This API allows user to store a text and request a summary of it.
Typical workflow:
- send a
POST
request to store text in a database and get a text id - send a
GET
request to retrieve the text corresponding to a text id - send a
GET
request to get a summary of the text corresponding to a text id
For the summarization service the API uses the gensim implemetation of TextRank algorithm (extractive summarization). See documentation here : https://radimrehurek.com/gensim_3.8.3/auto_examples/tutorials/run_summarization.html
First you need to clone this project :
$ git clone https://github.com/hamzameur/summarization_api.git
Then you need to check out the summarization_api
directory :
$ cd summarization_api
If you got Docker installed then you simply need to run the following commands:
$ sudo docker build . -t summarization_api
Once you make sure the image has been successfuly built, run the following command :
$ sudo docker run -p 5000:5000 -d summarization_api
If you use poetry then you can run the following command in the summarization_api
directory to install the virtual environment (python version 3.8.11):
$ poetry install
Once the environment is installed, you can run the following commands to run the api (feel free to change the port):
$ export FLASK_APP=summarization_api/views.py
$ poetry run python -m flask run --host=0.0.0.0 --port=5000
You will see the following output :
* Serving Flask app 'summarization_api/views.py'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://192.168.1.83:5000
Press CTRL+C to quit
If you use virtualenv you can run the following commands in the summarization_api
directory (python version 3.8.11:
$ export FLASK_APP=summarization_api/views.py
$ python3 -m venv env
$ source env/bin/activate
$ python -m pip install -r requirements.txt
$ python -m flask run --host=0.0.0.0 --port=5000
You will see the following output :
* Serving Flask app 'summarization_api/views.py'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://192.168.1.83:5000
Press CTRL+C to quit
The API implements the following 3 features :
- store text in a database and get a text id via a
POST
call to the/texts
service. You should provide the text in thetext
parameter in the form. - retrieve the text corresponding to a text id via a
GET
call to the/texts/{textId}
service - get a summary of the text corresponding to a text id via a
GET
call to the/texts/{textId}/summarize
service
Once you run the API using instructions from section 1. you can try out the following requests :
$ curl -d "text=Hello, world !" -X POST http://localhost:5000/texts
Output:
{"textId":"da1180d4c2cec7514b8f9707719e6fce5d872e393860b54a848b4060a8463a7c"}
You can retrieve the text you just saved using the following GET
call:
$ curl http://localhost:5000/texts/da1180d4c2cec7514b8f9707719e6fce5d872e393860b54a848b4060a8463a7c
Output:
{"text":"Hello, world !","textId":"da1180d4c2cec7514b8f9707719e6fce5d872e393860b54a848b4060a8463a7c"}
We have provided an example of a larger text in the sample_text.txt
file for demo purposes :
$ curl -d "@sample_text.txt" -X POST http://localhost:5000/texts
Output:
{"textId":"7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d"}
you can just run the following command:
$ curl http://localhost:5000/texts/7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d/summarize
Output :
{
"summary":"Morpheus awakens Neo to the real world, a ravaged wasteland where most of humanity have been captured by a race of machines that live off of the humans' body heat and electrochemical energy and who imprison their minds within an artificial reality known as the Matrix.",
"summaryParameters":{"ratio":0.2},
"textId":"7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d"
}
you can just run the following command for a ratio of 0.19:
$ curl http://localhost:5000/texts/7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d/summarize?ratio=0.19
Output :
{
"summary":"Morpheus awakens Neo to the real world, a ravaged wasteland where most of humanity have been captured by a race of machines that live off of the humans' body heat and electrochemical energy and who imprison their minds within an artificial reality known as the Matrix.",
"summaryParameters":{"ratio":0.19},
"textId":"7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d"
}
you can just run the following command for a word count of 100:
$ curl http://localhost:5000/texts/7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d/summarize?wordCount=100
Output :
{
"summary":"By day he is an average computer programmer and by night a hacker known as Neo. Neo has always questioned his reality, but the truth is far beyond his imagination.\nMorpheus awakens Neo to the real world, a ravaged wasteland where most of humanity have been captured by a race of machines that live off of the humans' body heat and electrochemical energy and who imprison their minds within an artificial reality known as the Matrix.\nAs a rebel against the machines, Neo must return to the Matrix and confront the agents: super-powerful computer programs devoted to snuffing out Neo and the entire human rebellion.",
"summaryParameters":{"wordCount":100},
"textId":"7f52f7e7c217848eaa291544784fc44735ac0cd14988cb4f432fad4754d2c49d"
}
- if you provide both
ratio
andwordCount
parameters, theratio
will be used andwordCount
will be ignored - if you provide a ratio equal to zero or a non-float ratio, then the default value of 0.2 will be used
- if you provide a word count equal to zero or a non-int word count, then the default value of 10 will be used
Run the following command :
$ poetry run pytest
or if you would like to see the coverage for each module :
$ poetry run pytest --cov=summarization_api