GutenTag is a big-data application that uses LDA and other NLP magic to automatically tag news articles.
- Constant influx of articles
- Manually tagging each article is a difficult and time consuming process
- Tags/Keywords are crucial data for search engines
- Tagging helps improve recommendations
- Consume APIs to fetch articles
- Extract metadata (tags and sentiments)
- The sentiment can be used to determine the overall opinion conveyed by the articles
- Index the metadata to provide improved search
- Install the required packages via
pip install -r requirements.txt
- Ensure you have docker installed, and run
scripts\run.cmd
(on Windows) orscripts\run.sh
(on Mac/Linux) - Run
start.cmd
(on Windows) orstart.sh
(on Mac/Linux) - Open http://localhost:5000/index.html on your browser
- Click ‘Extract’ on the navbar.
- Enter the umbrella term and click the ‘Extract’ button.
- The process will fetch the raw data from ElasticSearch, run it through the ML pipeline, and store the processed data back into another index on ElasticSearch.
- Go to the ‘Search’ page.
- Enter the search query and click the search button.
- The results will appear in the table.