This is a project for the analysis and summarization of facts and figures from NEWS channels over RSS feeds. It is to become astanalone framework for aggregation of multiple sources for the purpose of:
- Convenience in NEWS distribution
- Factual checking of information from different sectors.
- Collection of all facts pertaining to an object
- Making trend analysis easier
There are lots of news channels that are being considered for collection of information, of which a few are, BBC, CNN, Al Jazeera, Times of India, and a few more.
The test project has been completed without optimizations and without any finishes provided. There are lots of optimizations and corners to be covered. This will be done if and only if someone is interested in working with this tool any further.
To install, run the installer provided by making use of
sh ubuntu_installer.sh
For execution refer to the shell file called 'nasty.sh' for clear procedure of execution. It gives you the exact method for exeuting the program. It is also an automated file that can be executed in a linux environment.
sh nasty.sh
If you want to execute on multiple servers, the commands are:
- Executing the sourcing module
python3 ./python/sourcing
- Starting the CoreNLP server, tailored for information extraction
java -mx8g -cp "./factserver.jar:../corenlp/*" factserver.Driver
- Running the analyzer
python3 ./python/analysis/analyzer.py
- Make use of MongoDB to check results
- There exists a mockup of one of the three major use cases of this project at: NASTY UI
- Sourcing module run
- Analysis module run
- MongoDB view of acquired data
The project is halted for creation of a better NER API, that will be brought about in the NLP repository. The reason for this primarily is the lack of accuracy in the Stanford NER model. Can look forward to some serious upgrades to the project after that.