Skip to content

Automated journalist recommendation for media coverage with Nearest Neighbour search.

Notifications You must be signed in to change notification settings

rubyruins/pressmatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

newsentity 📰

A dashboard for analysing entities (people and organisations) in the content of recent news articles + media coverage by user sentiment.

It uses the LatestNews API for getting news content. You can find the deployed website here!


Features:

  • Named entities (person or organisation) can be filtered by number to obtain graphs and wordclouds.
  • Selecting a particular entity finds other trending topics in the news related to it + groups the various news sources covering the topic to find the intensity of their sentiments using the Afinn Lexicon.
  • The sentiment.py file creates the dashboard by loading the relevant files. It uses NLTK'S Parts of Speech tagger to chunk the tokens based on their POS tags to find named entities.
  • The ner.ipynb file contains previous attempts to solve the problem by using scikit-learn's CountVectorizer.
  • The dashboard currently only works on a small subset of data for the purpose of this experiment. Expanding it to cover news articles daily remains in the future scope.

Screenshots:


Tech stack:

  • nltk: POS tagging to perform NER by chunking tokens.
  • pandas: formatting and cleaning the data.
  • afinn: lexicon to measure coverage sentiment.
  • plotly express: visualisations.
  • streamlit: web framework.

Deployment:

The live project is deployed on https://newsentity.herokuapp.com/.


Local installation:

You must have Python 3.6 or higher to run the file.

  • Create a new virtual environment for running the application. You can follow the instructions here.
  • Navigate to the virtual environment and activate it.
  • Install the dependancies using pip install -r requirements.txt
  • Run the news.py file with streamlit run news.py

Releases

No releases published

Packages

No packages published

Languages