Skip to content

mihailthebuilder/bright-news-backend

Repository files navigation

Bright News (backend)

Intro

Django REST API backend for a web app that analyses the positivity of a news site. Links to...

Table of contents

Back-end architecture

The back-end is a Django RESTful API hosted on Heroku's free tier. It's completely separate from the front-end and the codebase sits in this repo.

Views

There's only 1 view in this Django project and it corresponds to the API endpoint that connects to the front-end. It receives the URL of the news site and returns the analysis data. I leverage Django REST framework to set up this view using the generic APIView class.

Models

2 models that database all the site submissions together with the data generated from my analysis:

  1. WebsiteModel - sites that were successfully analysed
  2. FailedWebsiteModel - sites where the back-end couldn't successfully send analysis results back to the front-end

This enables me to see all the data in my Django admin panel and discover gems 😄 models

Scoring process

Fetching the raw data

The scoring process starts by going to the URL that has been sent to the API. Upon successful entry, it uses Beautiful Soup to fetch all the text and splits it into a list according to their HTML elements.

Data cleansing

I remove any text piece that meets any of the following criteria:

  • Includes site-generic terms such as "cookie" or "sign up".
  • Doesn't have enough words; the sentiment model struggles to analyse them.
  • Are duplicates

I also apply some encoding/decoding black magic using the text_transform function to remove odd characters.

Absolute scoring

I selected the VADER and AFINN sentiment analysis libraries for generating my site positivity scores. Both are well-known in the NLP space. I tested them on a few samples as well; they seem to be quite reliable and complementary of each other.

I go through each piece of text and compare the scores generated by the two libraries. The aggregate score is based on several situations:

  1. Both scores are of the same sign -> aggregate score is +1.
  2. One score is 0 while another is non-0 -> aggregate score is +/-1 depending on the sign of the score.
  3. The two scores are of opposite signs -> aggregate score is 0 because the models seem unreliable.
  4. The two scores are 0 -> aggregate score is again 0.

You'll notice I convert all scores to +/-1 and 0; using the actual magnitudes didn't produce reliable results for me.

Finally, I calculate the entire site's absolute score by dividing the number of (+1) text pieces by the number of non-0 text pieces. But we're not done yet :)

Relative scoring

I retrieve all the absolute scores that are stored in the WebsiteModel and add the site-in-focus' score to the list. I then use pandas to group them by the mean values for each URL.

With scikit learn, I scale all the scores so that they're between 0 and 1. 0 represents the least positive score in our pandas dataframe, while 1 represents the most positive score.

This scaled dataset is what's sent back to the front-end for the wonderful results page. The end 🥳

DevOps

Running the app locally

Install the Python packages

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

You might encounter issues with installing psycopg2 on Linux (Ubuntu). Run the below as per the installation guide and this:

sudo apt-get install python3-dev libpq-dev build-essential
export PATH=/usr/lib/postgresql/X.Y/bin/:$PATH

Create a PostgreSQL db.

Create a .env file that stores the following...

DJANGO_ENV=development
SECRET_KEY=create_your_key
DB={"ENGINE": "django.db.backends.postgresql", "NAME": "your_db_name", "USER": "your_db_username", "PASSWORD": "your_db_password!", "HOST": "localhost", "PORT": "5432"}

Set up your database tables

python manage.py migrate

Run the app

python manage.py runserver

Test the app

python manage.py test

GitHub actions

I set up a workflow that...:

  1. Creates a PostgreSQL database service
  2. Tests the single API endpoint with a good request
  3. Tests the API with a bad request

Heroku deployment

heroku login
git push heroku master
heroku ps:scale web=1

Add the environment variables as per the local setup.

License

Licensed under Mozilla Public License 2.0.

About

Web app that analyses the positivity of news sites. This is the Django REST API backend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages