Skip to content
news scrapper
Python C C++ TeX Jupyter Notebook Fortran
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
images
myenv
static
templates
train_data
train_scripts
.gitignore
Procfile
README.md
app.py
count_vectorizer.sav
finalized_model.sav
polish.stopwords.txt
requirements.txt

README.md

Scrappy news

This is a simple news scrapper with a machine learning algorythm for news header categorize. It has been uploaded as a beta version.

You can try it HERE

How it works?

  • First it looks into Https://newsapi.org API to get some top news from Poland. Method gets header title and url.
  • Then header is converted into vector of counted words and transform it by count vectorizer dictionary which I fitted before.
  • This results obtaining an encoded vectors for each header title.
  • Now encoded vectors can be predicted by model since they are sparse matrices

In details:

The main and the only screen

Features in progress

  • Add more news categories
  • Consider stopwords removing
  • Lemmatization of headers
  • Gather more data from various sources
You can’t perform that action at this time.