Here using Regression model that uses a dataset OnlineNewsPopularityClassification.csv to train itself and then check the virality of the scraped data.The virality is checked on the basis of various information that has been scraped from Times of India website.
The file includes these data for evaluation:
Later on after using sentiment analysis and weighing the relevant words with the ones in popular news a model is created.
For this the data like number of tokens, number of shares etc are used from the respected website.
Later on the virality or popularity score is given:
The score lies between 0 and 1(0 corresponding to not popular news and 1 corresponds for popular news)
This model has various algos like Logistics Regression,Random Forest Classifier,SVM but the one actively used is RandomForestClassifier due to its best results.
The Output shows Essentials of the model using RandomForestClassifier after being trained and tested.
The Output shows the labels before and after standardization. It also shows the accuracy of this model.
This model uses Bayesian Linear Regression to solve the problem and give us the required accuracy of the model.
I also made a website using Django that can be later used to project the popular news on a single site only.
Web Scraping of Times Of India:
Web Scraping Of Hindustan TImes:
Web Scraping of The Economist:
A much proper integration of model and NewsAggregator which could predict the virality of the news and display the link to the site on my website asap.
The project is available as open source under the terms of the MIT License.