Topic Modelling on Financial News Articles

Summary

This repo contains code for pre-processing and vectorizing raw text collected from 85,000 news articles downloaded from a variety of online broadsheet newspapers and newswires covering finance, business and the economy.

A detailed blog post can be found at http://mattmurray.net/topic-modelling-financial-news-with-natural-language-processing/

The data was pre-processed with the removal of stop words, punctuation and numbers, and the words were stemmed using the Snowball stemmer.

The data was vectorized into a TF-IDF matrix, then Latent Semantic Analysis techniques were applied to reduce the dimensions into a smaller number of latent features.

Finally, the latent features were clustered into topic clusters and the trends in the topics visualized over time.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
01_nlp_stopwords_punct_entity.ipynb		01_nlp_stopwords_punct_entity.ipynb
02_trim_cleaned_article_data.ipynb		02_trim_cleaned_article_data.ipynb
03_nlp_lda_extra_stopwords.ipynb		03_nlp_lda_extra_stopwords.ipynb
04_nlp_stem_lemmatize_text.ipynb		04_nlp_stem_lemmatize_text.ipynb
05_nlp_viz_clusters.ipynb		05_nlp_viz_clusters.ipynb
06_nlp_snowball_dbscan_k_means_clustering.ipynb		06_nlp_snowball_dbscan_k_means_clustering.ipynb
07_explore_label_k_means_clusters.ipynb		07_explore_label_k_means_clusters.ipynb
08_nlp_chart_data.ipynb		08_nlp_chart_data.ipynb
README.md		README.md
articles_by_year.png		articles_by_year.png
cb_regulation_chart.png		cb_regulation_chart.png
country_region_chart.png		country_region_chart.png
economy_chart.png		economy_chart.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_nlp_stopwords_punct_entity.ipynb

01_nlp_stopwords_punct_entity.ipynb

02_trim_cleaned_article_data.ipynb

02_trim_cleaned_article_data.ipynb

03_nlp_lda_extra_stopwords.ipynb

03_nlp_lda_extra_stopwords.ipynb

04_nlp_stem_lemmatize_text.ipynb

04_nlp_stem_lemmatize_text.ipynb

05_nlp_viz_clusters.ipynb

05_nlp_viz_clusters.ipynb

06_nlp_snowball_dbscan_k_means_clustering.ipynb

06_nlp_snowball_dbscan_k_means_clustering.ipynb

07_explore_label_k_means_clusters.ipynb

07_explore_label_k_means_clusters.ipynb

08_nlp_chart_data.ipynb

08_nlp_chart_data.ipynb

README.md

README.md

articles_by_year.png

articles_by_year.png

cb_regulation_chart.png

cb_regulation_chart.png

country_region_chart.png

country_region_chart.png

economy_chart.png

economy_chart.png

Repository files navigation

Topic Modelling on Financial News Articles

Summary

Outcome

About

Releases

Packages

Languages

mattmurray/topic_modelling_financial_news

Folders and files

Latest commit

History

Repository files navigation

Topic Modelling on Financial News Articles

Summary

Outcome

About

Topics

Resources

Stars

Watchers

Forks

Languages