NU_453_NLP

The sotu_corpus_small.csv file contains 101 speeches and does not have any of the cell breaks. Please use this one for the project.

Project Scope

The project design is to utilize NLP techniques to preform data mining, determine term frequency–inverse document frequency (TF-IDF) values, latent Dirichlet allocation (LDA) estimations, topic modeling, and sentiment analysis of 101 State of the Union addresses from 1791 to 2019.

Desired Outcome

Sentiment analysis, topic modeling, TF-IDF and LDA values to derive deeper insights of American politics through the centuries and deepen understanding of NLP processes and results.

Corpus Development

Corpus is to be developed from SOTU addresses published to the State of the Union website. A scoped down assortment of all 243 files was used for speed and simplicity.

Model

The NLP modeling will incorporate a variety of scripts and/or Jupyter notebooks from the MSDS 453 Winter 2019 course, those discovered on GitHub, and the SOTU Kaggle website.

GitHub credits:

Daniel Bashir, https://github.com/db7894/sentiment-of-the-union

Shayne, https://github.com/shngli/SOTU-mining

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
Images		Images
nltk_data/corpora		nltk_data/corpora
.DS_Store		.DS_Store
A.3 Final Report - SOTU.ipynb		A.3 Final Report - SOTU.ipynb
A.3_Final Report_SOTU_v3 - Gowani_Luu_Thompson.pdf		A.3_Final Report_SOTU_v3 - Gowani_Luu_Thompson.pdf
README.md		README.md
SOTU Ontology.pptx		SOTU Ontology.pptx
SOTU_Project_Proposal.docx		SOTU_Project_Proposal.docx
corpus_creation.py		corpus_creation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Data

Data

Images

Images

nltk_data/corpora

nltk_data/corpora

.DS_Store

.DS_Store

A.3 Final Report - SOTU.ipynb

A.3 Final Report - SOTU.ipynb

A.3_Final Report_SOTU_v3 - Gowani_Luu_Thompson.pdf

A.3_Final Report_SOTU_v3 - Gowani_Luu_Thompson.pdf

README.md

README.md

SOTU Ontology.pptx

SOTU Ontology.pptx

SOTU_Project_Proposal.docx

SOTU_Project_Proposal.docx

corpus_creation.py

corpus_creation.py

Repository files navigation

NU_453_NLP

Project Scope

Desired Outcome

Corpus Development

Model

About

Releases

Packages

Contributors 3

Languages

ksenluu/NU_453_NLP

Folders and files

Latest commit

History

Repository files navigation

NU_453_NLP

Project Scope

Desired Outcome

Corpus Development

Model

About

Resources

Stars

Watchers

Forks

Languages