Skip to content

A data science project using NLP to explore tweets relating to the 2018 Champions League Final.

Notifications You must be signed in to change notification settings

jb-0/twitter-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP and Unsupervised learning using Champions League Final 2018 Tweets

This repository forms my final project submission for the General Assembly Data Science Essentials course. I utilised NLP and KMeans to identify topics within a Twitter dataset.

The repository comprises of two jupyter notebooks and two output files:

  1. eda-rea-v-liv-2018.ipynb - This contains the project brief (outline of the project and aspriations) along with some exploratory data analysis which provides insights to the dataset I selected.
  2. nlp-rea-v-liv-2018.ipynb - This contains the final project report which includes NLP and KMeans. There are two .txt files that are outputs of some of the code written in this file, these were used to identify parameters that resulted in strong/clear clustering. This combination of parameters was then further explored and visualised as seen in the jupyter notebook.
    1. 2020-09-03 1913-TF-IDF.txt - Contains the results of clustering data with TF-IDF features using KMeans and various parameters.
    2. 2020-09-04 0651-COUNT-VEC.txt - Contains the results of clustering data with count vectorized features using KMeans and various parameters.

About

A data science project using NLP to explore tweets relating to the 2018 Champions League Final.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published