Skip to content

giuseppegambino/Italian-Sentiment-Analysis-with-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ItalianSentimentAnalysis

This is the project for my thesis in Computer Science done at University of Palermo under the supervision of the professor Roberto Pirrone.

The goal was to build a data analysis pipeline with technologies related to Big Data:

  • Data collection
  • Data pre-processing
  • Data labeling
  • Machine Learning model tuning
  • Application of the Naive Bayes algorithm
  • Model evaluation
  • Insight extraction

The technologies used are:

  • Python 3.7
  • Tweepy, Twitter API
  • Pandas, Python Data Analysis Library
  • NLTK, Natural Language Toolkit Library
  • Apache Spark 2.4

The project consists of 4 python pages of code:

  • tweetSave.py to collect the tweet, is set to collect italian tweet with music keyword
  • tweetClean.py to clean and pre-process the data
  • tweetSentimentRadici.py to label the tweet with positive, negative or neutral sentiment
  • tweetSpark.py to apply the machine learning tools (RUNS ON SPARK)

Write me if you have doubts or to improve the solution.