Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
tweetClean.py		tweetClean.py
tweetSave.py		tweetSave.py
tweetSentimentRadici.py		tweetSentimentRadici.py
tweetSpark.py		tweetSpark.py

Repository files navigation

ItalianSentimentAnalysis

This is the project for my thesis in Computer Science done at University of Palermo under the supervision of the professor Roberto Pirrone.

The goal was to build a data analysis pipeline with technologies related to Big Data:

Data collection
Data pre-processing
Data labeling
Machine Learning model tuning
Application of the Naive Bayes algorithm
Model evaluation
Insight extraction

The technologies used are:

Python 3.7
Tweepy, Twitter API
Pandas, Python Data Analysis Library
NLTK, Natural Language Toolkit Library
Apache Spark 2.4

The project consists of 4 python pages of code:

tweetSave.py to collect the tweet, is set to collect italian tweet with music keyword
tweetClean.py to clean and pre-process the data
tweetSentimentRadici.py to label the tweet with positive, negative or neutral sentiment
tweetSpark.py to apply the machine learning tools (RUNS ON SPARK)

Write me if you have doubts or to improve the solution.

About

Application of Sentiment Analysis of Italian tweet with Python and Spark

python machine-learning natural-language-processing twitter big-data spark sentiment-analysis pandas-dataframe bigdata pandas machinelearning italian spark-mllib

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%