Sentiment Analysis of Arabic and Persian Tweets Following the Assassination of Qasem Soleimani

On the 3rd of January 2020, a United States drone strike near Baghdad International Airport targeted and killed Iranian major general Qasem Soleimani of Iran's Islamic Revolutionary Guard Corps (IRGC) along with Abu Mahdi al-Muhandis, a very important figure in the Popular Mobilization Committee (Al-Hashd Al-Sha'abi). Soleimani was commander of the Quds Force and was considered the second most powerful person in Iran, subordinate only to the Supreme Leader Ali Khamenei. This Assassination caused a massive escalation of tensions between the United States and Iran.

In this study I analyzed the sentiments of Arabic and Persian-language tweets that used the hashtag "#قاسم_سليماني" (Qasem Soleimani in Arabic/Persian) between January 3rd, the date of assassination and January 31st. The aim of my study was to analyze the change in sentiments over time, and how they were affected by major news and events.

Repositories I forked to use in this project:

Optimized-modified-GetOldTweets3-OMGOT
In my quest to find a library or a piece of code that could assist me in downloading tweets that are older than 7 days, I found this project. It is the only one that ran accurately and gave me the expected data. However, I had to add in a sleep time of 3 seconds after downloading 10 tweets in order to avoid receiving "too many requests" errors.
Persian Stop Words List: I used this project because NLTK does not have Persian stop words.

The approach I took:

After spending two weeks trying to find a way to obtain old tweets, I finally got them using Optimized-modified-GetOldTweets3-OMGOT project. After this, the next issue was determining the language of the tweets, since both Persian and Arabic tweets used the same hashtag. After trying many algorithms and solutions to determine the language of the tweets, I found that textblob library has the highest accuracy, but the downside was that to avoid "too many request" error, I had to sleep after 10 tweets. Once the languages of the tweets were identified, I randomly selected and labeled 1000 tweets from each language and ran some NLTK algorithms to determine the sentiment of the Arabic tweets only, using the codes taught by Harrison Kinsley on his amazing website. It turned out that the training data was not sufficient, so the result was highly inaccurate and inconsistent. Finally I realized, because the topic of the tweets is very specific, I could easily find some keywords that determined the sentiments of the tweets for both languages. The result was satisfying, consistent and computationally cheap.

Here are the files of the project. To reproduce the result, you must run the files in the order specified below. However, because labeling and determining the language of the tweets are not easily reproducible, I recommend running the last 3 or 4 files only.

code:

preprocessingData.ipynb: Cleaning the tweets and extracting mentions and hashtags.
textblob_lang_classification.py: Using textblob library to determine the language of the tweets.
addingLanguage.ipynb: Adding the language as a column to the main data.
random_labeling.py: Labeling 1000 randomly selected tweets from each language.
arabic_full_analysis.ipynb: running some NLTK sentiment analysis algorithms on Arabic tweets.
noMLar.ipynb: Classifying Arabic tweets without using ML Algorithms.
noMLfa.ipynb: Classifying Persian tweets without using ML Algorithms.
noML_fa_sqlite.ipynb: Same as noMLfa.ipynb but with sqlite3
tweets_visualization.ipynb: Visualizing the tweets' sentiments.

Data:

concatenated_data.csv: Raw data
cleaned_data.csv: Pre-processed data
tweets_lang_added.csv: The language column added.
tweets_sent_fa.csv: Persian tweets from which 1000 are labeled. Ready for Sentiment analysis.
tweets_sent_ar.csv: Arabic tweets from which 1000 are labeled. Ready for Sentiment analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis of Arabic and Persian Tweets Following the Assassination of Qasem Soleimani

Repositories I forked to use in this project:

The approach I took:

Here are the files of the project. To reproduce the result, you must run the files in the order specified below. However, because labeling and determining the language of the tweets are not easily reproducible, I recommend running the last 3 or 4 files only.

code:

Data:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Data		Data
figs		figs
.gitattributes		.gitattributes
README.md		README.md
addingLanguage.ipynb		addingLanguage.ipynb
arabic_full_analysis.ipynb		arabic_full_analysis.ipynb
noML_fa_sqlite.ipynb		noML_fa_sqlite.ipynb
noMLar.ipynb		noMLar.ipynb
noMLfa.ipynb		noMLfa.ipynb
preprocessingData.ipynb		preprocessingData.ipynb
random_labeling.py		random_labeling.py
textblob_lang_classification.py		textblob_lang_classification.py
tweets_visualization.ipynb		tweets_visualization.ipynb

khaledabbud/SA_of_Tweets_After_QS_Assassination_AR_FA

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Arabic and Persian Tweets Following the Assassination of Qasem Soleimani

Repositories I forked to use in this project:

The approach I took:

Here are the files of the project. To reproduce the result, you must run the files in the order specified below. However, because labeling and determining the language of the tweets are not easily reproducible, I recommend running the last 3 or 4 files only.

code:

Data:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages