# Project - Sentiment Classifier

We have provided some synthetic (fake, semi-randomly generated) twitter data in a csv file named project_twitter_data.csv which has the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet. We have also words that express positive sentiment and negative sentiment, in the files *positive_words.txt* and *negative_words.txt*.

Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. You will create a csv file, which contains columns for the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score for each tweet. At the end, you upload the csv file to Excel or Google Sheets, and produce a graph of the Net Score vs Number of Retweets.

To start, define a function called `strip_punctuation` which takes one parameter, a string which represents a word, and removes characters considered punctuation from everywhere in the word. (Hint: remember the `.replace()` method for strings.)

In [1]:

# YOUR CODE HERE
# قائمة الرموز اللي نعتبرها ترقيم
punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']

# دالة لإزالة الرموز من النص
def strip_punctuation(word):
    for char in punctuation_chars:
        word = word.replace(char, "")
    return word

# أمثلة للتجربة
print(strip_punctuation("hello!"))       # hello
print(strip_punctuation("good#morning")) # goodmorning
print(strip_punctuation("it's@fun"))     # itsfun


hello
goodmorning
itsfun


Next, define a function called `get_pos` which takes one parameter, a string which represents one or more sentences, and calculates how many words in the string are considered positive words. Use the list, `positive_words` to determine what words will count as positive. The function should return a positive integer - how many occurrences there are of positive words in the text. Note that all of the words in `positive_words` are lower cased, so you’ll need to convert all the words in the input string to lower case as well.

In [2]:
# list of positive words to use
positive_words = []
with open("assets/positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())

# YOUR CODE HERE
# دالة لحساب عدد الكلمات الإيجابية
def get_pos(text):
    count = 0
    words = text.lower().split()  # تحويل النص لحروف صغيرة وتقسيمه لكلمات
    for word in words:
        clean_word = strip_punctuation(word)  # إزالة الرموز
        if clean_word in positive_words:
            count += 1
    return count

# أمثلة للتجربة
print(get_pos("I love sunny days!"))  # على حسب محتوى positive_words
print(get_pos("This is awesome and fantastic!"))  # ترجع عدد الكلمات الإيجابية


1
2


Next, define a function called `get_neg` which takes one parameter, a string which represents one or more sentences, and calculates how many words in the string are considered negative words. Use the list, `negative_words` to determine what words will count as negative. The function should return a positive integer - how many occurrences there are of negative words in the text. Note that all of the words in negative_words are lower cased, so you’ll need to convert all the words in the input string to lower case as well.

In [3]:
# list of negative words to use
negative_words = []
with open("assets/negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())

# YOUR CODE HERE
# دالة لحساب عدد الكلمات السلبية
def get_neg(text):
    count = 0
    words = text.lower().split()  # تحويل النص لحروف صغيرة وتقسيمه لكلمات
    for word in words:
        clean_word = strip_punctuation(word)  # إزالة الرموز
        if clean_word in negative_words:
            count += 1
    return count

# أمثلة للتجربة
print(get_neg("I hate rainy days!"))  # على حسب محتوى negative_words
print(get_neg("This is terrible and awful!"))  # ترجع عدد الكلمات السلبية


1
2


Finally, write code that opens the file project_twitter_data.csv which has the fake generated twitter data (the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet). Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. 

Now, you will write code to create a csv file called `resulting_data.csv`, which contains the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score (how positive or negative the text is overall) for each tweet. The file should have those headers in that order. Remember that there is another component to this project. You will upload the csv file to Excel or Google Sheets and produce a graph of the Net Score vs Number of Retweets. Check Coursera for that portion of the assignment, if you’re accessing this textbook from Coursera.

In [4]:

# YOUR CODE HERE
import csv

# فتح الملف الأصلي للقراءة
with open("assets/project_twitter_data.csv", 'r') as infile:
    reader = csv.reader(infile)
    header = next(reader)  # نتخطى صف العنوان
    
    # فتح ملف جديد للكتابة
    with open("resulting_data.csv", 'w', newline='') as outfile:
        writer = csv.writer(outfile)
        
        # كتابة العناوين المطلوبة
        writer.writerow(["Number of Retweets", "Number of Replies", "Positive Score", "Negative Score", "Net Score"])
        
        # قراءة كل صف في الملف الأصلي
        for row in reader:
            tweet = row[0]            # نص التغريدة
            retweets = int(row[1])    # عدد الريتويت
            replies = int(row[2])     # عدد الردود
            
            pos_score = get_pos(tweet)   # عدد الكلمات الإيجابية
            neg_score = get_neg(tweet)   # عدد الكلمات السلبية
            net_score = pos_score - neg_score  # الصافي
            
            # كتابة البيانات في الملف الجديد
            writer.writerow([retweets, replies, pos_score, neg_score, net_score])

print("تم إنشاء الملف resulting_data.csv بنجاح ✅")


تم إنشاء الملف resulting_data.csv بنجاح ✅
