# Project: Rank Tweets by Kindness

**Sentiment Analysis**: Social media can be a great place to connect with others and share positive thoughts and experiences. However, it can also be a breeding ground for negativity and hate speech. This project aims to develop a system to rank tweets by kindness in order to promote more positive and supportive online interactions.

**Benefits:**

The proposed system has the potential to provide a number of benefits, including:

- Promoting more positive and supportive online interactions
- Reducing the spread of negativity and hate speech
- Helping people to find and connect with others who share their values
- Making social media a more enjoyable and welcoming place for everyone

**Tasks:**

The proposed system will use sentiment analysis to identify and rank tweets based on their level of kindness. This will be done by the following steps:

1. Read the `nice_words.txt` file into a list. This file will contain a list of words that are typically associated with kindness, such as "love," "compassion," and "gratitude."
1. Read the `tweets.txt` file into a tweets list. This file will contain a collection of tweets to be ranked.
1. Look at each of the tweets and count the number of nice words.
1. Sort the tweets in descending order based on the number of nice words, with the most kind tweet first.
1. Display the tweets, along with the count of nice words in each tweet.

```
sample tweets:
[
    "great and awesome",
    "what a good day"
]
sample output:
[
    ("great and awesome", 2),
    ("what a good day", 1),
]
```

## Start of Slution

### Methodology:

#### 1- Collecting tweets via API using RapidApi.com. and get the request in a json format 

The resulted request is in a json  structure:

```
{ ‘result’: [{……….,’text’:’tweet we looking for’,………}]
  ‘Metadata’:[{…………………..}]
}
```


In [8]:
import requests

url = "https://twitter154.p.rapidapi.com/user/tweets"

querystring = {"username":"omarmhaimdat","limit":"40","user_id":"96479162","include_replies":"false","include_pinned":"false"}

headers = {
	"X-RapidAPI-Key": "58baeea929mshd12d1d532219275p1149abjsn3d086b9835c4",
	"X-RapidAPI-Host": "twitter154.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)

print(response.json())



{'results': [{'tweet_id': '1681795360748892161', 'creation_date': 'Wed Jul 19 22:37:14 +0000 2023', 'text': 'RT @lvwerra: Did you know that you can train all Llama-2 models on your own data in just a few lines?\n\nThe script even works with the 70B m…', 'media_url': None, 'video_url': None, 'user': {'creation_date': 'Sun Dec 13 03:52:21 +0000 2009', 'user_id': '96479162', 'username': 'omarmhaimdat', 'name': 'Omar MHAIMDAT', 'follower_count': 958, 'following_count': 1249, 'favourites_count': 6489, 'is_private': False, 'is_verified': False, 'is_blue_verified': True, 'location': 'Casablanca, Morocco', 'profile_pic_url': 'https://pbs.twimg.com/profile_images/1271521722945110016/AvKfKpLo_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/96479162/1599303392', 'description': 'Data Scientist | Software Engineer | Better programming and Heartbeat contributor', 'external_url': 'https://t.co/UTbAycjH5Q', 'number_of_tweets': 3169, 'bot': False, 'timestamp': 1260676341, 'has

In [9]:
#saving the resulted result into a json file called tweets.

import json

with open('tweets.json', 'w') as json_file:
  json.dump(response.json(), json_file)

#### -2 Extract and keep only needed information (here it is tweets text). 

We only interested on text which is one value of key -> result. 

Using Pandas arranging the data in data frame format, Which enable processing the data in such a table format. 


In [10]:
import pandas as pd

df = pd.read_json('tweets.json')
print(df)


                                              results  \
0   {'tweet_id': '1681795360748892161', 'creation_...   
1   {'tweet_id': '1654934719086092290', 'creation_...   
2   {'tweet_id': '1646992900851396609', 'creation_...   
3   {'tweet_id': '1619711691515899906', 'creation_...   
4   {'tweet_id': '1619711572947128320', 'creation_...   
5   {'tweet_id': '1609522402735652865', 'creation_...   
6   {'tweet_id': '1609414546816393217', 'creation_...   
7   {'tweet_id': '1602648640149147648', 'creation_...   
8   {'tweet_id': '1600382718071685120', 'creation_...   
9   {'tweet_id': '1600168069300572173', 'creation_...   
10  {'tweet_id': '1599485631004307457', 'creation_...   
11  {'tweet_id': '1599153739843051520', 'creation_...   
12  {'tweet_id': '1586838973309132801', 'creation_...   
13  {'tweet_id': '1583804340682526720', 'creation_...   
14  {'tweet_id': '1581012889728856064', 'creation_...   
15  {'tweet_id': '1579514658368561156', 'creation_...   
16  {'tweet_id': '1570868934420

In [12]:
print(df['results'])

0     {'tweet_id': '1681795360748892161', 'creation_...
1     {'tweet_id': '1654934719086092290', 'creation_...
2     {'tweet_id': '1646992900851396609', 'creation_...
3     {'tweet_id': '1619711691515899906', 'creation_...
4     {'tweet_id': '1619711572947128320', 'creation_...
5     {'tweet_id': '1609522402735652865', 'creation_...
6     {'tweet_id': '1609414546816393217', 'creation_...
7     {'tweet_id': '1602648640149147648', 'creation_...
8     {'tweet_id': '1600382718071685120', 'creation_...
9     {'tweet_id': '1600168069300572173', 'creation_...
10    {'tweet_id': '1599485631004307457', 'creation_...
11    {'tweet_id': '1599153739843051520', 'creation_...
12    {'tweet_id': '1586838973309132801', 'creation_...
13    {'tweet_id': '1583804340682526720', 'creation_...
14    {'tweet_id': '1581012889728856064', 'creation_...
15    {'tweet_id': '1579514658368561156', 'creation_...
16    {'tweet_id': '1570868934420889600', 'creation_...
17    {'tweet_id': '1522990095711621120', 'creat

key-Result is the first column we interested in as it have the tweet text. Allocating the tweet text in text-value

In [14]:
#Keep only tweet's text

col = df['results']
tweets= col.apply(lambda x: x['text']) #passing lambda functiona as a parameter

In [26]:
tweets

pandas.core.series.Series

#### -3 Saving in a file

Now we get the tweets. Time to save them in a file. We choose CSV format to preserve the order and keep every tweet in one line making the parsing later easier.  

In [27]:
tweets.to_csv('tweet.csv')

#### 4- Count The Nice Words Appearance For Every Tweet

In [192]:

result={} #holding the resulted counted words for every tweet

for tweet in tweets: #one tweet at a time
   
    counter=0                     #Zeroed for every new tweet 
    
    nice_words= open("nice_words.txt", "r")
    for word in nice_words:        
        if word.strip() in tweet.lower(): # find if the word appear. strip() to remove the '\n'
            counter+=1
                
    #saving for every tweet the total nice words appear 
    result.update({tweet:counter}) 
    

In [193]:
#arrange the result in a descending order. The moset nice words count be first.
res= sorted(result.items(), key=lambda x: x[1], reverse=True)

print(res)

[('RT @lvwerra: Did you know that you can train all Llama-2 models on your own data in just a few lines?\n\nThe script even works with the 70B m…', 4), ('Imagine you switch from FastAPI to actix-web (#rustlang) to discover it has some serious memory leaks 😞.  2023 will be fun 🥳', 3), ('RT @cwolferesearch: Large Language Models (LLMs) work great when applied to language, but what happens when you pre-train them over other s…', 3), ('RT @Tim_Dettmers: @karpathy Super excited to push this even further:\n- Next week: bitsandbytes 4-bit closed beta that allows you to finetun…', 2), ('RT @DanHollick: How does a Large Language Model like ChatGPT actually work?\n\nWell, they are both amazingly simple and exceedingly complex a…', 2), ('🚀 Whatlang-pyo3 v0.5.0 | 𝐁𝐚𝐭𝐜𝐡 𝐃𝐞𝐭𝐞𝐜𝐭𝐢𝐨𝐧 🤩\n\nAnnouncing batch detection for text language detection using python with the performance of Rust.\n\nIt can be up to 5x 𝐟𝐚𝐬𝐭𝐞𝐫 due to the lack of data and call overheads https://t.co/aSUaZD7Go8', 2), ("Sometimes you k

### Challenging 
Searching for a word is an exact matching. It is more functional to use the root of the words. However, to see a reasonable result we altered the nice word file by appending extra words. 

We recommend using stemming and tokenization.