
# <center> <u>User Analysis via Twitter Likes </u></center>

### What kind of information can be extracted from a user's twitter likes?

My name is Matthew Kang. I have had my Twitter account since January, 2011.

To this day, I have made 368 tweets. But I have liked 5686 tweets.

Looking through someone's likes is an intensely personal thing to do. 

Tweets, Retweets, and Quotes are all meant to be displayed to the public. But a user's likes are much more intimate. 

Liking a tweet means that you approve of the message being conveyed by a tweet. Or, perhaps a tweet is liked because it is from a personal friend. Other times, likes are used in the same way that one uses a coin in a wishing well. (e.g. , "Like this tweet to find love this week")

A great detail of information is conveyed through a users likes. I believe that through analyzing a users likes, you can determine ones ...
    
1. Friends
2. Interests
3. Political Opinions

<br>
   
### Methodology and Data

The data used in this study will be liked tweets from a single twitter account. 
For the sake of privacy and consent, I will be using my own twitter account, @kangaroomatthew. 

I am utilizing Twitter's API in order to obtain the data used in this study. 

Tweepy, a third party library, is used to interact with Twitter's API using Python.

Pandas is used to conduct data analysis, data visualization, and dataframe creation. 

---
## <center>I. Data Preparation</center>


### Step 1 : Get the user's liked tweets

In this step, I will use Tweepy to get a User's likes from the Twitter API.

I will need to enter my own API Credentials, so I have imported a file called Secret.py which holds them. 

The end result is a CSV holding all my likes. The file name of the CSV is "[userhandle]+[unixtimestamp].csv"

<br><br>

<center><i>This code is not meant to be run. It is simply here to show the viewer how the CSV containing user likes was created.</i></center>

In [12]:
import Secret   # Used to import private api credentials from my Secret file. 
import tweepy   # Third party python library used to interact with twitter api
import os       # Handles creation and naming of paths and files
import time     # Used to get current unix timestamp for use in the csv filename
import csv      # Used to write the response to csv

### Instantiate tweepy authentication handler by passing in API credentials
auth = tweepy.OAuth1UserHandler(
    Secret.apikey, 
    Secret.apikeysecret, 
    Secret.accesstoken, 
    Secret.accesstokensecret)

### Instantiate tweepy API object by passing in the previously made authentication handler
api = tweepy.API(
    auth,
    wait_on_rate_limit=True
    )

if not os.path.exists('LIKEDTWEETS/kangaroomatthew-1661467599.csv'):
    ### Self explanatory
    myUserName = input('What is the username of the account whose liked tweets you wish to view? ')

    ### Create path to a directory called LIKEDTWEETS
    userFolder = os.path.join(os.getcwd(),'LIKEDTWEETS')
    if not os.path.exists(userFolder):
        os.makedirs(userFolder)

    ### Create filename for the csv that will be saved into LIKEDTWEETS
    filename= myUserName +"-"+ str(time.time()).split('.')[0]+'.csv' 
    fullpath = os.path.join(userFolder,filename)

    ### Ping the API for the user's liked tweets, parsing the json response for key info : [NAME, USERNAME, TWEET_TEXT, TIMESTAMP, FAVORITE_COUNT]
    ### Write each row into a csv
    with open(fullpath,'w+') as liked:
        a=csv.writer(liked)
        for page in tweepy.Cursor(api.get_favorites, screen_name=myUserName,tweet_mode='extended').pages():
            for tweet in page:
                a.writerow([
                    tweet._json['user']['name'],
                    tweet._json['user']['screen_name'],
                    tweet._json['full_text'],
                    tweet.created_at,
                    tweet._json['favorite_count']
                    ])

### Step 2 : Create Pandas Dataframe from CSV of likes

I am using the csv, kangaroomatthew-1661467599.csv that I created in Step 1.

In [93]:
import pandas as pd
df = pd.read_csv('LIKEDTWEETS/kangaroomatthew-1661467599.csv',names=['Name','@','Tweet','Time','Likes'])
df

Unnamed: 0,Name,@,Tweet,Time,Likes
0,Ivan Pozderac,pozda,@UGD_Zephyr @DennisCode Here's the whole list ...,2022-08-18 09:22:36+00:00,262
1,Sakun,sakofchit,so i made a lil #CSS library that allows you t...,2022-08-11 15:27:31+00:00,2553
2,😡fermion,angryfermion,what the heck?? just watched all of The Matrix...,2022-08-05 22:35:06+00:00,5031
3,eevee,eevee,cool i must've somehow missed that only christ...,2022-08-02 02:27:13+00:00,1349
4,Ryan,RyanRhyn0,Was out on a walk and an old lady randomly sto...,2022-07-31 00:25:39+00:00,2
...,...,...,...,...,...
2414,alyssa,alyssevans,the only #BlueLives that matter to me TBH http...,2016-08-12 18:15:34+00:00,7
2415,wint,dril,imagine how fucked uop it would be to have a b...,2014-09-20 19:06:20+00:00,2282
2416,Dong Nguyen,dongatory,"I am sorry 'Flappy Bird' users, 22 hours from ...",2014-02-08 19:02:33+00:00,115108
2417,Danny DeVito,DannyDeVito,gaming,2009-10-16 20:06:37+00:00,200791


---
## <center>II. Predicting a Twitter User's Friends based on Likes</center>

##### Here are the accounts whose posts the user has liked most frequently.

#### <center>{<u>Key</u> = Twitter Account   :   <u>Value</u> = Number of times the user has liked a tweet from that account}<center>

In [96]:
mostliked = dict(df['@'].value_counts())
mostliked = {k:v for (k,v) in mostliked.items() if (k,v)[1]>1}
mostliked

{'niyacatt': 50,
 'alyssevans': 38,
 'orangejellie': 36,
 'booritney': 35,
 'existentialkale': 30,
 'BillRatchet': 30,
 'ysablanch': 28,
 'ninaxrosalie': 25,
 'Rada_Rada__': 24,
 'cake_hoarder': 24,
 'gaycrimsonrain': 22,
 'Chainbody': 22,
 'pvato13': 17,
 'akumar_Ftw': 15,
 'juicykelp': 15,
 'Sadieisonfire': 14,
 'father': 12,
 'stepsus69': 12,
 'pokmon_facts': 12,
 'stephanielzhou': 12,
 'hiyooun': 12,
 'BobbyKrane': 12,
 'postedinthecrib': 11,
 'shegonsuck': 11,
 'guwop': 10,
 'goodbeanalt': 10,
 'Kaylaaway_': 10,
 'seanslimed': 10,
 'mejor_patricio': 9,
 'tuIuna': 9,
 'beanytuesday': 9,
 'shar1qa': 8,
 'maaddiso': 8,
 'chanbanhi': 8,
 'rheeeah': 8,
 'thotfuss_': 7,
 'mineifiwildout': 7,
 'sweatyhairy': 7,
 'jaboukie': 7,
 'Public_Citizen': 6,
 'KylePlantEmoji': 6,
 'WTMMP': 6,
 'bambooney': 6,
 'superskrong': 6,
 'heygetoverhere': 6,
 'katanasIice': 6,
 'johnniathan': 6,
 'lilsasquatch66': 6,
 'tylerthecreator': 6,
 'iucipur': 5,
 'quartoporto2': 5,
 'JordanReitzes': 5,
 'vincestap


##### Here are the number of times that someone tweeted at the user, and the user liked that tweet. 

#### <center>{<u>Key</u> = Twitter Account   :   <u>Value</u> = Number of times that twitter account tweeted at the user, and the user liked the tweet}<center>

In [5]:
df[df['Tweet'].str.contains('kang')]['@'].value_counts()

stephanielzhou     6
niyacatt           4
ysablanch          3
NLinstad           3
pvato13            3
hiyooun            3
mejor_patricio     2
booritney          2
kognomee_          2
Rada_Rada__        2
jacobhjkim         1
juicykelp          1
akumar_Ftw         1
Kaylaaway_         1
orangejellie       1
K4M1___            1
_Sebass45_         1
GroovinwithC       1
Sebastian__Thor    1
alyssevans         1
lucasmitchell9     1
Valarian11         1
FM310DY            1
ipablo_m           1
_guop              1
Name: @, dtype: int64

In [64]:
numfollowers = []
for i in mostliked:
    numfollowers.append(api.get_user(screen_name = i)._json['followers_count'])


In [123]:
rel = pd.DataFrame(
    {'username': mostliked.keys(),
     'followers': numfollowers,
     "# of their tweets you've liked":mostliked.values(),
    })


In [124]:
rel.head(50)

Unnamed: 0,username,followers,# of their tweets you've liked
0,niyacatt,119,50
1,alyssevans,413,38
2,orangejellie,113,36
3,booritney,3558,35
4,existentialkale,2646,30
5,BillRatchet,450365,30
6,ysablanch,408,28
7,ninaxrosalie,89,25
8,Rada_Rada__,123,24
9,cake_hoarder,152132,24


### Limitations and problems encountered 

1. Rate Limiting
2. Maximum limit of 3200 liked tweets returned.