# Twitter API
By: Gloria Sladek

Using Twitter Data I hope to answer the question of how artists are using twitter as a platform. More specifically I would like to identify the types of art being posted: Ie traditional vs. Digital.

## Gathering Data

In [1]:
import requests
import pandas as pd
import json
import urllib

I imported the different libraries I would be using.

Then I called in my _twitter API_ text files as _csv_ and assigned them to a variable called __bearer_token.__ 

    The bearer token is used to access the twitter API

In [2]:
bearer_token = pd.read_csv("twitterApp.txt", header = 0)

I isolated the Bearer Token from its label using _iloc_ then formated a line that included the header of Authorization and 'Bearer' followed by the token value.

In [None]:
bearer_token['Bearer Token'].iloc[0]

In [4]:
header = {'Authorization' : 'Bearer {}'.format(bearer_token['Bearer Token'].iloc[0])}

I then set the endpoint url to a variable. This will allow me to use a query to search twitter for specified content.

In [5]:
endpoint_url = 'https://api.twitter.com/2/tweets/search/recent'

## Creating a Query

Next I built my query and assigned it to a variable. The function _urllib.parse.quote_ takes the search that I put in quotes and replaces the symbols with _%xx_.
        
        The query I chose is ment to identify tweets with one or more of the hastags listed. I added #art AND #digitalart 
        rather than OR because in my experiece #art is used alot out of the context of what I am looking for.

In [6]:
query = urllib.parse.quote('( #art #digitalart OR #digitalart OR #traditionalart OR #myart OR #Inktober) lang:en')

I created a variable to hold the tweet fields I wanted to get back from my query. Then I added my query to the end_point url defined earlier.

        Author_id: provides the id of the __user__ who posted
        id: provides the __tweet id__
        text: provides the __text__ of the tweet
        public_metrics: provides the __retweets, likes_count, reply_count,__ and __quote_count__.
        created_at: proved the __date__ on which the post was made
        
I added __entities__ as well, which contains __hashtags__. This field will be useful when answering questions on how an artist is using twitter. For example, I could use this information to check if posts with the #digitalart got more reactions then #traditionalart.

In [7]:
tweet_fields = 'author_id,id,text,public_metrics,created_at,entities'

I added the expansion "author_id" and added it to my query as _expansions_ containing "username".

In [8]:
expansions = 'author_id'

In [43]:
url = endpoint_url + '?query={}&tweet.fields={}&expansions={}&user.fields={}&max_results=100'.format(query,tweet_fields,expansions,"username")

## Creating Dataframes

Using __resquest__ from the _requests_ library i requested the data from the url I defined with headers and assigned that data to a variable.
    I then used __loads__ from the _json_ library to create a ___dictionary___ out of the text file.

In [11]:
response_1 = requests.request("GET", url, headers=header)

In [12]:
response_1_dict = json.loads(response_1.text)

I created a __three dataframes__, one that held the original data, one that held the user data, and one that held the public metrics data

In [13]:
df = pd.DataFrame(response_1_dict['data'])

In [14]:
user1 = pd.DataFrame(response_1_dict['includes']['users'])

In [15]:
metrics = pd.DataFrame(list(df['public_metrics']))

I then added the elements I wanted from each dictionary onto my primary dictionary. I did this with each of the separate "pages" of data.

In [16]:
df['name'] = user1['name']
df['username'] = user1['username']
df['likes'] = metrics['like_count']
df['retweets'] = metrics['retweet_count']
df['replys'] = metrics['reply_count']

Because Each dataset maxed out at 100, I had to access the next set or "page" of data. this is done with the _next_token_
    in order to access this I identified the keys of the dictionary and used the 'next_token' key from the 'meta' key. 

In [17]:
response_1_dict.keys()

dict_keys(['data', 'includes', 'meta'])

In [None]:
response_1_dict['meta']['next_token']

To access the more data:

    - I added the next token to my url and assigned it to a variable 
    - I then repeated the process above to create a new dictionary of data
    - I repeated this process 2 more times to create 4 different dictionaries
    

In [19]:
page2 = url +'&next_token={}'.format(response_1_dict['meta']['next_token'])

In [20]:
response_2 = requests.request("GET",url = page2, headers=header)

In [21]:
response_2_dict = json.loads(response_2.text)

In [22]:
df2 = pd.DataFrame(response_2_dict['data'])
user2 = pd.DataFrame(response_2_dict['includes']['users'])
metrics2 = pd.DataFrame(list(df2['public_metrics']))
ht2 = pd.DataFrame(list(df2['entities']))

In [23]:
df2['name'] = user2['name']
df2['username'] = user2['username']
df2['likes'] = metrics2['like_count']
df2['retweets'] = metrics2['retweet_count']
df2['replys'] = metrics2['reply_count']

In [24]:
page3 = url +'&next_token={}'.format(response_2_dict['meta']['next_token'])

In [25]:
response_3 = requests.request("GET",url = page3, headers=header)

In [26]:
response_3_dict = json.loads(response_3.text)

In [27]:
df3 = pd.DataFrame(response_3_dict['data'])
user3 = pd.DataFrame(response_3_dict['includes']['users'])
metrics3 = pd.DataFrame(list(df3['public_metrics']))

In [28]:
df3['name'] = user3['name']
df3['username'] = user3['username']
df3['likes'] = metrics3['like_count']
df3['retweets'] = metrics3['retweet_count']
df3['replys'] = metrics3['reply_count']

In [29]:
page4 = url +'&next_token={}'.format(response_3_dict['meta']['next_token'])

In [30]:
response_4 = requests.request("GET",url = page4, headers=header)

In [31]:
response_4_dict = json.loads(response_4.text)

In [32]:
df4 = pd.DataFrame(response_4_dict['data'])
user4 = pd.DataFrame(response_4_dict['includes']['users'])
metrics4 = pd.DataFrame(list(df4['public_metrics']))
ht4 = pd.DataFrame(list(df4['entities']))

In [33]:
df4['name'] = user4['name']
df4['username'] = user4['username']
df4['likes'] = metrics4['like_count']
df4['retweets'] = metrics4['retweet_count']
df4['replys'] = metrics4['reply_count']

10. I __combined__ the four dictionaries together to create a _400 row_ dataframe by using the __DataFrame__ function from the _pandas library_

## Combining Data Frames

In [34]:
tweets = pd.concat([df, df2, df3, df4])

I delete the 'public_metrics' column from my master Data Frame to avoid duplication. 

In [35]:
del tweets['public_metrics']

In [36]:
len(tweets.index) 

400

I used the __concat__ function from the _pandas_ library to combine all my previous Data Frames and make one master data frame I called "tweets". "tweets" was 400 rows long.

Then I used the __head__ and __tail__ functions to display the first and last 10 rows in my master dataframe.

In [37]:
tweets.head(10)

Unnamed: 0,author_id,entities,id,text,created_at,name,username,likes,retweets,replys
0,1439500749239504897,"{'hashtags': [{'start': 153, 'end': 157, 'tag'...",1450657432359694340,"Hello everyone!\n\nOur white list is now open,...",2021-10-20T02:57:38.000Z,fiveliondanceclub,fivelionsdc,0,0,0
1,1257491531491139587,"{'hashtags': [{'start': 67, 'end': 78, 'tag': ...",1450657426181660672,RT @Cubebrush: Depth sketching with Mental Can...,2021-10-20T02:57:36.000Z,Mr. Pineapple,MrPineapple7u7,0,931,0
2,1002180392361906177,"{'hashtags': [{'start': 37, 'end': 48, 'tag': ...",1450657415347777539,RT @SifaSeven: The first four in the #mooglejo...,2021-10-20T02:57:34.000Z,SifaSeven,SifaSeven,0,3,0
3,1011160246973292544,"{'hashtags': [{'start': 124, 'end': 134, 'tag'...",1450657408292843520,RT @KyuYongEom: Hahaha. I design clothes of va...,2021-10-20T02:57:32.000Z,Hepkept,ArnoldKang7,0,33,0
4,1108083423049105409,"{'hashtags': [{'start': 32, 'end': 43, 'tag': ...",1450657403331096576,RT @mcflarey: alright listen.. \n#digitalart #...,2021-10-20T02:57:31.000Z,🔞 BlackBerry~♪,berry_blk,0,111,0
5,1713751680,"{'hashtags': [{'start': 15, 'end': 24, 'tag': ...",1450657398901952520,RT @Asmerrith: #Inktober for this one I drew @...,2021-10-20T02:57:30.000Z,RYAN 🎃,RatherMalicious,0,5,0
6,1246921421315547146,"{'hashtags': [{'start': 39, 'end': 49, 'tag': ...",1450657388793671680,RT @scrapchallenge1: I fixed his eyes. #GoodOm...,2021-10-20T02:57:28.000Z,Manda Lynn,MandaLynn5304,0,41,0
7,578634352,"{'hashtags': [{'start': 40, 'end': 53, 'tag': ...",1450657373857615876,Day 19: Loop\nEh not the greatest pic...\n#ink...,2021-10-20T02:57:24.000Z,Mellera_Derg Gurl!,DragonMellera,0,0,0
8,1361763565196111876,"{'hashtags': [{'start': 139, 'end': 155, 'tag'...",1450657369097129986,Day 19: Loop\nA pen and ink octopus! 🐙 Need I ...,2021-10-20T02:57:23.000Z,Sarah Morrison,smorrisonartist,0,0,0
9,59698667,"{'hashtags': [{'start': 67, 'end': 78, 'tag': ...",1450657363011280902,RT @Cubebrush: Depth sketching with Mental Can...,2021-10-20T02:57:21.000Z,Sasha Valentine,Sasha_AT,0,931,0


In [38]:
tweets.tail(10)

Unnamed: 0,author_id,entities,id,text,created_at,name,username,likes,retweets,replys
90,1413896013388161030,"{'annotations': [{'start': 25, 'end': 29, 'pro...",1450654431515750400,RT @Bikini_Boody: I feel Hilda’s desire to not...,2021-10-20T02:45:42.000Z,がじら,gajira1341,0,2337,0
91,1368944485262974977,"{'hashtags': [{'start': 32, 'end': 37, 'tag': ...",1450654421994524673,RT @Rodrigo75532403: A new draw #cute #digital...,2021-10-20T02:45:40.000Z,Naoi,Naoi_O,0,1,0
92,819889946,"{'hashtags': [{'start': 46, 'end': 54, 'tag': ...",1450654414809796616,RT @noize_exe: Noizetober Pokedraws:\nMilotic\...,2021-10-20T02:45:39.000Z,Bettle Jam,BettleJam,0,4,0
93,1157768153755246594,"{'mentions': [{'start': 3, 'end': 17, 'usernam...",1450654414767788035,RT @ShiftyCatProd: Always watching\n\n-\n\nWhe...,2021-10-20T02:45:38.000Z,I Want To Commit Link Start,InactiveSpades,0,1,0
94,881630110046093312,"{'annotations': [{'start': 32, 'end': 35, 'pro...",1450654412041510917,"RT @JestaCrow: Inktober Day: 19 Yuri, Yuzuru, ...",2021-10-20T02:45:38.000Z,RandomDoodler,ARandomDoodler,0,1,0
95,109879816,"{'annotations': [{'start': 36, 'end': 48, 'pro...",1450654391137026050,RT @Cubebrush: Depth sketching with Mental Can...,2021-10-20T02:45:33.000Z,,,0,931,0
96,867763419847458816,"{'hashtags': [{'start': 63, 'end': 72, 'tag': ...",1450654383356719104,RT @Rafchu: Street Fighter rival schoolgirls K...,2021-10-20T02:45:31.000Z,,,0,482,0
97,1131363440721846277,"{'hashtags': [{'start': 23, 'end': 27, 'tag': ...",1450654382874365956,Late night sketch! ❤🐸\n\n#ink #inktober #inkto...,2021-10-20T02:45:31.000Z,,,0,0,0
98,832797067814662144,"{'annotations': [{'start': 25, 'end': 29, 'pro...",1450654380622065665,RT @Bikini_Boody: I feel Hilda’s desire to not...,2021-10-20T02:45:30.000Z,,,0,2337,0
99,1420789229181972483,"{'hashtags': [{'start': 23, 'end': 32, 'tag': ...",1450654363186311176,She’s got the zoomies\n\n#inktober #inktober20...,2021-10-20T02:45:26.000Z,,,0,1,0


11. I was satisfied with this data so I __exported__ my full data frame as a csv using the __to_csv__ function in python. 

I saved my data into the same folder as my jupyternotebook so that I may refer to it later when answering hypothesis.

In [39]:
tweets.to_csv(r'C:\Users\glori\Data in EMAT\tweets.csv')

## Potential weaknesses:
- This is only 400 tweets, in the grand scheme of things that is not much data. However it is enouph to give me some evidence to answer the questions I have. There are some issues with the data where it stands right now. For example, from _entities_ I only want _hashtag_ so I need to find a way to isolate that aspect and change the header to read __hashtags__. However the issue is that 'entities' is set as a string not a list, and I dont have the knowledge yet to parse it into a list. 

- It should also be noted that the like_count appears to be 0 for all the columns. This is most likely inacurate. For example a tweet with over 2000 retweets, such as the one on line 98, would most likely NOT hav 0 likes.

- there is a lot of duplicate code. It would be beneficial  to create a function that created each of the individual data frames so that I could simply add in a couple variables and cut down on the lines of code.

## Next Steps

In order to have the best results for my hypothesis I would need to:
1. Retrieve the 'tag' value from the _entities_ column
2. find a way to retrieve accurate "like" counts
3. Understand why some of the values are listed as NaN
4. create a function to combine and create the different "pages" of data.