# Twitter API
By: Gloria Sladek

Using Twitter Data I hope to answer the question of how artists are using twitter as a platform. More specifically I would like to identify the types of art being posted: Ie traditional vs. Digital.

## Gathering Data

In [1]:
import requests
import pandas as pd
import json
import urllib

I imported the different libraries I would be using.

Then I called in my _twitter API_ text files as _csv_ and assigned them to a variable called __bearer_token.__ 

    The bearer token is used to access the twitter API

In [2]:
bearer_token = pd.read_csv("twitterApp.txt", header = 0)

I isolated the Bearer Token from its label using _iloc_ then formated a line that included the header of Authorization and 'Bearer' followed by the token value.

In [None]:
bearer_token['Bearer Token'].iloc[0]

In [4]:
header = {'Authorization' : 'Bearer {}'.format(bearer_token['Bearer Token'].iloc[0])}

I then set the endpoint url to a variable. This will allow me to use a query to search twitter for specified content.

In [5]:
endpoint_url = 'https://api.twitter.com/2/tweets/search/recent'

## Creating a Query

Next I built my query and assigned it to a variable. The function _urllib.parse.quote_ takes the search that I put in quotes and replaces the symbols with _%xx_.
        
        The query I chose is ment to identify tweets with one or more of the hastags listed. I added #art AND #digitalart 
        rather than OR because in my experiece #art is used alot out of the context of what I am looking for.

In [6]:
query = urllib.parse.quote('( #art #digitalart OR #digitalart OR #traditionalart OR #myart OR #Inktober) lang:en')

I created a variable to hold the tweet fields I wanted to get back from my query. Then I added my query to the end_point url defined earlier.

        Author_id: provides the id of the __user__ who posted
        id: provides the __tweet id__
        text: provides the __text__ of the tweet
        public_metrics: provides the __retweets, likes_count, reply_count,__ and __quote_count__.
        created_at: proved the __date__ on which the post was made
        
I added __entities__ as well, which contains __hashtags__. This field will be useful when answering questions on how an artist is using twitter. For example, I could use this information to check if posts with the #digitalart got more reactions then #traditionalart.

In [7]:
tweet_fields = 'author_id,id,text,public_metrics,created_at,entities'

I added the expansion "author_id" and added it to my query as _expansions_ containing "username".

In [8]:
expansions = 'author_id'

In [9]:
url = endpoint_url + '?query={}&tweet.fields={}&expansions={}&user.fields={}&max_results=100'.format(query,tweet_fields,expansions,"username")

## Creating Dataframes

Using __resquest__ from the _requests_ library i requested the data from the url I defined with headers and assigned that data to a variable.
    I then used __loads__ from the _json_ library to create a ___dictionary___ out of the text file.

In [10]:
response_1 = requests.request("GET", url, headers=header)

In [11]:
response_1_dict = json.loads(response_1.text)

I created a __three dataframes__, one that held the original data, one that held the user data, and one that held the public metrics data

In [12]:
df = pd.DataFrame(response_1_dict['data'])

In [13]:
user1 = pd.DataFrame(response_1_dict['includes']['users'])

In [14]:
metrics = pd.DataFrame(list(df['public_metrics']))

I then added the elements I wanted from each dictionary onto my primary dictionary. I did this with each of the separate "pages" of data.

In [15]:
df['name'] = user1['name']
df['username'] = user1['username']
df['likes'] = metrics['like_count']
df['retweets'] = metrics['retweet_count']
df['replys'] = metrics['reply_count']

Because Each dataset maxed out at 100, I had to access the next set or "page" of data. this is done with the _next_token_
    in order to access this I identified the keys of the dictionary and used the 'next_token' key from the 'meta' key. 

In [16]:
response_1_dict.keys()

dict_keys(['data', 'includes', 'meta'])

In [None]:
response_1_dict['meta']['next_token']

To access the more data:

    - I added the next token to my url and assigned it to a variable 
    - I then repeated the process above to create a new dictionary of data
    - I repeated this process 2 more times to create 4 different dictionaries
    

In [18]:
page2 = url +'&next_token={}'.format(response_1_dict['meta']['next_token'])

In [19]:
response_2 = requests.request("GET",url = page2, headers=header)

In [20]:
response_2_dict = json.loads(response_2.text)

In [21]:
df2 = pd.DataFrame(response_2_dict['data'])
user2 = pd.DataFrame(response_2_dict['includes']['users'])
metrics2 = pd.DataFrame(list(df2['public_metrics']))
ht2 = pd.DataFrame(list(df2['entities']))

In [22]:
df2['name'] = user2['name']
df2['username'] = user2['username']
df2['likes'] = metrics2['like_count']
df2['retweets'] = metrics2['retweet_count']
df2['replys'] = metrics2['reply_count']

In [23]:
page3 = url +'&next_token={}'.format(response_2_dict['meta']['next_token'])

In [24]:
response_3 = requests.request("GET",url = page3, headers=header)

In [25]:
response_3_dict = json.loads(response_3.text)

In [26]:
df3 = pd.DataFrame(response_3_dict['data'])
user3 = pd.DataFrame(response_3_dict['includes']['users'])
metrics3 = pd.DataFrame(list(df3['public_metrics']))

In [27]:
df3['name'] = user3['name']
df3['username'] = user3['username']
df3['likes'] = metrics3['like_count']
df3['retweets'] = metrics3['retweet_count']
df3['replys'] = metrics3['reply_count']

In [28]:
page4 = url +'&next_token={}'.format(response_3_dict['meta']['next_token'])

In [29]:
response_4 = requests.request("GET",url = page4, headers=header)

In [30]:
response_4_dict = json.loads(response_4.text)

In [31]:
df4 = pd.DataFrame(response_4_dict['data'])
user4 = pd.DataFrame(response_4_dict['includes']['users'])
metrics4 = pd.DataFrame(list(df4['public_metrics']))
ht4 = pd.DataFrame(list(df4['entities']))

In [32]:
df4['name'] = user4['name']
df4['username'] = user4['username']
df4['likes'] = metrics4['like_count']
df4['retweets'] = metrics4['retweet_count']
df4['replys'] = metrics4['reply_count']

I __combined__ the four dictionaries together to create a _400 row_ dataframe by using the __DataFrame__ function from the _pandas library_

## Combining Data Frames

In [33]:
tweets = pd.concat([df, df2, df3, df4])

I delete the 'public_metrics' column from my master Data Frame to avoid duplication. 

In [34]:
del tweets['public_metrics']

In [35]:
len(tweets.index) 

400

I used the __concat__ function from the _pandas_ library to combine all my previous Data Frames and make one master data frame I called "tweets". "tweets" was 400 rows long.

Then I used the __head__ and __tail__ functions to display the first and last 10 rows in my master dataframe.

In [36]:
tweets.head(10)

Unnamed: 0,created_at,id,entities,text,author_id,name,username,likes,retweets,replys
0,2021-10-20T03:37:30.000Z,1450667466196201473,"{'urls': [{'start': 216, 'end': 239, 'url': 'h...",When it hits your cheeks and you feel it in th...,937347617658363904,Julius J Jervoso,ArtbyJoebalde,0,0,0
1,2021-10-20T03:37:30.000Z,1450667466187677696,"{'mentions': [{'start': 3, 'end': 13, 'usernam...",RT @larapedan: Mother Night\n#inktober #inktob...,2448621548,S A R R A H 🍥🔩🎀,SarrahPrinsesa,0,42,0
2,2021-10-20T03:37:30.000Z,1450667463637491713,"{'mentions': [{'start': 3, 'end': 17, 'usernam...","RT @nanasibrushes: ""Lan Zhan, let's go back ho...",158554940,战해 🌻☀️,lmaoHae,0,30,0
3,2021-10-20T03:37:27.000Z,1450667452438704128,"{'urls': [{'start': 59, 'end': 82, 'url': 'htt...",Girls 👩🏻‍🤝‍👩🏼\n#procreate \n#digitalillustrati...,1447588345035452420,Suina 千千,fangsuinas,0,0,0
4,2021-10-20T03:37:27.000Z,1450667451457413120,"{'mentions': [{'start': 3, 'end': 19, 'usernam...",RT @eugenia_kelheor: Day 17 of Fantober with K...,1344895880558731266,🕸️| spooky month spooky sluts,mistalucilfer,0,127,0
5,2021-10-20T03:37:26.000Z,1450667449263796225,"{'urls': [{'start': 148, 'end': 171, 'url': 'h...",Couldn't figure out what to draw for today so ...,881304831276843009,Crapworks,crapworks1980,0,0,0
6,2021-10-20T03:37:24.000Z,1450667441047097345,"{'mentions': [{'start': 3, 'end': 10, 'usernam...",RT @Rafchu: Street Fighter rival schoolgirls K...,2852005678,Kyle Parker,Kylethemonkey24,0,542,0
7,2021-10-20T03:37:18.000Z,1450667413523947520,"{'mentions': [{'start': 3, 'end': 15, 'usernam...","RT @judyhopps44: Oh, hello and on the fifteent...",397149313,Hisou,hisousihou,0,14,0
8,2021-10-20T03:37:16.000Z,1450667405076844549,"{'annotations': [{'start': 16, 'end': 44, 'pro...",RT @AFinnstark: Sekiro Vs The Demon of Hatred ...,1221195406706647040,Ruisu,RuisuTheFallen,0,464,0
9,2021-10-20T03:37:14.000Z,1450667399779270658,"{'annotations': [{'start': 36, 'end': 48, 'pro...",RT @Cubebrush: Depth sketching with Mental Can...,951066608889356288,Powdered Donut Bwushies~,bwushies,0,1029,0


In [37]:
tweets.tail(10)

Unnamed: 0,created_at,id,entities,text,author_id,name,username,likes,retweets,replys
90,2021-10-20T03:23:10.000Z,1450663859035983874,"{'urls': [{'start': 179, 'end': 202, 'url': 'h...",My next Inktober drawing is called Star Phanto...,774385051282862080,artofjim,Art0fJim,1,0,0
91,2021-10-20T03:23:10.000Z,1450663858947956736,"{'hashtags': [{'start': 22, 'end': 32, 'tag': ...",RT @sayatale: Today's #monstober prompt: Corru...,897983798,雨宮ハルノフ,amamiyap9,0,3,0
92,2021-10-20T03:23:08.000Z,1450663849489678337,"{'hashtags': [{'start': 85, 'end': 89, 'tag': ...",RT @BoredSourHeads: Been okay LFG washed up…ne...,1349639691428003843,,,0,1,0
93,2021-10-20T03:23:06.000Z,1450663840690184195,"{'urls': [{'start': 135, 'end': 158, 'url': 'h...","I'm not proud of this one, but I wanted to get...",941534340,,,1,0,0
94,2021-10-20T03:23:03.000Z,1450663828727877633,"{'hashtags': [{'start': 85, 'end': 89, 'tag': ...",RT @BoredSourHeads: Been okay LFG washed up…ne...,1349639691428003843,,,0,2,0
95,2021-10-20T03:23:00.000Z,1450663816795017219,"{'mentions': [{'start': 3, 'end': 19, 'usernam...",RT @dalmatian_guard: December Dalmatian.\n\nA ...,1340457607060852737,,,0,3,0
96,2021-10-20T03:23:00.000Z,1450663814488379392,"{'hashtags': [{'start': 23, 'end': 32, 'tag': ...",RT @HatebitX: Day 9 of #inktober and I'm under...,1076494890081677313,,,0,7,0
97,2021-10-20T03:22:59.000Z,1450663811682234368,"{'urls': [{'start': 115, 'end': 138, 'url': 'h...",Star Wars Inktober day 17: Collide\ndope scene...,3086934937,,,1,0,0
98,2021-10-20T03:22:57.000Z,1450663802576375815,"{'urls': [{'start': 87, 'end': 110, 'url': 'ht...",RT @Rafchu: Street Fighter rival schoolgirls K...,991532009913696257,,,0,542,0
99,2021-10-20T03:22:55.000Z,1450663796557574145,"{'hashtags': [{'start': 76, 'end': 80, 'tag': ...",RT @BoredSourHeads: Been okay LFG washed up…ne...,1349639691428003843,,,0,2,0


I was satisfied with this data so I __exported__ my full data frame as a csv using the __to_csv__ function in python. 

I saved my data into the same folder as my jupyternotebook so that I may refer to it later when answering hypothesis.

In [38]:
tweets.to_csv(r'C:\Users\glori\Data in EMAT\tweets.csv')

## Potential weaknesses:
- This is only 400 tweets, in the grand scheme of things that is not much data. However it is enouph to give me some evidence to answer the questions I have. There are some issues with the data where it stands right now. For example, from _entities_ I only want _hashtag_ so I need to find a way to isolate that aspect and change the header to read __hashtags__. However the issue is that 'entities' is set as a string not a list, and I dont have the knowledge yet to parse it into a list. 

- It should also be noted that the like_count appears to be 0 for all the columns. This is most likely inacurate. For example a tweet with over 2000 retweets, such as the one on line 98, would most likely NOT hav 0 likes.

- there is a lot of duplicate code. It would be beneficial  to create a function that created each of the individual data frames so that I could simply add in a couple variables and cut down on the lines of code.

## Next Steps

In order to have the best results for my hypothesis I would need to:
1. Retrieve the 'tag' value from the _entities_ column
2. find a way to retrieve accurate "like" counts
3. Understand why some of the values are listed as NaN
4. create a function to combine and create the different "pages" of data.