Application
=======
![twitter is an application](figs/application.png)

Interface
=====
![twitter has an interface](figs/interface.png)

Which can be accessed using a Programming langauge
=====================================================
![twitter api screenshot](figs/twitterapi.png)

The twitter Applications Programming Interface (API) can be found at: https://developer.twitter.com/en/docs


There are many ways to talk to twitter using Python, but for this workshop we will start by using the [tweepy](http://www.tweepy.org/) library. If you haven't installed it yet, [open a terminal](https://github.com/GCDigitalFellows/installdri.github.io/blob/master/anaconda.md) and type:
```bash
conda install -c conda-forge tweepy -y
```

What are authentication keys and access tokens?
===============
Just like people need usernames and passwords, so do programs that talk to websites. Twitter uses a protocal called [OAuth Autentication](http://tweepy.readthedocs.io/en/v3.6.0/auth_tutorial.html). Manage yours at https://apps.twitter.com/

![app management page](figs/register.png)


In [6]:
#import tweepy and my private file with my access
import tweepy

#replace my authentication credentials with yours
import my_tokens
consumer_key = my_tokens.twitter_consumer_key
consumer_secret = my_tokens.twitter_consumer_secret
access_token = my_tokens.twitter_access_token
access_token_secret = my_tokens.twitter_access_token_secret

In [8]:
# connect to twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

In [60]:
# let's get every tweet with hashtag "#digitalgc" in for the past week
# twitter api only allows searches for the past week
# api.search
digitalgc_tweets = api.search(q="#digitalgc", count=100, lang="en", 
                       since="2018-01-01")


In [61]:
# print digital gc to see what we have
digitalgc_tweets

[Status(_api=<tweepy.api.API object at 0x10f8834a8>, _json={'created_at': 'Wed Mar 14 03:30:39 +0000 2018', 'id': 973763311429963776, 'id_str': '973763311429963776', 'text': 'RT @psmfCUNY: Join us for an interdisciplinary presentation+panel discussion on critical issues in social media and digital literacy! Socia…', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'psmfCUNY', 'name': 'Social Mediums', 'id': 949918950, 'id_str': '949918950', 'indices': [3, 12]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 2802755778, 'id_str': '2802755778', 'name': 'GC LAILAC', 'screen_name': 'GC_LAILAC', 'location': '365 Fifth Ave., NYC, NY 10016', '

# How do we just get the fields we're interested in?
Look at the response object, which is documented in the [tweet data dictionary](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object). I'm interested in:
* created_at
* text
* retweet_count
* favorite_count
* user.name
* user.screen_name
* user.verified

In [62]:
# lets loop over the tweets in digitalgc
for tweet in digitalgc_tweets:
    print (tweet.created_at, tweet.text, tweet.retweet_count)

2018-03-14 03:30:39 RT @psmfCUNY: Join us for an interdisciplinary presentation+panel discussion on critical issues in social media and digital literacy! Socia… 12
2018-03-14 02:41:32 RT @jojokarlin: Don't miss @psmyth01's new blog post on programming paradigms! #digitalgc https://t.co/TS3YzpOblb 2
2018-03-14 02:31:41 RT @jojokarlin: Don't miss @psmyth01's new blog post on programming paradigms! #digitalgc https://t.co/TS3YzpOblb 2
2018-03-14 02:31:20 RT @nemersonian: Folks introduce themselves at Sound! with @KChatlosh and @rachelrakov. Lots of cool projects on sound and gender identity,… 2
2018-03-14 01:37:15 RT @psmfCUNY: Join us for an interdisciplinary presentation+panel discussion on critical issues in social media and digital literacy! Socia… 12
2018-03-14 00:56:09 RT @psmfCUNY: Join us for an interdisciplinary presentation+panel discussion on critical issues in social media and digital literacy! Socia… 12
2018-03-14 00:53:54 RT @nemersonian: Folks introduce themselves at Sound!

In [63]:
# lets store all the information in a structured way
# such as a list of Python dictionaries
tweet_list = []
for tweet in digitalgc_tweets:
    td = dict()
    td['created'] = tweet.created_at
    td['text'] = tweet.text
    td['retweets'] = tweet.retweet_count
    td['favorites'] = tweet.favorite_count
    td['user'] = tweet.user.name
    tweet_list.append(td)


In [64]:
# lets turn that list into a spreadsheet
import pandas as pd

tweets = pd.DataFrame(tweet_list)
tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
created      50 non-null datetime64[ns]
favorites    50 non-null int64
retweets     50 non-null int64
text         50 non-null object
user         50 non-null object
dtypes: datetime64[ns](1), int64(2), object(2)
memory usage: 2.0+ KB


In [65]:
#lets look at the first 5
tweets.head()

Unnamed: 0,created,favorites,retweets,text,user
0,2018-03-14 03:30:39,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,GC LAILAC
1,2018-03-14 02:41:32,0,2,RT @jojokarlin: Don't miss @psmyth01's new blo...,Stephen Zweibel
2,2018-03-14 02:31:41,0,2,RT @jojokarlin: Don't miss @psmyth01's new blo...,GC Digital Fellows
3,2018-03-14 02:31:20,0,2,RT @nemersonian: Folks introduce themselves at...,GC Digital Fellows
4,2018-03-14 01:37:15,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,CUNYAcademicCommons


In [66]:
# lets find the most retweeted tweet
tweets.sort_values(by="retweets", ascending=False)

Unnamed: 0,created,favorites,retweets,text,user
0,2018-03-14 03:30:39,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,GC LAILAC
11,2018-03-13 20:43:47,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Gerry Martini
16,2018-03-13 19:38:11,10,12,Join us for an interdisciplinary presentation+...,Social Mediums
15,2018-03-13 19:41:40,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Humanities Center GC
4,2018-03-14 01:37:15,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,CUNYAcademicCommons
5,2018-03-14 00:56:09,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,CUNYGCDI
14,2018-03-13 19:45:48,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Alise Tifentale
7,2018-03-14 00:08:45,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,CUNYChemistryPhD
13,2018-03-13 20:03:17,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Futures Initiative
12,2018-03-13 20:30:14,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Danica Savonick


In [76]:
# What if the data had more than a 100 results?
# need cursor to do paging to go further back - plus API has limits 
tweet_list = []
for tweet in tweepy.Cursor(api.search, q="#digitalgc", lang="en", 
                           count=100, since="2018-01-01").items():
    td = dict()
    td['created'] = tweet.created_at
    td['text'] = tweet.text
    td['retweets'] = tweet.retweet_count
    td['favorites'] = tweet.favorite_count
    td['user'] = tweet.user.name
    tweet_list.append(td)
    
tweets = pd.DataFrame(tweet_list)
#let's save our results
tweets.to_csv("digitalgc_tweets.csv")

# Now try with a hashtag that interests you