# Session 3: Harvesting data from the web: APIs  

### A first API

[Chronicling America](http://chroniclingamerica.loc.gov/about/) is a joint project of the National Endowment for the Humanities and the Library of Congress .

Search for articles that mention "[slavery](http://chroniclingamerica.loc.gov/search/pages/results/?andtext=slavery)".

<div class="alert alert-info">

Look at the URL. What happens if you change the word slavery to abolition? 

What happens to the URL when you go to the second page? Can you get to page 251?

</div>

What if we append ``&format=json`` to the end of the search URL? 


http://chroniclingamerica.loc.gov/search/pages/results/?andtext=slavery&format=json


[``requests``](http://docs.python-requests.org/en/master/) is a useful and commonly used HTTP library for python. It is not a part of the default installation, but is included with Anaconda Python Distribution. 

In [None]:
import requests

It would be possible to use the API URL and parameters directly in the requests command, but since the most likely scenario involves making repeating calls to ``requests`` as part of a loop -- the search returned less than 1% of the results -- I store the strings first. 

In [None]:
base_url   = 'http://chroniclingamerica.loc.gov/search/pages/results/'
parameters = '?andtext=slavery&format=json'

`requests.get()` is used for both accessing websites and APIs. The command can be modified by several arguements, but at a minimum, it requires the URL.

In [None]:
r = requests.get(base_url + parameters)

`r` is a `requests` response object. Any JSON returned by the server are stored in `.json().`

In [None]:
search_json = r.json()

JSONs are dictionary like objects, in that they have keys (think variable names) and values. `.keys()` returns a list of the keys.

In [None]:
search_json.keys()

You can return the value of any key by putting the key name in brackets.

In [None]:
search_json['totalItems']

<div class="alert alert-info">
What else is in there? Where is the stuff we want?
</div>

As is often the case with results from an API, most of the keys and values are metadate about either the search or what is being returned. These are useful for knowing if the search is returning what you want, which is particularly important when you are making multiple calls to the API. 

The data I'm intereted in is all in `items`. 

In [None]:
type(search_json['items'])

In [None]:
len(search_json['items'])

`items` is a list with 20 items.

In [None]:
type(search_json['items'][3])

Each of the 20 items in the list is a dictionary. 

In [None]:
first_item = search_json['items'][0]

first_item.keys()

<div class="alert alert-info">
What is the title of the first item?
</div>

While a standard CSV file has a header row that describes the contents of each column, a JSON file has keys identifying the values found in each case. Importantly, these keys need not be the same for each item. Additionally, values don't have to be numbers of strings, but could be lists or dictionaries. For example, this JSON could have included a `newspaper` key that was a dictionary with all the metadata about the newspaper the article and issue was published, an `article` key that include the article specific information as another dictionary, and a `text` key whose value was a string with the article text.

As before, we can examine the contents of a particular item, such as the publication's `title`.

In [None]:
first_item['ocr_eng']

In [None]:
print(first_item['ocr_eng'])

The easiest way to view or analyze this data is to convert it to a dataset-like structure. While Python does not have a builting in dataframe type, the popular `pandas` library does. By convention, it is imported as `pd`.

In [None]:
print(first_item['ocr_eng'][:200])

In [None]:
import pandas as pd

# Make sure all columns are displayed
pd.set_option("display.max_columns",101)

pandas is prety smart about importing different JSON-type objects and converting them to dataframes with its `.DataFrame()` function.

In [None]:
df = pd.DataFrame(search_json['items'])

df.head(6)

Note that I converted `search_json['items']` to  dataframe and not the entire JSON file. This is because I wanted each row to be an article. 

In [None]:
pd.DataFrame(search_json)

If this dataframe contained all the items that you were looking for, it would be easy to save this to a csv file for storage and later analysis.

In [None]:
df.to_csv('lynching_articles.csv')

In [None]:
df.to_csv('lynching_articles.csv', encoding='utf8')

In [None]:
!head lynching_articles.csv

<div class="alert alert-info">
<h3> Your turn</h3>
<p> Conduct your own search of the API. Store the results in a csv file.

</div>



In [None]:
r = requests.get('https://exchangeratesapi.io/api/latest?base=EUR')

In [None]:
pd.DataFrame(r.json())

<div class="alert alert-info">
<h3> Your turn</h3>
<p>What is the current exchange rate using the Norwegian krone as the base rate?</code> Save the results in a new csv file.

</div>



This is only a small subset of the articles on lynching that are available, however. The API returns results in batches of 20 and this is only the first page of results. As is often the case, I'll need to make multiple calls to the API to retrieve all the data of interest. The easiest way to do that is to define a small function for getting the article information and put that in a loop. While it isn't a requirement that you create a function for making the API call, it will make your code easier to read and debug.


Looking at the API guidelines, there is an additional paramater `page` that tells the API which subset of results we want. This name varies by API but their is usually some mechanism for retrieiving results beyond the initial JSON.

Before creating the loop and making multiple calls to the API, I want to make sure that the API is working the way I think it is. 

<div class="alert alert-info">
Look at the API guidelines. How can we get the third page?
</div>


[Guidelines](https://chroniclingamerica.loc.gov/about/api/)

In [None]:
base_url   = 'http://chroniclingamerica.loc.gov/search/pages/results/'
parameters = '?andtext=slavery&format=json&page=3'

r = requests.get(base_url + parameters)
results =  r.json()

print results['startIndex']
print results['endIndex']

A call to random selected page 3 returns results 41 through 60, which is what I expected since each page has 20 items.

The parameters are getting pretty ugly, so fortunately `requests` accepts a dictionary where the keys are the parameter names as defined by the API and the values are the search paramaters you are looking for. So the same request can be rewritten:

In [None]:
base_url = 'http://chroniclingamerica.loc.gov/search/pages/results/'
parameters = {'andtext': 'lynching',
              'page'   : 3,
              'format'  : 'json'}

r = requests.get(base_url, params=parameters)

results =  r.json()

results['startIndex'], results['endIndex']

This can be rewritten as function:

In [None]:
def get_articles():
    '''
    Make calls to the Chronicling America API.
    '''

    base_url = 'http://chroniclingamerica.loc.gov/search/pages/results/'
    parameters = {'andtext': 'lynching', 'page': 3, 'format': 'json'}

    r = requests.get(base_url, params=parameters)
    results = r.json()

    return results

In [None]:
results = get_articles()

results['startIndex'], results['endIndex']

The advantage of writing a function, however, would be that you can pass along your own parameters, such as the search term and page number, which would make this much more useful. 

In [None]:
def get_articles(search_term, page_number):
    '''
    Make calls to the Chronicling America API.
    '''
    
    base_url = 'http://chroniclingamerica.loc.gov/search/pages/results/'
    parameters = {'andtext': search_term,
                  'page'   : page_number,
                  'format' : 'json'}
    
    r = requests.get(base_url, params = parameters)
    results =  r.json()

    return results

In [None]:
results = get_articles('lynching', 3)

results['startIndex'], results['endIndex']

In [None]:
results = get_articles('cows', 45)

results['startIndex'], results['endIndex']

In [None]:
pd.DataFrame(results['items'])

<div class="alert alert-info">
<h3> Your turn</h3>
<p>This url
<p><code>https://itunes.apple.com/search?term=beyonce&entity=song</code>
<p>will return 50 songs in the iTunes store by Beyoncé.
<p>
<p>Write a function that will return the results of a call for any artists into a dataframe. Hint: inspect the contents of the resulting JSON to make sure you are getting what you want.</p>
<p>
<p><b>Bonus challenge:</b> Open up a new notebook. Se the first cell type to "Markdown" and write a brief introduction about your function. But your code in subsequent cells! Don't forget your <code>import</code> statements!  
    
</div>

[API Manual](https://affiliate.itunes.apple.com/resources/documentation/itunes-store-web-service-search-api/)

Back to Chronicling America. Now, the first 60 results could downloaded in a just a few lines:

In [None]:
for page_number in [1, 2, 3]: 
    print(page_number)
    

In [None]:
for page_number in range(1, 4): 
    print(page_number)
    

In [None]:
for page_number in range(1,4):
    
    results = get_articles('lynching', page_number)
    results['startIndex'], results['endIndex']
    

Everything appears to be working, but unfortunately I only have the last page of results still. Each call to the API was redefining `results` variable. In this case, I set up an empty dataframe to store the results and will append the items from each page of results.

In [None]:
dfs = [] # empty list to store dataframes

for page_number in range(1,4):
    results = get_articles('lynching', page_number)
    new_df = pd.DataFrame(results['items'])
    
    dfs.append(new_df) 

df = pd.concat(dfs, ignore_index = True)
df.info()

For a large download, you would still want to tweak this a bit by pausing between each API call and making it robust to internet or API errors, but this is a solid framework for collecting data from an API.

In [None]:
from time import sleep

In [None]:
dfs = [] # empty list to store dataframes

for page_number in range(1,4):
    results = get_articles('lynching', page_number)
    new_df = pd.DataFrame(results['items'])
    
    dfs.append(new_df) 
    sleep(1)
    print('Getting page: ' + str(page_number))
    
df = pd.concat(dfs, ignore_index = True)
df.info()

<div class="alert alert-info">
<h3> Your turn</h3>
<p>Can you modify your iTunes search to get more results?  
    
</div>

[API Manual](https://affiliate.itunes.apple.com/resources/documentation/itunes-store-web-service-search-api/)

### How about Twitter?

In [6]:
import pandas as pd

In [1]:
from twython import Twython

Sign up as a developer

In [2]:
APP_KEY            = 'J8TGgv1SlKgAtqvxGZzc9XiNx'
APP_SECRET         = '8bEieGM73FLqbnWu6WcTR3vM6ICfEBEmQ8lXgqojw5IL1uzQ0Z'
OAUTH_TOKEN        = '594565064-erQRFPOFk520ePJjR86b9H2PTPxAF9i1d3A7pzjp'
OAUTH_TOKEN_SECRET = 'qGwK0Jz7f0YyYlQIoLSJ8FJCIX3ydFQBo0yJOePsaWXL9'



Store your credentials

In [3]:
twitter = Twython(APP_KEY,
                  APP_SECRET,
                  OAUTH_TOKEN,
                  OAUTH_TOKEN_SECRET)

Start your searches!

In [4]:
user_timeline = twitter.get_user_timeline(screen_name='oprah')

In [7]:
pd.DataFrame(user_timeline)

Unnamed: 0,contributors,coordinates,created_at,entities,favorite_count,favorited,geo,id,id_str,in_reply_to_screen_name,...,is_quote_status,lang,place,possibly_sensitive,retweet_count,retweeted,source,text,truncated,user
0,,,Sat Dec 08 16:00:00 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",8860,False,,1071434225944068104,1071434225944068104,,...,False,en,,False,1200,False,"<a href=""https://studio.twitter.com"" rel=""nofo...",Every father has a dream for their family and ...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
1,,,Fri Dec 07 16:00:10 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",3549,False,,1071071878901456896,1071071878901456896,,...,False,en,,False,566,False,"<a href=""https://studio.twitter.com"" rel=""nofo...",.@ItsGabrielleU and @DwyaneWade dispel many my...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
2,,,Tue Dec 04 23:34:30 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",2714,False,,1070099055382908928,1070099055382908928,,...,False,en,,False,375,False,"<a href=""https://studio.twitter.com"" rel=""nofo...",Why wait until after the holidays to get healt...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
3,,,Tue Dec 04 18:51:13 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",10935,False,,1070027764072296448,1070027764072296448,,...,False,en,,False,3175,False,"<a href=""http://twitter.com/download/iphone"" r...",This story struck my heart. I’ve done this a 1...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
4,,,Wed Nov 21 21:00:01 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",3928,False,,1065349132452122624,1065349132452122624,,...,False,en,,False,551,False,"<a href=""https://studio.twitter.com"" rel=""nofo...",Roll call! Greenleafers…this is the EXPLOSIVE ...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
5,,,Tue Nov 20 22:12:30 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",11077,False,,1065004989372743680,1065004989372743680,,...,False,en,,False,1159,False,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...","Look who I got to meet! Little Kaavia James, t...",True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
6,,,Mon Nov 19 21:00:00 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",5827,False,,1064624353227169792,1064624353227169792,,...,False,en,,False,1035,False,"<a href=""https://studio.twitter.com"" rel=""nofo...",Did you know that 1 in 8 Americans struggles w...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
7,,,Sat Nov 17 17:00:00 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",18033,False,,1063839179958878210,1063839179958878210,,...,False,en,,False,2518,False,"<a href=""https://studio.twitter.com"" rel=""nofo...",Seeing all your comments on social &amp; feeli...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
8,,,Thu Nov 15 18:03:56 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",28843,False,,1063130492969480192,1063130492969480192,,...,False,en,,False,5982,False,"<a href=""https://studio.twitter.com"" rel=""nofo...","Michelle, I never thought of it that way befor...",True,"{'id': 19397785, 'id_str': '19397785', 'name':..."
9,,,Thu Nov 15 03:16:43 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",34,False,,1062907220382375936,1062907220382375936,Deborahjoywina1,...,False,en,,,6,False,"<a href=""http://twitter.com/download/iphone"" r...",@Deborahjoywina1 @DeborahJWinans @MerleDandrid...,True,"{'id': 19397785, 'id_str': '19397785', 'name':..."




<div class="alert alert-info">
<h3> Your turn</h3>
<p> Find the tweets from someone else. If you add <code>, count = 200 </code> after the username, you can get up to 200 tweets. Do it!

</div>



In [8]:
python_tweets = twitter.search(q='ipynb', count=200)

In [11]:
pd.DataFrame(python_tweets['statuses'])

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
0,,,Wed Dec 12 11:53:57 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072821855306825728,1072821855306825728,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com/download/android"" ...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 369558203, 'id_str': '369558203', 'name..."
1,,,Wed Dec 12 11:23:17 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,1,False,,1072814138756354050,1072814138756354050,...,,,,0,False,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",stil transferi ile alakalı bir başka güzel dem...,False,"{'id': 68239920, 'id_str': '68239920', 'name':..."
2,,,Wed Dec 12 11:14:33 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072811943663153154,1072811943663153154,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 1083927277, 'id_str': '1083927277', 'na..."
3,,,Wed Dec 12 11:03:29 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072809155159883776,1072809155159883776,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 636890886, 'id_str': '636890886', 'name..."
4,,,Wed Dec 12 10:51:18 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 1072609199291547648, 'id_str...",0,False,,1072806089765216257,1072806089765216257,...,,,,5,False,{'created_at': 'Tue Dec 11 21:49:07 +0000 2018...,"<a href=""http://twitter.com/download/iphone"" r...",RT @tdualdir: ワイもCNNでくずし字の分類やってみた（＾ω＾）\nhttps:...,False,"{'id': 2370471067, 'id_str': '2370471067', 'na..."
5,,,Wed Dec 12 10:39:30 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072803120068128770,1072803120068128770,...,,,,0,False,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",Hint of the day\nProbability Density Functions...,True,"{'id': 126638564, 'id_str': '126638564', 'name..."
6,,,Wed Dec 12 10:36:36 +0000 2018,"{'hashtags': [{'text': 'bpstudy', 'indices': [...",,0,False,,1072802391966154752,1072802391966154752,...,,,,1,False,{'created_at': 'Wed Dec 12 10:36:08 +0000 2018...,"<a href=""http://twitter.com/download/iphone"" r...",RT @mamono_jingu: https://t.co/fp8rtGXOie #bps...,False,"{'id': 14946295, 'id_str': '14946295', 'name':..."
7,,,Wed Dec 12 10:36:08 +0000 2018,"{'hashtags': [{'text': 'bpstudy', 'indices': [...",,1,False,,1072802274823495682,1072802274823495682,...,,,,1,False,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",https://t.co/fp8rtGXOie #bpstudy googleコラボラトリー...,False,"{'id': 623559636, 'id_str': '623559636', 'name..."
8,,,Wed Dec 12 10:29:02 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072800488461090816,1072800488461090816,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 611662254, 'id_str': '611662254', 'name..."
9,,,Wed Dec 12 10:22:05 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,4,False,,1072798736676470784,1072798736676470784,...,,,,4,False,,"<a href=""https://about.twitter.com/products/tw...",What's the orbital future of MarCO-A &amp; B? ...,True,"{'id': 989455044650119168, 'id_str': '98945504..."


In [None]:
python_tweets

In [None]:
python_tweets['search_metadata']

In [None]:
python_tweets.keys()

In [None]:
pd.DataFrame(python_tweets['statuses'])

In [12]:
python_tweets = twitter.search(q     = 'ipynb', 
                               count = 200,
                               maxid = 1023186440514142207)                        

In [13]:
pd.DataFrame(python_tweets['statuses'])

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
0,,,Wed Dec 12 11:53:57 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072821855306825728,1072821855306825728,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com/download/android"" ...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 369558203, 'id_str': '369558203', 'name..."
1,,,Wed Dec 12 11:23:17 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,1,False,,1072814138756354050,1072814138756354050,...,,,,0,False,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",stil transferi ile alakalı bir başka güzel dem...,False,"{'id': 68239920, 'id_str': '68239920', 'name':..."
2,,,Wed Dec 12 11:14:33 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072811943663153154,1072811943663153154,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 1083927277, 'id_str': '1083927277', 'na..."
3,,,Wed Dec 12 11:03:29 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072809155159883776,1072809155159883776,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 636890886, 'id_str': '636890886', 'name..."
4,,,Wed Dec 12 10:51:18 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 1072609199291547648, 'id_str...",0,False,,1072806089765216257,1072806089765216257,...,,,,5,False,{'created_at': 'Tue Dec 11 21:49:07 +0000 2018...,"<a href=""http://twitter.com/download/iphone"" r...",RT @tdualdir: ワイもCNNでくずし字の分類やってみた（＾ω＾）\nhttps:...,False,"{'id': 2370471067, 'id_str': '2370471067', 'na..."
5,,,Wed Dec 12 10:39:30 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072803120068128770,1072803120068128770,...,,,,0,False,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",Hint of the day\nProbability Density Functions...,True,"{'id': 126638564, 'id_str': '126638564', 'name..."
6,,,Wed Dec 12 10:36:36 +0000 2018,"{'hashtags': [{'text': 'bpstudy', 'indices': [...",,0,False,,1072802391966154752,1072802391966154752,...,,,,1,False,{'created_at': 'Wed Dec 12 10:36:08 +0000 2018...,"<a href=""http://twitter.com/download/iphone"" r...",RT @mamono_jingu: https://t.co/fp8rtGXOie #bps...,False,"{'id': 14946295, 'id_str': '14946295', 'name':..."
7,,,Wed Dec 12 10:36:08 +0000 2018,"{'hashtags': [{'text': 'bpstudy', 'indices': [...",,1,False,,1072802274823495682,1072802274823495682,...,,,,1,False,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",https://t.co/fp8rtGXOie #bpstudy googleコラボラトリー...,False,"{'id': 623559636, 'id_str': '623559636', 'name..."
8,,,Wed Dec 12 10:29:02 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,0,False,,1072800488461090816,1072800488461090816,...,,,,4,False,{'created_at': 'Wed Dec 12 10:22:05 +0000 2018...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @poliastro_py: What's the orbital future of...,False,"{'id': 611662254, 'id_str': '611662254', 'name..."
9,,,Wed Dec 12 10:22:05 +0000 2018,"{'hashtags': [], 'symbols': [], 'user_mentions...",,4,False,,1072798736676470784,1072798736676470784,...,,,,4,False,,"<a href=""https://about.twitter.com/products/tw...",What's the orbital future of MarCO-A &amp; B? ...,True,"{'id': 989455044650119168, 'id_str': '98945504..."


In [14]:
df = pd.DataFrame(python_tweets['statuses'])
df['text'].values

array(["RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd community!\n\nhttps://t.co/…",
       'stil transferi ile alakalı bir başka güzel demo: https://t.co/6cLW79Pgb7',
       "RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd community!\n\nhttps://t.co/…",
       "RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd community!\n\nhttps://t.co/…",
       'RT @tdualdir: ワイもCNNでくずし字の分類やってみた（＾ω＾）\nhttps://t.co/SZrUoo8IAY https://t.co/71NJebzsfV',
       'Hint of the day\nProbability Density Functions in Python: the early evaluation approach (numpy array) and the lazy e… https://t.co/ZxAwo0zsrG',
       'RT @mamono_jingu: https://t.co/fp8rtGXOie #bpstudy googleコラボラトリーってこれかな？ #bpstudy',
       'https://t.co/fp8rtGXOie #bpstudy googleコラボラトリーってこれかな？ #bpstudy',
       "RT @poliastro_py: What's

In [15]:
for status_update in df['text'].values:
    print(status_update)

RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd community!

https://t.co/…
stil transferi ile alakalı bir başka güzel demo: https://t.co/6cLW79Pgb7
RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd community!

https://t.co/…
RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd community!

https://t.co/…
RT @tdualdir: ワイもCNNでくずし字の分類やってみた（＾ω＾）
https://t.co/SZrUoo8IAY https://t.co/71NJebzsfV
Hint of the day
Probability Density Functions in Python: the early evaluation approach (numpy array) and the lazy e… https://t.co/ZxAwo0zsrG
RT @mamono_jingu: https://t.co/fp8rtGXOie #bpstudy googleコラボラトリーってこれかな？ #bpstudy
https://t.co/fp8rtGXOie #bpstudy googleコラボラトリーってこれかな？ #bpstudy
RT @poliastro_py: What's the orbital future of MarCO-A &amp; B? Check out this notebook shared in the @LibreSpace_Fnd co



<div class="alert alert-info">
<h3> Your turn</h3>
<p> Do a search!
</div>




Get 3,200 tweets from somone

In [None]:
pd.DataFrame(user_timeline).iloc[-1]['id']

In [None]:
user_timeline = twitter.get_user_timeline(screen_name='oprah', 
                                         count = 200,
                                         max_id = 929540230465458177)

In [None]:
def get_timeline(screen_name):
    tweets = []
    user_timeline = twitter.get_user_timeline(screen_name=screen_name,
                                              count = 200)
    df = pd.DataFrame(user_timeline)
    tweets.append(df)
    
    most_recent = pd.DataFrame(user_timeline).iloc[-1]['id']
    
    for i in range(0,15):
        user_timeline = twitter.get_user_timeline(screen_name=screen_name,
                                              count = 200,
                                                 max_id = most_recent)
        df = pd.DataFrame(user_timeline)
        tweets.append(df)
        most_recent = pd.DataFrame(user_timeline).iloc[-1]['id']
    
    tweet_df = pd.concat(tweets, ignore_index = True)
    return tweets    
    