# Using APIs to Get Data From the Internet


**API** means Application Programmer Interface

An API is a set of instructions that describe how computers can interact with each other to request and receive information.

Some important questions we will ask that help us discover APIs is below.

|Question | In technical terms |
|:---------|:--------------------|
|Where is my data? | What is the domain? |
|How do I learn what data is available?| Where is the documentation? |
|How do I request specific data?| How do I formulate a URL for a specific purpose? |
|How do I interpret the data?| What is the structure and format of the output?|



**Let's walk through an example in the browser**

PlaceKitten!

In a browser, go to http://www.placekitten.com

|In technical terms | PlaceKitten |
|:---------|:--------------------|
|What is the domain? | http://www.placekitten.com |
|Where is the documentation?| The documentation is on the home page. |
|How do I formulate a URL for a specific purpose? | You put it in the url like http://www.placekitten/width/height |
|What is the structure and format of the output?| It's an image! |

# Accessing placekitten in python

We're going to use a special library called <code>requests</code>

In [2]:
from IPython.display import display, Image  # This line lets you display images. We'll use that in a bit.

# This line lets you use python to download data from the web.
import requests

In [3]:
# Get a 200 by 300 image from placekitten.
r = requests.get('http://www.placekitten.com/200/300')

In [4]:
# Look at the status code
r.status_code

200

In [5]:
# print the content
r.content

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00`\x00`\x00\x00\xff\xfe\x00;CREATOR: gd-jpeg v1.0 (using IJG JPEG v80), quality = 65\n\xff\xdb\x00C\x00\x0b\x08\x08\n\x08\x07\x0b\n\t\n\r\x0c\x0b\r\x11\x1c\x12\x11\x0f\x0f\x11"\x19\x1a\x14\x1c)$+*($\'\'-2@7-0=0\'\'8L9=CEHIH+6OUNFT@GHE\xff\xdb\x00C\x01\x0c\r\r\x11\x0f\x11!\x12\x12!E.\'.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE\xff\xc0\x00\x11\x08\x01,\x00\xc8\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc

In [6]:
# Use the Image function to display the image
display(Image(r.content))

<IPython.core.display.Image object>

### Exercise 1

Write a function that takes in the width and height and prints an image

In [7]:
# Get a 300 by 400 image from placekitten.
r = requests.get('http://www.placekitten.com/300/400')

In [8]:
r.status_code

200

In [9]:
r.content

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00`\x00`\x00\x00\xff\xfe\x00;CREATOR: gd-jpeg v1.0 (using IJG JPEG v80), quality = 65\n\xff\xdb\x00C\x00\x0b\x08\x08\n\x08\x07\x0b\n\t\n\r\x0c\x0b\r\x11\x1c\x12\x11\x0f\x0f\x11"\x19\x1a\x14\x1c)$+*($\'\'-2@7-0=0\'\'8L9=CEHIH+6OUNFT@GHE\xff\xdb\x00C\x01\x0c\r\r\x11\x0f\x11!\x12\x12!E.\'.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE\xff\xc0\x00\x11\x08\x01\x90\x01,\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc

In [10]:
# Use the Image function to display the image
display(Image(r.content))

<IPython.core.display.Image object>

In [11]:
def get(width, height):
    new = request.get('http://www.placekitten.com/{height}/{400}')

### Exercise 2

Can you write a loop to show several images?


In [12]:
# Write a loop that shows multiple images

n_images = 5 # Change for number of images to display

for i_image in range(0, n_images):
    width_image  = 200 + i_image 
    height_image = 300 + i_image 
    r = requests.get('http://www.placekitten.com/' + str(width_image) + '/' + str(height_image))
    display(Image(r.content))
    r = 0


<IPython.core.display.Image object>

<IPython.core.display.Image object>

<IPython.core.display.Image object>

<IPython.core.display.Image object>

<IPython.core.display.Image object>

# Example 2: Getting World Times

This example introduces a slightly more complicated API. It also introduces **JSON** which is a very common data format.

The API (including some documentation) is at http://worldtimeapi.org/

In [13]:
# Download list of time zones
r = requests.get("http://worldtimeapi.org/api/timezone")
print(r.content)

b'["Africa/Abidjan","Africa/Accra","Africa/Algiers","Africa/Bissau","Africa/Cairo","Africa/Casablanca","Africa/Ceuta","Africa/El_Aaiun","Africa/Johannesburg","Africa/Juba","Africa/Khartoum","Africa/Lagos","Africa/Maputo","Africa/Monrovia","Africa/Nairobi","Africa/Ndjamena","Africa/Sao_Tome","Africa/Tripoli","Africa/Tunis","Africa/Windhoek","America/Adak","America/Anchorage","America/Araguaina","America/Argentina/Buenos_Aires","America/Argentina/Catamarca","America/Argentina/Cordoba","America/Argentina/Jujuy","America/Argentina/La_Rioja","America/Argentina/Mendoza","America/Argentina/Rio_Gallegos","America/Argentina/Salta","America/Argentina/San_Juan","America/Argentina/San_Luis","America/Argentina/Tucuman","America/Argentina/Ushuaia","America/Asuncion","America/Atikokan","America/Bahia","America/Bahia_Banderas","America/Barbados","America/Belem","America/Belize","America/Blanc-Sablon","America/Boa_Vista","America/Bogota","America/Boise","America/Cambridge_Bay","America/Campo_Grande","A

### Exercise 3

Use the .json() function to get the response converted to a dictionary or list

In [14]:
# Use the .json() function to get the response converted to a dictionary or list
# What did it return?
import json



In [15]:
# Opening JSON file
json.loads(r.content)
  

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Algiers',
 'Africa/Bissau',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/El_Aaiun',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Khartoum',
 'Africa/Lagos',
 'Africa/Maputo',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Sao_Tome',
 'Africa/Tripoli',
 'Africa/Tunis',
 'Africa/Windhoek',
 'America/Adak',
 'America/Anchorage',
 'America/Araguaina',
 'America/Argentina/Buenos_Aires',
 'America/Argentina/Catamarca',
 'America/Argentina/Cordoba',
 'America/Argentina/Jujuy',
 'America/Argentina/La_Rioja',
 'America/Argentina/Mendoza',
 'America/Argentina/Rio_Gallegos',
 'America/Argentina/Salta',
 'America/Argentina/San_Juan',
 'America/Argentina/San_Luis',
 'America/Argentina/Tucuman',
 'America/Argentina/Ushuaia',
 'America/Asuncion',
 'America/Atikokan',
 'America/Bahia',
 'America/Bahia_Banderas',
 'America/Barbados',
 'America/Belem',
 'America/Belize',
 'America/Blanc-Sablon',
 'America/Boa_Vista

### Exercise 4

Get the time for your time zone

In [16]:
# Your code here
jsonIndiana = json.loads(requests.get("http://worldtimeapi.org/api/timezone/America/Indiana/Indianapolis").content)

In [17]:
jsonIndiana['datetime']

'2021-10-28T14:44:04.181725-04:00'

In [18]:
Timeloc = jsonIndiana['datetime'].find("T")
print(Timeloc)
time   = jsonIndiana['datetime'][Timeloc + 1: Timeloc +  9 ]
print(time)

10
14:44:04


### Exercise 5

Get the time for your IP address

In [19]:
# Get the time for your IP address

jsonIP = json.loads(requests.get("http://worldtimeapi.org/api/ip").content)
jsonIP

{'abbreviation': 'EDT',
 'client_ip': '69.174.156.149',
 'datetime': '2021-10-28T14:44:09.623462-04:00',
 'day_of_week': 4,
 'day_of_year': 301,
 'dst': True,
 'dst_from': '2021-03-14T07:00:00+00:00',
 'dst_offset': 3600,
 'dst_until': '2021-11-07T06:00:00+00:00',
 'raw_offset': -18000,
 'timezone': 'America/Indiana/Indianapolis',
 'unixtime': 1635446649,
 'utc_datetime': '2021-10-28T18:44:09.623462+00:00',
 'utc_offset': '-04:00',
 'week_number': 43}

In [20]:
TimeIP = jsonIP['datetime'].find("T")
print(TimeIP)
time   = jsonIP['datetime'][TimeIP + 1: TimeIP +  9 ]
print(time)

10
14:44:09


# Example 3: Getting Wikipedia pages

Wikipedia also has an open API, and I want to use it to show one other tip for using the `requests` library; many APIs will take in a set of parameters, which you can pass as a parameter dictionary.

The documentation for the very extensive API is [here](https://www.mediawiki.org/wiki/API:Main_page). Many of the operations require you to authenticate (which we will cover next), but some things, like getting the content of a page, do not.

For example, the following code gets the recent changes to Wikipedia.

In [2]:
import requests

endpt = 'https://en.wikipedia.org/w/api.php'


def get_last_pages_changed(n):
    params = {'action': 'query',
          'format': 'json',
          'list': 'recentchanges',
          'rcnamespace': '0',
          'rclimit': n}
    r = requests.get(endpt, params = params)
    #print(r.json())
    #print(r.json()['query']['recentchanges'])
    result = []
    content = r.json()['query']['recentchanges']
    for page in content:
        result.append(page['title'])
    return result

In [3]:
get_last_pages_changed(20)

['Duplain Township, MI',
 'Good Sam (TV series)',
 'Have a Good Time (Ruth Brown album)',
 'DuPage County, IL',
 'Anti-communism',
 'Bela Lugosi filmography',
 'Vuzu',
 'Dunwoody, GA',
 'Fraser, Iowa',
 'Dunstable, MA',
 'Mulan (Disney character)',
 'Starfire (Teen Titans)',
 'Vu!',
 'Dunsmuir, CA',
 'Dunseith, ND',
 'Governor-General of Pakistan',
 "October 2021 Sudanese coup d'état",
 'List of defunct Canadian companies',
 'Dunreith, IN',
 'Dunnstown, PA']

## Exercise 6

Review the documentation (and Google) to see if you can figure out how to get a list of all of the users who have ever edited the most recently edited Wikipedia page.

In [4]:
## Your code here
print(get_last_pages_changed(1))


def get_contributors():
    params = {'action': 'query',
          'format': 'json',
          'prop': 'contributors',
          'titles': get_last_pages_changed(1)[0]}
    r = requests.get(endpt, params=params)
    #print(r.json())
    info = next(iter(r.json()['query']['pages'].items()))[1]
    #print(info)
    if 'anoncontributors' in info:
        anon = info['anoncontributors']
    else:
        anon = 0
        
    contributors = info['contributors']
    
    #print(anon, 'anonymous contributors')
    #print(contributors)
  
    result = []
    
    for user in contributors:
        result.append(user['name'])
        
    result.append(str(anon) + ' anonymous contributor(s)')
    return result

get_contributors()

['Durham, KS']


['Red Director', 'Sethbot', '0 anonymous contributor(s)']

# Example 4: Intro to Twitter API

In order to use the Twitter API, you need to do two things:

1. Install tweepy. This is a python library designed to make it easier to use the API (rather than using `requests` directly. I made [this video](https://www.youtube.com/watch?v=TASX3evcgG4) to walk you through how to install tweepy in Anaconda.

2. To use the Twitter API, you need to be authenticated, and so you need a developer account. [This page](https://wiki.communitydata.science/Intro_to_Programming_and_Data_Science_(Summer_2020)/Twitter_authentication_setup) explains how to get a developer account.

Once you have your keys, you should create a file called `twitter_authentication.py` in the same directory as this file. It should contain the following four lines (replace the fake strings below with the corresponding keys from your twitter account):

```
CONSUMER_KEY = 'zFxMGdKmbo4e72X8Fi2FYr54v'
CONSUMER_SECRET = 'SetuIC9x6zPQXPZrc9cKTph7AMSngUZSf745GXT0QZTrnWeELQ'
ACCESS_TOKEN = '16614440-V09URsqNfP0V0JYZCD65NhpJAcPZ6Wb9A5ar9JrUT'
ACCESS_TOKEN_SECRET = 'oxVSzC1OjXOVVYrBvGyy6XKKe772Jdvvw6Opb3bSLdIb'
```

Note that the consumer key and consumer secret are called API key and API secret in the new Twitter interface.

In general, it is a good practice to keep your keys (which should be secret) separate from your code, which you can share. In this case, we put them in a different file and then import them.

The following code loads the tweepy library and imports these keys from the `twitter_authentication.py` file, and then prepares to "log in" to your account for the Twitter API.

In [5]:
import tweepy

from twitter_authentication import CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# We then create an api object, based on the auth object created with your credentials
api = tweepy.API(auth, wait_on_rate_limit = True)

## Rate Limiting

You will quickly learn that the Twitter API is "rate limited". This means that they will only let each account make a certain number of calls to their API in a given time period. The default rate is quite low - many calls only allow 15 calls per 15 minutes.

You may notice above that we had the code:
```
api = tweepy.API(auth, wait_on_rate_limit = True)
```
the `wait_on_rate_limit=True` tells your code to wait for 15 minutes if it gets back a message that you've exceeded a rate limit. This can get annoying when debugging, so be careful with how often you try things - sometimes it makes sense, for example, to try to get a small amount of data that only takes one call and make sure that your code works before trying to get all of the data.

## Timeline

This first example is just to make sure it's working. It should print out the last 100 tweets from your timeline.

In [6]:
# Grab the last 100 tweets
public_tweets = api.home_timeline(count = 100)

# And print the text from them
for tweet in public_tweets:
    print(tweet.text)


BREAKING: Criminal complaint filed with court accuses former New York Gov. Andrew Cuomo of groping a woman. Cuomo r… https://t.co/SZ4tcFpjuR
La futura colocación de 4.364 millones de acciones, que con el precio de hoy en la Bolsa de Valores de Colombia (BV… https://t.co/oLWHhQumyt
UPDATE: On Thursday afternoon, the U.S. Supreme Court vacated a legal stay, allowing the state to move forward with… https://t.co/8cxxxlWEwF
Police surround home in Cypress Park as residents shelter in place https://t.co/4w8luZwIEK
Early voting, surging in both parties, shows signs of becoming a way of life in Virginia https://t.co/MirRTwgmcP
Detenida una persona tras la muerte de un menor de nueve años en Lardero (La Rioja) https://t.co/thDzsoWGYJ
Más de 20 millones de pasajeros se han movilizado vía aérea en 2021. Entérese acá 
👉  https://t.co/hoT1zZXsRE https://t.co/8knVEdFRZ9
Jennifer Lawrence protagonizó el último estilismo más apetecible del otoño. https://t.co/GF8WT3bMWr
The 'Dune' sequel announcement 

Each of these `tweet` objects contains lots of additional information. This shows all of the metadata available for the last one we looked at.

In [7]:
tweet._json

{'created_at': 'Thu Oct 28 21:07:07 +0000 2021',
 'id': 1453830713589964803,
 'id_str': '1453830713589964803',
 'text': 'RT @hgfernan: R ideas -- Benchmark para R, usando programa de cálculo de eps\nhttps://t.co/14dIUgvCJH   \n\nApresentação de benchmarks em R, s…',
 'truncated': False,
 'entities': {'hashtags': [],
  'symbols': [],
  'user_mentions': [{'screen_name': 'hgfernan',
    'name': 'Hilton Fernandes',
    'id': 45841081,
    'id_str': '45841081',
    'indices': [3, 12]}],
  'urls': [{'url': 'https://t.co/14dIUgvCJH',
    'expanded_url': 'https://youtu.be/_ezGRrlXWcA',
    'display_url': 'youtu.be/_ezGRrlXWcA',
    'indices': [77, 100]}]},
 'source': '<a href="http://127.0.0.1" rel="nofollow">rstatsretweetingtool</a>',
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 1011817655957893120,
  'id_str': '1011817655957893120',
  'name': 'Rstats',
  's

You can try to change the `count` argument above, and you'll quickly learn that if you raise it over 200, you will still only get 200 tweets. If you want to print more than 200 tweets, you may need to use a [cursor](http://docs.tweepy.org/en/v3.5.0/cursor_tutorial.html).

This is basically tweepy's clever way of breaking what you want to do into multiple calls to the API.

For example, this call will get 350 tweets. The `count` argument (optional) says how many tweets to get per call, and the argument in `.items()` is how many to get in total.

In [8]:
for tweet in tweepy.Cursor(api.home_timeline, count = 175).items(350):
    print(tweet.text)

"Sometimes surrender means giving up trying to understand and becoming comfortable with not knowing." - Eckhart Tol… https://t.co/pLX0utohtr
How someone’s level of physical activity can influence their sleep quality and overall well-being https://t.co/xUfpUBH2Gq
#LoUltimo | Sin embargo, este carné de vacunación contra la Covid-19 se exigirá para aquellos que reserven espacios… https://t.co/KeaZuxvpsy
It found that more than 1 million people live in areas of the county that by mid-century will experience an additio… https://t.co/aV14mFmgEq
Their research found an even stronger correlation for the most heavily Latino neighborhoods, which in the Los Angel… https://t.co/BnYdSm4ckj
In a study of 20 metro areas in the southwestern U.S., UC Davis researchers found that on extreme heat days, Califo… https://t.co/Yy4OYgZW2E
#Comunicado | Durante la audiencia pública sobre las afectaciones por la ola invernal en Putumayo la Ministra María… https://t.co/sGZHgnjBnb
BREAKING: Criminal complaint fil

Global equities, U.S. yields rise despite weak U.S. economic growth data https://t.co/AEtm5zQUk7 https://t.co/6qpMrsadj6
Week of Meals: Cookbook author Dawn Perry's genius weeknight recipes https://t.co/NNCw15AXXT
Join ESL Library and Florida educators at SSTESOL 2021! We will be doing virtual presentations on Friday from 1:30… https://t.co/S8iAooCN9T
😷 ¿Cómo le fue a Colombia con el manejo de la pandemia? Hablemos del tema con los epidemiólogos @ZulmaCucunuba y… https://t.co/eF8jpKiIJb
¡Respeta el espacio del peatón! Cuando invades a la cebra pones en riesgo la vida de todas las personas que están c… https://t.co/KoCvFMrJsm
Kaapo Kakko likely returning to Rangers' lineup against Blue Jackets https://t.co/ao9bq1XpyB https://t.co/PcieAA0U1h
¡Yo quiero una píldora mágica! ¿Y tú amiga? Sigue leyendo aquí:https://t.co/WSMLhHQFAg https://t.co/aKy3WQNTqU
There have been bats and endangered butterflies, wild and rare bees; a coyote in Central Park; beavers and salamand… https://t.co/FJbkjD2NL

Si lo pruebas, no lo cambias.
https://t.co/kaGjFUQaac
Tu localidad tiene voz y tú serás quien la difunda. Cuétanos tu historia y dirígela junto a nuestro equipo de Cuida… https://t.co/ZsFrObhXW1


## Followers

You can also get information about a user, such as who their followers are.

Here's information about me and some of my followers.

In [9]:
user = api.get_user(screen_name = 'jdfoote')

print(user.screen_name + " has " + str(user.followers_count) + " followers.")

print("They include these 100 people:")

for follower in user.followers(count=100):
    print(follower.screen_name)

jdfoote has 826 followers.
They include these 100 people:
BTimmOSU
wes_deng
cal_liang
nprandchill
asbruckman
ywu2450
hyperrealestate
Communalytic
Work_Lina
SandraCmgo
abirsaha_
Helena_K1
CallieKalny
_mohsen_m
JohnyVegasLaw
PrinceAduGyamf6
takeoutphoto
ElaTheGrad
vermouthkuo
isabellagbrown
UNLCommDept
ZeningDuan_Ze
oeasy102
Ruth66011432
hyejin_youn
mansour_ameera
ettaboyle
DavidnLang
zbensonzhou
AmbedkarCaravan
HonglinB
BrianePVSamson
annjoann_jo
AryaArshia
YulinYuResearch
sabeti_sepehr
SteelCityKid87
divyasiddarth
shagunjhaver
Rachel_chloe_1
audelau
RummanShikder1
Testingapi9
mercedes_rss
ctokelly
dianeljacks
Peter_G_Royal
arvidmartin
OnkarSadekar
Subhasree249
joely_wu
profbohns
nic_tilly
UTWthePodcast
pmannino5
saiphcita
SrishtiPatil15
nishalsach
ragazzacciorock
rachelsteenblik
jalbacutler
diptodas175
shahanmemon
MattNicholswag
solicitous_sri
KavehKadkhoda
Lalhorr
richardhuskey
arjun_s2
tuanhe_lee
LINKatNU
yanniknoc
zarinahagnew
dvshah
hci_vis
DGaff_
MIT_CSAIL
James95223946
feedkoko
S

Here is what that user object looks like for my user

In [10]:
user._json

{'id': 16614440,
 'id_str': '16614440',
 'name': 'Jeremy Foote',
 'screen_name': 'jdfoote',
 'location': 'West Lafayette, IN, USA',
 'profile_location': None,
 'description': 'Assistant Prof of Communication at #Purdue. Computational social scientist: online organizations, collective decision making. @comdatasci member. Dad. (he/him)',
 'url': 'https://t.co/vovAhYeNah',
 'entities': {'url': {'urls': [{'url': 'https://t.co/vovAhYeNah',
     'expanded_url': 'http://jeremydfoote.com',
     'display_url': 'jeremydfoote.com',
     'indices': [0, 23]}]},
  'description': {'urls': []}},
 'protected': False,
 'followers_count': 826,
 'friends_count': 985,
 'listed_count': 22,
 'created_at': 'Mon Oct 06 14:43:31 +0000 2008',
 'favourites_count': 5533,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': True,
 'verified': False,
 'statuses_count': 2910,
 'lang': None,
 'status': {'created_at': 'Tue Oct 26 22:06:17 +0000 2021',
  'id': 1453120828753813510,
  'id_str': '1453120828753813510',


And here's the user object for one of my followers, which is nearly identical.

In [11]:
follower._json

{'id': 75654382,
 'id_str': '75654382',
 'name': 'mastress of public health 🏳️\u200d⚧️🏳️\u200d⚧️🏳️\u200d⚧️',
 'screen_name': 'onion_technique',
 'location': 'philly',
 'description': "living my chillest academia life at @TempleSBS\n\nlet's talk about mass media and behavior change",
 'url': None,
 'entities': {'description': {'urls': []}},
 'protected': False,
 'followers_count': 206,
 'friends_count': 774,
 'listed_count': 0,
 'created_at': 'Sat Sep 19 22:08:27 +0000 2009',
 'favourites_count': 6612,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': True,
 'verified': False,
 'statuses_count': 1262,
 'lang': None,
 'status': {'created_at': 'Thu Oct 28 17:46:10 +0000 2021',
  'id': 1453780141906309121,
  'id_str': '1453780141906309121',
  'text': 'RT @ashtroid22: Please, and I can not stress this enough…stop emailing me.',
  'truncated': False,
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [{'screen_name': 'ashtroid22',
     'name': 'Ashley Holub, PhD',
  

Note that 200 is the maximum number of followers that you can get at one time. If you want to get information about all of a user's followers, you will need to use a cursor. If you are getting many followers, you will almost certainly hit rate limits.

In [12]:
f = []
for follower in tweepy.Cursor(api.get_followers, screen_name='jdfoote', count=200).items():
    #print(follower.screen_name)
    f.append(follower.screen_name)

In [13]:
print(f)

['BTimmOSU', 'wes_deng', 'cal_liang', 'nprandchill', 'asbruckman', 'ywu2450', 'hyperrealestate', 'Communalytic', 'Work_Lina', 'SandraCmgo', 'abirsaha_', 'Helena_K1', 'CallieKalny', '_mohsen_m', 'JohnyVegasLaw', 'PrinceAduGyamf6', 'takeoutphoto', 'ElaTheGrad', 'vermouthkuo', 'isabellagbrown', 'UNLCommDept', 'ZeningDuan_Ze', 'oeasy102', 'Ruth66011432', 'hyejin_youn', 'mansour_ameera', 'ettaboyle', 'DavidnLang', 'zbensonzhou', 'AmbedkarCaravan', 'HonglinB', 'BrianePVSamson', 'annjoann_jo', 'AryaArshia', 'YulinYuResearch', 'sabeti_sepehr', 'SteelCityKid87', 'divyasiddarth', 'shagunjhaver', 'Rachel_chloe_1', 'audelau', 'RummanShikder1', 'Testingapi9', 'mercedes_rss', 'ctokelly', 'dianeljacks', 'Peter_G_Royal', 'arvidmartin', 'OnkarSadekar', 'Subhasree249', 'joely_wu', 'profbohns', 'nic_tilly', 'UTWthePodcast', 'pmannino5', 'saiphcita', 'SrishtiPatil15', 'nishalsach', 'ragazzacciorock', 'rachelsteenblik', 'jalbacutler', 'diptodas175', 'shahanmemon', 'MattNicholswag', 'solicitous_sri', 'Kaveh

## Searching

For most of your research, you may be interested in how people are talking about a given topic. There are two main ways to do this.

The first is the search API ([Official Twitter info on the Search API](https://developer.twitter.com/en/docs/tweets/search/overview)). We only have access to "[Standard Search](https://developer.twitter.com/en/docs/tweets/search/overview/standard)", the most limited of Twitter Search API options, which is limited to the last 7 days.


**Note that if you would like to use Twitter for a project or a paper, you can request access to the Academic Research API, which includes historical search and a much higher limit on the number of tweets you can request**

Unforutnately, tweepy doesn't yet support the v2 API for Twitter, but [here is an example of how to use it](https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/master/Full-Archive-Search/full-archive-search.py) with just requests.


[This page](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets) is the documentation for Standard Search and has some helpful intel about modifying the parameters.

Below is a simple example that gets the last 20 tweets about data science.

In [14]:
public_tweets = api.search_tweets('"from:@jdfoote"', count=20)

for tweet in public_tweets:
    print(tweet.user.screen_name + "\t" + str(tweet.created_at) + "\t" + tweet.text)

jdfoote	2021-10-26 22:06:17+00:00	RT @eegilbert: I hope we get meaningful change out of the Facebook leak. I'm worried, however, that the one definite thing we're going to g…
jdfoote	2021-10-26 19:05:15+00:00	RT @asbruckman: A qualitative study of small subreddits (between 5 and 100 posters per month) finds that they provide value for their membe…
jdfoote	2021-10-23 14:06:16+00:00	RT @dannagal: This is so important. As some scholars have tried to “debunk the echo chamber myth” (that SM causes most users to restrict to…
jdfoote	2021-10-23 03:05:00+00:00	RT @nickchk: I've always found it interesting how it is drastically cheaper to rent a U-Haul than a regular car. Sometimes less than a thir…
jdfoote	2021-10-21 17:22:02+00:00	@suhemparack Follower/followee networks


Note that many of these results are truncated. If you want the full tweet, you actually have to modify the call a little bit, like so.

In [15]:
public_tweets = api.search_tweets('"data science"', count = 20, tweet_mode = 'extended')

for tweet in public_tweets:
    print(tweet.user.screen_name + "\t" + str(tweet.created_at) + "\t" + tweet.full_text)

iPythonistaBot	2021-10-28 21:18:27+00:00	RT @byLilyV: #FEATURED #COURSES

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on #machine #learning tutoria…
coder_487	2021-10-28 21:18:25+00:00	RT @EdKwedar: The Best #DataScience #Books for Beginners and Experts in 2021. #BigData #Analytics #IoT #IIoT #Python #RStats #TensorFlow #J…
coder_487	2021-10-28 21:18:25+00:00	RT @gp_pulipaka: AI Best: A Great List of 60 #DataScience #Books for ML People. #BigData #Analytics #GIS #IoT #IIoT #Python #RStats #Tensor…
iPythonistaBot	2021-10-28 21:18:24+00:00	RT @EdKwedar: The Best #DataScience #Books for Beginners and Experts in 2021. #BigData #Analytics #IoT #IIoT #Python #RStats #TensorFlow #J…
aProgrammerBot	2021-10-28 21:18:24+00:00	RT @EdKwedar: The Best #DataScience #Books for Beginners and Experts in 2021. #BigData #Analytics #IoT #IIoT #Python #RStats #TensorFlow #J…
Noberg4	2021-10-28 21:18:23+00:00	RT @EdKwedar: The Best #DataScience #Books for Beginners and Exper

### Additional Search resources

* [Tweepy extended tweets documentation](http://docs.tweepy.org/en/latest/extended_tweets.html)
* [Twitter documentation for crafting queries](https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators). This includes things like how to search by geography or remove retweets.

## Streaming

The other option is to "stream" tweets. Instead of looking backward, this just keeps you connected to Twitter and whenever new tweets come in, they are sent to your program. You would typicaly just keep the program running and keep writing the data that you want to an external file.

As with the search API, there are some caveats. One is that (I believe) there is no guarantee that this is all of the tweets that match. If you try to filter by very popular terms, then Twitter may give you only a sample of them.

In [None]:
class Streamer(tweepy.Stream):
    def on_status(self, tweet):
        print(tweet.author.screen_name + "\t" + tweet.text)

    def on_error(self, status_code):
        print( 'Error: ' + repr(status_code))
        return False

streamer = Streamer(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET)


keywords = ['Purdue', '"data science"']
streamer.filter(track = keywords)

DawnBlueberry	@RobertFaturechi Why only going after Butr. So did @KLoeffler David Purdue, Trump, Kushner, Mnuchin etc etc
PGSGinfo	RT @GradProfCareers: Dr. Keenan Shimko, @purduedepthist alumnus, now in Mgmt Consulting, will discuss career diversity &amp; careers beyond aca…
NushPowell	RT @GradProfCareers: Dr. Keenan Shimko, @purduedepthist alumnus, now in Mgmt Consulting, will discuss career diversity &amp; careers beyond aca…
WatkinsShow	Oregons last 5 games 
vs. Colorado
@ Washington 
vs. Washington State
@ Utah 
vs. Oregon State 

Ohio States last 5… https://t.co/oX7ZmctBye
LeeDoe11	@JaniceClaire14 @RobertFaturechi Exactly! TFG put a stop to investigations of Purdue and Loeffler. He was pissed at Burr.
scot23132	Tom Petty &amp;Prince brought us joy &amp; beauty. 
Purdue Pharma/Sacklers brought untold suffering pain &amp; needless death t… https://t.co/F1KrNfPLnC
scot23132	@RexChapman Tom Petty &amp;Prince brought us joy &amp; beauty. 
Purdue Pharma/Sacklers brought untold suffering

AFSKaegan	@theStevenRuiz 2020 joe burrow would take purdue to a Natty
madalynkathrynm	RT @JeromeAdamsMD: Got my flu shot at Purdue University! @purduepharmacy https://t.co/SvUfH9JzxO
madalynkathrynm	RT @JeromeAdamsMD: I would never ask anyone to do something for their health that I’m not willing to do myself. That’s why every year I get…
HailVarsity	With Purdue coming to town, Scott Frost liked what he’s seen from his team coming off its first bye week after eigh… https://t.co/gA5zBExh7t
andyluther300	@CBBonFOX How is Brian Cardinal not on the list? No Purdue player should make the list over Cardinal.
HoosierJV	RT @TheMopLady: Cody’s starting to look like Purdue sophomore.
BarmakN	As public institutions are starved of state support to fulfill their mission, their newly empowered administrators… https://t.co/TrzMrjm8cG
BarmakN	Take a toxic asset like Kaplan, slap the name "Purdue" on it and sell, sell, sell like the for-profit entity that K… https://t.co/CKvG90NiQv
BarmakN	Needless to s

psuhottakes	Franklin vs #big10

Rutgers 7-0 
Wisconsin 3-0
Purdue 2-0 
Indiana 7-1
Maryland 5-2 
Iowa 4-2
Illinois 3-2
Minnesot… https://t.co/PoUvVzbSZ4
goldenmistas	I was working during the Iowa game, but the Purdue game in 2018 ruined my life
jimpierzchalski	RT @TheMopLady: Cody’s starting to look like Purdue sophomore.
Stoga_83	@johnnyc59722461 Our 1st home match is #Purdue.
airhasiescardo	RT @EsportsSnhu: Tonight we’re in for a great match going up against Purdue’s Gold team in Overwatch! Watch live on Twitch or come into the…
HunterBlue24	Purdue fans getting all excited for a Sweet 16 boot
ESPNLincoln	Jeremiah Sirles previews Nebraska-Purdue  https://t.co/X96mNYSqDe
HawgsOnMain	@TheGruffSpartan Purdue bein weird af in the back is so fitting
Optimus_Crime12	@edsbs Thought we couldn't get Purdue into the playoff conversation? Think again fucko
BlasingameKara	RT @PurdueSoccer: 𝐀𝐥𝐥-𝐁𝐢𝐠 𝐓𝐞𝐧 𝐅𝐢𝐫𝐬𝐭 𝐓𝐞𝐚𝐦

For the first time since 2009, Purdue has 2 All-Big Ten First Team honorees, in Sara

ClearwaterBK	RT @acklaw: Stay And Direct Appeal Requests Denied In Purdue Pharma; District Court Commits To Shielding Case From Equitable Mootness Conce…
LJSHuskers	Another big game looms for the Huskers. Our staffers make their predictions for Saturday's game against Purdue — ov… https://t.co/CJI3Lz6BKl
TannerLee92	RT @BoilerBreakPod: Death, Taxes, Purdue Losing to Wisconsin in Football https://t.co/CAGVth2Skg
CDuffSports	RT @KitchDuff: "To come back against Wisconsin and see the productivity on the offensive side again was pretty disappointing." @Kelly_Kitch…
CDuffSports	RT @KitchDuff: Kitch and Duff-  Why Can't Purdue Top 13 Points? Oct 28, 2021 https://t.co/sVB4PgNzB9 via @YouTube
JeezumCrow2	@ReportsDaNews Waiting on Purdue &amp; Loeffler…
JeezumCrow2	@birdiedoesit @ReportsDaNews Loeffler. And David Purdue.
MattooLab	RT @hurryram: Faculty positions at Purdue Biological Sciences.  Great place to work. I have been here more than 2 years and enjoyed it thor…
RNJenni74	I watched the P

Stream connection has errored or timed out
Stream encountered HTTP error: 420
Stream connection has errored or timed out
Stream encountered HTTP error: 420


# Exercises


7. Use the streaming API to produce a list of 1000 tweets about a topic.
2. From that list of 1000 tweets, eliminate retweets.
4. For each original tweet, create a dictionary with the number of times you see it retweeted in your dataset.
5. Get a list of the URLs in your dataset
3. Now, see if you can figure out how to eliminate retweets in the query instead.
7. Get the last 50 tweets from West Lafayette, using the search API. (Hint - look up the geocode information [here](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets)).
8. Alter the streaming algorithm to include a "locations" filter to get tweets from New York City. You need to use the order sw_lng, sw_lat, ne_lng, ne_lat for the four coordinates instead of a radius as in the search API.

### BONUS Questions
1. For each of your followers, get *their* followers (investigate time.sleep to throttle your computation)
2. Identify the follower you have that also follows the most of your followers.
3. How many users follow you but none of your followers?

7. Use the streaming API to produce a list of 1000 tweets about a topic.

In [None]:
topicStream = []



class StreamerText(tweepy.Stream):
    global keywords
    global topicStream
    
    def __init__(self, api=None):
        super(StdOutListener, self).__init__()
        self.num_tweets = 0

    def on_status(self, tweet):

        author = tweet.author.screen_name
        text   = tweet.text
        if "RT " in text:
            rt = True
            text = text.replace("RT ", "")
        else:
            rt = False
        

        theTweet = {
            "Author": author,
            "RT": rt,
            "Tweet": text
                   }


        print(theTweet)
        self.num_tweets += 1
        if self.num_tweets < 20:
            topicStream.insert(record)
            return True
        else:
            return False

    def on_error(self, status_code):
        print( 'Error: ' + repr(status_code))
        return False


keywords = ['Purdue', '"data science"']

streamer = StreamerText(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET)



In [None]:
#From https://stackoverflow.com/questions/20863486/tweepy-streaming-stop-collecting-tweets-at-x-amount

#from tweepy import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json, time, sys

import tweepy

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

class StdOutListener(Streamer):

    def __init__(self, api=None):
        super(StdOutListener, self).__init__()
        self.num_tweets = 0

    def on_status(self, status):
        record = {'Text': status.text, 'Created At': status.created_at}
        print(record)  #See Tweepy documentation to learn how to access other fields
        self.num_tweets += 1
        if self.num_tweets < 20:
            collection.insert(record)
            return True
        else:
            return False


    def on_error(self, status):
        print('Error on status', status)

    def on_limit(self, status):
        print('Limit threshold exceeded', status)

    def on_timeout(self, status):
        print('Stream disconnected; continuing...')


stream = Streamer(auth, StdOutListener())
stream.filter(track=['tv'])