# Acquiring Data:

In [1]:
import pandas as pd

import requests

Requests is a library that helps to make requests and get responses from different websites into dataframes/dictionaries.

In [2]:
#by using requests, it returns what that website status code is:
response = requests.get('http://example.com')
response

<Response [200]>

Notes about Http status codes:

- 200s: everythings good
- 300s: redirecting
- 400s: you did something wrong
- 500s: something is wrong with the server


In [5]:
#response.text returns the raw html data of the website
print(response.text)

<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domai

# Creating JSON API endpoints:

We can also hit API's and get structured data too!

In [6]:
#using a quote generator url...
response = requests.get('https://aphorisms.glitch.me')
response

<Response [200]>

In [7]:
response.text

'{"quote":"One can choose to go back toward safety or forward toward growth. Growth must be chosen again and again; fear must be overcome again and again.","author":"Abraham Maslow"}'

In [8]:
#we can request and clarify to make the request a JSON (structured dictionary)
data = response.json()
data

{'quote': 'One can choose to go back toward safety or forward toward growth. Growth must be chosen again and again; fear must be overcome again and again.',
 'author': 'Abraham Maslow'}

### What is the difference between response.text and response.json()?

In [9]:
print('response.text type is:', type(response.text))
print('response.json() type is:', type(response.json()))

response.text type is: <class 'str'>
response.json() type is: <class 'dict'>


### So now we can treat data just like a dictionary?

In [10]:
data

{'quote': 'One can choose to go back toward safety or forward toward growth. Growth must be chosen again and again; fear must be overcome again and again.',
 'author': 'Abraham Maslow'}

In [11]:
data['quote']

'One can choose to go back toward safety or forward toward growth. Growth must be chosen again and again; fear must be overcome again and again.'

In [12]:
data['author']

'Abraham Maslow'

# Let's use Codeup API:

In [13]:
url = 'https://api.data.codeup.com'
response = requests.get(url)
response.json()

{'api': '/api/v1', 'help': '/documentation'}

### Response has told us things that we can add to the url to go to different directories in url

In [14]:
#let's use documentation
url = 'https://api.data.codeup.com' + '/documentation'
response = requests.get(url)
response.json()

{'payload': '\nThe API accepts GET requests for all endpoints, where endpoints are prefixed\nwith\n\n    /api/{version}\n\nWhere version is "v1"\n\nValid endpoints:\n\n- /stores[/{store_id}]\n- /items[/{item_id}]\n- /sales[/{sale_id}]\n\nAll endpoints accept a `page` parameter that can be used to navigate through\nthe results.\n',
 'status': 'ok'}

In [15]:
#it's a bit messy, let's clear it up a bit:
response.json()['payload']

'\nThe API accepts GET requests for all endpoints, where endpoints are prefixed\nwith\n\n    /api/{version}\n\nWhere version is "v1"\n\nValid endpoints:\n\n- /stores[/{store_id}]\n- /items[/{item_id}]\n- /sales[/{sale_id}]\n\nAll endpoints accept a `page` parameter that can be used to navigate through\nthe results.\n'

In [16]:
#I can see above that there is some formatting to it, so I'm going
#to print it:
print(response.json()['payload'])



The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



### Above, the {} tells us that we can add in specifics to those variables to navigate and explore

# What is an endpoint?
* An endpoint are the parts after the main url, called the domain.
* In this case our endpoints go after .com in the url separated by slashes.
Extra: .com, .gov, .net are known as TLD or Top Level Domains in a url

### So with this info we can now start retrieving data from the api


#### Example: Check out the stores data:

In [17]:
url = 'https://api.data.codeup.com/api/v1/stores'
response = requests.get(url)
data = response.json()
data

{'payload': {'max_page': 1,
  'next_page': None,
  'page': 1,
  'previous_page': None,
  'stores': [{'store_address': '12125 Alamo Ranch Pkwy',
    'store_city': 'San Antonio',
    'store_id': 1,
    'store_state': 'TX',
    'store_zipcode': '78253'},
   {'store_address': '9255 FM 471 West',
    'store_city': 'San Antonio',
    'store_id': 2,
    'store_state': 'TX',
    'store_zipcode': '78251'},
   {'store_address': '2118 Fredericksburg Rdj',
    'store_city': 'San Antonio',
    'store_id': 3,
    'store_state': 'TX',
    'store_zipcode': '78201'},
   {'store_address': '516 S Flores St',
    'store_city': 'San Antonio',
    'store_id': 4,
    'store_state': 'TX',
    'store_zipcode': '78204'},
   {'store_address': '1520 Austin Hwy',
    'store_city': 'San Antonio',
    'store_id': 5,
    'store_state': 'TX',
    'store_zipcode': '78218'},
   {'store_address': '1015 S WW White Rd',
    'store_city': 'San Antonio',
    'store_id': 6,
    'store_state': 'TX',
    'store_zipcode': '78220

In [18]:
#what are the basic keys I can get from this?
data.keys()


dict_keys(['payload', 'status'])

In [19]:
#looking at status
data['status']

'ok'

In [20]:
#looking at what payload holds
data['payload']


{'max_page': 1,
 'next_page': None,
 'page': 1,
 'previous_page': None,
 'stores': [{'store_address': '12125 Alamo Ranch Pkwy',
   'store_city': 'San Antonio',
   'store_id': 1,
   'store_state': 'TX',
   'store_zipcode': '78253'},
  {'store_address': '9255 FM 471 West',
   'store_city': 'San Antonio',
   'store_id': 2,
   'store_state': 'TX',
   'store_zipcode': '78251'},
  {'store_address': '2118 Fredericksburg Rdj',
   'store_city': 'San Antonio',
   'store_id': 3,
   'store_state': 'TX',
   'store_zipcode': '78201'},
  {'store_address': '516 S Flores St',
   'store_city': 'San Antonio',
   'store_id': 4,
   'store_state': 'TX',
   'store_zipcode': '78204'},
  {'store_address': '1520 Austin Hwy',
   'store_city': 'San Antonio',
   'store_id': 5,
   'store_state': 'TX',
   'store_zipcode': '78218'},
  {'store_address': '1015 S WW White Rd',
   'store_city': 'San Antonio',
   'store_id': 6,
   'store_state': 'TX',
   'store_zipcode': '78220'},
  {'store_address': '12018 Perrin Beitel 

In [21]:
#what are keys in payload?
data['payload'].keys()


dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'stores'])

In [22]:
#I want to look at the key of payload in the stores dictionary
data['payload']['stores']


[{'store_address': '12125 Alamo Ranch Pkwy',
  'store_city': 'San Antonio',
  'store_id': 1,
  'store_state': 'TX',
  'store_zipcode': '78253'},
 {'store_address': '9255 FM 471 West',
  'store_city': 'San Antonio',
  'store_id': 2,
  'store_state': 'TX',
  'store_zipcode': '78251'},
 {'store_address': '2118 Fredericksburg Rdj',
  'store_city': 'San Antonio',
  'store_id': 3,
  'store_state': 'TX',
  'store_zipcode': '78201'},
 {'store_address': '516 S Flores St',
  'store_city': 'San Antonio',
  'store_id': 4,
  'store_state': 'TX',
  'store_zipcode': '78204'},
 {'store_address': '1520 Austin Hwy',
  'store_city': 'San Antonio',
  'store_id': 5,
  'store_state': 'TX',
  'store_zipcode': '78218'},
 {'store_address': '1015 S WW White Rd',
  'store_city': 'San Antonio',
  'store_id': 6,
  'store_state': 'TX',
  'store_zipcode': '78220'},
 {'store_address': '12018 Perrin Beitel Rd',
  'store_city': 'San Antonio',
  'store_id': 7,
  'store_state': 'TX',
  'store_zipcode': '78217'},
 {'store

In [23]:
#so I can now take the dictionary of stores from this key I input and 
#stick it in a df:
pd.DataFrame(data['payload']['stores'])


Unnamed: 0,store_address,store_city,store_id,store_state,store_zipcode
0,12125 Alamo Ranch Pkwy,San Antonio,1,TX,78253
1,9255 FM 471 West,San Antonio,2,TX,78251
2,2118 Fredericksburg Rdj,San Antonio,3,TX,78201
3,516 S Flores St,San Antonio,4,TX,78204
4,1520 Austin Hwy,San Antonio,5,TX,78218
5,1015 S WW White Rd,San Antonio,6,TX,78220
6,12018 Perrin Beitel Rd,San Antonio,7,TX,78217
7,15000 San Pedro Ave,San Antonio,8,TX,78232
8,735 SW Military Dr,San Antonio,9,TX,78221
9,8503 NW Military Hwy,San Antonio,10,TX,78231


In [24]:
#let's do the same with items:
url = 'https://api.data.codeup.com/api/v1/items'
response = requests.get(url)
data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [25]:
data['payload']

{'items': [{'item_brand': 'Riceland',
   'item_id': 1,
   'item_name': 'Riceland American Jazmine Rice',
   'item_price': 0.84,
   'item_upc12': '35200264013',
   'item_upc14': '35200264013'},
  {'item_brand': 'Caress',
   'item_id': 2,
   'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
   'item_price': 6.44,
   'item_upc12': '11111065925',
   'item_upc14': '11111065925'},
  {'item_brand': 'Earths Best',
   'item_id': 3,
   'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
   'item_price': 2.43,
   'item_upc12': '23923330139',
   'item_upc14': '23923330139'},
  {'item_brand': 'Boars Head',
   'item_id': 4,
   'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
   'item_price': 3.14,
   'item_upc12': '208528800007',
   'item_upc14': '208528800007'},
  {'item_brand': 'Back To Nature',
   'item_id': 5,
   'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
   'item_price': 2.61,
   'item_upc12': '759283100036',

In [26]:
#going back to find the items keys
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

In [27]:
#putting this as a tuple so I can look at them all together to see
#what is in each of these:
(
    data['payload']['page'], 
    data['payload']['max_page'], 
    data['payload']['next_page'],
    data['payload']['previous_page'],
)

(1, 3, '/api/v1/items?page=2', None)

Notes: the max page is 3, min 1 | there is an address for the next following page and nothing on the page before

In [28]:
#let's see if items makes a df well too:
pd.DataFrame(data['payload']['items'])


Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036
5,Sally Hansen,6,Sally Hansen Nail Color Magnetic 903 Silver El...,6.93,74170388732,74170388732
6,Twinings Of London,7,Twinings Of London Classics Lady Grey Tea - 20 Ct,9.64,70177154004,70177154004
7,Lea & Perrins,8,Lea & Perrins Marinade In-a-bag Cracked Pepper...,1.68,51600080015,51600080015
8,Van De Kamps,9,Van De Kamps Fillets Beer Battered - 10 Ct,1.79,19600923015,19600923015
9,Ahold,10,Ahold Cocoa Almonds,3.17,688267141676,688267141676


### NOTES: these dataframes are only page 1 of each categories (payload and items)

#### What can we do with multiple pages?

In [29]:
domain = 'https://api.data.codeup.com'
endpoint = '/api/v1/items'
#the name of my lists I want to store things in:
items = []

url = domain + endpoint

response = requests.get(url)
data = response.json()
# .extend adds elemnts from a list to another list
items.extend(data['payload']['items'])

In [30]:
data['payload']['next_page']


'/api/v1/items?page=2'