# Acquisition - Lesson

- JSON (JavaScript Object Notation)

- RESTful - server on the internet with certain url patterns

- API (Application Programming Interface) a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service.

- RESTful JSON API 

    - it's a server on the internet and the urls follow a certain pattern
    
    - the API speaks JSON
    
    - the API is something we can interact with programatically

In [76]:
import requests
import pandas as pd
from os import path

import warnings
warnings.filterwarnings('ignore')

import acquire

## Send Request, Receive Response

In [2]:
# We send requests, we receive response objects

response = requests.get('http://example.com')
response

# '200' is an http status code
# 200 means 'it's all ok'

<Response [200]>

### .ok

In [3]:
response.ok

True

### .status_code

In [4]:
response.status_code

200

### .text

In [5]:
response.text

'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <

# Acquisition - Exercise Example

## Using the code from the lesson as a guide, create a dataframe named items that has all of the data for items.

### Quick look at base url

In [6]:
base_url = 'https://python.zach.lol'
print(requests.get(base_url).text)

{"api":"/api/v1","help":"/documentation"}



### Make a request to take a look at the documentation

In [7]:
response = requests.get(base_url + '/documentation')
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



### .json()

- Take a look at the items

In [8]:
response = requests.get('https://python.zach.lol/api/v1/items')

data = response.json()

# print keys from dictionary
data.keys()

dict_keys(['payload', 'status'])

### .keys()

- take a look at the keys

In [9]:
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

### Max and Next pages

In [10]:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

max_page: 3
next_page: /api/v1/items?page=2


### Data

- peek inside/context


In [11]:
data

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [12]:
# response a wrapper around the items property

data['payload']['items'][:2]

[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'}]

### DataFrame

- Turn data into a pandas dataframe

- data from 1st page

In [13]:
items = pd.DataFrame(data['payload']['items'])
items.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


### Next_page

- get and add data from 2nd page into our df

In [14]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

items = pd.concat([items, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: /api/v1/items?page=3


In [15]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

items = pd.concat([items, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: None


In [16]:
items.shape

(50, 8)

## Do the same thing, but for stores.

### Make a request to take a look at the documentation


In [17]:
response = requests.get(base_url + '/documentation')
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



### .json()

- Take a look at the items

In [18]:
response = requests.get('https://python.zach.lol/api/v1/stores')
data = response.json()

# print keys from dictionary

data.keys()

dict_keys(['payload', 'status'])

### .keys()

- take a look at the keys

In [19]:
data['payload'].keys()

dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'stores'])

### Max and Next pages

In [20]:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

max_page: 1
next_page: None


### DataFrame

- Turn data into a pandas dataframe

- data from 1st page

- No need to repeat the process bc there is no next page

In [21]:
stores = pd.DataFrame(data['payload']['stores'])
stores.head()

Unnamed: 0,store_address,store_city,store_id,store_state,store_zipcode
0,12125 Alamo Ranch Pkwy,San Antonio,1,TX,78253
1,9255 FM 471 West,San Antonio,2,TX,78251
2,2118 Fredericksburg Rdj,San Antonio,3,TX,78201
3,516 S Flores St,San Antonio,4,TX,78204
4,1520 Austin Hwy,San Antonio,5,TX,78218


In [22]:
stores.shape

(10, 5)

### Extract the data for sales. There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.

In [23]:
base_url = 'https://python.zach.lol'
print(requests.get(base_url).text)

{"api":"/api/v1","help":"/documentation"}



### Make a request

In [24]:
response = requests.get(base_url + '/documentation')
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



In [25]:
response = requests.get('https://python.zach.lol/api/v1/sales')

data = response.json()

# print keys from dictionary
data.keys()

dict_keys(['payload', 'status'])

### .keys()

In [26]:
data['payload'].keys()

dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'sales'])

### Max and  Next page

In [27]:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

max_page: 183
next_page: /api/v1/sales?page=2


In [28]:
sales = pd.DataFrame(data['payload']['sales'])

In [29]:
for i in range(1, data['payload']['max_page']):
    response = requests.get(base_url + data['payload']['next_page'])
    data = response.json()
    sales = pd.concat([sales, pd.DataFrame(data['payload']['sales'])])


In [30]:
sales.shape

(913000, 5)

### Save the data in your files to local csv files so that it will be faster to access in the future.



In [31]:
sales.to_csv('sales.csv')

In [32]:
items.to_csv('items.csv')

In [33]:
stores.to_csv('stores.csv')

In [49]:
sales.head()

Unnamed: 0,item_id,sale_amount,sale_date,sale_id,store_id
0,1,13.0,"Tue, 01 Jan 2013 00:00:00 GMT",1,1
1,1,11.0,"Wed, 02 Jan 2013 00:00:00 GMT",2,1
2,1,14.0,"Thu, 03 Jan 2013 00:00:00 GMT",3,1
3,1,13.0,"Fri, 04 Jan 2013 00:00:00 GMT",4,1
4,1,10.0,"Sat, 05 Jan 2013 00:00:00 GMT",5,1


In [50]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 913000 entries, 0 to 2999
Data columns (total 5 columns):
item_id        913000 non-null int64
sale_amount    913000 non-null float64
sale_date      913000 non-null object
sale_id        913000 non-null int64
store_id       913000 non-null int64
dtypes: float64(1), int64(3), object(1)
memory usage: 41.8+ MB


In [51]:
sales.describe()

Unnamed: 0,item_id,sale_amount,sale_id,store_id
count,913000.0,913000.0,913000.0,913000.0
mean,25.5,52.250287,456500.5,5.5
std,14.430878,28.801144,263560.542224,2.872283
min,1.0,0.0,1.0,1.0
25%,13.0,30.0,228250.75,3.0
50%,25.5,47.0,456500.5,5.5
75%,38.0,70.0,684750.25,8.0
max,50.0,231.0,913000.0,10.0


In [52]:
# this shows that each value in this column is unique and could identify
# the rows in the table

sales.sale_id.nunique()

913000

### Combine the data from your three separate dataframes into one large dataframe.

In [53]:
sales = sales.rename(columns={'item': 'item_id', 'store': 'store_id'})

In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 913000 entries, 0 to 912999
Data columns (total 14 columns):
item_id          913000 non-null int64
sale_amount      913000 non-null float64
sale_date        913000 non-null object
sale_id          913000 non-null int64
store_id         913000 non-null int64
item_brand       913000 non-null object
item_name        913000 non-null object
item_price       913000 non-null float64
item_upc12       913000 non-null object
item_upc14       913000 non-null object
store_address    913000 non-null object
store_city       913000 non-null object
store_state      913000 non-null object
store_zipcode    913000 non-null object
dtypes: float64(2), int64(3), object(9)
memory usage: 104.5+ MB


In [55]:
df = sales.merge(items, on='item_id').merge(stores, on='store_id')

In [56]:
df = df.drop(columns=['level_0', 'index'])

In [57]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 913000 entries, 0 to 912999
Data columns (total 14 columns):
item_id          913000 non-null int64
sale_amount      913000 non-null float64
sale_date        913000 non-null object
sale_id          913000 non-null int64
store_id         913000 non-null int64
item_brand       913000 non-null object
item_name        913000 non-null object
item_price       913000 non-null float64
item_upc12       913000 non-null object
item_upc14       913000 non-null object
store_address    913000 non-null object
store_city       913000 non-null object
store_state      913000 non-null object
store_zipcode    913000 non-null object
dtypes: float64(2), int64(3), object(9)
memory usage: 104.5+ MB


### Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

In [75]:
solar = pd.read_csv('https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv')

In [79]:
solar.head()

Unnamed: 0,Date,Consumption,Wind,Solar,Wind+Solar
0,2006-01-01,1069.184,,,
1,2006-01-02,1380.521,,,
2,2006-01-03,1442.533,,,
3,2006-01-04,1457.217,,,
4,2006-01-05,1477.131,,,


### Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the acquire.py file and be able to re-run the functions and get the same data.

In [83]:
df = acquire.get_opsd_data()

In [88]:
df['Date'] = pd.to_datetime(df['Date'])

In [89]:
df.set_index('Date').resample('D').sum()

Unnamed: 0_level_0,Consumption,Wind,Solar,Wind+Solar
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006-01-01,1069.18400,0.000,0.000,0.000
2006-01-02,1380.52100,0.000,0.000,0.000
2006-01-03,1442.53300,0.000,0.000,0.000
2006-01-04,1457.21700,0.000,0.000,0.000
2006-01-05,1477.13100,0.000,0.000,0.000
2006-01-06,1403.42700,0.000,0.000,0.000
2006-01-07,1300.28700,0.000,0.000,0.000
2006-01-08,1207.98500,0.000,0.000,0.000
2006-01-09,1529.32300,0.000,0.000,0.000
2006-01-10,1576.91100,0.000,0.000,0.000
