The end result of this exercise should be a file named acquire.py.

1. Using the code from the lesson as a guide and the REST API from https://python.zgulde.net/api/v1/items as we did in the lesson, create a dataframe named items that has all of the data for items.

2. Do the same thing, but for stores (https://python.zgulde.net/api/v1/stores)

3. Extract the data for sales (https://python.zgulde.net/api/v1/sales). There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.

4. Save the data in your files to local csv files so that it will be faster to access in the future.

5. Combine the data from your three separate dataframes into one large dataframe.

6. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

7. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the acquire.py file and be able to re-run the functions and get the same data.

In [25]:
import pandas as pd
import requests
import os

In [3]:
domain = 'https://python.zgulde.net'
endpoint = '/api/v1/items'
items = []

url = domain + endpoint

response = requests.get(url)
response

<Response [200]>

In [4]:
data = response.json()

In [5]:
data

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [6]:
page_n = data['payload']['max_page']

In [7]:
page_n

3

In [8]:
type(data)

dict

In [9]:
data['status']

'ok'

In [11]:
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

In [14]:
data_df = pd.DataFrame(data['payload']['items'])

In [22]:
data_df.shape

(20, 6)

In [16]:
url = domain + data['payload']['next_page']
print('next url:', url)

next url: https://python.zgulde.net/api/v1/items?page=2


In [17]:
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])

In [18]:
print('next endpoint', data['payload']['next_page'])
url = domain + data['payload']['next_page']
print('next url:', url)

next endpoint /api/v1/items?page=3
next url: https://python.zgulde.net/api/v1/items?page=3


In [23]:
items_df = pd.DataFrame(items)
items_df.shape
# same shape, but different inclusions.

(20, 6)

In [24]:
# setup
domain = 'https://api.data.codeup.com'
endpoint = '/api/v1/items'
items = []

# For each page -- until next page is None
url = domain + endpoint
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])
# update the end point
endpoint = data['payload']['next_page']

In [46]:
def get_items_df(cached=False):
    '''
    This function creates a request from the REST API at https://api.data.codeup.com/api/v1/items
    and transforms the response into a pandas dataframe named items. It then saves the data as a csv file.
    '''
    # Unless the cached parameter is set to true, read from the API into a Dataframe and then CSV it.
    if cached == False or os.path.isfile('items.csv') == False:
        
        # create the empty list which will be appended with data with each iteration 
        items = []

        # define the url from where the data is stored
        domain = 'https://api.data.codeup.com'
        endpoint = '/api/v1/items'
        url = domain + endpoint

        # define the response by the request
        response = requests.get(url)

        # convert the response to json
        data = response.json()

        # define the number of pages based on the max_page value 
        n = data['payload']['max_page']

        # Create a loop to iterate through each page starting with page 1 and ending on page n + 1
        # be sure to include the last page.
        # p is the page number
        for p in range(1, n+1):

            # define the new url returned for next page
            new_url = url+"?page="+str(p)

            # define the response requested
            response = requests.get(new_url)

            # convert response to json
            data = response.json()

            #create the variable to hold the items returned from the response
            page_items = data['payload']['items']

            # add the items from the page to the items list and continue to iterate through n pages
            page_items = items.extend(data['payload']['items'])
        
            # Create a dataframe of the items_list that now hold all the items from all pages
        items_df = pd.DataFrame(items)

            #also cache the data we read from the REST API to a file on disk
        items_df.to_csv('items_df.csv')

    else:
        # either the cached parameter was true, or a file exists on disk. Read that into a df instead of going to the database
        items_df = pd.read_csv('items.csv', index_col=0)

    return items_df

In [47]:
items_df = get_items_df()

In [48]:
items_df.shape

(50, 6)

In [None]:
for item in range(1, n + page_n):
    