USA CPI Data from 2007 to 2019:

Extract: 

The the US Average Price Data (Consumer Price Index - CPI) from 2007 to 2019 was extracted from the United States Bureau of Labour Statistics site: https://www.bls.gov/cpi/data.htm. 

The data was saved as a csv file then loaded into the jupyter notebook.

In [1]:
# https://beta.bls.gov/dataViewer/view;jsessionid=CABC201CD643E5C31B56321479FD40D8

In [2]:
# import dependencies
import pandas as pd
from sqlalchemy import create_engine

In [3]:
# The path to our CSV file
CPIfile = "Resources/CPI_2007_2019.csv"

# Read our CPI data into pandas
df = pd.read_csv(CPIfile)
df.head()

Unnamed: 0,Series ID,Year,Period,Value
0,CUUR0000SA0,2007,M01,202.416
1,CUUR0000SA0,2007,M02,203.499
2,CUUR0000SA0,2007,M03,205.352
3,CUUR0000SA0,2007,M04,206.686
4,CUUR0000SA0,2007,M05,207.949


Transform: 
The first step was to format the dates, in order to get the months and year I used pandas.date_range to autogenerate the dates in the column in order. Then formatted in as MMM YYYY. 

Second, I renamed the columns and then created a new data frame.

In [4]:
# Format Dates
df['date'] = pd.date_range(start='1/1/2007', periods=len(df), freq='MS')
df['date'] = df["date"].dt.strftime('%b %Y')

In [5]:
df.head()

Unnamed: 0,Series ID,Year,Period,Value,date
0,CUUR0000SA0,2007,M01,202.416,Jan 2007
1,CUUR0000SA0,2007,M02,203.499,Feb 2007
2,CUUR0000SA0,2007,M03,205.352,Mar 2007
3,CUUR0000SA0,2007,M04,206.686,Apr 2007
4,CUUR0000SA0,2007,M05,207.949,May 2007


In [6]:
# Check datatypes
print(df.dtypes)

Series ID     object
Year           int64
Period        object
Value        float64
date          object
dtype: object


In [7]:
# Rename Columns
df.rename(columns = {'date':'Date', 'Value':'CPI'}, inplace = True) 
df.head()

Unnamed: 0,Series ID,Year,Period,CPI,Date
0,CUUR0000SA0,2007,M01,202.416,Jan 2007
1,CUUR0000SA0,2007,M02,203.499,Feb 2007
2,CUUR0000SA0,2007,M03,205.352,Mar 2007
3,CUUR0000SA0,2007,M04,206.686,Apr 2007
4,CUUR0000SA0,2007,M05,207.949,May 2007


In [8]:
# Create New dataframe with required columns
CPI = df.copy()
CPI = CPI[['Date', 'CPI']]
CPI.head()

Unnamed: 0,Date,CPI
0,Jan 2007,202.416
1,Feb 2007,203.499
2,Mar 2007,205.352
3,Apr 2007,206.686
4,May 2007,207.949


Loading: 

A connection was made to the border_db database in Postgres. A table called border_entry was created (see border_entry_schema.sql for completed schema). Using pandas, the dataframe was loaded into the border_db.

In [9]:
# Connect to Local Database
connection_string = "postgres: @localhost:5432/border_db"
engine = create_engine(f'postgresql://{connection_string}')

In [10]:
# Use pandas to load Dataframe into the database
CPI.to_sql(name='cpi', con=engine, if_exists='append', index=False)

OperationalError: (psycopg2.OperationalError) FATAL:  password authentication failed for user "postgres"

(Background on this error at: http://sqlalche.me/e/e3q8)

In [None]:
# Readback data
pd.read_sql_query('select * from cpi', con=engine).head()