# Energy data from CSV into PostgreSQL
The goal of this project is to transform a CSV file containing Austin energy data into a PostgreSQL database. We'll be going through the following steps:
1. Understand the data from the CSV file
2. Create a SQL table that fits the data
3. Import a single value to the database
4. Import all the rows


# Step 1: Understanding the data from the CSV file

In [42]:
import psycopg2
import csv
with open('Residential_Average_Monthly_kWh_and_Bills.csv', 'r') as f:
    reader = csv.reader(f)
    columns = next(reader)
    first_row = next(reader)
    second_row = next(reader)
    print(columns)
    print(first_row)
    print(second_row)

['Date', 'Average kWh', 'Fuel Charge (Cents/kWh)', 'Average Bill']
['01/01/2000 12:00:00 AM', '820', '1.372', '$54.26']
['02/01/2000 12:00:00 AM', '766', '1.372', '$50.27']


So we have data in four columns, first is date, then info about energy consuption and billing.

# Step 2: Create a PostgreSQL table for the data

In [43]:
sql_create_energy_table = """CREATE TABLE IF NOT EXISTS energy(
                                ID SERIAL PRIMARY KEY,
                                date date,
                                kWh integer,
                                Fuel_Charge real,
                                average_bill real
                                )"""   

# create a connection with the database (this should already exist)
try:
    connection = psycopg2.connect("dbname='austin_weather_energy' user='muriel' host='localhost' password='1'")
    print("connected to austin_weather_energy")
except:
    print("Unable to connect to the database")
# the cursor can help us execute SQL
cursor = connection.cursor()

# delete the table if it already exists
sql = """DROP table energy"""
cursor.execute(sql)
connection.commit()

# now let's create the table
cursor.execute(sql_create_energy_table)
# and commit to the DB
connection.commit()

# next, let's print the column names to see if it worked:


def print_values():
    cursor.execute("SELECT * from energy")
    colnames = [desc[0] for desc in cursor.description]
    print("Columns in database:")
    print(colnames)
    rows = cursor.fetchall()
    print("Values in database:")
    for row in rows[0:4]: #only print first 5 rows to avoid clutter
        print(" ", row)
    connection.commit()

# print the first 5 values     
print_values()

connected to austin_weather_energy
Columns in database:
['id', 'date', 'kwh', 'fuel_charge', 'average_bill']
Values in database:


# Step 3: Import a single value into the Database
Let's get started wiht the first value in the databse, make sure we can add that, then proceed to more values.

In [44]:
# now let's try to add a single value to the database
print(first_row[0])
sql = """INSERT INTO energy(date) VALUES (%r)""" %(first_row[0])
print(sql)
cursor.execute(sql)
connection.commit()
print_values()

01/01/2000 12:00:00 AM
INSERT INTO energy(date) VALUES ('01/01/2000 12:00:00 AM')
Columns in database:
['id', 'date', 'kwh', 'fuel_charge', 'average_bill']
Values in database:
  (1, datetime.date(2000, 1, 1), None, None, None)


# Step 4: Import all rows
Now we're ready to import all the rows. First let's clean up the table we created earlier and create a new table, then we can import all the data. 

In [45]:
# delete the table if it already exists
sql = """DROP table energy"""
cursor.execute(sql)
connection.commit()

# now let's create the table
cursor.execute(sql_create_energy_table)
# and commit to the DB
connection.commit()

In [46]:
with open('Residential_Average_Monthly_kWh_and_Bills.csv', 'r') as f:
    reader = csv.reader(f)
    columns = next(reader)
    f.seek(0)
    for row in f:
        my_row = next(reader)
        my_row[0] = "'"+my_row[0]+"'"
        my_row[3] = my_row[3].strip('$')
        sql = """INSERT INTO energy(date, kwh, fuel_charge, average_bill) VALUES ({0})"""
        sql = sql.format(','.join(my_row))
        cursor.execute(sql)
connection.commit()
# hurrah! It works :)

In [47]:
# print the first 5 values     
print_values()

Columns in database:
['id', 'date', 'kwh', 'fuel_charge', 'average_bill']
Values in database:
  (1, datetime.date(2000, 1, 1), 820, 1.372, 54.26)
  (2, datetime.date(2000, 3, 1), 707, 1.372, 45.91)
  (3, datetime.date(2000, 5, 1), 838, 1.372, 61.68)
  (4, datetime.date(2000, 7, 1), 1442, 1.372, 117.2)


In [48]:
connection.close()
cursor.close()