* Getting started with Database Programming using Pandas
* Role of `SQLAlchemy` and `psycopg2`
* Read data from Database Table into Dataframe
* Read query results into Dataframe
* Database Programming Best Practices
* Write data from Dataframe to Database Table
* Overview of using `method` in `to_sql`
* Exercise and Solution

In [None]:
import psycopg2

# read data from the file
with open('data/retail_db/departments/part-00000') as fp:
    data = fp.read().splitlines()

# build list of tuples for insert
recs = []
for rec in data:
    r = rec.split(',')
    recs.append((int(r[0]), r[1]))

# prep for database programming
conn = psycopg2.connect(
    host='localhost',
    port=5432,
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='itversity'
)
cur = conn.cursor()

# populate data in recs into the departments table
query = '''
    INSERT INTO departments (department_id, department_name)
    VALUES (%s, %s)
'''

cur.executemany(query, recs)

conn.commit()
conn.close()

# Limitations
# Complexity: High
# Readability: Low
# Reusability: Low
# Pandas with data driven development will address all these limitations

In [None]:
# Getting started with Database Programming using Pandas
# Make sure pandas is installed
# Make sure psycopg2-binary is installed
# Make sure SQLAlchemy is installed

In [None]:
# Role of `SQLAlchemy` and `psycopg2`

In [None]:
# Read data from Database Table into Dataframe
import pandas as pd

In [None]:
host = 'localhost'
port = 5432
database = 'itversity_retail_db'
user = 'itversity_retail_user'
password = 'itversity'

In [None]:
conn = f'postgresql://{user}:{password}@{host}:{port}/{database}'

In [None]:
pd.read_sql('departments', con=conn)

In [None]:
# Read query results into Dataframe
pd.read_sql('SELECT * FROM departments', con=conn)

In [None]:
pd.read_sql(
    "SELECT * FROM information_schema.tables WHERE table_schema = 'public'", 
    con=conn
)

In [None]:
pd.read_sql('SELECT current_date', con=conn)

In [None]:
# Database Programming Best Practices
# Databases are generally much powerful than servers where applications run
# Implement the core data processing logic using SQL to run on Database Servers

In [None]:
# Write data from Dataframe to Database Table
df = pd.read_csv(
    'data/retail_db/departments/part-00000',
    names=['department_id', 'department_name']
)

In [None]:
df

In [None]:
help(df.to_sql)

In [None]:
df.to_sql(
    'departments',
    con=conn,
    if_exists='append', # Make sure the table is pre-recreated with constraints
    index=False
)

In [None]:
pd.read_sql('departments', con=conn)

In [None]:
# Overview of using `method` in `to_sql`
help(df.to_sql)

In [None]:
order_items = pd.read_csv(
    'data/retail_db/order_items/part-00000',
    names=['order_item_id', 'order_item_order_id', 'order_item_product_id', 
           'order_item_quantity', 'order_item_subtotal', 'order_item_product_price']
)

In [None]:
order_items.shape

In [None]:
order_items.to_sql(
    'order_items',
    con=conn,
    if_exists='replace',
    index=False,
    chunksize=10000,
    method='multi'
)

In [None]:
pd.read_sql('order_items', con=conn)

* Exercise: Read data from `data/sales/part-00000` and write to `sales` table in the database.
  * Make sure to reset the table if it exists (recreate or truncate).
  * Use `pandas` to read the data from file into dataframe.
  * Write dataframe into `sales` table using `pandas`.
  * Use `read_sql` to see if the data is written to the table or not.