* Getting started with Database Programming using Pandas
* Role of `SQLAlchemy` and `psycopg2`
* Read data from Database Table into Dataframe
* Read query results into Dataframe
* Database Programming Best Practices
* Write data from Dataframe to Database Table
* Overview of `using` method in `to_sql`
* Exercise and Solution

In [None]:
# Getting started with Database Programming using Pandas
# Make sure pandas is installed
# Make sure psycopg2-binary is installed
# Make sure SQLAlchemy is installed
# python -m pip install sqlalchemy
# python -m pip install psycopg2-binary

In [None]:
# Role of `SQLAlchemy` and `psycopg2`
# Pandas uses SQLAlchemy to work with databases
# Need low level Database related libary depending up on Database

In [None]:
# Read data from Database Table into Dataframe
import pandas as pd

In [None]:
conn = 'postgresql://itversity_retail_user:itversity@localhost:5432/itversity_retail_db'

In [None]:
pd.read_sql('departments', con=conn)

In [None]:
# Read query results into Dataframe
pd.read_sql('SELECT current_date', con=conn)

In [None]:
# Database Programming Best Practices
# Databases are generally powerful than servers where applications run
# Implement the core processing logic using SQL to run on Database Servers

In [None]:
# Write data from Dataframe to Database Table
df = pd.read_csv(
    'data/retail_db/departments/part-00000',
    names=['department_id', 'department_name']
)

In [None]:
df.to_sql(
    'departments', 
    con=conn,
    index=False
) # will fail, if table exists

In [None]:
df.to_sql(
    'departments', 
    con=conn, 
    if_exists='replace',
    index=False
) # overwrites (also supports append)

In [None]:
# Overview of `using` method in `to_sql`
# By default inserts one record at a time
# Set using to multi to insert multiple records at a time
# We can combine chunksize with multi and populate table over a period of time
# Make sure to reset orders table (create if not exists, truncate if exists)

In [None]:
help(df.to_sql)

In [None]:
df = pd.read_csv(
    'data/retail_db/orders/part-00000',
    names=['order_id', 'order_date', 'order_customer_id', 'order_status']
)

In [None]:
df.to_sql(
    'orders',
    con=conn,
    chunksize=10000,
    multi=True
)

* Exercise: Read data from `data/sales/part-00000` and write to `sales` table in the database.
  * Make sure to reset the table if it exists (recreate or truncate).
  * Use `pandas` to read the data from file into dataframe.
  * Write dataframe into `sales` table using `pandas`.
  * Use `read_sql` to see if the data is written to the table or not.