# Read in CSV files to DB

## Monkey patching pandas sql IO

It turns out that there's an existing issue with pandas that limits its ability to perform insertions for multiple rows at a time. If I used the default one row at a time, then this operation would take far too long. See the link below for more about the existing issue.

[Pandas to_sql issue link.](https://github.com/pandas-dev/pandas/issues/8953)

Thanks to github user `nhockham` for suggesting the use of the monkey patch below.

In [None]:
from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print('.', end='')
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

In [None]:
from sqlalchemy import create_engine
from getpass import getpass, getuser
from os import listdir
from os.path import join

import pandas as pd
import psycopg2

In [None]:
csv_files = [file for file in listdir('../data/') if file[-4:] == '.csv']


u = input('Database user:')
p = getpass('Input database password')
engine_string = 'postgresql://{0}:{1}@handelstaccato.homenet.org:5432/king_county'.format(u, p)
engine = create_engine(engine_string)

for csv_file in csv_files:
    table_name = csv_file.split('.')[0]
    df = pd.read_csv(join('../data', csv_file), quotechar='"', encoding='latin1')
    df.to_sql(table_name, engine, schema='assessor_data', index=False, chunksize=1000)
    # Just in case. I've been hurt too many times.
    print('"Finished"', table_name)
