# Pandas + SQL
This notebook will run through how to use Python, pandas and sql with a couple examples.

First of all we need to set-up our virtual environment, install the required modules and import into this project.

``` bash
$ virtualenv .env
$ source .env/bin/activate
$ (.env) pip install pandas
```

If you're using python >3.3 sqlite3 will already be included in the standard library so no need to pip install.

In [1]:
import pandas
import sqlite3

## Example 1: Storing data in database
We will be creating a pandas DataFrame and storing the data in a sqlite3 database. This process is helpful for taking tabular data either from the web, excel, csv or reading from another database and soring it in your database with ease.

For further reading on Pandas please see:
* ..insert link to pandas intro..
* https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
* http://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html

For Further reading on SQLite3:
* https://www.sqlite.org/lang.html
* https://www.sqlite.org/datatype3.html

### Setup example DataFrame

In [8]:
data1 = {'bhp':[100,110,125,130],
        'nab':[120,130,140,150],
        'rio':[120,130,160,140]}

data2 = {'bhp':[135,150,155,160],
        'nab':[170,160,150,150],
        'rio':[150,130,120,110]}

dates1 = ['2018-01-01','2018-02-01','2018-03-01','2018-04-01']
dates2 = ['2018-05-01','2018-06-01','2018-07-01','2018-08-01']
df1 = pandas.DataFrame(data=data1, index=dates1)
df2 = pandas.DataFrame(data=data2, index=dates2)
print(df1)
print(df2)

            bhp  nab  rio
2018-01-01  100  120  120
2018-02-01  110  130  130
2018-03-01  125  140  160
2018-04-01  130  150  140
            bhp  nab  rio
2018-05-01  135  170  150
2018-06-01  150  160  130
2018-07-01  155  150  120
2018-08-01  160  150  110


### Store DataFrame in SQLite3
Pandas has the ability to store data in a variety of databases, including Postgres, MySQL and other relational databases. In this case we will be using SQLite3.

SQLite3 is a lightweight database that stored data on the disk as a flat file, or in memory. It has native python bindings and is exceptionally easy to use given there is no need to connect to a server, and relatively fast (using read-write speeds) as it is written in C++ and stored locally.

To begin, we need to open a database connection. We will use pythons context manager `with <expression> as <name>:` to save time. To view more on context managers please see ....link.... 

We will then use `pandas.to_sql()` to insert the DataFrame into the **SQLite3** databse. to ensure this is done properly we have a number of function inputs we can use:
* name: string -  Name of SQL table
* con: connection - database connections
* if_exists: string {'append', 'replace', or 'fail'} - if data exists do you want to replace all data with new data in DataFrame, or would you like to append (i.e. add data as additional rows)
* index: boolean, default True - write DataFrame index as a column

We will then read data from the database and store into another DataFrame using the `pandas.read_sql()` functionality.

In [12]:
db_path = 'pandas-sql.db'
table_name = 'sec_prices'

# if you want to play around with appending data and replacoing, change the below variable
append_replace = 'append'

sql = 'select * from {}'.format(table_name)

with sqlite3.connect(db_path) as conn:
    df1.to_sql(name=table_name, con=conn, if_exists='replace')
    df2.to_sql(name=table_name, con=conn, if_exists='append')
    
    df3 = pandas.read_sql(sql, conn)

print(df3)

        index  bhp  nab  rio
0  2018-01-01  100  120  120
1  2018-02-01  110  130  130
2  2018-03-01  125  140  160
3  2018-04-01  130  150  140
4  2018-05-01  135  170  150
5  2018-06-01  150  160  130
6  2018-07-01  155  150  120
7  2018-08-01  160  150  110
