In [35]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [36]:
%%sql 
postgresql://basilbeirouti@localhost:5432

u'Connected: basilbeirouti@None'

In [50]:
%%sql
DROP TABLE mytable;

Done.


[]

The data type of **wine_id** is serial. This is a pecial notation to create unique identifier columns. The values will automatically be generated and incremented for every new row that is added to the table.

See the docs https://www.postgresql.org/docs/9.3/static/datatype-numeric.html

In [45]:
%%sql
CREATE TABLE mytable(
    wine_id SERIAL PRIMARY KEY,
    fixed_acidity  VARCHAR(13) NOT NULL
  ,volatile_acidity  VARCHAR(16) NOT NULL
  ,citric_acid  VARCHAR(11) NOT NULL
  ,residual_sugar  VARCHAR(14) NOT NULL
  ,chlorides  VARCHAR(9) NOT NULL
  ,free_so2  VARCHAR(19) NOT NULL
  ,total_so2  VARCHAR(20) NOT NULL
  ,density  VARCHAR(7) NOT NULL
  ,pH  VARCHAR(4) NOT NULL
  ,sulphates VARCHAR(9) NOT NULL
  ,alcohol VARCHAR(11) NOT NULL
  ,quality VARCHAR(7) NOT NULL
);

Done.


[]

Note that the column **wine_id** is not specified. PostgreSQL will automatically generate the values for that column since we specified the datatype as SERIAL (see above).

In [46]:
%%sql
INSERT INTO mytable( fixed_acidity,
                    volatile_acidity,
                    citric_acid,
                    residual_sugar,
                    chlorides,
                    free_so2,
                    total_so2,
                    density,
                    pH,
                    sulphates,
                    alcohol,
                    quality)
VALUES
('7.4','0.7','0','1.9','0.076','11','34','0.9978','3.51','0.56','9.4','5');


1 rows affected.


[]

Note how the wine_id value was automatically generated.

In [47]:
%%sql
SELECT * FROM mytable;

1 rows affected.


wine_id,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_so2,total_so2,density,ph,sulphates,alcohol,quality
1,7.4,0.7,0,1.9,0.076,11,34,0.9978,3.51,0.56,9.4,5


Here, instead of merely excluding the first row **wine_id** from the INSERT INTO statement, I include **wine_id**, and include the keyword DEFAULT instead of a real value in VALUES clause. The effect is the same, but this syntax might be easier to understand.

In [48]:
%%sql
INSERT INTO mytable( wine_id,
                    fixed_acidity,
                    volatile_acidity,
                    citric_acid,
                    residual_sugar,
                    chlorides,
                    free_so2,
                    total_so2,
                    density,
                    pH,
                    sulphates,
                    alcohol,
                    quality)
VALUES
(DEFAULT, '7.5','0.75','0.1','1.2','0.048','12','31','0.9782','3.21','0.86','5.2','5');


1 rows affected.


[]

As you can see, the unique ID is generated nicely and automatically for both entries

In [49]:
%%sql

SELECT * FROM mytable;

2 rows affected.


wine_id,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_so2,total_so2,density,ph,sulphates,alcohol,quality
1,7.4,0.7,0.0,1.9,0.076,11,34,0.9978,3.51,0.56,9.4,5
2,7.5,0.75,0.1,1.2,0.048,12,31,0.9782,3.21,0.86,5.2,5


## CSV to SQL

You still need to create the table, no way around that. 

In [51]:
%%sql
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(
    wine_id SERIAL PRIMARY KEY,
    fixed_acidity  VARCHAR(13) NOT NULL
  ,volatile_acidity  VARCHAR(16) NOT NULL
  ,citric_acid  VARCHAR(11) NOT NULL
  ,residual_sugar  VARCHAR(14) NOT NULL
  ,chlorides  VARCHAR(9) NOT NULL
  ,free_so2  VARCHAR(19) NOT NULL
  ,total_so2  VARCHAR(20) NOT NULL
  ,density  VARCHAR(7) NOT NULL
  ,pH  VARCHAR(4) NOT NULL
  ,sulphates VARCHAR(9) NOT NULL
  ,alcohol VARCHAR(11) NOT NULL
  ,quality VARCHAR(7) NOT NULL
);

Done.
Done.


[]

Now we can use the COPY [tablename] FROM [filepath] syntax. 

A few things to note:
   1. Absolute path to the file. Don't use relative
   
   2. single quote strings are necessary for the path
   
   3. wine_id is not specified with rest of column names, because it is autogenerated

In [52]:
%%sql
COPY mytable(fixed_acidity, 
             volatile_acidity, 
             citric_acid, 
             residual_sugar, 
             chlorides, 
             free_so2, 
             total_so2, 
             density, 
             pH, 
             sulphates, 
             alcohol, 
             quality) 
FROM
'/Users/basilbeirouti/Github/DSI-ATX-1/curriculum/04-lessons/week-06.5/PostgreSQL/wine_noid.csv' 
DELIMITER ',' CSV HEADER;

1599 rows affected.


[]

If the csv file already has a uid, then we would add **wine_id** to the list of column names. 

In [53]:
%%sql
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(
    wine_id SERIAL PRIMARY KEY,
    fixed_acidity  VARCHAR(13) NOT NULL
  ,volatile_acidity  VARCHAR(16) NOT NULL
  ,citric_acid  VARCHAR(11) NOT NULL
  ,residual_sugar  VARCHAR(14) NOT NULL
  ,chlorides  VARCHAR(9) NOT NULL
  ,free_so2  VARCHAR(19) NOT NULL
  ,total_so2  VARCHAR(20) NOT NULL
  ,density  VARCHAR(7) NOT NULL
  ,pH  VARCHAR(4) NOT NULL
  ,sulphates VARCHAR(9) NOT NULL
  ,alcohol VARCHAR(11) NOT NULL
  ,quality VARCHAR(7) NOT NULL
);

Done.
Done.


[]

In [54]:
%%sql
COPY mytable(wine_id, 
             fixed_acidity, 
             volatile_acidity, 
             citric_acid, 
             residual_sugar, 
             chlorides, 
             free_so2, 
             total_so2, 
             density, 
             pH, 
             sulphates, 
             alcohol, 
             quality) 
FROM
'/Users/basilbeirouti/Github/DSI-ATX-1/curriculum/04-lessons/week-06.5/PostgreSQL/wine.csv' 
DELIMITER ',' CSV HEADER;

1599 rows affected.


[]

## DataFrame to SQL

In this technique, we don't have to create the table beforehand. We first read the CSV into a DataFrame as we would if we were working with the data directly in Pandas. Then we call the to_sql method on the DataFrame to export to a SQL table directly. 

Pandas doesn't support PostgreSQL natively, it only supports SQL-Lite. Thus we use the create_engine function from SQLAlchemy, which takes care of translating the SQL-Lite that Pandas spits out into PostgreSQL.

In [55]:
from sqlalchemy import create_engine
import sql
import pandas as pd
con = create_engine('postgresql://basilbeirouti@localhost:5432')
df = pd.read_csv('/Users/basilbeirouti/Github/DSI-ATX-1/curriculum/04-lessons/week-06.5/PostgreSQL/wine.csv')

df.to_sql(name="newtable", con=con, if_exists = 'replace', index=False)

## Running a .SQL Script

You can write SQL code directly in a text editor of your choice, and save the file as a .sql file. Write all your SQL code for creating a new table and inserting values into it, and then simply run the file from the command line like this:

psql -f [filename.sql]

example:

psql -f /Users/basilbeirouti/Github/DSI-ATX-1/curriculum/04-lessons/week-06.5/PostgreSQL/wine.csv 

    

The syntax is basically saying call psql with the file located on that path. PostgreSQL will run that .sql file line by line.