### Postgres for Data Engineers: Loading and Extracting Data with Tables
#### These are exercises done as part of <a href = "www.dataquest.io"> DataQuest</a>'s Data Engineer Path
This is not replicated for commercial use; strictly personal development.<br>
All exercises are (c) DataQuest, with slight modifications so they use my PostGres server on my localhost

#### Loading and Extracting Data with Tables Mission
<b>1.  </b>Instructions:
- Use the provided `cur` variable.
- Load the `ign.csv` file found in terminal table using the `csv` module.
- Run the insert query on the `ign_reviews` table using the execute method using the prepared statement.
- Insert every row from the `ign_review.csv` file except for the header row.
- Note that the last column is `release_date` instead of the 3 `release_day`, `release_month`, and `release_year` columns.
- Commit your changes using the `conn` object.

<font color = 'blue'>We used SQL Alchemy to complete this task in the <a href = "https://github.com/nmolivo/dataquest_eng/blob/master/1_production_databases/02_opt_tables.ipynb">second mission</a> of this quest. In the second mission, I also explore populating a database using a csv file. I wound up using `\COPY`, a psql command. Overall, I appreciate SQL Alchemy the most of all these methods. Nevertheless, I complete all exercises here.</font>

In [1]:
import csv
import psycopg2

```python
conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()
with open('ign.csv', 'r') as f:
    next(f)
    reader = csv.reader(f)
    for row in reader:
        cur.execute("INSERT INTO ign_reviews VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)", row)
conn.commit()
```

<b>2.  </b>Instructions:
- Use the provided `cur` variable.
- Load the `ign.csv` file found in terminal table using the `csv` module.
- Create a comma-seperated string of mogrified values using the `mogrify()` method.
- Mogrify every row from the `ign_review.csv` file and skip the header row.
- Set the comma-seperated string to the variable `mogrified_values`.
- Execute the insert query on the `ign_reviews` table using the execute method.
- Concat the `mogrified_values` to the `INSERT` statement.
- Commit your changes using the `conn` object.

> From the previous screen, we discussed how the prepared statement safely converts the Python types to the Postgres types when executing an `INSERT` statement. The conversion takes place in a seperate step within the `psycopg2` library using a method called `mogrify()`
>
>DataQuest

```python
conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()
with open('ign.csv', 'r') as f:
    next(f)
    reader = csv.reader(f)
    mogrified = [cur.mogrify("(%s, %s, %s, %s, %s, %s, %s, %s, %s)", row).decode('utf-8') for row in reader]
mogrified_values = ",".join(mogrified)
cur.execute("INSERT INTO ign_reviews VALUES " + mogrified_values)
conn.commit()
```

> `>>> cur.mogrify("INSERT INTO test (num, data) VALUES (%s, %s)", (42, 'bar'))
"INSERT INTO test (num, data) VALUES (42, E'bar')"`
>
> <a href = "http://initd.org/psycopg/docs/cursor.html">Psycopg2 Cursor Class Documentation</a>

<b>3.  </b>Instructions:
- Use the provided `cur` variable.
- Load the `ign.csv` file.
- Execute the `COPY ... FROM` method on the `ign_reviews` table using the `copy_expert` method.
- Add the `CSV` and `HEADER` options.
- Commit your changes using the `conn` object.

> The `cur.copy_from()` method provides a useful API for file copying but only if the file is defined with a simple seperator (delimiter) character
>
> To use the `copy_expert()` method, you first have to declare the full `COPY` statement and then pass in the Python file descriptor. The biggest difference you may notice is that we don't copy from a file, but from the `STDIN` which in this case is the Python file object.
>
> DataQuest

```python
conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()
with open('ign.csv',  'r') as f:
    cur.copy_expert('COPY ign_reviews FROM STDIN WITH CSV HEADER', f)
conn.commit()
```

<b>4.  </b>Instructions:
- Using the time module, play around with the following screen to determine which of the last three methods we introduced is the fastest.

```python
import time
conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()
# Multiple single insert statements.
start = time.time()
with open('ign.csv', 'r') as f:
    next(f)
    reader = csv.reader(f)
    for row in reader:
        cur.execute(
            "INSERT INTO ign_reviews VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)",
            row
        )
conn.rollback()
print("Single statment insert: ", time.time() - start)
        
# Multiple mogrify insert.
start = time.time()
with open('ign.csv', 'r') as f:
    next(f)
    reader = csv.reader(f)
    mogrified = [ 
        cur.mogrify("(%s, %s, %s, %s, %s, %s, %s, %s, %s)", row).decode('utf-8')
        for row in reader
    ] 
    mogrified_values = ",".join(mogrified) 
    cur.execute('INSERT INTO ign_reviews VALUES ' + mogrified_values)
conn.rollback()
print("Multiple mogrify insert: ", time.time() - start)

        
# Copy expert method.
start = time.time()
with open('ign.csv', 'r') as f:
    cur.copy_expert('COPY ign_reviews FROM STDIN WITH CSV HEADER', f)
conn.rollback()
print("Copy expert method: ", time.time() - start)
```

Single statment insert:  2.948253631591797<br>
Multiple mogrify insert:  1.0108413696289062<br>
Copy expert method:  0.16642284393310547<br>

<b>5.  </b>Instructions:
- Use the provided `cur` variable.
- Open a `old_ign_reviews.csv` file using the statement with `open()` as `f`.
- Execute the `COPY ... TO` method on the `old_ign_reviews` table using the `copy_expert` method.
- Add the `CSV` and `HEADER` options.
- Write it out to the `old_ign_reviews.csv` file.

```python
conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()

with open('old_ign_reviews.csv', 'w') as f:
    cur.copy_expert('COPY old_ign_reviews TO STDOUT CSV HEADER', f)
```

> In the previous mission we discussed how to alter an older table and adjust it to incoming new data. However, there are times where we might want to create a new table without altering the older data. With the new table created, we then want to copy all the older data over from the old table and transform it into the new table.
>
>DataQuest

<b>6.  </b>Instructions:
- Use the provided `cur` variable.
- Open a `old_ign_reviews.csv` file using the statement with `open()` as `f`.
- Execute the `COPY ... TO` method on the `old_ign_reviews` table using the `copy_expert` method.
- Add the `CSV` and `HEADER` options.
- Process the data and transform it to match the `ign_reviews` table. <font color ='blue'>meaning: made the date columns into one column containing the full date.</font>
- Insert the processed rows into the `ign_reviews` table using whatever `INSERT` command you want.
- Commit your changes.

> ### open()
>
|Character|Meaning|
|------|------|
|'r'|open for reading (default)|
|'w'|open for writing, truncating the file first|
|'x'|open for exclusive creation, failing if the file already exists|
|'a'|open for writing, appending to the end of the file if it exists|
|'b'|binary mode|
|'t'|text mode (default)|
|'+'|open a disk file for updating (reading and writing)|
|'U'|universal newlines mode (deprecated)|
>
>The default mode is 'r' (open for reading text, synonym of 'rt'). For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.


```python
import csv
from datetime import date

conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()
with open('old_ign_reviews.csv', 'r+') as f:
    cur.copy_expert('COPY old_ign_reviews TO STDOUT WITH CSV HEADER', f)
    f.seek(0)
    next(f) #skip header
    reader = csv.reader(f)
    for row in reader:
        updated_row = row[:8]
        updated_row.append(date(int(row[8]), int(row[9]), int(row[10])))
        cur.execute("INSERT INTO ign_reviews VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)", updated_row)
    conn.commit()
```

> Referring to the last method:
> This approach is great for tables that contain less than a million rows but as the size of the table increases, it becomes unlikely that this approach would work.
>
> DataQuest

<b>7.  </b>Instructions:
- Use the provided `cur` variable.
- Insert rows into the `ign_reviews` table using the `INSERT` with `SELECT` from the `old_ign_reviews` table.
- Commit your changes.

```python
conn = psycopg2.connect("dbname=dq user=dq")
cur = conn.cursor()

cur.execute("""
INSERT INTO ign_reviews (id, score_phrase, title, url, platform, score, genre, editors_choice, release_date)

SELECT id, score_phrase, title_of_game_review, url, platform, score, genre, editors_choice, to_date(release_day || '-' || release_month || '-' || release_year, 'DD-MM-YYYY') as release_date 

FROM old_ign_reviews
""")
conn.commit()
```