### Optimizing Postgres Databases: Using an Index
#### These are exercises done as part of <a href = "www.dataquest.io"> DataQuest</a>'s Data Engineer Path
This is not replicated for commercial use; strictly personal development.<br>
All exercises are (c) DataQuest, with slight modifications so they use my PostGres server on my localhost

> In this mission, we will follow up on the join query, and work through strategies to make it more efficient. To begin, we will learn about different query scans a `SELECT` performs. Next, we will introduce the concept of an index, and how indexes are used to speed up common queries. 
>
>An index creates a b-tree structure on a column, separate from the table, which allows filtered queries to perform binary search.
>
> Using an index, we will show that we can speed up queries to run in $Olog(n)$ complexity from $O(n)$ We will both prove it theoretically, and then using `EXPLAIN`, show how query speeds will decrease as a result of adding the index. Finally, we will finish by demonstrating the positive effect an index can have on joins.
>
>DataQuest

#### Using an Index
<b>1.</b> Instructions:
- Use the provided `cur` object.
- Run the `EXPLAIN` command for a `SELECT` all query on the `homeless_by_coc` table filtering by `id`=10.
- Format the `EXPLAIN` query with json output.
- Call `.fetchall()` and pretty print the output.

In [1]:
import psycopg2
import pprint as pp

conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()

cur.execute("EXPLAIN (FORMAT json) SELECT * FROM vbstatic WHERE index = 5")
pp.pprint(cur.fetchall())

[([{'Plan': {'Alias': 'vbstatic',
             'Index Cond': '(index = 5)',
             'Index Name': 'vbstatic_pkey',
             'Node Type': 'Index Scan',
             'Parallel Aware': False,
             'Plan Rows': 1,
             'Plan Width': 100,
             'Relation Name': 'vbstatic',
             'Scan Direction': 'Forward',
             'Startup Cost': 0.42,
             'Total Cost': 8.44}}],)]


Since we were searching through the primary key, (see `vbstatic_pkey` in the output), Our query knows to stop searching after finding the first record where `index = 5` since in Postgres, all primary key values are unique. Our query does a binary search.<br><br>
>A binary search can help us find an item in a list efficiently if we know the list is ordered. We can check the middle element of the list, compare it to the item we're looking for, and continue narrowing our search in this manner.
>
>DataQuest

<b>2.</b> Instructions:
- Use the provided `cur` object.
- Run the `EXPLAIN` command on a select query from each table that filters on their corresponding primary keys:
    - Format by json.
    - `homeless_by_coc.id` equal to 5 and assign `fetchall()` to the variable `homeless_query_plan`.
    - `state_info.name` equal to Alabama and assign `fetchall()` to the variable `state_query_plan`.
    - `state_household_incomes.state` equal to Georgia and assign `fetchall()` to the variable `incomes_query_plan`.
- For each `query_plan` variable (`homeless_query_plan`, `state_query_plan`, `incomes_query_plan`), pretty print the output.

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()

cur.execute("EXPLAIN (format json) SELECT * FROM homeless_by_coc WHERE id=10")
homeless_query_plan = cur.fetchall()
pp.pprint(homeless_query_plan)

cur.execute("EXPLAIN (format json) SELECT * FROM state_info WHERE name='Alabama'")
state_query_plan = cur.fetchall()
pp.pprint(state_query_plan)

cur.execute("EXPLAIN (format json) SELECT * FROM state_household_incomes WHERE state='Georgia'")
incomes_query_plan = cur.fetchall()
pp.pprint(incomes_query_plan)
```

`[Output]`

`[([{'Plan': {'Alias': 'homeless_by_coc',
             'Index Cond': '(id = 10)',
             'Index Name': 'homeless_by_coc_pkey',
             'Node Type': 'Index Scan',
             'Plan Rows': 1,
             'Plan Width': 480,
             'Relation Name': 'homeless_by_coc',
             'Scan Direction': 'Forward',
             'Startup Cost': 0.29,
             'Total Cost': 8.3}}],)]
[([{'Plan': {'Alias': 'state_info',
             'Index Cond': "((name)::text = 'Alabama'::text)",
             'Index Name': 'state_info_pkey',
             'Node Type': 'Index Scan',
             'Plan Rows': 1,
             'Plan Width': 132,
             'Relation Name': 'state_info',
             'Scan Direction': 'Forward',
             'Startup Cost': 0.15,
             'Total Cost': 8.17}}],)]
[([{'Plan': {'Alias': 'state_household_incomes',
             'Index Cond': "((state)::text = 'Georgia'::text)",
             'Index Name': 'state_household_incomes_pkey',
             'Node Type': 'Index Scan',
             'Plan Rows': 1,
             'Plan Width': 318,
             'Relation Name': 'state_household_incomes',
             'Scan Direction': 'Forward',
             'Startup Cost': 0.14,
             'Total Cost': 8.16}}],)]`

If we were searching on anything besides each of these databases primary keys, it would take way longer! Because we are using primary keys, it is able to search it with $Olog(n)$ time.

>Let's create a separate table that's optimized for lookups by a different column than `id` from the `homeless_by_coc` table. First, we assign the column we want to query part of the primary key, so we get the speed benefits, and add the next part of the primary key as the `id` value from the `homeless_by_coc`. We call this table an index and each row in the index contains:
>
>- the value we want to be able to search by,
>- an `id` value for the corresponding row in `homeless_by_coc`,
>- assign both as composite primary keys for the table.
>
>DataQuest

<font color = 'blue'> This sounds similar to what I want to do with `vbstatic` data, I want a station ID as a possible index. <strike>with the datetime of record.</strike> So let's try doing this with `vbstatic`</font>

<b>3.</b> Instructions:
- Use the provided `cur` object.
- Create a table, `state_idx`, that contains the columns `state` and `homeless_id`.
    - Create a composite primary key containing both `state` and `homeless_id`.
    - Insert into `state_idx` the columns `state` and `id` from `homeless_by_coc`.
- Select `state`, `year`, and `coc_number` from `homeless_by_coc` by joining with the `state_idx` id.
    - Filter by `CA` state on `state_idx`.
- Call `fetchall()` and pretty print the results.

In [2]:
import pandas as pd
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT DISTINCT name from vbstatic")
names = pd.DataFrame(cur.fetchall())

In [3]:
names.columns = ['name']

In [4]:
names['stationid'] = ["%03d" % (x) for x in list(range(1,306))]

<font color = 'blue'>In case you run this a few times, like I did, here is the drop table:</font>

In [8]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()
cur.execute("DROP TABLE stations;")
conn.commit()

In [9]:
import sqlalchemy
from sqlalchemy import create_engine

engine = create_engine('postgresql+psycopg2://nmolivo:MYPASSWORD@localhost/valenbisi2018')
names.to_sql('stations', engine, dtype = {'name': sqlalchemy.types.CHAR(length=55), \
                                         'stationid':sqlalchemy.types.CHAR(length=3)})

In [48]:
conn = psycopg2.connect(dbname="valenbisi2018", user = "nmolivo")
cur = conn.cursor()
cur.execute("DROP TABLE vbstatic2;")
conn.commit()

In [49]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT vbstatic.index, vbstatic.update, vbstatic.free, vbstatic.available, vbstatic.total, vbstatic.lat, vbstatic.long, stations.stationid, stations.name\
             INTO vbstatic2\
             FROM vbstatic\
             FULL OUTER JOIN stations\
             ON stations.name = vbstatic.name")
conn.commit()

<font color = 'blue'>I'm having trouble making a composite primary key in `vbstatic2` because there are duplicate update and station id records! I will first check where duplicates are and then remove them.</font>

In [50]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT * from vbstatic2")
data = pd.DataFrame(cur.fetchall())

In [51]:
data.columns = ['index', 'update', 'free', 'available', 'total', 'lat', 'long', 'stationid', 'name']

In [52]:
station271 = data[data['stationid'] =='271']

In [53]:
station271

Unnamed: 0,index,update,free,available,total,lat,long,stationid,name
111,111,2018-02-20 05:27:07,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
384,384,2018-02-20 05:27:07,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
657,657,2018-02-20 05:41:52,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
930,930,2018-02-20 05:56:30,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
1139,1139,2018-02-20 06:11:17,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
1431,1431,2018-02-20 06:26:48,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
1848,1848,2018-02-20 06:42:17,0,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
1951,1951,2018-02-20 06:56:44,15,0,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
2189,2189,2018-02-20 07:12:19,8,7,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...
2463,2463,2018-02-20 07:26:46,8,7,15,39.48154732,-0.39839384,271,145_PLAZA_BADAJOZ ...


In [58]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT update, COUNT( stationid ) FROM vbstatic2 GROUP BY update HAVING COUNT (stationid)>1 ORDER BY update")
dupcheck = pd.DataFrame(cur.fetchall())

In [59]:
dupcheck

Unnamed: 0,0,1
0,2018-02-20 05:27:07,273
1,2018-02-20 05:41:52,273
2,2018-02-20 05:56:30,273
3,2018-02-20 06:11:17,268
4,2018-02-20 06:12:18,5
5,2018-02-20 06:26:48,273
6,2018-02-20 06:42:17,273
7,2018-02-20 06:56:44,273
8,2018-02-20 07:12:19,273
9,2018-02-20 07:26:46,273


<font color ='blue'>Interesting, we have some datapoints where we didn't collect data from all stations.</font>

In [8]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()

query = """
DELETE FROM vbstatic2 
WHERE index IN (SELECT index
                 FROM (SELECT index, ROW_NUMBER() OVER (PARTITION BY stationid, update ORDER BY index) AS rnum 
                       FROM vbstatic2) t 
                 WHERE t.rnum >1);
"""

#read this query from the inside, out: What are the things we count as duplicates and lets groupby those things
# Start with the groupby aka partition by:
    # Station ID and Update
    # Order by index - We created index a unique identifyer, it's just an ordered number by collection or 'update' time. 
# Now that we've grouped dups, select index and the new variable that we've created on the fly, called ROW_NUMBER
    # ROWNUMBER counts the number of records stored for a particular stationid/update combination as we iterate through. 
    # Any stationid/update with a ROWNUMBER >1 will be a dup.
    # We declare this as a table t and ROW NUMBER variable to be rnum
    # Select all where Rownumber is >1, and delete it.
    
cur.execute(query) 
conn.commit()

<font color = 'blue'>Now we can perform on `valenbisi2` what the instructions for the dataquest mission loosely instruct.</font>

In [7]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()

cur.execute("CREATE TABLE station_update_idx (update TIMESTAMP, stationid CHAR(3), PRIMARY KEY (update, stationid))")
cur.execute("INSERT INTO station_update_idx SELECT update, stationid FROM vbstatic2")
conn.commit()

In [9]:
conn = psycopg2.connect(dbname="valenbisi2018", user="nmolivo")
cur = conn.cursor()

cur.execute("SELECT vbstatic2.available, vbstatic2.free FROM vbstatic2, station_update_idx idx\
             WHERE idx.update = '2018-02-20 07:12:19' AND idx.stationid = vbstatic2.stationid")
pp.pprint(cur.fetchall())

[(0, 24),
 (3, 17),
 (14, 0),
 (0, 16),
 (2, 18),
 (5, 11),
 (6, 19),
 (9, 6),
 (7, 12),
 (0, 20),
 (2, 15),
 (2, 23),
 (12, 8),
 (17, 6),
 (14, 6),
 (12, 7),
 (18, 2),
 (17, 8),
 (16, 4),
 (11, 4),
 (1, 19),
 (15, 0),
 (12, 3),
 (8, 7),
 (11, 9),
 (0, 30),
 (7, 11),
 (7, 8),
 (2, 18),
 (15, 0),
 (20, 0),
 (18, 2),
 (20, 5),
 (19, 0),
 (19, 0),
 (6, 9),
 (6, 14),
 (0, 20),
 (2, 23),
 (1, 19),
 (22, 3),
 (20, 0),
 (10, 5),
 (4, 20),
 (14, 0),
 (12, 8),
 (11, 10),
 (14, 0),
 (3, 12),
 (16, 4),
 (17, 0),
 (17, 3),
 (13, 1),
 (3, 12),
 (1, 21),
 (1, 24),
 (0, 15),
 (10, 8),
 (2, 25),
 (1, 18),
 (1, 23),
 (2, 18),
 (2, 38),
 (1, 23),
 (11, 9),
 (7, 8),
 (11, 4),
 (6, 14),
 (6, 8),
 (15, 0),
 (17, 0),
 (20, 0),
 (15, 0),
 (18, 1),
 (17, 3),
 (0, 25),
 (1, 19),
 (1, 19),
 (9, 11),
 (7, 8),
 (4, 16),
 (10, 5),
 (17, 3),
 (0, 20),
 (1, 19),
 (1, 24),
 (22, 18),
 (1, 19),
 (2, 18),
 (0, 30),
 (3, 11),
 (0, 40),
 (4, 11),
 (3, 17),
 (34, 4),
 (10, 9),
 (0, 36),
 (9, 12),
 (22, 1),
 (14, 1),
 (16,

<font color = 'blue'>Ok, here's the DQ task:</font>

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()

cur.execute("CREATE TABLE state_idx (state CHAR(2), homeless_id INT, PRIMARY KEY (state, homeless_id))")
cur.execute("INSERT INTO state_idx SELECT state, id FROM homeless_by_coc")
conn.commit()
cur.execute("SELECT hbc.state, hbc.year, hbc.coc_number FROM homeless_by_coc hbc, state_idx WHERE state_idx.state = 'CA' AND state_idx.homeless_id=hbc.id")
pp.pprint(cur.fetchall()) 
```

<b>4.</b> Instructions:
- Use the provided `cur` object.
- Run the `EXPLAIN ANALYZE` on the query you built in the last screen.
    - Format the ouptut with the json type.
- Call `.fetchall()` and pretty print the output.
- Run the `EXPLAIN ANALYZE` on a query that returns the columns `id`, `year`, and `coc_number`, and filters `state` equal to `CA` on the `homeless_by_coc` table.
    - Format the ouptut with the `json` type.
- Call `.fetchall()` and pretty print the output.   

In [12]:
conn = psycopg2.connect(dbname = 'valenbisi2018', user = 'nmolivo')
cur = conn.cursor()
cur.execute("""
            EXPLAIN (ANALYZE, format json) SELECT vbstatic2.available, vbstatic2.free FROM vbstatic2,\
                                                                                      station_update_idx idx\
            WHERE idx.update = '2018-02-20 07:12:19' AND idx.stationid = vbstatic2.stationid
            """)
pp.pprint(cur.fetchall())

[([{'Execution Time': 238.214,
    'Plan': {'Actual Loops': 1,
             'Actual Rows': 272968,
             'Actual Startup Time': 0.404,
             'Actual Total Time': 223.079,
             'Hash Cond': '(vbstatic2.stationid = idx.stationid)',
             'Inner Unique': True,
             'Join Type': 'Inner',
             'Node Type': 'Hash Join',
             'Parallel Aware': False,
             'Plan Rows': 258270,
             'Plan Width': 8,
             'Plans': [{'Actual Loops': 1,
                        'Actual Rows': 273000,
                        'Actual Startup Time': 0.025,
                        'Actual Total Time': 65.008,
                        'Alias': 'vbstatic2',
                        'Node Type': 'Seq Scan',
                        'Parallel Aware': False,
                        'Parent Relationship': 'Outer',
                        'Plan Rows': 273000,
                        'Plan Width': 12,
                        'Relation Name': 'vbstatic2',

<font color = 'blue'>Answer to the DQ mission:</font>

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()
cur.execute("""
SELECT hbc.id, hbc.year, hbc.coc_number FROM homeless_by_coc hbc, state_idx
WHERE state_idx.state = 'CA' AND state_idx.homeless_id = hbc.id
""")
```
```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()
cur.execute("""
EXPLAIN (ANALYZE, format json) SELECT hbc.id, hbc.year, hbc.coc_number FROM homeless_by_coc hbc, state_idx
WHERE state_idx.state = 'CA' AND state_idx.homeless_id = hbc.id
""")
pp.pprint(cur.fetchall())
```
```python
cur.execute("""
EXPLAIN (ANALYZE, format json) SELECT id, year, coc_number FROM homeless_by_coc WHERE state='CA'
""")
pp.pprint(cur.fetchall())
```

```[Output]
[([{'Execution Time': 183.424,
    'Plan': {'Actual Loops': 1,
             'Actual Rows': 8946,
             'Actual Startup Time': 115.559,
             'Actual Total Time': 180.863,
             'Hash Cond': '(state_idx.homeless_id = hbc.id)',
             'Join Type': 'Inner',
             'Node Type': 'Hash Join',
             'Plan Rows': 339,
             'Plan Width': 44,
             'Plans': [{'Actual Loops': 1,
                        'Actual Rows': 8946,
                        'Actual Startup Time': 0.812,
                        'Actual Total Time': 3.877,
                        'Alias': 'state_idx',
                        'Exact Heap Blocks': 50,
                        'Lossy Heap Blocks': 0,
                        'Node Type': 'Bitmap Heap Scan',
                        'Parent Relationship': 'Outer',
                        'Plan Rows': 339,
                        'Plan Width': 4,
                        'Plans': [{'Actual Loops': 1,
                                   'Actual Rows': 8946,
                                   'Actual Startup Time': 0.798,
                                   'Actual Total Time': 0.798,
                                   'Index Cond': '(state = '
                                                 "'CA'::bpchar)",
                                   'Index Name': 'state_idx_pkey',
                                   'Node Type': 'Bitmap Index Scan',
                                   'Parent Relationship': 'Outer',
                                   'Plan Rows': 339,
                                   'Plan Width': 0,
                                   'Startup Cost': 0.0,
                                   'Total Cost': 14.96}],
                        'Recheck Cond': "(state = 'CA'::bpchar)",
                        'Relation Name': 'state_idx',
                        'Rows Removed by Index Recheck': 0,
                        'Startup Cost': 15.04,
                        'Total Cost': 407.05},
                       {'Actual Loops': 1,
                        'Actual Rows': 86529,
                        'Actual Startup Time': 114.712,
                        'Actual Total Time': 114.712,
                        'Hash Batches': 2,
                        'Hash Buckets': 4096,
                        'Node Type': 'Hash',
                        'Original Hash Batches': 1,
                        'Parent Relationship': 'Inner',
                        'Peak Memory Usage': 4097,
                        'Plan Rows': 20512,
                        'Plan Width': 44,
                        'Plans': [{'Actual Loops': 1,
                                   'Actual Rows': 86529,
                                   'Actual Startup Time': 0.005,
                                   'Actual Total Time': 71.008,
                                   'Alias': 'hbc',
                                   'Node Type': 'Seq Scan',
                                   'Parent Relationship': 'Outer',
                                   'Plan Rows': 20512,
                                   'Plan Width': 44,
                                   'Relation Name': 'homeless_by_coc',
                                   'Startup Cost': 0.0,
                                   'Total Cost': 1487.12}],
                        'Startup Cost': 1487.12,
                        'Total Cost': 1487.12}],
             'Startup Cost': 1758.56,
             'Total Cost': 2156.92},
    'Planning Time': 0.415,
    'Triggers': []}],)]
[([{'Execution Time': 15.981,
    'Plan': {'Actual Loops': 1,
             'Actual Rows': 8946,
             'Actual Startup Time': 0.04,
             'Actual Total Time': 13.522,
             'Alias': 'homeless_by_coc',
             'Filter': "(state = 'CA'::bpchar)",
             'Node Type': 'Seq Scan',
             'Plan Rows': 103,
             'Plan Width': 44,
             'Relation Name': 'homeless_by_coc',
             'Rows Removed by Filter': 77583,
             'Startup Cost': 0.0,
             'Total Cost': 1538.4},
    'Planning Time': 0.044,
    'Triggers': []}],)]```

<b>5.</b> Instructions:
- Use the provided `cur` and `conn` object.
- Create an index on state for the `homeless_by_coc` table.
    - Commit your changes.
- Run `EXPLAIN ANALYZE` on a select all from `homeless_by_coc` and filter by `CA` on the indexed `state` column.
    - Format the output with `json`.
- Call `.fetchall()` and pretty print the output.

>By letting Postgres maintain the indexes, we know that they will remain up to date as rows are added to the table. In addition, Postgres will automatically take advantages of indexes whenever possible, so we can focus on writing queries. This occurs during the planning/optimization stage, which is why we can see it in the EXPLAIN query.
>
>While creating indexes gives us tremendous speed benefits, they come at the cost of space. Each index needs to be stored in the database file. In addition, adding, editing, and deleting rows takes longer since each of the affected indexes need to be updated. Because indexes can be created after a table is created, it's recommended to only create an index when you find yourself querying on a specific column frequently.
>
>DataQuest

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()

cur.execute("DROP INDEX state_idx")
conn.commit()
cur.execute("CREATE INDEX state_idx ON homeless_by_coc(state)")
conn.commit()
cur.execute("EXPLAIN (ANALYZE, format json) SELECT * FROM homeless_by_coc where state = 'CA'")
pp.pprint(cur.fetchall())
```

```
[Output]
[([{'Execution Time': 2.657,
    'Plan': {'Actual Loops': 1,
             'Actual Rows': 8946,
             'Actual Startup Time': 1.0,
             'Actual Total Time': 2.284,
             'Alias': 'homeless_by_coc',
             'Exact Heap Blocks': 142,
             'Lossy Heap Blocks': 0,
             'Node Type': 'Bitmap Heap Scan',
             'Plan Rows': 8875,
             'Plan Width': 88,
             'Plans': [{'Actual Loops': 1,
                        'Actual Rows': 8946,
                        'Actual Startup Time': 0.981,
                        'Actual Total Time': 0.981,
                        'Index Cond': "(state = 'CA'::bpchar)",
                        'Index Name': 'state_idx',
                        'Node Type': 'Bitmap Index Scan',
                        'Parent Relationship': 'Outer',
                        'Plan Rows': 8875,
                        'Plan Width': 0,
                        'Startup Cost': 0.0,
                        'Total Cost': 166.85}],
             'Recheck Cond': "(state = 'CA'::bpchar)",
             'Relation Name': 'homeless_by_coc',
             'Rows Removed by Index Recheck': 0,
             'Startup Cost': 169.07,
             'Total Cost': 1562.01},
    'Planning Time': 0.438,
    'Triggers': []}],)]
```

<b>6. </b> Instructions:
- Use the provided `cur` and `conn` objects.
- Proceeding the `EXPLAIN ANALYZE` command's `fetchall()`, drop the index on the `homeless_by_coc` table.
    - Commit your changes.
- Re-run `EXPLAIN ANALYZE` on a select all from `homeless_by_coc` and filter by `CA` on the indexed `state` column.
    - Format the output with `json`.
- Call `.fetchall()` and pretty print the output.

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()
#cur.execute("CREATE INDEX state_idx ON homeless_by_coc(state)")
#conn.commit()
#cur.execute("EXPLAIN (ANALYZE, format json) SELECT * FROM homeless_by_coc WHERE state='CA'")
#pp.pprint(cur.fetchall())
cur.execute("DROP INDEX IF EXISTS state_idx")
conn.commit()
cur.execute("EXPLAIN (ANALYZE, format json) SELECT * FROM homeless_by_coc WHERE state = 'CA'")
pp.pprint(cur.fetchall())
```

```
[Output]
[([{'Execution Time': 17.527,
    'Plan': {'Actual Loops': 1,
             'Actual Rows': 8946,
             'Actual Startup Time': 0.063,
             'Actual Total Time': 17.125,
             'Alias': 'homeless_by_coc',
             'Filter': "(state = 'CA'::bpchar)",
             'Node Type': 'Seq Scan',
             'Plan Rows': 103,
             'Plan Width': 480,
             'Relation Name': 'homeless_by_coc',
             'Rows Removed by Filter': 77583,
             'Startup Cost': 0.0,
             'Total Cost': 1538.4},
    'Planning Time': 14.656,
    'Triggers': []}],)]
```

<b>7. </b> Instructions:
- Use the provided `cur` and `conn` objects.
- Create and drop the index for `state` on `homeless_by_coc` to test the benchmark.
    - Run `EXPLAIN ANALYZE` on the given join query for `homeless_by_coc` before and after the drop.
    - Call `.fetchall()` to return the output.
- Pretty print the output from `fechall()`.

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()
#cur.execute("CREATE INDEX state_idx ON homeless_by_coc(state)")
#conn.commit()
query = "EXPLAIN (ANALYZE, format json) SELECT hbc.state, hbc.coc_number, hbc.coc_name, si.name FROM homeless_by_coc as hbc, state_info as si WHERE hbc.state = si.postal"

cur.execute(query)
pp.pprint(cur.fetchall())
```

```
[Output]
[([{'Execution Time': 37.718,
    'Plan': {'Actual Loops': 1,
             'Actual Rows': 85449,
             'Actual Startup Time': 0.062,
             'Actual Total Time': 34.366,
             'Hash Cond': '(hbc.state = si.postal)',
             'Join Type': 'Inner',
             'Node Type': 'Hash Join',
             'Plan Rows': 216322,
             'Plan Width': 89,
             'Plans': [{'Actual Loops': 1,
                        'Actual Rows': 86529,
                        'Actual Startup Time': 0.014,
                        'Actual Total Time': 10.275,
                        'Alias': 'hbc',
                        'Node Type': 'Seq Scan',
                        'Parent Relationship': 'Outer',
                        'Plan Rows': 86529,
                        'Plan Width': 43,
                        'Relation Name': 'homeless_by_coc',
                        'Startup Cost': 0.0,
                        'Total Cost': 2147.29},
                       {'Actual Loops': 1,
                        'Actual Rows': 50,
                        'Actual Startup Time': 0.025,
                        'Actual Total Time': 0.025,
                        'Hash Batches': 1,
                        'Hash Buckets': 1024,
                        'Node Type': 'Hash',
                        'Original Hash Batches': 1,
                        'Parent Relationship': 'Inner',
                        'Peak Memory Usage': 3,
                        'Plan Rows': 500,
                        'Plan Width': 58,
                        'Plans': [{'Actual Loops': 1,
                                   'Actual Rows': 50,
                                   'Actual Startup Time': 0.007,
                                   'Actual Total Time': 0.01,
                                   'Alias': 'si',
                                   'Node Type': 'Seq Scan',
                                   'Parent Relationship': 'Outer',
                                   'Plan Rows': 500,
                                   'Plan Width': 58,
                                   'Relation Name': 'state_info',
                                   'Startup Cost': 0.0,
                                   'Total Cost': 15.0}],
                        'Startup Cost': 15.0,
                        'Total Cost': 15.0}],
             'Startup Cost': 21.25,
             'Total Cost': 9956.15},
    'Planning Time': 0.505,
    'Triggers': []}],)]
```