### Optimizing Postgres Databases: Advanced Indexing
#### These are exercises done as part of <a href = "www.dataquest.io"> DataQuest</a>'s Data Engineer Path
This is not replicated for commercial use; strictly personal development.<br>
All exercises are (c) DataQuest, with slight modifications so they use my PostGres server on my localhost

>In this mission, we will be expanding on the concept of indexing, and we will dive into Postgres' advanced indexing features. The features we will investigate are multiple column indexes, different types of indexes, and partial indexes. You can think of these concepts as adding options to an index.
>
>DataQuest

<font color = 'blue'>Remember: Index Scans are more efficient than Sequential Scans of data.</font>

#### Advanced Indexing
<b>1.</b> Instructions:
- Use the provided `cur` and `conn` object.
- Create an index on `state` for the `homeless_by_coc` table.
    - Commit your changes.
- Run `EXPLAIN` on a select all from `homeless_by_coc`.
    - Filter by `CA` on the indexed `state` column.
    - Filter years greater than `1991-01-01` on the non-indexed `year` column.
    - Format the output with `json`.
- Call `.fetchall()` and pretty print the output.

```python
conn = psycopg2.connect(dbname="dq", user="hud_admin", password="abc123")
cur = conn.cursor()

#may be needed if you've already created the index.
cur.execute("DROP INDEX state_idx")
conn.commit()

cur.execute("CREATE INDEX state_idx ON homeless_by_coc(state)")
conn.commit()

cur.execute("EXPLAIN (format json) SELECT * FROM homeless_by_coc WHERE state = 'CA' AND year > '1991-01-01'")
pp.pprint(cur.fetchall())
```

```
[Output]
[([{'Plan': {'Alias': 'homeless_by_coc',
             'Filter': "(year > '1991-01-01'::date)",
             'Node Type': 'Bitmap Heap Scan',
             'Plan Rows': 9137,
             'Plan Width': 88,
             'Plans': [{'Index Cond': "(state = 'CA'::bpchar)",
                        'Index Name': 'state_idx',
                        'Node Type': 'Bitmap Index Scan',
                        'Parent Relationship': 'Outer',
                        'Plan Rows': 9137,
                        'Plan Width': 0,
                        'Startup Cost': 0.0,
                        'Total Cost': 172.82}],
             'Recheck Cond': "(state = 'CA'::bpchar)",
             'Relation Name': 'homeless_by_coc',
             'Startup Cost': 175.1,
             'Total Cost': 1594.16}}],)]
```

>A `Bitmap Heap Scan` occurs when Postgres encounters two, or more, columns that contain an index. Our heap scan follows these steps:
>
>1. Run through the indexed column, state, and select all the rows that match CA. This is the `Bitmap Index Scan`.
>2. Create a `Bitmap Heap` that is used as the temporary index.
>3. Scan through the `Bitmap Heap`, and select all rows that have a year value greater than 1991-01-01. This is the `Bitmap Heap Scan`.
>4. Return the results.
>
>This type of scan is more efficient than a pure Seq Scan, because the number of filtered rows in an index will always be less than or equal to the number of rows in the full table. Unfortunately, each filtered row must be sequentially searched again to find values that match the second filter (eg. year greater than 1991).
>
>We can eliminate the second sequential scan by adding an additional index on to another column in our table. This type of index is called a multi-column index. If you commonly run queries that filters two columns, then using a multi-column index can speed up your query times.
>
>DataQuest

<b>2. </b> Instructions:
- Use the provided `cur` and `conn` objects.
- Create and drop a single column index for `state` on `homeless_by_coc` to test the benchmark.
    - Run `EXPLAIN ANALYZE` on a select all from `homeless_by_coc`.
    - Filter by CA on the indexed `state` column.
    - Filter years greater than `1991-01-01` on the non-indexed year column.
    - Format the output with `json`.
    - Call `fetchall()` and pretty print the output.
- Create a multi-column index on state and year on `homeless_by_coc` and run the same `EXPLAIN ANALYZE`.
    - pretty print the output from `fetchall()`.

<b>3.</b> Instructions:
- Use the provided `cur` and `conn` objects.
- Create a multi-column index on `state`, `year`, and `coc_number` on `homeless_by_coc`.
    - Use the convention of naming your index by `snake_casing` the columns in order.
- Commit the index with the `conn` object.

<b>4. </b>Instructions:
- Use the provided `cur` and `conn` objects.
- Run a copy statement that loads the `homeless_by_coc.csv` file into the `homeless_by_coc` table.
    - Enclose the `COPY` by a start and end time, then print the `end_time`.
- Delete all the rows in the `homeless_by_coc` table.
- Create a double column index on `state`, `year` for `homeless_by_coc`.
- Run another copy statement that loads the `homeless_by_coc.csv` file into the `homeless_by_coc` table.
    - Enclose the `COPY` by a start and end time, then print the `end_time`.

<b>5. </b>Instructions:
- Use the provided `cur` and `conn` objects.
- Create a double column index on `state`, `year` for `homeless_by_coc`.
    - Add the descending order by option to `year`.
- Commit the index.
- Run a select on `homeless_by_coc`.
    - Select distinct `year`.
    - Filter by CA on the indexed `state` column.
    - Filter years greater than `1991-01-01` on the order by indexed year column.
- Call `fetchall()` and assign the return value to `ordered_years`
- pretty print `ordered_years`.

<b>6. </b>Instructions:
- Use the provided `cur` and `conn` objects.
- Create a case-insensitive expression index on measures for `homeless_by_coc`.
- Commit the index.
- Run a select all from `homeless_by_coc`.
    - Filter `measures` to rows with `'unsheltered homeless people in families'`.
    - Limit to 1 row.
- Call `fetchone()` and assign the return value to `unsheltered_row`

<b>7. </b>Instructions:
- Use the provided `cur` and `conn` objects.
    - Create a partial index on `homeless_by_coc`.
    - Index on the `state` column.
- Restrict the index on all rows that have a count greater than 0.
- Commit the index.
- Run an `EXPLAIN ANALYZE` on a select all from `homeless_by_coc`.
    - Filter `state` on CA and count greater than 0.
    - Limit to 1 row.
- Call `fetchall()` and pretty print the result. 

<b>8. </b>Instructions:
- Use the provided cur and conn objects.
- Create a multi-column index that speeds up the following query:
    - `SELECT hbc.year, si.name, hbc.count FROM homeless_by_coc hbc, state_info si WHERE hbc.state = si.postal AND hbc.year > '2007-01-01' AND hbc.measures != 'total homeless'`
- Run `EXPLAIN ANALYZE` on the query.
- Call `.fetchall()` and pretty print the results.