### Optimizing Postgres Databases: Exploring Postgres Internals
#### These are exercises done as part of <a href = "www.dataquest.io"> DataQuest</a>'s Data Engineer Path
This is not replicated for commercial use; strictly personal development.<br>
All exercises are (c) DataQuest, with slight modifications so they use my PostGres server on my localhost

> We learn about Postgres schemas and how to build a database description from scratch. Using internal tables, we were able to describe metadata associated with our Postgres database. The metadata included data types, table names, created schemas, and plenty of other useful information.
>
>DataQuest

#### Exploring Postgres Internals Mission
<b>1.  </b>Instructions:
- Import the `psycopg2` library.
- Connect to the `dq` database with the user `hud_admin` and the password `eRqg123EEkl` using the keyword arguments.
- Use the `print` function to display the Connection object.

In [7]:
import psycopg2
conn = psycopg2.connect(dbname="valenbisi2018", user="stormadmin", password="admin123")
print(conn)

<connection object at 0x1061afe60; dsn: 'dbname=valenbisi2018 password=xxx user=stormadmin', closed: 0>


<font color = 'blue'>This seems to let any user get into any db, and it allows any password to fly. Resources to fix this:</font><br>
- https://stackoverflow.com/questions/21054549/postgres-accepts-any-password
- https://dba.stackexchange.com/questions/17790/created-user-can-access-all-databases-in-postgresql-without-any-grants

<b>2.  </b>Instructions:
- Use the provided `cur` object.
- Using the `SELECT` query, grab the `table_name` column from the `information_schema.tables` table with the `ORDER BY` option on the `table_name` column.
- Fetch all the results and assign them to the variable `table_names`.
- Loop through `table_names`:
- Print each `table_name` from the query.

<font color = 'blue'>If you don't know the table names in your schema, you can fetch them by doing the following. Here is a list of tables in the `information_schema`:</font>

>|Name|Data Type|Description|
>|------|------|------|
|`table_catalog`|`sql_identifier`|Name of the database that contains the table (always the current database)|
|`table_schema`|`sql_identifier`|Name of the schema that contains the table|
|`table_name`|`sql_identifier`|Name of the table|
|`table_type`|`character_data`|Type of the table: BASE TABLE for a persistent base table (the normal table type), VIEW for a view, FOREIGN TABLE for a foreign table, or LOCAL TEMPORARY for a temporary table|
|`self_referencing_column_name`|`sql_identifier`|Applies to a feature not available in PostgreSQL|
|`reference_generation`|`character_data`|Applies to a feature not available in PostgreSQL|
|`user_defined_type_catalog`|`sql_identifier`|If the table is a typed table, the name of the database that contains the underlying data type (always the current database), else null.|
|`user_defined_type_schema`|`sql_identifier`|If the table is a typed table, the name of the schema that contains the underlying data type, else null.|
|`user_defined_type_name`|`sql_identifier`|If the table is a typed table, the name of the underlying data type, else null.|
|`is_insertable_into`|`yes_or_no`|YES if the table is insertable into, NO if not (Base tables are always insertable into, views not necessarily.)|
|`is_typed`|`yes_or_no`|YES if the table is a typed table, NO if not|
|`commit_action`|`character_data`|If the table is a temporary table, then PRESERVE, else null. (The SQL standard defines other commit actions for temporary tables, which are not supported by PostgreSQL.)|
>
><a href = "https://www.postgresql.org/docs/9.1/static/infoschema-tables.html">PostgreSQL Documentation: Information Schema</a>

In [8]:
conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()
cur.execute('SELECT table_name FROM information_schema.tables ORDER BY table_name')
table_names = cur.fetchall()
for name in table_names:
    print(name)

('_pg_foreign_data_wrappers',)
('_pg_foreign_servers',)
('_pg_foreign_table_columns',)
('_pg_foreign_tables',)
('_pg_user_mappings',)
('administrable_role_authorizations',)
('applicable_roles',)
('attributes',)
('character_sets',)
('check_constraint_routine_usage',)
('check_constraints',)
('collation_character_set_applicability',)
('collations',)
('column_domain_usage',)
('column_options',)
('column_privileges',)
('column_udt_usage',)
('columns',)
('constraint_column_usage',)
('constraint_table_usage',)
('data_type_privileges',)
('domain_constraints',)
('domain_udt_usage',)
('domains',)
('element_types',)
('enabled_roles',)
('foreign_data_wrapper_options',)
('foreign_data_wrappers',)
('foreign_server_options',)
('foreign_servers',)
('foreign_table_options',)
('foreign_tables',)
('information_schema_catalog_name',)
('key_column_usage',)
('parameters',)
('pg_aggregate',)
('pg_am',)
('pg_amop',)
('pg_amproc',)
('pg_attrdef',)
('pg_attribute',)
('pg_auth_members',)
('pg_authid',)
('pg_avai

> In the output, you would have noticed many tables that started with the prefix `pg_*`. Each one of these tables is part of the `pg_catalog` group of internal tables. These are the system catalog tables. 
>
> DataQuest

<b>3.  </b>Instructions:
- Use the provided `cur` object.
- Using the `SELECT` query, grab the `table_name` column from the `information_schema.tables` table.
    - Find user created by filtering the query on the `table_schema` column.
    - `ORDER BY` the `table_name` again.
- Loop through `cur.fetchall()` and print each `table_name` from the query.

> In Postgres, schemas are used as a namespace for tables, with the distinct purpose of seperating them into isolated groups or sets within a single database.
>
> DataQuest

In [16]:
cur = conn.cursor()
conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public' ORDER BY table_name")
table_names = cur.fetchall()
for table_name in cur.fetchall():
    name = table_name[0]
    print(name)

<b>4.  </b>Instructions:
- Import `AsIs` from `psycopg2.extensions`.
- Within the loop for `cur.fetchall()` for each table name:
    - Run a `SELECT` query with the table variable using `AsIs`.
    - Print the `cur.description` attribute.
    - Print a black space to seperate the descriptions at the end of each loop.

In [11]:
from psycopg2.extensions import AsIs

conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
for table in cur.fetchall(): 
     table = table[0]
     cur.execute("SELECT * FROM %s LIMIT 0", [AsIs(table)])
     print(cur.description, "\n")

(Column(name='fid', type_code=1043, display_size=None, internal_size=32, precision=None, scale=None, null_ok=None), Column(name='btid', type_code=1042, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None), Column(name='name', type_code=1042, display_size=None, internal_size=9, precision=None, scale=None, null_ok=None), Column(name='lat', type_code=1700, display_size=None, internal_size=4, precision=4, scale=2, null_ok=None), Column(name='long', type_code=1700, display_size=None, internal_size=7, precision=7, scale=4, null_ok=None), Column(name='wind_kts', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None), Column(name='pressure', type_code=23, display_size=None, internal_size=4, precision=None, scale=None, null_ok=None), Column(name='cat', type_code=1042, display_size=None, internal_size=2, precision=None, scale=None, null_ok=None), Column(name='basin', type_code=1042, display_size=None, internal_size=15, precision=None,

<b>5.  </b>Instructions:
- Use the provided `cur` object.
- Using `execute()`, `SELECT` from the `pg_catalog.pg_type`, choose two columns that can map an integer type code to a human - readable string.
- Create a dict and assign it to the variable `type_mappings`.
- Loop through the returned `SELECT` query and map the integer type code to the string.

In [12]:
conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()
cur.execute("SELECT oid, typname FROM pg_catalog.pg_type")
type_mappings = {
    int(oid): typname
    for oid, typname in cur.fetchall()
}

>One interesting thing to note about <a href = "https://www.postgresql.org/docs/9.6/static/catalog-pg-type.html">`pg_catalog.pg_type`</a> is that it can be used to create your own Postgres types from scratch.
>
>DataQuest

In [13]:
type_mappings

{16: 'bool',
 17: 'bytea',
 18: 'char',
 19: 'name',
 20: 'int8',
 21: 'int2',
 22: 'int2vector',
 23: 'int4',
 24: 'regproc',
 25: 'text',
 26: 'oid',
 27: 'tid',
 28: 'xid',
 29: 'cid',
 30: 'oidvector',
 32: 'pg_ddl_command',
 71: 'pg_type',
 75: 'pg_attribute',
 81: 'pg_proc',
 83: 'pg_class',
 114: 'json',
 142: 'xml',
 143: '_xml',
 194: 'pg_node_tree',
 199: '_json',
 210: 'smgr',
 325: 'index_am_handler',
 600: 'point',
 601: 'lseg',
 602: 'path',
 603: 'box',
 604: 'polygon',
 628: 'line',
 629: '_line',
 650: 'cidr',
 651: '_cidr',
 700: 'float4',
 701: 'float8',
 702: 'abstime',
 703: 'reltime',
 704: 'tinterval',
 705: 'unknown',
 718: 'circle',
 719: '_circle',
 774: 'macaddr8',
 775: '_macaddr8',
 790: 'money',
 791: '_money',
 829: 'macaddr',
 869: 'inet',
 1000: '_bool',
 1001: '_bytea',
 1002: '_char',
 1003: '_name',
 1005: '_int2',
 1006: '_int2vector',
 1007: '_int4',
 1008: '_regproc',
 1009: '_text',
 1010: '_tid',
 1011: '_xid',
 1012: '_cid',
 1013: '_oidvector'

>Let's put all this together and create our own table descriptions. We want to rewrite the description attributes from a list of tuples towards something human readable. In the following exercise, we will assemble output from the previous exercises into a dictionary.
>
>DataQuest

<b>6.  </b>Instructions:
- Use the provided `cur`, `type_mappings`, and `table_names` objects.
- Create a dict and assign it to the variable `readable_description`.
- Loop through the `table_names` with the table variable and do the following:
    - Get the description attribute for the given table.
    - Map the name of the table to a dictionary with a columns key.
    - Recreate the columns list from the screen example by iterating through the description, and mapping the appropriate types.
- Print the `readable_description` dictionary at the end.

In [21]:
from psycopg2.extensions import AsIs
conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()
# You have `table_names` and `type_mappings` provided for you.

readable_description = {}
#for table in table_names:
    #cur.execute("SELECT * FROM %s LIMIT 0", [AsIs(table)])
cur.execute("SELECT * FROM stormdata LIMIT 0")
readable_description[table] = dict(
    columns=[
        dict(
            name=col.name,
            type=type_mappings[col.type_code],
            length=col.internal_size
        )
        for col in cur.description
    ]
)
print(readable_description)

{('stormdata',): {'columns': [{'length': 32, 'name': 'fid', 'type': 'varchar'}, {'length': 4, 'name': 'btid', 'type': 'bpchar'}, {'length': 9, 'name': 'name', 'type': 'bpchar'}, {'length': 4, 'name': 'lat', 'type': 'numeric'}, {'length': 7, 'name': 'long', 'type': 'numeric'}, {'length': 4, 'name': 'wind_kts', 'type': 'int4'}, {'length': 4, 'name': 'pressure', 'type': 'int4'}, {'length': 2, 'name': 'cat', 'type': 'bpchar'}, {'length': 15, 'name': 'basin', 'type': 'bpchar'}, {'length': 9, 'name': 'shape_len', 'type': 'numeric'}, {'length': 8, 'name': 'date', 'type': 'timestamp'}]}}


> Let's provide our description with the number of rows in the table.
>
> DataQuest

<b>7.  </b>Instructions:
- Use the provided `cur` object and `AsIs` class.
- Loop through the `readable_description` keys:
    - Fetch the value of each table's row count and assign it to a `total` key for that table.
- Print the `readable_description` dictionary at the end.

In [23]:
from psycopg2.extensions import AsIs
conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()

#for table in readable_description.keys():
    #cur.execute("SELECT COUNT(*) FROM %s", [AsIs(table)])
cur.execute("SELECT COUNT(*) FROM stormdata")
readable_description[table]["total"] = cur.fetchone()

In [24]:
readable_description

{('stormdata',): {'columns': [{'length': 32, 'name': 'fid', 'type': 'varchar'},
   {'length': 4, 'name': 'btid', 'type': 'bpchar'},
   {'length': 9, 'name': 'name', 'type': 'bpchar'},
   {'length': 4, 'name': 'lat', 'type': 'numeric'},
   {'length': 7, 'name': 'long', 'type': 'numeric'},
   {'length': 4, 'name': 'wind_kts', 'type': 'int4'},
   {'length': 4, 'name': 'pressure', 'type': 'int4'},
   {'length': 2, 'name': 'cat', 'type': 'bpchar'},
   {'length': 15, 'name': 'basin', 'type': 'bpchar'},
   {'length': 9, 'name': 'shape_len', 'type': 'numeric'},
   {'length': 8, 'name': 'date', 'type': 'timestamp'}],
  'total': (118456,)}}

>Finally, let's add some sample rows the readable_description dictionary.
>
>DataQuest

<b>8.  </b>Instructions:
- Use the provided `cur` object and `AsIs` class.
- Loop through the `readable_description` keys and run the following:
    - Select the first 100 rows with `SELECT ... LIMIT` using `execute()` and `AsIs`.
    - Fetch the all the rows and assign it to the `readable_description` dictionary for the given table using the `sample_rows` key.
- Print the `readable_description` dictionary at the end.

In [25]:
from psycopg2.extensions import AsIs
conn = psycopg2.connect(dbname="dq_exercises", user="nmolivo")
cur = conn.cursor()

#for table in readable_description.keys():
#    cur.execute("SELECT * FROM %s LIMIT 100", [AsIs(table)])
cur.execute("SELECT * FROM stormdata LIMIT 100")
readable_description[table]["sample_rows"] = cur.fetchall()

In [26]:
readable_description

{('stormdata',): {'columns': [{'length': 32, 'name': 'fid', 'type': 'varchar'},
   {'length': 4, 'name': 'btid', 'type': 'bpchar'},
   {'length': 9, 'name': 'name', 'type': 'bpchar'},
   {'length': 4, 'name': 'lat', 'type': 'numeric'},
   {'length': 7, 'name': 'long', 'type': 'numeric'},
   {'length': 4, 'name': 'wind_kts', 'type': 'int4'},
   {'length': 4, 'name': 'pressure', 'type': 'int4'},
   {'length': 2, 'name': 'cat', 'type': 'bpchar'},
   {'length': 15, 'name': 'basin', 'type': 'bpchar'},
   {'length': 9, 'name': 'shape_len', 'type': 'numeric'},
   {'length': 8, 'name': 'date', 'type': 'timestamp'}],
  'sample_rows': [('2001',
    '63  ',
    'NOTNAMED ',
    Decimal('22.50'),
    Decimal('-140.0000'),
    50,
    0,
    'TS',
    'Eastern Pacific',
    Decimal('1.1401750'),
    datetime.datetime(1957, 8, 8, 0, 0)),
   ('2002',
    '116 ',
    'PAULINE  ',
    Decimal('22.10'),
    Decimal('-140.2000'),
    45,
    0,
    'TS',
    'Eastern Pacific',
    Decimal('1.1661900'),
 