# Building a Crimes Database with Postgres

The purpose of this project is to create a database for storing data related to crimes that occurred in Boston. 

The objectives for this database are:

- Create a database ```crimes_db``` with a table ```boston_crimes``` with appropriate data types
- Create a schema and create the table inside it
- Loading the data from ```boston.csv``` in the table
- Create readonly and readwrite groups with appropriate privileges
- Create users for each of the groups

In [None]:
import psycopg2
conn = psycopg2.connect("dbname = dq user = dq")
cur = conn.cursor()
conn.autocommit = True
cur.execute('CREATE DATABASE crimes_db;')
conn.close()

In [None]:
conn = psycopg2.connect("dbname = crimes_db user = dq")
cur = conn.cursor()
cur.execute('CREATE SCHEMA crimes_sch;')

In [None]:
import csv
rows = []
with open('boston.csv') as file:
    reader = csv.reader(file)
    for row in reader:
        rows.append(row)
        
col_headers = rows[0]
first_row = row[1]

print(rows[0])
print(rows[1])

In [None]:
import pandas as pd
def get_col_set(file, index):
    df = pd.read_csv(file)
    return set(df.iloc[:,index])

In [None]:
for item in range(len(col_headers)):
    print(len(get_col_set("boston.csv", item)))

In [None]:
col_values = get_col_set("boston.csv", 2)
len(max(col_values, key=len))

In [None]:
off_code_col = get_col_set("boston.csv", 1)
print(min(off_code_col), max(off_code_col))

In [None]:
lat_col = get_col_set("boston.csv", -2)
print(min(lat_col), max(lat_col))

In [None]:
long_col = get_col_set("boston.csv", -1)
print(min(long_col), max(long_col))

After inspecting the dataset content, I can move on to creating the table. To do that, the datatypes must be specified for each column. 

Before I create the table, I create an enumerated datatype for the day of the week column:

In [None]:
cur.execute("CREATE TYPE days_week AS ENUM ('Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday');")

Now, for the creation of the table:

In [None]:
cur.execute("""CREATE TABLE crimes_sch.boston_crimes (
    incident_number serial PRIMARY KEY,
    offense_code smallint,
    description VARCHAR(100),
    date date,
    day_of_the_week days_week,
    lat decimal(10,8),
    long decimal(10,8)
);""")

In [None]:
with open("boston.csv") as f:
    cur.copy_expert("COPY crimes_sch.boston_crimes FROM STDIN WITH CSV HEADER;", f)

Check that all rows have been copied over:

In [None]:
cur.execute("SELECT * FROM crimes_sch.boston_crimes;")
print(len(cur.fetchall()))

In [None]:
cur.execute("REVOKE ALL ON SCHEMA public FROM public;")
cur.execute("REVOKE ALL ON DATABASE crimes_db FROM public;")
conn.commit()