# Building Database for crime reports
The goal of this project is to create a database named *crimes_db* with a table *boston_crimes* with appropriate datatypes for storing the data from the boston.csv file. We'll be creating the table inside a schema named *crimes*. We'll also create the *readonly* and *readwrite* groups with the appropriate privileges. Finally, we will also need to create one user for each of these groups.

In [1]:
import psycopg2

conn = psycopg2.connect(dbname="dq", user="dq")
conn.autocommit = True
# creating crimes_db database
cur =  conn.cursor()
cur.execute('CREATE DATABASE crimes_db')
conn.close()

In [2]:
conn = psycopg2.connect(dbname="crimes_db", user="dq")
conn.autocommit = True
# creating crimes schema in crimes_db
cur =  conn.cursor()
cur.execute('CREATE SCHEMA crimes')
conn.autocommit = False

In [3]:
import csv
# reading boston.csv file
with open('boston.csv', 'r') as f:
    rows = list(csv.reader(f))
    col_header = rows[0]
    first_row = rows[1]

In [4]:
col_header, first_row

(['incident_number',
  'offense_code',
  'description',
  'date',
  'day_of_the_week',
  'lat',
  'long'],
 ['1',
  '619',
  'LARCENY ALL OTHERS',
  '2018-09-02',
  'Sunday',
  '42.35779134',
  '-71.13937053'])

In [5]:
def get_col_value_set(csv_filename, col_index):
    '''
    args: 
    csv_filename- name of a CSV file
    col_index- index of a column of that CSV file
    
    returns:
    a Python set that contains all distinct values from that column
    '''
    col_set = set()
    with open(csv_filename, 'r') as f:
        rows = list(csv.reader(f))
        for row in rows[1:]:
            col_set.add(row[col_index])
    return col_set

In [6]:
# calculating no of values in a column
for i in range(len(col_header)):
    values = get_col_value_set('boston.csv', i)
    print(col_header[i], len(values))

incident_number 298329
offense_code 219
description 239
date 1177
day_of_the_week 7
lat 18177
long 18177
