# Building a Database for Crime Reports

In this project, we aim at creating a database with appropriate datatypes from a csv file. We will also create appropriate privileges for users.

## Creating the Crime Database

creating the database

In [1]:
import psycopg2

conn = psycopg2.connect("dbname=dq user=dq password=dq")
conn.autocommit = True
cur = conn.cursor()
cur.execute("CREATE DATABASE crime_db")
conn.autocommit = False
conn.close()

creating a schema

In [2]:
conn = psycopg2.connect("dbname=crime_db user=dq password=dq")
conn.autocommit = True
cur = conn.cursor()
cur.execute("CREATE SCHEMA crimes")
# conn.autocommit = False
# conn.close()

## Obtaining the Column Names and Sample

gather the data about crime dataset from the csv file

In [3]:
import csv
with open("boston.csv", "r") as file:
    reader = csv.reader(file)
    col_headers = next(reader)
    first_row = next(reader)

## Creating an Auxiliary Function

computing the number of different values in each column

In [4]:
def get_col_set(csv_filename, col_index):
    col_set = set()
    with open(csv_filename, "r") as file:
        reader = csv.reader(file)
        next(file)
        for row in reader:
            col_set.add(row[col_index])
    return col_set

for i in range(len(col_headers)):
    values = get_col_set("boston.csv", i)
    print(col_headers[i], len(values))

incident_number 298329
offense_code 219
description 239
date 1177
day_of_the_week 7
lat 18177
long 18177


## Finding the Maximum Length

computing the length of each value in the description column

find ing out which index the description column is

In [5]:
print(col_headers)

['incident_number', 'offense_code', 'description', 'date', 'day_of_the_week', 'lat', 'long']


computing the maximum length of any value

In [6]:
desc_col = get_col_set("boston.csv", 2)
len_desc = 0
for row in desc_col:
    len_desc = max(len_desc, len(row))
print(len_desc)

58


## Creating the Table

creating a table for storing Boston crime data

finding the suitable column names for the table

In [7]:
print(col_headers)

['incident_number', 'offense_code', 'description', 'date', 'day_of_the_week', 'lat', 'long']


printing the value of the first row to see the data contained in each column

In [8]:
print(first_row)

['1', '619', 'LARCENY ALL OTHERS', '2018-09-02', 'Sunday', '42.35779134', '-71.13937053']


for the incident number, we put integer, as well as for the offense code. As for the description we will put varchar with 100 as the limit. the date should have the date type. as for the days of the week, we can use enumerated data type. For both latitude and longitude, we  will use the decimal type

first, we create the enumerated data type for the day of the week, then we create the table

In [9]:
# conn = psycopg2.connect("dbname=dq user=dq password=dq")
# cur = conn.cursor()
cur.execute("""
    CREATE TYPE day_enum AS ENUM (
    'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');
""")
cur.execute("""
CREATE TABLE crimes.boston_crimes(
    incident_number integer PRIMARY KEY,
    offense_code integer, 
    description varchar(100), 
    date date, 
    day_of_the_week day_enum, 
    lat real, 
    long real
);
""")
# conn.commit()
# conn.close()

## Loading the Data

loading the data from boston.csv to the crimes.boston_crimes table

In [11]:
with open("boston.csv", "r") as f:
    cur.copy_expert("COPY crimes.boston_crimes FROM STDIN WITH CSV HEADER;", f)
cur.execute("SELECT * FROM crimes.boston_crimes")
print(len(cur.fetchall()))

298329


## Revoking Public Privileges

revoking the privileges of public group on the public schema

In [12]:
cur.execute("REVOKE ALL ON SCHEMA public FROM public;")

revoking all privileges of public on the crime_db database

In [13]:
cur.execute("REVOKE ALL ON DATABASE crime_db FROM public;")

## Creating User Groups

the goal is to create two users groups with different privileges

first, we create two groups named readonly and readwrite

In [14]:
cur.execute("CREATE GROUP readonly NOLOGIN;")
cur.execute("CREATE GROUP readwrite NOLOGIN;")

granting connect to both groups

In [15]:
cur.execute("GRANT CONNECT ON DATABASE crime_db TO readonly;")
cur.execute("GRANT CONNECT ON DATABASE crime_db TO readwrite;")

granting usage to the crimes schema to both groups

In [17]:
cur.execute("GRANT USAGE ON SCHEMA crimes TO readonly;")
cur.execute("GRANT USAGE ON SCHEMA crimes TO readwrite;")

grant specific privileges on all tables in the crimes schema

In [18]:
cur.execute("GRANT SELECT ON ALL TABLES IN SCHEMA crimes TO readonly;")
cur.execute("GRANT SELECT, INSERT, DELETE, UPDATE ON ALL TABLES IN SCHEMA crimes TO readwrite;")

## Creating Users

creating user for data analysts and assign to readonly

In [20]:
cur.execute("CREATE USER data_analyst WITH PASSWORD 'secret1'")
cur.execute("GRANT readonly TO data_analyst;")

creating user for data scientists and assign to readwrite

In [21]:
cur.execute("CREATE USER data_scientist WITH PASSWORD 'secret2'")
cur.execute("GRANT readwrite TO data_scientist;")

## Testing

closing the old connection with a new connection

In [61]:
conn.close()

conn = psycopg2.connect(dbname="crime_db", user="dq", password="dq")
cur = conn.cursor()

checking the users and group

In [52]:
cur.execute("""
    SELECT rolname, rolsuper, rolcreaterole, rolcreatedb, rolcanlogin
    FROM pg_roles
    WHERE rolname IN ('readonly', 'readwrite', 'data_analyst', 'data_scientist');
""")
for user in cur:
    print(user)


check the privileges of groups

In [46]:
cur.execute("""
    SELECT grantee, privilege_type
    FROM information_schema.table_privileges
    WHERE grantee IN ('readonly', 'readwrite');
""")
users = cur.fetchall()
for user in users:
    print(user)

('readonly', 'SELECT')
('readwrite', 'INSERT')
('readwrite', 'SELECT')
('readwrite', 'UPDATE')
('readwrite', 'DELETE')


In [68]:
cur.execute("Select * FROM pg_roles LIMIT 0")
for desc in cur.description:
    print(desc[0])

rolname
rolsuper
rolinherit
rolcreaterole
rolcreatedb
rolcanlogin
rolreplication
rolconnlimit
rolpassword
rolvaliduntil
rolbypassrls
rolconfig
oid
