# Assignment #4 - Data Gathering and Warehousing - DSSA-5102

Instructor: Melissa Laurino</br>
Spring 2025</br>

Name: Thinh Le</br>
Date: February 15, 2025<br>
<br>
<b>Only Murders in the...Database?</b><br>
An introduction to navigating databases using SQL, Python and Jupyter Notebook. <br>
<br>
A crime has taken place and you are the perfect amateur detective to solve it!
<br>
The detective gave you the crime scene report, but you somehow lost it...? You vaguely remember that the crime
was a murder that occurred sometime on <b>Jan.15, 2018</b> and that it took place in <b>SQL
City</b>. All the clues to this mystery are buried in a huge database, and you need to use
SQL to navigate through this network of information. Your first step to solving the mystery is to retrieve the corresponding crime scene report from the police department’s database. Take a look at the cheatsheet to learn how to do this! From there, you can use your SQL skills to find the murderer.<br>

Your task for <b>Assignment #4</b> is to:<br>
- Connect to the database stored locally using Python.
- Explore the database by listing all tables, fields within the tables, and data types.
- Find out who the murderer is!<br>
- Add detailed comments to explain EVERY query or SQL command you use while we are still learning and practicing. I have my steps outlined, but please add more cells inbetween for additional queries! There is no limit on the number of queries you can use. <br>
- For each query include comments such as "SELECT all records FROM table WHERE column name = X"<br>
- Answer any prompts in markdown cells.<br>
<br>

This fabulous database was created by @NUKnightLab on Github and can be found here: https://github.com/NUKnightLab/sql-mysteries

Database files are indicated by the .db file extension. For this exercise, our database has already been created. To avoid setting your working directory, have the .db file and Assignment#4_LastName.ipynb in the same file location. There are multiple libraries depending on your platform that can be used to navigate database files and connect to SQL for databases that are stored on a server or your computer as a local host.

## Load libraries

In [1]:
# Load necessary packages:
from sqlalchemy import create_engine, inspect, text # Object Relational Mapper
import pandas as pd # Python data manilpulation

## Create a new engine

In [2]:
# How to connect to our .db file using library SQLAlchemy
# Notice that we are not specifying a host, password, or server since this .db file is stored locally.
db_path = "sql-murder-mystery.db"
# Use the create_engine function to connect to the database.
engine = create_engine(f"sqlite:///{db_path}")
engine

Engine(sqlite:///sql-murder-mystery.db)

SQLAlchemy needs to know what kind of database we are accessing. .db files can be treated as a SQLite database. Other options include databases formatted for MySQL, PostgreSQL, etc. Database type prefixes must be included.

## Explore the database

### Create an inspector

In [3]:
inspector = inspect(engine)

### View tables names

In [4]:
# List the tables in our database:
table_names = inspector.get_table_names()
table_names

['crime_scene_report',
 'drivers_license',
 'facebook_event_checkin',
 'get_fit_now_check_in',
 'get_fit_now_member',
 'income',
 'interview',
 'person',
 'solution']

So much data! How would we possibly make sense of ANY of this without SQL? :)

### View column names in each table

In [5]:
# What are the column names for each table? What table will help us get to the next clue?
for table_name in table_names:
    print(f"All column names in table '{table_name}':")
    # Get all columns in the table
    columns = inspector.get_columns(table_name)
    # Extract only column names
    column_names = [column["name"] for column in columns]
    print(column_names)
    print() # empty line

All column names in table 'crime_scene_report':
['date', 'type', 'description', 'city']

All column names in table 'drivers_license':
['id', 'age', 'height', 'eye_color', 'hair_color', 'gender', 'plate_number', 'car_make', 'car_model']

All column names in table 'facebook_event_checkin':
['person_id', 'event_id', 'event_name', 'date']

All column names in table 'get_fit_now_check_in':
['membership_id', 'check_in_date', 'check_in_time', 'check_out_time']

All column names in table 'get_fit_now_member':
['id', 'person_id', 'name', 'membership_start_date', 'membership_status']

All column names in table 'income':
['ssn', 'annual_income']

All column names in table 'interview':
['person_id', 'transcript']

All column names in table 'person':
['id', 'name', 'license_id', 'address_number', 'address_street_name', 'ssn']

All column names in table 'solution':
['user', 'value']



Based on the clue from the introduction, the `crime_scene_report` table contains the date when the crime happened. This information can be used to search for data in other tables. For example, the `facebook_event_checkin` table also contains date data and includes a person ID that references the `person` table.

## Practice

Practice using commmands useful in queries:

In [6]:
# Example: 
# Select all records from the table crime_scene_report where type is murder:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM crime_scene_report
        WHERE type='murder'
    """)
    # Use pandas to read the sql query with the connection to the database
    murder_crimes = pd.read_sql(query, connection)
    
# Print the results
murder_crimes

Unnamed: 0,date,type,description,city
0,20180115,murder,Life? Dont talk to me about life.,Albany
1,20180115,murder,"Mama, I killed a man, put a gun against his he...",Reno
2,20180215,murder,REDACTED REDACTED REDACTED,SQL City
3,20180215,murder,Someone killed the guard! He took an arrow to ...,SQL City
4,20170107,murder,‘It proves nothing of the sort!’ said Alice. ‘...,Memphis
...,...,...,...,...
143,20180427,murder,"‘Only a thimble,’ said Alice sadly.\n",St. Petersburg
144,20180428,murder,wag my tail when I’m angry. Therefore I’m mad.’\n,Appleton
145,20180429,murder,"(she was obliged to say ‘creatures,’ you see, ...",Toledo
146,20180430,murder,‘But what am I to do?’ said Alice.\n,Spokane


In [7]:
# Select all records from the table drivers_license where the eye color is blue:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM drivers_license
        WHERE eye_color='blue'
    """)
    # Use pandas to read the sql query with the connection to the database
    blue_eye_drivers = pd.read_sql(query, connection)

# Print the results
blue_eye_drivers

Unnamed: 0,id,age,height,eye_color,hair_color,gender,plate_number,car_make,car_model
0,101255,18,79,blue,grey,female,5162Z1,Lexus,GS
1,101494,48,55,blue,red,female,81X1N7,Kia,Sportage
2,101611,40,65,blue,white,female,5O8VW7,GMC,Sierra Denali
3,101726,78,81,blue,blue,female,236T17,Oldsmobile,Intrigue
4,101836,56,51,blue,grey,female,3VSG76,Hyundai,Azera
...,...,...,...,...,...,...,...,...,...
2020,996697,68,73,blue,black,female,QVEW13,Ferrari,612 Scaglietti
2021,997724,65,73,blue,blonde,male,640570,Toyota,Highlander
2022,997950,84,69,blue,grey,female,66DP8D,Infiniti,G37
2023,998114,46,50,blue,brown,female,I0GI33,Mercedes-Benz,SLK-Class


In [8]:
# Select distinct records of eye_color from the table drivers_license:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT DISTINCT eye_color
        FROM drivers_license
    """)
    # Use pandas to read the sql query with the connection to the database
    distinct_eye_color_drivers = pd.read_sql(query, connection)

# Print the results
distinct_eye_color_drivers

Unnamed: 0,eye_color
0,brown
1,green
2,amber
3,blue
4,black


In [9]:
# Select distinct records of name from the table person:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT DISTINCT name
        FROM person
    """)
    # Use pandas to read the sql query with the connection to the database
    distinct_person_names = pd.read_sql(query, connection)

# Print the results
distinct_person_names

Unnamed: 0,name
0,Christoper Peteuil
1,Kourtney Calderwood
2,Muoi Cary
3,Era Moselle
4,Trena Hornby
...,...
10006,Luba Benser
10007,Roxana Mckimley
10008,Cherie Zeimantz
10009,Allen Cruse


In [10]:
# Select distinct records of gender AND car_model from the table drivers_license:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT DISTINCT gender, car_model
        FROM drivers_license
    """)
    # Use pandas to read the sql query with the connection to the database
    distinct_gender_and_car_model_drivers = pd.read_sql(query, connection)

# Print the results
distinct_gender_and_car_model_drivers

Unnamed: 0,gender,car_model
0,male,MDX
1,female,SRX
2,female,xB
3,female,Rogue
4,female,GS
...,...,...
1365,female,550
1366,female,Prius c
1367,female,BRZ
1368,male,Mark LT


In [11]:
# Select distinct records of type and city from the table crime_scene_report:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT DISTINCT type, city
        FROM crime_scene_report
    """)
    # Use pandas to read the sql query with the connection to the database
    distinct_type_and_city_crimes = pd.read_sql(query, connection)

# Print the results
distinct_type_and_city_crimes

Unnamed: 0,type,city
0,robbery,NYC
1,murder,Albany
2,murder,Reno
3,murder,SQL City
4,theft,Chicago
...,...,...
1038,robbery,Trenton
1039,bribery,Garden Grove
1040,fraud,Houma
1041,assault,Fontana


In [12]:
# Select all records from table drivers_license where the age is == 40:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM drivers_license
        WHERE age = 40
    """)
    # Use pandas to read the sql query with the connection to the database
    _40_years_old_drivers = pd.read_sql(query, connection)

# Print the results
_40_years_old_drivers

Unnamed: 0,id,age,height,eye_color,hair_color,gender,plate_number,car_make,car_model
0,101611,40,65,blue,white,female,5O8VW7,GMC,Sierra Denali
1,103725,40,57,black,green,female,81WZ22,Land Rover,Freelander
2,116863,40,63,black,black,female,1ZSI8C,Audi,A8
3,118332,40,75,green,black,male,6SG3W0,Mitsubishi,Diamante
4,122134,40,61,black,white,female,8616Y4,Chevrolet,Astro
...,...,...,...,...,...,...,...,...,...
143,969815,40,63,green,black,female,G40MZY,Chevrolet,Suburban 2500
144,973824,40,50,brown,blue,female,LEJ771,Ford,F250
145,974364,40,52,green,black,female,822S0D,Ford,E250
146,990554,40,63,amber,white,male,XH2QC0,Chrysler,Town & Country


In [13]:
# Select all records from table drivers_license where the age is greater than 21:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM drivers_license
        WHERE age > 21
    """)
    # Use pandas to read the sql query with the connection to the database
    greater_21_years_old_drivers = pd.read_sql(query, connection)

# Print the results
greater_21_years_old_drivers

Unnamed: 0,id,age,height,eye_color,hair_color,gender,plate_number,car_make,car_model
0,100280,72,57,brown,red,male,P24L4U,Acura,MDX
1,100460,63,72,brown,brown,female,XF02T6,Cadillac,SRX
2,101029,62,74,green,green,female,VKY5KR,Scion,xB
3,101198,43,54,amber,brown,female,Y5NZ08,Nissan,Rogue
4,101494,48,55,blue,red,female,81X1N7,Kia,Sportage
...,...,...,...,...,...,...,...,...,...
9434,999509,42,80,amber,blue,female,8E403S,Morgan,Aero 8
9435,999536,56,65,green,green,female,4FFUD5,Mercury,Sable
9436,999940,71,61,green,green,male,1B8QN8,Mitsubishi,Eclipse
9437,999981,67,69,brown,blue,female,1684K3,Land Rover,LR2


In [14]:
#Select all records from table drivers_license where the age is less than OR equal to 21:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM drivers_license
        WHERE age <= 21
    """)
    # Use pandas to read the sql query with the connection to the database
    less_or_equal_21_years_old_drivers = pd.read_sql(query, connection)

# Print the results
less_or_equal_21_years_old_drivers

Unnamed: 0,id,age,height,eye_color,hair_color,gender,plate_number,car_make,car_model
0,101255,18,79,blue,grey,female,5162Z1,Lexus,GS
1,102536,21,76,blue,blonde,male,3DG322,Suzuki,Grand Vitara
2,105064,20,53,black,blue,male,25UO1B,Mercedes-Benz,SL-Class
3,108374,18,63,brown,red,male,X2KE6N,Ford,Escape
4,109617,19,76,green,blue,male,UO2B77,Infiniti,G
...,...,...,...,...,...,...,...,...,...
563,992923,20,77,black,brown,female,GT0EX0,MINI,Cooper
564,996996,20,56,black,green,female,5Y0W6U,Maserati,Quattroporte
565,998350,20,78,amber,grey,male,6HRAXP,Mitsubishi,Lancer
566,999923,19,77,amber,black,female,5L0ZI4,GMC,Sierra 3500


Do you see a difference in the number of rows?<br>
Yes, there are only 568 records. The number of records found reduced.

In [15]:
#Select all records from table get_fit_now_check_in and order by check_in_date:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM get_fit_now_check_in
        ORDER BY check_in_date
    """)
    # Use pandas to read the sql query with the connection to the database
    get_fit_now_by_checking_date = pd.read_sql(query, connection)

# Print the results
get_fit_now_by_checking_date

Unnamed: 0,membership_id,check_in_date,check_in_time,check_out_time
0,82GA2,20170101,431,1116
1,KE0FH,20170101,375,447
2,C84S4,20170101,580,864
3,XTE42,20170101,1188,1199
4,TQU5U,20170101,381,1069
...,...,...,...,...
2698,C581L,20180430,76,642
2699,0U51D,20180501,88,218
2700,7788A,20180501,532,705
2701,VHN8M,20180501,658,989


Do you see a difference in the number of rows?<br>
There are 2703 records

In [16]:
#Select all records from table get_fit_now_check_in and order by check_in_date and order by check_in_time:

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM get_fit_now_check_in
        ORDER BY check_in_date, check_in_time
    """)
    # Use pandas to read the sql query with the connection to the database
    get_fit_now_by_checking_date_and_time = pd.read_sql(query, connection)

# Print the results
get_fit_now_by_checking_date_and_time

Unnamed: 0,membership_id,check_in_date,check_in_time,check_out_time
0,C581L,20170101,23,1071
1,3JP0X,20170101,154,609
2,J2033,20170101,242,955
3,KE0FH,20170101,375,447
4,TQU5U,20170101,381,1069
...,...,...,...,...
2698,133ED,20180430,1197,1200
2699,0U51D,20180501,88,218
2700,7788A,20180501,532,705
2701,VHN8M,20180501,658,989


Do you see a difference in the order of the rows?
The records are sorted by the `check_in_date` first, then by the `check_in_time`.

## Solve the problem - Find the murder(s)

Now that we have practiced, let's solve this murder!

All we know at the start of our investigation is the <b>murder</b> occurred on <b>Jan.15, 2018</b> and that it took place in <b>SQL City<b>.

In [17]:
# Write a query in SQL to view the crime scene report for the murder occurred on Jan.15, 2018 and that it took place in SQL City.

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM crime_scene_report
        WHERE type = 'murder' AND date = 20180115 AND city = 'SQL City'
    """)
    # Use pandas to read the sql query with the connection to the database
    report_clue = pd.read_sql(query, connection)

# Print the results
report_clue

Unnamed: 0,date,type,description,city
0,20180115,murder,Security footage shows that there were 2 witne...,SQL City


From the description: Security footage shows that there were 2 witnesses. The first witness lives at the last house on "Northwestern Dr". The second witness, named Annabel, lives somewhere on "Franklin Ave".<br><br>
Another clue! It looks like we have 2 witnesses! Now who are they and how do we find them?

In [18]:
# Write two queries in SQL to find each witness in the person table.

# For the first query as first_witness, select all the records from person where the address_street_name is Northwestern Dr and order by adress_number

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM person
        WHERE address_street_name = 'Northwestern Dr'
        ORDER BY address_number DESC
        LIMIT 1
    """)
    # Use pandas to read the sql query with the connection to the database
    first_witness = pd.read_sql(query, connection)

# Print the results
first_witness

Unnamed: 0,id,name,license_id,address_number,address_street_name,ssn
0,14887,Morty Schapiro,118009,4919,Northwestern Dr,111564949


We have identified our first witness as **Morty Schapiro**. Now lets find the second!

In [19]:
# For the third query as second_witness,
# Select all the records from person where the address_street_name is Franklin Ave and the name includes Annabel.

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM person
        WHERE address_street_name = 'Franklin Ave' AND name LIKE 'Annabel%'
    """)
    # Use pandas to read the sql query with the connection to the database
    second_witness = pd.read_sql(query, connection)

# Print the results
second_witness

Unnamed: 0,id,name,license_id,address_number,address_street_name,ssn
0,16371,Annabel Miller,490173,103,Franklin Ave,318771143


We have identified our second witness as **Annabel Miller**. What were their witness statements? What did they see?

In [20]:
# For our fourth query, witness_statements, lets find out what each witness saw...or claim they saw?
# Select the peron's name and interview where the person_id is X, X (Get the ID from the above queries)

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM interview
        WHERE person_id = '14887' OR person_id = '16371'
    """)
    # Use pandas to read the sql query with the connection to the database
    witness_statements = pd.read_sql(query, connection)

# Print the results
witness_statements

Unnamed: 0,person_id,transcript
0,14887,I heard a gunshot and then saw a man run out. ...
1,16371,"I saw the murder happen, and I recognized the ..."


Statement 1: I heard a gunshot and then saw **a man** run out. He had a **"Get Fit Now Gym"** bag. The membership number on the bag started with **"48Z"**. Only **gold members** have those bags. The man got into a car with a plate that included **"H42W"**.<br>

Statement 2: I saw the murder happen, and I recognized the killer **from my gym** when I was working out last week on **January the 9th**.

A few new clues! Explore the different clues with multiple queries. Seems like a reliable statement!

In [21]:
# Create a query to select for the following clues: (You can create multiple queries or combine them into one and go from there.)

# Remember to write our your query in the comments!

# Statement 1's clue #1: The membership starts with '48Z' and a gold membership

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM get_fit_now_member
        WHERE id LIKE '48Z%' AND membership_status = 'gold'
    """)
    # Use pandas to read the sql query with the connection to the database
    statement_1_clue_1 = pd.read_sql(query, connection)

# Print the results
statement_1_clue_1

Unnamed: 0,id,person_id,name,membership_start_date,membership_status
0,48Z7A,28819,Joe Germuska,20160305,gold
1,48Z55,67318,Jeremy Bowers,20160101,gold


In [22]:
# Statement 1's clue #2: The man got into a car with a plate that included 'H42W'.

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM drivers_license
        WHERE plate_number LIKE '%H42W%' and gender = 'male'
    """)
    # Use pandas to read the sql query with the connection to the database
    statement_1_clue_2 = pd.read_sql(query, connection)

# Print the results
statement_1_clue_2

Unnamed: 0,id,age,height,eye_color,hair_color,gender,plate_number,car_make,car_model
0,423327,30,70,brown,brown,male,0H42W2,Chevrolet,Spark LS
1,664760,21,71,black,black,male,4H42WR,Nissan,Altima


Now, we must search for matching person ID and driver license ID from the data we collected.

In [23]:
# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT *
        FROM person
        WHERE license_id = '423327' OR license_id = '664760'
    """)
    # Use pandas to read the sql query with the connection to the database
    suspects_1 = pd.read_sql(query, connection)

# Print the results
suspects_1

Unnamed: 0,id,name,license_id,address_number,address_street_name,ssn
0,51739,Tushar Chandra,664760,312,Phi St,137882671
1,67318,Jeremy Bowers,423327,530,"Washington Pl, Apt 3A",871539279


From the results, only **Jeremy Bowers** has matching person ID when we search for driver license IDs. This is the suspect of statement 1.

Let's try to find the suspects in statement 2.

In [24]:
# Find all persons went to the gym on January the 9th, 2018, just a note that we must exclude the person ID 16371 from the results (who witnessed)

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT * FROM person
        WHERE id IN (
            SELECT person_id FROM get_fit_now_member
            WHERE id IN (
                SELECT membership_id from get_fit_now_check_in
                WHERE check_in_date = 20180109
            )
        )
    """)
    # Use pandas to read the sql query with the connection to the database
    suspects_2 = pd.read_sql(query, connection)

# Print the results
suspects_2

Unnamed: 0,id,name,license_id,address_number,address_street_name,ssn
0,10815,Adriane Pelligra,952073,948,Emba Ave,243639527
1,15247,Shondra Ledlow,108978,2906,Chuck Dr,492143109
2,16371,Annabel Miller,490173,103,Franklin Ave,318771143
3,28073,Zackary Cabotage,402017,3823,S Winthrop Ave,367741547
4,28819,Joe Germuska,173289,111,Fisk Rd,138909730
5,31523,Blossom Crescenzo,737886,1245,Ruxshire St,753962462
6,55662,Sarita Bartosh,556026,1031,Legacy Pointe Blvd,564780417
7,67318,Jeremy Bowers,423327,530,"Washington Pl, Apt 3A",871539279
8,83186,Burton Grippe,915564,484,Lemcrow Way,426280783
9,92736,Carmen Dimick,890722,2965,Kilmaine Circle,622279052


From the results, I found **Jeremy Bowers** is the suspect that appears in both clues of statement 1 & 2.

We have a suspect! Lets check their witness statement. Do they have an alibi?

In [31]:
# Remember to write your query in the comments!

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT transcript
        FROM interview
        WHERE person_id = '67318'
    """)
    result = connection.execute(query)
    # Retrieve a single row from the result set of a database query
    row = result.fetchone()

# Print the results
row[0]

'I was hired by a woman with a lot of money. I don\'t know her name but I know she\'s around 5\'5" (65") or 5\'7" (67"). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.\n'

Statement: 'I was hired by **a woman** with a lot of money. I don\'t know her name but I know she\'s around **5\'5" (65") or 5\'7" (67")**. She has **red hair** and she drives a **Tesla Model S**. I know that she **attended the SQL Symphony Concert 3 times** in **December 2017**.\n'

Uh oh...turns out it wasn't just one person behind the crime...who is this additional person involved?

In [26]:
# Try to search for a woman with height between 5'5" (65") and 5'7" (67"). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT * FROM drivers_license
        WHERE height BETWEEN 65 AND 67
            AND hair_color = "red"
            AND gender = "female"
            AND car_make = "Tesla"
            AND car_model = "Model S";
    """)
    # Use pandas to read the sql query with the connection to the database
    additional_suspect = pd.read_sql(query, connection)

# Print the results
additional_suspect

Unnamed: 0,id,age,height,eye_color,hair_color,gender,plate_number,car_make,car_model
0,202298,68,66,green,red,female,500123,Tesla,Model S
1,291182,65,66,blue,red,female,08CM64,Tesla,Model S
2,918773,48,65,black,red,female,917UU3,Tesla,Model S


Three different people match this description! Now which one attended the SQL Symphony Concert three times?

First, find the corresponding persons with the drivers licenses

In [27]:
# Let's find the killer!
# Remember to write our your query in the comments!

# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT * FROM person
        WHERE license_id = '202298' OR license_id = '291182' OR license_id = '918773'
    """)
    # Use pandas to read the sql query with the connection to the database
    _3_suspect_woman = pd.read_sql(query, connection)

# Print the results
_3_suspect_woman

Unnamed: 0,id,name,license_id,address_number,address_street_name,ssn
0,78881,Red Korb,918773,107,Camerata Dr,961388910
1,90700,Regina George,291182,332,Maple Ave,337169072
2,99716,Miranda Priestly,202298,1883,Golden Ave,987756388


Finally, find the woman attended the SQL Symphony Concert three times

In [28]:
# Establish a connection
with engine.connect() as connection:
    # Define the query - text() ensures that the query string is read as a SQL expression
    query = text("""
        SELECT person_id, COUNT(*) FROM facebook_event_checkin
        GROUP BY person_id
        HAVING COUNT(*) = 3
            AND (person_id = '78881' OR person_id = '90700' OR person_id = '99716')
    """)
    # Use pandas to read the sql query with the connection to the database
    woman_attended_3_times = pd.read_sql(query, connection)

# Print the results
woman_attended_3_times

Unnamed: 0,person_id,COUNT(*)
0,99716,3


So the woman behind the crime is **Miranda Priestly**.

The murderer(s) is/are: **Jeremy Bowers** and **Miranda Priestly**

In [29]:
# Disconnect from the database. Always remember to disconnect :)
connection.close()