# Query Incidents CSV with SQL
This notebook loads `data/Incidents.csv`, registers it as a SQL table, runs example SQL queries, and persists the table for reuse.

## Import Required Libraries
We use `pandas` for CSV loading and `duckdb` for SQL queries over DataFrames.

In [14]:
import pandas as pd
import duckdb

## Load `incidents.csv` into a DataFrame
Read the CSV, inspect the schema, and preview a few rows.

In [15]:
csv_path = "data/Incidents.csv"

df = pd.read_csv(csv_path)
df.head()


Unnamed: 0,IncidentDateTime,City,IncidentState,Country,Shape,DurationSeconds,Comments
0,2005-10-31 18:00:00.000,poughkeepsie,ny,us,light,37800.0,Several bright lights moving erratically for e...
1,2005-10-31 18:30:00.000,linwood,nj,us,light,5.0,VERY bright apparent meteor over Southern New ...
2,2005-10-31 19:00:00.000,clarksville,md,us,other,5.0,White ball shaped bright object whizzing acros...
3,2005-10-31 19:00:00.000,newark,de,us,light,45.0,Very fast&#44 brillant bluish/white light trav...
4,2005-10-31 19:00:00.000,scottsdale,az,us,triangle,600.0,Gilbert


In [16]:
df.info()

df.dtypes

<class 'pandas.DataFrame'>
RangeIndex: 6452 entries, 0 to 6451
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   IncidentDateTime  6452 non-null   str    
 1   City              6452 non-null   str    
 2   IncidentState     6202 non-null   str    
 3   Country           5702 non-null   str    
 4   Shape             6310 non-null   str    
 5   DurationSeconds   6452 non-null   float64
 6   Comments          6451 non-null   str    
dtypes: float64(1), str(6)
memory usage: 353.0 KB


IncidentDateTime        str
City                    str
IncidentState           str
Country                 str
Shape                   str
DurationSeconds     float64
Comments                str
dtype: object

## Register DataFrame as an SQL Table
Register the DataFrame in DuckDB so it can be queried with SQL.

In [17]:
con = duckdb.connect()
con.register("incidents", df)

con.execute("SELECT COUNT(*) AS total_rows FROM incidents").df()

Unnamed: 0,total_rows
0,6452


## Run SQL Queries Against the Table
Use SQL to select, filter, and group data.

In [18]:
con.execute("PRAGMA table_info('incidents')").df()

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,IncidentDateTime,VARCHAR,False,,False
1,1,City,VARCHAR,False,,False
2,2,IncidentState,VARCHAR,False,,False
3,3,Country,VARCHAR,False,,False
4,4,Shape,VARCHAR,False,,False
5,5,DurationSeconds,DOUBLE,False,,False
6,6,Comments,VARCHAR,False,,False


In [19]:

con.execute("""SELECT Shape,
       AVG(DurationSeconds) AS Average, 
       MIN(DurationSeconds) AS Minimum, 
       MAX(DurationSeconds) AS Maximum
FROM Incidents
GROUP BY Shape
-- Return records where minimum of DurationSeconds is greater than 1
having min(DurationSeconds) > 1""").df()


Unnamed: 0,Shape,Average,Minimum,Maximum
0,changing,3191.674419,2.0,172800.0
1,egg,558.95614,1.5,7200.0
2,rectangle,969.613208,4.0,28800.0
3,cylinder,795.241758,3.0,37800.0
4,teardrop,3501.685185,2.0,172800.0
5,crescent,10.0,10.0,10.0
6,chevron,1100.59375,2.0,21600.0
7,cross,848.133333,2.0,7200.0


In [9]:
# Simple SELECT
con.execute("SELECT * FROM incidents LIMIT 5").df()

Unnamed: 0,IncidentDateTime,City,IncidentState,Country,Shape,DurationSeconds,Comments
0,2005-10-31 18:00:00.000,poughkeepsie,ny,us,light,37800.0,Several bright lights moving erratically for e...
1,2005-10-31 18:30:00.000,linwood,nj,us,light,5.0,VERY bright apparent meteor over Southern New ...
2,2005-10-31 19:00:00.000,clarksville,md,us,other,5.0,White ball shaped bright object whizzing acros...
3,2005-10-31 19:00:00.000,newark,de,us,light,45.0,Very fast&#44 brillant bluish/white light trav...
4,2005-10-31 19:00:00.000,scottsdale,az,us,triangle,600.0,Gilbert


In [10]:
# WHERE example (works even when column types are unknown)
where_col = df.columns[0]
con.execute(
    f"SELECT * FROM incidents WHERE {where_col} IS NOT NULL LIMIT 5"
).df()

Unnamed: 0,IncidentDateTime,City,IncidentState,Country,Shape,DurationSeconds,Comments
0,2005-10-31 18:00:00.000,poughkeepsie,ny,us,light,37800.0,Several bright lights moving erratically for e...
1,2005-10-31 18:30:00.000,linwood,nj,us,light,5.0,VERY bright apparent meteor over Southern New ...
2,2005-10-31 19:00:00.000,clarksville,md,us,other,5.0,White ball shaped bright object whizzing acros...
3,2005-10-31 19:00:00.000,newark,de,us,light,45.0,Very fast&#44 brillant bluish/white light trav...
4,2005-10-31 19:00:00.000,scottsdale,az,us,triangle,600.0,Gilbert


In [11]:
# GROUP BY example using the first column as a key
if len(df.columns) > 0:
    group_col = df.columns[0]
    con.execute(
        f"SELECT {group_col} AS key, COUNT(*) AS total "
        f"FROM incidents GROUP BY {group_col} ORDER BY total DESC LIMIT 10"
    ).df()

## Persist Table to Disk
Save the table to a DuckDB file so it can be reused later.

In [12]:
db_path = "data/incidents.duckdb"

con_disk = duckdb.connect(db_path)
con_disk.register("incidents_df", df)
con_disk.execute(
    "CREATE OR REPLACE TABLE incidents AS SELECT * FROM incidents_df"
)

con_disk.execute("SHOW TABLES").df()
con_disk.close()