## Set Up



In this notebook, I will be exploring the three approaches of presenting SQL in a Jupyter Notebook demonstrated in [lecture 20 from DATA 100](https://ds100.org/fa24/resources/assets/lectures/lec20/lec20.html). The three approaches are SQL Magic, `pandas`, and DuckDB.

In [1]:
#pip install jupysql --upgrade
#%pip install duckdb-engine --quiet

In [2]:
import sqlite3
import duckdb
import pandas as pd
%load_ext sql

In [3]:
# Create a DuckDB database file (example.duckdb)
conn = duckdb.connect('example_duck.db')

# Optional: Create a table
conn.execute('''
    CREATE TABLE IF NOT EXISTS Dragon (
        name TEXT PRIMARY KEY,
        year INTEGER,
        cute INTEGER
    )
''')

# Optional: Insert data
conn.execute('''
    INSERT INTO Dragon (name, year, cute) VALUES
    ('hiccup', 2010, 10),
    ('drogon', 2011, -100),
    ('dragon 2', 2019, 0),
    ('puff', 2010, 100),
    ('smaug', 2011, NULL)
''')

# Save changes and close the connection
conn.close()


ConstraintException: Constraint Error: Duplicate key "name: hiccup" violates primary key constraint. If this is an unexpected constraint violation please double check with the known index limitations section in our documentation (https://duckdb.org/docs/sql/indexes).

Need to connect to duckdb.

In [None]:
%sql duckdb:///example_duck.db

In [None]:
%%sql 
SELECT * 
FROM Dragon;

name,year,cute
hiccup,2010,10.0
drogon,2011,-100.0
dragon 2,2019,0.0
puff,2010,100.0
smaug,2011,


Using `pandas` to read SQL queries.

In [None]:
import sqlalchemy 
import pandas as pd

engine = sqlalchemy.create_engine("duckdb:///example_duck.db")

In [None]:
query = """
SELECT * 
FROM Dragon;
"""

df = pd.read_sql(query, engine)
df

Unnamed: 0,name,year,cute
0,hiccup,2010,10.0
1,drogon,2011,-100.0
2,dragon 2,2019,0.0
3,puff,2010,100.0
4,smaug,2011,


We can use [finance data](https://corgis-edu.github.io/corgis/csv/finance/).

In [None]:
conn = duckdb.connect('finance_database.db')

# Optional: Create a table
conn.execute('''
    CREATE TABLE IF NOT EXISTS finance AS SELECT * FROM read_csv_auto('finance.csv')
''')

conn.close()

In [None]:
%sql duckdb:///finance_database.db

In [None]:
%%sql 
SELECT * 
FROM finance;

State,Year,Totals.Capital outlay,Totals.Revenue,Totals.Expenditure,Totals.General expenditure,Totals.General revenue,Totals.Insurance trust revenue,Totals.Intergovernmental,Totals.License tax,Totals.Selective sales tax,Totals.Tax,Details.Correction.Correction Total,Details.Education.Education Total,Details.Financial Aid.Assistance and Subsidies,Details.Financial Aid.Cash and Securities Total,Details.Health.Health Total Expenditure,Details.Intergovernmental.Intergovernmental Expenditure,Details.Intergovernmental.Intergovernmental to Combined and Unallocable,Details.Natural Resources.Natural Resources Construction,Details.Utilities.Utilities Current Operation,Details.Welfare.Welfare Institution Total Expenditure,Details.Natural Resources.Parks.Parks Total Expenditure,Details.Transportation.Highways.Highways Total Expenditure,Totals. Debt at end of fiscal year,Details.Insurance benefits and repayments,Details.Interest on debt,Details.Interest on general debt,Details.Miscellaneous general revenue,Details.Other taxes,Details.Police protection
ALABAMA,1992,664748,10536166,9650515,8788293,8910315,1473217,2737180,395202,1103368,4217916,182698,3570524,273050,14594322,394119,2143312,518611,151432,5564374,1853436,9728,694874,4128724,724852,280179,280179,607453,205227,77789
ALABAMA,1993,781952,11389335,10242374,9339796,9688246,1570768,2965310,377723,1324610,4639784,182217,3663465,306485,15506309,412456,2211563,527474,156698,5913144,2016935,11031,856228,4170084,761582,267648,267648,599988,224878,78320
ALABAMA,1994,767100,11599362,10815221,9922352,10014415,1454982,3077084,386771,1280747,4767108,216691,3969277,315344,16206051,487044,2349153,563733,169019,6370171,2167799,12053,883852,3853804,762811,250642,250642,643807,234592,86839
ALABAMA,1995,808001,12279726,11541881,10489513,10582838,1566923,3240417,480698,1280494,5077827,231357,4400912,303811,17373025,491648,2619713,565505,185903,6703955,2291264,10645,924411,3758726,912649,193752,193752,643469,232783,83482
ALABAMA,1996,760751,12741148,12126587,10991713,10894396,1710360,3347019,422841,1334829,5257771,220046,4872259,301698,18013799,514380,3076820,587074,173797,6782766,2325418,7788,881381,3645292,987710,216842,216842,649073,265426,86936
ALABAMA,1997,770809,14007883,12944867,11668841,11487011,2381696,3553541,424165,1360764,5484161,233870,5175279,282494,21639230,566651,3292491,625765,175948,7241270,2537627,16320,837255,3780493,1134137,223666,223666,669731,279304,93163
ALABAMA,1998,755486,14843951,13728431,12475614,12433410,2268663,4021037,434433,1422930,5739128,257214,5362196,668076,23466316,554980,3419845,620254,180408,7575449,3027120,20222,882080,4166572,1107068,202507,202507,778771,268751,102178
ALABAMA,1999,851302,15501093,14701938,13289087,13093255,2262606,4281715,454177,1480248,6032234,282458,5751997,682036,24768728,588248,3631426,655661,202937,8068008,3195049,15401,921658,4467074,1271203,197963,197963,793813,308206,113709
ALABAMA,2000,1136551,16856646,15872589,14399604,14116821,2588130,4781099,541645,1526560,6438438,278856,6224938,726205,25953796,598894,3908350,770460,194417,8503158,3484564,11326,1076311,5291796,1322395,275930,275930,797887,353816,117831
ALABAMA,2001,1214496,17859899,16718151,15055914,15731646,1972924,5715592,432671,1585107,6747707,303759,6385451,783882,28094479,644288,3892653,712374,205794,9053735,3776253,15956,1163889,5577158,1505848,267537,267537,1052701,378117,124884
