The SQL component to the is course will use DuckDB— an _in-memory_ analytics engine that let's you write full-featured SQL without the need for a stand-alone database.

If you nerd out over that stuff, like me, you can read more [here](https://open.substack.com/pub/casewhen/p/data-explained-what-is-duckdb?r=rnul&utm_campaign=post&utm_medium=web). 

For now, you can just assume that the following code will load DuckDB + the necessary datasets, so you can sit back and relax:

In [1]:
import duckdb

# Load SQL extension
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

Config,value
feedback,True
autopandas,True
displaylimit,10
displaycon,False


Unnamed: 0,Count
0,224


Now, we can focus on writing SQL! DuckDB is like any other variant— you can `SELECT` columns `FROM` some data source. (note we need `%%sql` at the beginning of the cell to make it work with our setup)

In [2]:
%%sql 
SELECT
    *
FROM nps_public_data.parks
LIMIT 1

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operatingHours,entrancePasses,name,description,directionsUrl,fees,...,activities,url,longitude,id,images,directionsInfo,fullName,parkCode,latLong,latitude
0,1,National Memorial,http://forecast.weather.gov/MapClick.php?CityN...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","[{'name': 'Hours of Operation', 'standardHours...",[],Federal Hall,"Here on Wall Street, George Washington took th...",http://www.nps.gov/feha/planyourvisit/directio...,[],...,"[{'name': 'Arts and Culture', 'id': '09DF0950-...",https://www.nps.gov/feha/index.htm,-74.010256,2337D255-2D32-4997-957A-D461EEA03AF8,[{'url': 'https://www.nps.gov/common/uploads/s...,The main entrance of Federal Hall is located a...,Federal Hall National Memorial,feha,"lat:40.70731192, long:-74.01025636",40.707312


Because this is a _transformation_ focused course, we'll assume you know the basics of SQL, but here are a few quick refreshers if you're rusty.

Some SQL basics and refreshers:
- Every query is made up of a `SELECT` and `FROM` 
- Between those two, we list the columns, separated by a comma. 
- We can _alias_ columns or our data source using _as_ (technically not required, but a good idea)

In [None]:
%%sql 
SELECT
    fullName as full_name,
    weatherInfo as weather_info,
    operatingHours as operating_hours
FROM nps_public_data.parks as not_parks
LIMIT 3

If there's anything you don't know, feel free to poke around online or play around with the parks data loaded into this notebook.

Otherwise, I highly encourage learning by observing and playing— feel free to open up a new cell, drop in `%%sql` and query some sample data! You can find out more about the database by running `SHOW ALL TABLES`

In [3]:
%%sql
SHOW ALL TABLES

Unnamed: 0,database,schema,name,column_names,column_types,temporary
0,memory,nps_public_data,events,"[tags, subjectname, recurrencerule, sitetype, ...","[VARCHAR[], VARCHAR, VARCHAR, VARCHAR, DATE, V...",False
1,memory,nps_public_data,feespasses,"[fees, relatedMultiSitePasses, contentOrderOrd...","[STRUCT(npsGovPurchaseUrl VARCHAR, description...",False
2,memory,nps_public_data,lessonplans,"[duration, subject, parks, questionObjective, ...","[VARCHAR, VARCHAR[], VARCHAR[], VARCHAR, VARCH...",False
3,memory,nps_public_data,meta,"[ts, endpoint, total, start, limit, table, pag...","[VARCHAR, VARCHAR, BIGINT, BIGINT, BIGINT, VAR...",False
4,memory,nps_public_data,multimedia__audio,"[versions, callToActionUrl, callToAction, tran...","[STRUCT(url VARCHAR, fileType VARCHAR, fileSiz...",False
5,memory,nps_public_data,multimedia__galleries,"[tags, assetCount, relatedParks, copyright, im...","[VARCHAR[], BIGINT, STRUCT(""name"" VARCHAR, ful...",False
6,memory,nps_public_data,multimedia__galleries__assets,"[tags, copyright, constraintsInfo, permalinkUr...","[VARCHAR[], VARCHAR, STRUCT(grantingRights VAR...",False
7,memory,nps_public_data,multimedia__videos,"[versions, hasOpenCaptions, audioDescribedBuil...","[STRUCT(fileType VARCHAR, url VARCHAR, aspectR...",False
8,memory,nps_public_data,newsreleases,"[lastIndexedDate, credit, geometryPoiId, longi...","[TIMESTAMP WITH TIME ZONE, VARCHAR, VARCHAR, V...",False
9,memory,nps_public_data,park_hours,"[park_name, park_id, description, category, mo...","[VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, ...",False


Otherwise, let's talk a bit about structure.

You'll notice my queries are formatted very precisely. Why do we do this? Well, simple— it's easy to read and it makes code repeatable, editable, and understandable. As we go through the course, pay attention to how queries and CTEs are structured.

I'll be sure to call these out as we go along. There are also tools out there, called linters, that can automagically format the code in your SQL files & repos. [SQLfluff](https://www.sqlfluff.com/) is a good place to start! 