# School

Let's start with a **Warm-Up** challenge and a quick `SELECT` statement.

## Data
We will work with the `school.sqlite` database available at this URL:
`https://wagon-public-datasets.s3.amazonaws.com/sql_databases/school.sqlite`

Run the cell below to download the file:

In [1]:
!curl https://wagon-public-datasets.s3.amazonaws.com/sql_databases/school.sqlite > data/school.sqlite

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12288  100 12288    0     0  76795      0 --:--:-- --:--:-- --:--:-- 77283


## Database Discovery

You can use the VS Code SQLite extension to explore the database (once you downloaded the database):

- Hit `Ctrl-Shift-P` or `Cmd-Shift-P`
- Start typing until you see `SQLite: Open Database`
- `Enter`
- Select the database file from the dropdown
- In the bottom left corner, click on `SQLITE EXPLORER`

Answer the following questions:

- How many tables do you have?
- For each table, what are the columns?

Once you are comfortable with the schema, let's write a SQL Query


## Setup

Pandas and sqlite3 is all we need :-)

In [2]:
import pandas as pd
from sqlite3 import connect

## Paris Students

ðŸ‘‰ Write a SQL Query to select all students from `Paris`.

In [3]:
# Return all students from Paris
query_students = """
    SELECT *
    FROM students
    WHERE birth_city="Paris";
"""

In [5]:
with connect('data/school.sqlite') as conn:
    df = pd.read_sql(
        query_students,
        con=conn
    )
df.head()

Unnamed: 0,id,first_name,last_name,birth_city
0,1,Oran,Southern,Paris
1,6,Bertha,Brook,Paris
2,7,Neha,Salazar,Paris
3,8,Ignacy,Casey,Paris
4,10,Shirley,Mayer,Paris


## Dynamic queries

Cool, we got our Paris students.

But wouldn't it be nice if this query was **dynamic** (i.e. work with any city)?

ðŸ‘‰ Rewrite the query, so we can use it for any city. Check the code below how we are going to use it when we load our DataFrame.

In [6]:
# Return all students from a city
query_students = """
   SELECT *
   FROM students
   WHERE birth_city=?;
"""

In [7]:
city = 'London'

with connect('data/school.sqlite') as conn:
    df = pd.read_sql(
        query_students,
        con=conn,
        params=[city]  # This is where we pass the parameter, the city
    )
df.head()

Unnamed: 0,id,first_name,last_name,birth_city
0,2,Safa,Lugo,London
1,5,Rick,Broadhurst,London


When the result looks like expected, run the following cell to test your query.

In [8]:
from nbresult import ChallengeResult
result = ChallengeResult(
    'school',
    query=query_students
)
result.write(); print(result.check())


platform darwin -- Python 3.12.9, pytest-8.3.4, pluggy-1.5.0 -- /Users/simonhingant/.pyenv/versions/3.12.9/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /Users/simonhingant/code/simsam56/02-Data-Toolkit/05-SQL-Advanced/data-back_to_school_query/tests
plugins: anyio-4.8.0, typeguard-4.4.2
[1mcollecting ... [0mcollected 5 items

test_school.py::TestSchool::test_barcelona [32mPASSED[0m[32m                        [ 20%][0m
test_school.py::TestSchool::test_berlin [32mPASSED[0m[32m                           [ 40%][0m
test_school.py::TestSchool::test_brussels [32mPASSED[0m[32m                         [ 60%][0m
test_school.py::TestSchool::test_london [32mPASSED[0m[32m                           [ 80%][0m
test_school.py::TestSchool::test_paris [32mPASSED[0m[32m                            [100%][0m



ðŸ’¯ You can commit your code:

[1;32mgit[39m add tests/school.pickle

[32mgit[39m commit -m [33m'Completed school step'[39m

[32mgit[39m push origin master

## Key learning points

- Use Pandas to query a database
- Use parameter substitution to write dynamic queries