# Query the Database

Now it is time to make advanced SQL requests to an `ecommerce` database!

## Data
We will work with the `ecommerce.sqlite` database available at this URL:  
`https://wagon-public-datasets.s3.amazonaws.com/sql_databases/ecommerce.sqlite`

Run the cell below to download the file.

In [1]:
!curl https://wagon-public-datasets.s3.amazonaws.com/sql_databases/ecommerce.sqlite > data/ecommerce.sqlite

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9216  100  9216    0     0  46365      0 --:--:-- --:--:-- --:--:-- 46545


## Setup

Pandas and sqlite3 is all we need :-)

In [2]:
import pandas as pd
from sqlite3 import connect

## Orders

ðŸ‘‰ Get all the orders, displaying all the columns

In [3]:
# Return a list of orders displaying each column
query_orders = """
    SELECT *
    FROM orders;
"""

In [4]:
with connect('data/ecommerce.sqlite') as conn:
    df = pd.read_sql(
        query_orders,
        con=conn
    )
df.head()

Unnamed: 0,OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,FreightCharge
0,1,1,1,2012-01-04,2012-01-09,2012-01-05,1,3.75
1,2,2,2,2012-01-27,2012-02-01,2012-01-28,1,7.25
2,3,4,1,2012-02-19,2012-02-24,2012-02-23,2,5.5
3,4,2,4,2012-03-13,2012-03-18,2012-03-14,2,13.5
4,5,4,2,2012-04-05,2012-04-10,2012-04-06,3,8.75


When the result looks like expected, run the following cell to test your query.

In [5]:
from nbresult import ChallengeResult
result = ChallengeResult(
    'query_orders',
    query=query_orders
)
result.write(); print(result.check())


platform darwin -- Python 3.12.9, pytest-8.3.4, pluggy-1.5.0 -- /Users/simonhingant/.pyenv/versions/3.12.9/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /Users/simonhingant/code/simsam56/02-Data-Toolkit/05-SQL-Advanced/data-query-the-db/tests
plugins: anyio-4.8.0, typeguard-4.4.2
[1mcollecting ... [0mcollected 2 items

test_query_orders.py::TestQueryOrders::test_first_element [32mPASSED[0m[32m         [ 50%][0m
test_query_orders.py::TestQueryOrders::test_length_list [32mPASSED[0m[32m           [100%][0m



ðŸ’¯ You can commit your code:

[1;32mgit[39m add tests/query_orders.pickle

[32mgit[39m commit -m [33m'Completed query_orders step'[39m

[32mgit[39m push origin master



## Orders range

ðŸ‘‰ Get all the orders made between two given dates by ascending OrderDate (excluding date_from and including date_to)

In [6]:
# return a list of orders displaying all columns with OrderDate between
# date_from and date_to (excluding date_from and including date_to)
query_orders_range = """
    SELECT *
    FROM orders
    WHERE OrderDate > ? AND OrderDate <= ?;
"""

ðŸ‘‰ This time, to try out the query, write the code yourself to load the data into a dataframe. Make sure it's dynamic: we want to be able to easily change the start and end date!

Not sure how to do that? Get inspired by the previous challenge!

In [10]:
start_date = '2013-01-01'
end_date = '2013-01-31'

with connect('data/ecommerce.sqlite') as conn:
    df = pd.read_sql(
        query_orders_range,
        con=conn,
        params=[start_date, end_date]
    )
df.head()

Unnamed: 0,OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,FreightCharge
0,17,5,1,2013-01-06,2013-01-11,2013-01-07,3,6.25
1,18,3,3,2013-01-29,2013-02-03,2013-01-30,1,10.75


When the result looks like expected, run the following cell to test your query.

In [7]:
from nbresult import ChallengeResult
result = ChallengeResult(
    'get_orders_range',
    query=query_orders_range
)
result.write(); print(result.check())


platform darwin -- Python 3.12.9, pytest-8.3.4, pluggy-1.5.0 -- /Users/simonhingant/.pyenv/versions/3.12.9/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /Users/simonhingant/code/simsam56/02-Data-Toolkit/05-SQL-Advanced/data-query-the-db/tests
plugins: anyio-4.8.0, typeguard-4.4.2
[1mcollecting ... [0mcollected 4 items

test_get_orders_range.py::TestGetOrdersRange::test_len_results [32mPASSED[0m[32m    [ 25%][0m
test_get_orders_range.py::TestGetOrdersRange::test_results_0 [32mPASSED[0m[32m      [ 50%][0m
test_get_orders_range.py::TestGetOrdersRange::test_results_1 [32mPASSED[0m[32m      [ 75%][0m
test_get_orders_range.py::TestGetOrdersRange::test_type_results [32mPASSED[0m[32m   [100%][0m



ðŸ’¯ You can commit your code:

[1;32mgit[39m add tests/get_orders_range.pickle

[32mgit[39m commit -m [33m'Completed get_orders_range step'[39m

[32mgit[39m push origin master



## Waiting time

ðŸ‘‰ Get all the orders with the delivery time in ascending order (from the smallest timedelta to the largest).

Hint: search for "sqlite julianday"

In [13]:
# Get a list with all the orders displaying each column
# and calculate an extra TimeDelta column displaying the number of days
# between OrderDate and ShippedDate, ordered by ascending TimeDelta
query_waiting_time = """
    SELECT *, julianday(ShippedDate)-julianday(OrderDate) AS TimeDelta
    FROM orders
    ORDER BY TimeDelta
"""

When the result looks like expected, run the following cells to try and test your query.

In [14]:
with connect('data/ecommerce.sqlite') as conn:
    df = pd.read_sql(
        query_waiting_time,
        con=conn
    )
df.head()

Unnamed: 0,OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,FreightCharge,TimeDelta
0,1,1,1,2012-01-04,2012-01-09,2012-01-05,1,3.75,1.0
1,2,2,2,2012-01-27,2012-02-01,2012-01-28,1,7.25,1.0
2,4,2,4,2012-03-13,2012-03-18,2012-03-14,2,13.5,1.0
3,5,4,2,2012-04-05,2012-04-10,2012-04-06,3,8.75,1.0
4,6,3,3,2012-04-28,2012-05-03,2012-04-29,2,11.0,1.0


In [15]:
from nbresult import ChallengeResult
result = ChallengeResult(
    'get_waiting_time',
    query=query_waiting_time
)
result.write(); print(result.check())


platform darwin -- Python 3.12.9, pytest-8.3.4, pluggy-1.5.0 -- /Users/simonhingant/.pyenv/versions/3.12.9/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /Users/simonhingant/code/simsam56/02-Data-Toolkit/05-SQL-Advanced/data-query-the-db/tests
plugins: anyio-4.8.0, typeguard-4.4.2
[1mcollecting ... [0mcollected 4 items

test_get_waiting_time.py::TestGetWaitingTime::test_first_result [32mPASSED[0m[32m   [ 25%][0m
test_get_waiting_time.py::TestGetWaitingTime::test_last_result [32mPASSED[0m[32m    [ 50%][0m
test_get_waiting_time.py::TestGetWaitingTime::test_size_list [32mPASSED[0m[32m      [ 75%][0m
test_get_waiting_time.py::TestGetWaitingTime::test_type_results [32mPASSED[0m[32m   [100%][0m



ðŸ’¯ You can commit your code:

[1;32mgit[39m add tests/get_waiting_time.pickle

[32mgit[39m commit -m [33m'Completed get_waiting_time step'[39m

[32mgit[39m push origin master



## Key learning points

- Use Pandas to query a database
- Use parameter substitution to write dynamic queries
- Work with dates, and make calculations with them