## Commissioner Equi-Join's Toughest Case (5 Points)
Copyright Jens Dittrich, Christian Schön & Jors Nix, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

In this exercise you will help commissioner equi-join solving one of his old, unsolved cases.

In [1]:
import duckdb

## Load Data

Before we can start analyzing the data, we first have to load data from the corresponding csv files into an appropriate database schema. This is fake data.

In [2]:
duckdb.sql("""
    CREATE TABLE households (
    id INTEGER PRIMARY KEY,
    street VARCHAR,
    postcode INTEGER,
    city VARCHAR,
    floor INTEGER
);""")

duckdb.sql("""
CREATE TABLE citizens (
    id INTEGER PRIMARY KEY,
    firstname VARCHAR,
    lastname VARCHAR,
    birthday TIMESTAMP
);""")

duckdb.sql("""
CREATE TABLE live_in (
    household_id INTEGER,
    citizen_id INTEGER,
    start TIMESTAMP,
    until TIMESTAMP,
    FOREIGN KEY(household_id) REFERENCES households(id),
    FOREIGN KEY(citizen_id) REFERENCES citizens(id),
    PRIMARY KEY(citizen_id, start)
);""")

duckdb.sql("""
CREATE TABLE articles (
    id INTEGER PRIMARY KEY,
    label VARCHAR,
    unit VARCHAR
);""")

duckdb.sql("""
CREATE TABLE groceries (
    id INTEGER PRIMARY KEY,
    caloriesPer100g INTEGER,
    FOREIGN KEY(id) REFERENCES articles(id)
);""")

duckdb.sql("""
CREATE TABLE purchases (
    article_id INTEGER,
    citizen_id INTEGER,
    date TIMESTAMP,
    amount FLOAT,
    FOREIGN KEY(article_id) REFERENCES articles(id),
    FOREIGN KEY(citizen_id) REFERENCES citizens(id),
    PRIMARY KEY(article_id, citizen_id, date)
);""")

In [3]:
duckdb.sql("COPY households FROM './data/nsa/households_no_header.csv' (FORMAT CSV, DELIMITER ',');")
duckdb.sql("COPY citizens FROM './data/nsa/citizens_no_header.csv' (FORMAT CSV, DELIMITER ',');")
duckdb.sql("COPY live_in FROM './data/nsa/live_in_no_header.csv' (FORMAT CSV, DELIMITER ',');")
duckdb.sql("COPY articles FROM './data/nsa/articles_no_header.csv' (FORMAT CSV, DELIMITER ',');")
duckdb.sql("COPY groceries FROM './data/nsa/groceries_no_header.csv' (FORMAT CSV, DELIMITER ',');")
duckdb.sql("COPY purchases FROM './data/nsa/purchases_no_header.csv' (FORMAT CSV, DELIMITER ',');")

### Your query

Enter your query in the following cell. It should output the list of main suspects in the following format:
1. The suspects' first names as 'First_Name'
2. The last names of the suspects as 'Last_Name'

You are allowed to use subqueries and views.

# Alle normalen SQL Statements sind nur zum testen da!!!

## Nur die Views und das letzte Select sind wichtig!!

In [None]:
duckdb.sql('''
    select ci.firstname, ci.lastname, h.street, li.start as started_living_there,a.label, p.date as purchase_date, p.amount
    from purchases as p
    join articles as a on p.article_id = a.id
    join live_in as li on p.citizen_id = li.citizen_id
    join households as h on li.household_id = h.id
    join citizens as ci on p.citizen_id = ci.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00'
        and li.start <= '1943-11-24 15:00:00'
    order by p.citizen_id
''')

In [None]:
duckdb.sql('''
    select p.citizen_id,min(p.date) as earlies_purchase,max(p.date) as latest_purchase
    from purchases as p
    join articles as a on p.article_id = a.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00'
    group by p.citizen_id
''')

In [None]:
duckdb.sql('''
    select p.citizen_id,ci.firstname,ci.lastname, a.label, round(sum(p.amount),2) as amount_in_Kg
    from purchases as p
    join articles as a on p.article_id = a.id
    join citizens as ci on p.citizen_id = ci.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00' and a.label = 'Apple'
    group by p.citizen_id,ci.firstname,ci.lastname, a.label
    having sum(p.amount) >= 2
    union
    select p.citizen_id,ci.firstname,ci.lastname, a.label, round(sum(p.amount),2) as amount_in_Kg
    from purchases as p
    join articles as a on p.article_id = a.id
    join citizens as ci on p.citizen_id = ci.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00' and a.label = 'Onion'
    group by p.citizen_id,ci.firstname,ci.lastname, a.label
    having sum(p.amount) >= 1
    union
    select p.citizen_id,ci.firstname,ci.lastname, a.label, round(sum(p.amount),2) as amount_in_Kg
    from purchases as p
    join articles as a on p.article_id = a.id
    join citizens as ci on p.citizen_id = ci.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00' and a.label = 'Carrot'
    group by p.citizen_id,ci.firstname,ci.lastname, a.label
    having sum(p.amount) >=0.5
    order by p.citizen_id
''')

# Ab hier ist es erst relevant für die Abgabe

In [4]:
duckdb.sql("DROP VIEW IF EXISTS suspiciousOrders;")

duckdb.sql("""
CREATE VIEW suspiciousOrders AS
    select p.citizen_id, a.label, round(sum(p.amount),2) as amount
    from purchases as p
    join articles as a on p.article_id = a.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00' and a.label = 'Apple'
    group by p.citizen_id, a.label
    having sum(p.amount) >= 2
    union
    select p.citizen_id, a.label, round(sum(p.amount),2) as amount
    from purchases as p
    join articles as a on p.article_id = a.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00' and a.label = 'Onion'
    group by p.citizen_id, a.label
    having sum(p.amount) >= 1
    union
    select p.citizen_id, a.label, round(sum(p.amount),2) as amount
    from purchases as p
    join articles as a on p.article_id = a.id
    where p.date between '1943-11-19 15:00:00' and '1943-11-24 15:00:00' and a.label = 'Carrot'
    group by p.citizen_id, a.label
    having sum(p.amount) >=0.5;
""")

In [5]:
duckdb.sql("DROP VIEW IF EXISTS suspects;")

duckdb.sql("""
CREATE VIEW suspects AS
    select distinct firstname as First_Name, lastname as Last_Name
    from citizens as ci
    join live_in as li on ci.id = li.citizen_id
    join households as h on li.household_id = h.id
    join suspiciousOrders as so on ci.id = so.citizen_id
    where h.street like '%13' OR h.street like '%bucht%' OR h.street like 'Kor%'
    order by firstname, lastname
""")

In [6]:
duckdb.sql("SELECT * FROM suspects")

┌────────────┬───────────┐
│ First_Name │ Last_Name │
│  varchar   │  varchar  │
├────────────┼───────────┤
│ Norman     │ Bates     │
└────────────┴───────────┘