
## Feedback

1.a) 1/2; You correctly identified the problem, but the suggested solution violates 1NF. Within the same purchase, if the same item is bought multiple times, your table will use row order to convey the distinction. That would violate 1NF. You could introduce an id to track the index of each item in a purchase and use a compound key of transaction_id and purchase_idx or you can track the quantity in a new column, and then the transaction_id&item_id will be a unique key.

1.b) 2/2

2.a) 2/2

2.b) 2/2

---
---


# Homework 3
## Balázs Menkó (O67UT7)

In [1]:
# packages
import psycopg2
import psycopg2.extras
import pandas as pd

In [2]:
# Create connection
pgsql_settings = {
    'pguser' : 'menkobalazs1',
    'pgpasswd' : 'O67UT7',
    'pghost' : 'postgres-datasci.db-test',
    'pgport' : 5432,
    'pgdb' : 'menkobalazs1_homework',
    'schema' : 'public'
}

def connect_from_settings(settings):
    return psycopg2.connect(
        host = settings['pghost'],
        port = settings['pgport'],
        database = settings['pgdb'],
        user = settings['pguser'],
        password = settings['pgpasswd'],
        options=f'--search_path={settings["schema"]}'
    )

def run_query(query, only_head=False):
    "Run a query in one line"
    connection = connect_from_settings(pgsql_settings)
    cursor = connection.cursor(cursor_factory = psycopg2.extras.DictCursor)
    cursor.execute(query)
    dict_res = cursor.fetchall()
    df = pd.DataFrame(dict_res,columns=list(dict_res[0].keys())) 
    cursor.close()
    connection.close()
    if only_head:
        return df.head()
    else:
        return df

### Tables and columns:
- `branch`: location_id, location, manager, café, drive_thru, wi_fi_password
- `employee`: personel_id, first_name, last_name, date_of_birth, role, salary, supervisor_id, branch_id
- `purchases`: transaction_id, transaction_date, location_id, cashier_id, total_paid, items_sold
- `supplier`: branch_id, supplier_name, product
- `menu`: item_id, item_name, item_price, item_type

---
# Task 1
Is the restaurant database normalized?

- What do you need to do, to achieve 1NF? (2 points)
- Once it’s in 1NF does it also comply with 2NF and 3NF? What else do you have to do to make it fully normalized? (2 points)

In [3]:
query = """
    SELECT *
    From branch
"""
run_query(query)

Unnamed: 0,location_id,location,manager,café,drive_thru,wi_fi_password
0,1,Tokyo,129,False,False,ahZ7^2q3sxkIIqs2
1,2,New York,220,True,True,)pvV8^9$7@5sIDX)
2,3,London,304,True,False,jUkx^sk@rxsuyxrQ
3,4,Cape Town,402,False,True,MHueAaLu$AI4nHcM
4,5,Wellington,509,False,True,q9^eqSpdH8vxjwEg


In [4]:
query = """
    SELECT *
    From employee
"""
run_query(query, only_head=True)

Unnamed: 0,personel_id,first_name,last_name,date_of_birth,role,salary,supervisor_id,branch_id
0,125,Sadao,Maō,1989-06-23,assistant manager,115.0,129.0,1
1,226,Nikita,Dinley,1995-09-20,assistant manager,106.0,220.0,2
2,303,Benjamin,Rowbotham,2001-02-09,assistant manager,126.0,304.0,3
3,307,Madison,Goodson,1988-06-10,assistant manager,126.0,304.0,3
4,421,Leora,Willicott,1984-05-06,assistant manager,121.0,402.0,4


In [5]:
query = """
    SELECT *
    From purchases
"""
run_query(query, only_head=True)

Unnamed: 0,transaction_id,transaction_date,location_id,cashier_id,total_paid,items_sold
0,7d2c076803fe468882b88a7b695dbf4f,2022-09-22,1,125,10.0,"[1007, 1003, 1007, 1006]"
1,d25f6d2a790f4c37a1245e10a2df45f3,2022-09-24,1,125,9.0,"[1007, 1008, 1001, 1001]"
2,cd7b18d66bd649758266be6902df62c3,2022-09-23,2,211,6.0,"[1004, 1008, 1008]"
3,50e32d24c68144aead338e24cec306a9,2022-09-26,4,421,5.0,"[1008, 1001]"
4,1a9de3692b0440de947e99eefc1a77f5,2022-09-20,3,313,4.0,"[1005, 1007]"


In [6]:
query = """
    SELECT *
    From supplier
"""
run_query(query, only_head=True)

Unnamed: 0,branch_id,supplier_name,product
0,1,Global Cola JP,beverages
1,1,Beaver Bread,buns
2,1,Mito Co. Tokyo,meat
3,1,Hokkaido Potato Farm,potatoes
4,2,Global Cola US,beverages


In [7]:
query = """
    SELECT *
    From menu
"""
run_query(query)

Unnamed: 0,item_id,item_name,item_price,item_type
0,1001,Hamburger,3.0,"[meat, buns]"
1,1002,Cheeseburger,4.0,"[meat, buns]"
2,1003,Chickenburger,4.0,"[meat, buns]"
3,1004,Small fires,2.0,[potatoes]
4,1005,Large fries,3.0,[potatoes]
5,1006,Chicken tendies,4.0,[meat]
6,1007,Small soda,1.0,[beverages]
7,1008,Large soda,2.0,[beverages]
8,1009,Espresso,2.0,[coffee]
9,1010,Latte,2.0,[coffee]


## Requirements of $N$th normal form
### 1NF
1. ) Using row order to convey information is not permitted.
2. ) Mixing data types within the same column is not permitted.
3. ) Having a table without a primary key is not permitted.
4. ) Repeating groups (*arrays*) are not permitted.


### 2NF
1. ) The database is in 1NF.
2. ) There’re no partial-dependencies, each non-key attribute should be dependent on the entire primary key.

### 3NF
1. ) The database is in 2NF.
2. ) Every non-key attribute of an entity should depend on the key, the whole key and nothing but the key.
- For the vast majority of cases a database in third normal form can be considered fully normalized.


### 1NF
- The `purchaese.items_sold` column contains arrays of item IDs, which violates 1NF. We need to ensure that each row has only one item. $\xrightarrow{}$ Create a separate table, say purchase_items, to store each item from a transaction as a separate record:
```sql
CREATE TABLE purchase_items (
  transaction_id INT,
  item_id INT
)
```

- The `menu.item_type` contains arrays of item categories, which also violates 1NF. $\xrightarrow{}$ Create another table, say `item_types`, to store item categories separately

```sql
CREATE TABLE item_types (
  item_id INT,
  item_type VARCHAR(50)
)
```

### 2NF
- In the current structure, `purchases` already seems to follow 2NF because all non-key attributes (`transaction_date`, `location_id`, `cashier_id`, etc.) are fully dependent on the primary key (`transaction_id`)
- The `menu` table might have an issue with functional dependencies if item types are dependent on something other than `item_id`. After separating `item_type`, as mentioned in 1NF, it will be in 2NF


### 3NF 
- Transitive dependency in `branch`: The `café`, `drive_thru`, and `wi_fi_password` columns are likely dependent on `location_id` but could also be considered transitively dependent on `manager`. To eliminate this, you could split the `branch` table further. $\xrightarrow{}$ Create a `branch_features` table that stores the `café`, `drive-thru`, and `wi_fi_password` details separately.

```sql
CREATE TABLE branch_features (
  location_id INT,
  café BOOLEAN,
  drive_thru BOOLEAN,
  wi_fi_password VARCHAR(50)
)
```

- Transitive dependency in `employee`: If `supervisor_id` references another employee, there may be a transitive dependency between `personnel_id` and `branch_id`. This is more of a hierarchical structure issue but should be checked to avoid redundancy.

After these changes, the database would be fully normalized, adhering to 1NF, 2NF, and 3NF.


---
# Task 2
Add a new page to the main.py of the Flask App! On this page the user should be able to search the menu of the restaurant using multiple filters (simultaneously): The name of the menu items, price range, or the type. Use query parametrization to ensure it is injection-safe! (2 points) Your answer should start at @app.route(), you don’t need to upload the entire main.py

Also, create the corresponding menu.html that has the specified functionality! (2 points)

## Added to `main.py`
```python
@app.route('/menu', methods=['GET', 'POST'])
def menu():
    title = "Menu Search"
    if request.method == 'POST':
        item_name = request.form.get('item_name', '')
        min_price = request.form.get('min_price', None)
        max_price = request.form.get('max_price', None)
        item_type = request.form.get('item_type', '')

        # Constructing the base query
        query = """
            SELECT item_id, item_name, item_price, item_type 
            FROM menu
            WHERE 1=1
        """ ### 1=1 is required at the start of the following filters, followed by " AND ..."
        params = {}

        # Adding filters if they are provided      
        if item_name:
            query += " AND item_name LIKE %(item_name)s;"
            params['item_name'] = '%' + item_name + '%'
        
        if min_price:
            query += " AND item_price >= %(min_price)s;"
            params['min_price'] = float(min_price)
        
        if max_price:
            query += " AND item_price <= %(max_price)s;"
            params['max_price'] = float(max_price)
        
        if item_type:
            query += " AND %(item_type)s = ANY(item_type);"
            params['item_type'] = item_type
        
        # If no params are given, return with error
        if not params:
            error = "Fill at least one cell."
            return render_template('menu.html', table=None, query=query, error=error, title=title)            

        connection = connect_from_settings(db_settings)
        cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
        cursor.execute(query, params)
        dict_res = cursor.fetchall()
        cursor.close()
        connection.close()
        if dict_res:
            df = pd.DataFrame(dict_res, columns=list(dict_res[0].keys()))
            return render_template('menu.html', table=df.to_html(), query=query, error=None, title=title)
        else:
            error = "No menu items match your filters."
            return render_template('menu.html', table=None, query=query, error=error, title=title)
    else:
        return render_template('menu.html', title=title)
```

## Added in `templates/home.html`
In line 15.
```html
<li><a href="/menu">Search Menu</a></li>
```

## Created `templates/menu.html`
(I used ChatGPT to generate the html code.)
```html
<!DOCTYPE html>
<html>
<head>
    <title>{{ title }}</title>
</head>
<body>
    <h1>{{ title }}</h1>
    <form method="POST" action="{{ url_for('menu') }}">
        <label for="item_name">Menu Item Name:</label>
        <input type="text" id="item_name" name="item_name" placeholder="Enter item name" value="{{ request.form.item_name }}">
        <br><br>

        <label for="min_price">Minimum Price:</label>
        <input type="number" step="0.1" id="min_price" name="min_price" placeholder="Enter min price" value="{{ request.form.min_price }}">
        <br><br>

        <label for="max_price">Maximum Price:</label>
        <input type="number" step="0.1" id="max_price" name="max_price" placeholder="Enter max price" value="{{ request.form.max_price }}">
        <br><br>

        <label for="item_type">Item Type:</label>
        <input type="text" id="item_type" name="item_type" placeholder="Enter item type" value="{{ request.form.item_type }}">
        <br><br>

        <input type="submit" value="Search Menu">
    </form>

    <br>

    {% if table %}
        <h2>Search Results</h2>
        <div>
            {{ table|safe }} <!-- Renders the DataFrame as HTML table -->
        </div>
    {% elif error %}
        <p style="color:red">{{ error }}</p>
    {% endif %}

    <br><br>
    <a href="{{ url_for('home') }}">Back to Home</a>

</body>
</html>
```