**Recall:** Groceries dataset from Notebook 2.

In [None]:
from os.path import isfile
if not isfile('groceries.csv'):
    from requests import get
    response = get('https://cse6040.gatech.edu/datasets/groceries.csv')
    print(response.text[:250])
    with open('groceries.csv', 'wt') as fp:
        fp.write(response.text)
    print("Recall the `groceries.csv` file from Notebook 2:")
    print(response.text[:250], "\n... (and so on) ...")
    
print("`groceries.csv` exists. You may proceed.")

Load and convert to Python data structures:
* `bags`: A Python list of itemsets, where each line of the groceries input file is one "shopping bag" (e.g., `bags[0]`, `bags[1]`) and each itemset is stored as a Python set.
* `items`: A Python set of unique items from all the bags.

In [None]:
bags = []
items = set()
with open('groceries.csv', 'rt') as fp:
    for line in fp.readlines():
        line = line.strip()
        bag = set(line.split(','))
        items |= bag
        bags.append(bag)
print(len(items), "unique items")
print(len(bags), "grocery bags")
print("Bag 0:", bags[0])
print("Bag 1:", bags[1])

**Create SQL tables to hold these data.**

In [None]:
import sqlite3 as db

# Connect to a database (or create one if it doesn't exist)
conn = db.connect('groceries.db')
c = conn.cursor()

**`Items` table.**

In [None]:
c.execute("DROP TABLE IF EXISTS Items")
c.execute("CREATE TABLE Items (id INTEGER, name TEXT)")

item_to_id = {item: k for k, item in enumerate(items)}
c.executemany('INSERT INTO Items VALUES (?, ?)', [(k, item) for item, k in item_to_id.items()])
conn.commit()

from pandas import read_sql_query

df_items = read_sql_query('SELECT * FROM Items', conn)
df_items

**`Bags` table.**

In [None]:
c.execute("DROP TABLE IF EXISTS Bags")
c.execute("CREATE TABLE Bags (id INTEGER, item_id INTEGER)")
for k, bag in enumerate(bags):
    for item in bag:
        item_id = item_to_id[item]
        c.execute(f"INSERT INTO Bags VALUES ({k}, '{item_id}')")
conn.commit()

df_bags = read_sql_query('SELECT * FROM Bags', conn)
df_bags

**Example:** Get items by name from bag 0.

In [None]:
query = '''
    SELECT Bags.id, Bags.item_id, Items.name
        FROM Bags, Items
        WHERE Bags.id=0 AND Bags.item_id=Items.id
'''

# Alternative, suggested in class by Agustina:
query_agustina = '''
    SELECT Bags.id, Bags.item_id, Items.name
        FROM Bags
        JOIN Items ON Bags.item_id=Items.id
        WHERE Bags.id=0
'''
read_sql_query(query_agustina, conn)

**Pandas version of the above.** You need to _construct_ the solution.

In [None]:
bags[0]

_Explicit filter._

In [None]:
df2 = df_bags[df_bags['id'] == 0]
df2

Options:

1. merge (like joining in SQL)
2. remapping values (`.map()`)

_Option 1: merge._

In [None]:
df_items.head()

In [None]:
df2.merge(df_items, left_on='item_id', right_on='id')

_Option 2: remap._ Use the `Series.map()` function with a dictionary that converts item IDs to item names.

In [None]:
id_to_item = {k: name for name, k in item_to_id.items()}
id_to_item

In [None]:
df3 = df2.copy()
df3['name'] = df3['item_id'].map(id_to_item)
df3

**Exercise for you to do at home:** Compute a table of pairwise counts.