# 2. Retrieval

The main objective here is to query our dataset of nutrional information from a given list of food names and their quantity.

Open food facts is the most complete food product databsae available, and supports the download of the entire database, or the use of an API.

1. Find a way to host a db locally OR test the API suitability for this task
2. Explore the dataset
    - multi language support
    - a lot of information not related to this application's needs, could be removed to reduce disk size
3. Find a way to match food names with the ones found in the database
    - fuzzy search?
    - semantic or lexical search?
    - Keep it simple, just use the API's text search function
4. Find a way to output the nutrional value for the given amount
    - simply unpack the retrieved json or db record

In [None]:
import pandas as pd

df_chunks = pd.read_csv(
    "data/en.openfoodfacts.org.products.csv",
    sep="\t",
    chunksize=10_000,
    usecols=[
        "code",
        "product_name",
        "abbreviated_product_name",
        "generic_name",
        "quantity",
        "ingredients_text",
        "allergens",
        "serving_size",
        "serving_quantity",
        "energy-kj_100g",
        "energy-kcal_100g",
        "fat_100g",
        "saturated-fat_100g",
        "carbohydrates_100g",
        "sugars_100g",
        "fiber_100g",
        "proteins_100g",
        "glycemic-index_100g",
    ],
    dtype=str,
)

In [None]:
next(df_chunks)

## 2.2 create a database

Exploring the data using a file will not achieve the performance required. 

- Build using DuckDB
- Fuzzy search based on product_name
- Retrieve carb data

## 2.3 API

Let's just try using the API

In [None]:
from openfoodfacts import API, APIVersion, Environment, Country

api = API(
    user_agent="DAIA/0.1", version=APIVersion.v2, environment=Environment.net, country=Country.us
)

In [None]:
r = api.product.text_search("Banana", page_size=10000)

In [None]:
# ideal for performing fuzzy search
l = [(product["product_name"], product["code"]) for product in r["products"]]  

In [None]:
import pandas as pd


def retrieve_nutrional_information(foods: list[str]) -> pd.DataFrame:
    nutrional_info = {}
    for food in foods:
        try:
            response = api.product.text_search(food)
        except Exception as e:
            raise ValueError(f"Could not retrieve information for {food}") from e
        # TODO: improve product selection
        # extract first product
        product = response["products"][0]
        nutrional_info[product["code"]] = {
            "user_input": food,
            "retrieved_product": product["product_name"],
            **product["nutriments"],
        }
    df = pd.DataFrame.from_dict(data=nutrional_info, orient="index")
    return df


foods = ["Dark Chocolate", "Coke"]
df = retrieve_nutrional_information(foods)

In [None]:
df

# 2.3 Use a combined search strategy

1. Query the API with the given food name
2. Retrieve all API output names and their corresponding barcodes
3. Perform fuzzy search to find the most similar product to the query
4. Use the barcode to retrieve the product's nutrional information

# 2.4 Output data for bolus calculation

1. Get the first product retrived by the API
2. Calculate the number of carbs based on the user's given quantity
3. Format it in a dictionary format

In [None]:
def count_carbs(nutriments: dict, quantity_g: float) -> float:
    carbs = float(nutriments["carbohydrates_100g"])
    return carbs * quantity_g / 100


def calculate_bolus_for_product(carbs: float, insulin_carb_factor: float) -> dict:
    bolus = carbs / insulin_carb_factor
    return bolus


carbs = count_carbs(r["products"][0]["nutriments"], 50)
calculate_bolus_for_product(carbs, 150, 120, 40, 10)