# Take Home Test - Backend - Data Processing Task

A naïve calculation of the impact of a recipe involves multiplying the weight of each ingredient by the impact per kilogram of that ingredient.

However, we don't necessarily have direct impacts for every ingredient. For instance, there might be scarce academic literature on the impact of Jerusalem artichokes. However, since Jerusalem artichokes are root vegetables, and we have impact data for other root vegetables, we can produce a best estimate for the impact of a generic root vegetable and use that value as a proxy for the impact of Jerusalem artichokes.

To model this, we have created what we call a "food class hierarchy", which is just an n-ary tree. When we want to retrieve the impact for a given ingredient, we first look at its corresponding food class. If that food class has an impact, we use it. Otherwise, we move to its parent food class and repeat the process. We continue this recursion up the tree until we reach a food class without a parent, in which case we've reached a root node. If this node has no impact, then we can't recurse and we've failed to retrieve an impact and should throw an error.

The other issue is ingredient names are entered by users and must be matched to food classes. In some cases, we're lucky and the ingredient name entered by the user matches exactly with the name of a food class. But often this is not the case. As such, when matching names, we ignore case, punctuation and word order. For example, the ingredient `tomatoes, plum (Chopped)` would match with the food class `Chopped Plum Tomatoes`.

##### Inputs

You are provided with two CSV files: `food_classes.csv` and `recipes.csv`.

`food_classes.csv` contains data with the following schema:
- ID
- Name
- Impact / kg
- Parent ID

_Impact values are randomly generated._

`recipes.csv` contains data with the following schema:
- Recipe ID
- Recipe Name
- Ingredient Name
- Ingredient Weight / kg

##### Goal

You are required to write a script to calculate the impact of all the recipes in `recipes.csv`. If there is an ingredient in a recipe that cannot be matched to a food class, you should not give an impact for that recipe.

## Step 0: Import Modules
- **`pandas`**: Data manipulation library in Python used to handle data in tabular form with DataFrames.
- **`re`**: Module for working with regular expressions used to process text by matching patterns.
- **`os`**: Provides functions to interact with the operating system. Here, it's used to access file paths dynamically.

In [1]:
import pandas as pd
import re
import os

## Step 1: Load Data
Reads the CSV files into two DataFrames.

In [3]:
food_classes_df = pd.read_csv('food_classes.csv')
recipes_df = pd.read_csv('recipes.csv')

In [4]:
food_classes_df.head()

Unnamed: 0,ID,Name,Impact / kg,Parent ID
0,1,Beef Mince,,2.0
1,2,Beef,2.649402,3.0
2,3,Ruminant Meat,5.508226,4.0
3,4,Meat,1.310414,
4,5,Pork Mince,,6.0


In [5]:
recipes_df.head()

Unnamed: 0,Recipe ID,Recipe Name,Ingredient Name,Ingredient Weight / kg
0,1,Spaghetti Bolognese,beef mince,0.25
1,1,Spaghetti Bolognese,Pork Mince,0.25
2,1,Spaghetti Bolognese,Carrots,0.1
3,1,Spaghetti Bolognese,Onions,0.25
4,1,Spaghetti Bolognese,Celery,0.1


In the code challenge this step was performed as a function which gets the directory of the current Python file, allowing relative paths for accessing files. Then, read the files.

In [6]:
def load_data() -> pd.DataFrame:
    current_dir = os.path.dirname(__file__)
    food_classes_df = pd.read_csv(os.path.join(current_dir, 'food_classes.csv'))
    recipes_df = pd.read_csv(os.path.join(current_dir, 'recipes.csv'))
    return food_classes_df, recipes_df

## Step 2: Name normalisation
The following function standardises a string by removing punctuation, converting it to lowercase, and sorting the words alphabetically. This allows a closer name matching when the user input does not match exactly the name of a food class.

In [7]:
def normalize_name(name: str) -> str:
    name = re.sub(r'[^\w\s]', '', name)  # Remove any characters that aren't word characters
    name = name.lower()  # Convert to lowercase
    name = ' '.join(sorted(name.split()))  # Arrange words in alphabetical order and join
    return name

Here is an example on how the function works with a string.

In [8]:
print(normalize_name("Chicken Breast"))     
print(normalize_name("Spaghetti Bolognese!"))  
print(normalize_name("Eggplant Parmesan."))
print(normalize_name("pork, loin steak"))

breast chicken
bolognese spaghetti
eggplant parmesan
loin pork steak


Here is an example on how to use the function with the values of a specific column in a DataFrame. The `apply` method was used to apply the function to each element in that column.

In [9]:
recipes_df_dummy = recipes_df.copy()
recipes_df_dummy['Normalized Ingredient Name'] = recipes_df_dummy['Ingredient Name'].apply(normalize_name)
recipes_df_dummy.head()

Unnamed: 0,Recipe ID,Recipe Name,Ingredient Name,Ingredient Weight / kg,Normalized Ingredient Name
0,1,Spaghetti Bolognese,beef mince,0.25,beef mince
1,1,Spaghetti Bolognese,Pork Mince,0.25,mince pork
2,1,Spaghetti Bolognese,Carrots,0.1,carrots
3,1,Spaghetti Bolognese,Onions,0.25,onions
4,1,Spaghetti Bolognese,Celery,0.1,celery


## Step 3: Food Classes Hierarchy
The code sets up an n-ary tree structure to represent the "food class hierarchy". Here is a breakdown of the approach and logic:
1. Data Structure for Food Classes: The `FoodClass` class is created to store details about each food class, including an `id`, `name`, `impact`, and `parent_id`. A dictionary (`food_classes`) maps each food class ID to a FoodClass object. This allows us to easily access any food class by its ID.
2. Building the Food Class Hierarchy: The `build_food_class_hierarchy` function iterates over the DataFrame containing food classes. For each row, it creates a FoodClass object and adds it to the food_classes dictionary. This dictionary represents the hierarchy as an n-ary tree, where each food class is linked to its parent class via the `parent_id` attribute.

In [10]:
food_classes_df_dummy = food_classes_df.copy()

In [11]:
# Class to represent a Food Class
class FoodClass:
    def __init__(self, id: int, name: str, impact: float, parent_id: int):
        self.id = id
        self.name = name
        self.impact = impact
        self.parent_id = parent_id

# Function to build the food class hierarchy
def build_food_class_hierarchy(df: pd.DataFrame) -> dict:
    food_classes = {}
    for _, row in df.iterrows():
        food_classes[row['ID']] = FoodClass(row['ID'], row['Name'], row['Impact / kg'], row['Parent ID'])
    return food_classes

The next cell uses the `build_food_class_hierarchy` function to build the hierarchy and then print the resulting dictionary of `FoodClass` instances.

In [12]:
food_classes = build_food_class_hierarchy(food_classes_df_dummy)
print('\nFood Class Hierarchy:')
for class_id, food_class in food_classes.items():
    print(f'ID: {food_class.id}, Name: {food_class.name}, Impact {food_class.impact}, Parent ID: {food_class.parent_id}')


Food Class Hierarchy:
ID: 1, Name: Beef Mince, Impact nan, Parent ID: 2.0
ID: 2, Name: Beef, Impact 2.649402237, Parent ID: 3.0
ID: 3, Name: Ruminant Meat, Impact 5.508225903, Parent ID: 4.0
ID: 4, Name: Meat, Impact 1.310413655, Parent ID: nan
ID: 5, Name: Pork Mince, Impact nan, Parent ID: 6.0
ID: 6, Name: Pork, Impact nan, Parent ID: 7.0
ID: 7, Name: Pork & Poultry, Impact 4.548881992, Parent ID: 4.0
ID: 8, Name: Carrots, Impact 9.811717734, Parent ID: 9.0
ID: 9, Name: Root Vegetables, Impact 7.781639031, Parent ID: 10.0
ID: 10, Name: Vegetables, Impact 3.161091821, Parent ID: nan
ID: 11, Name: Onions, Impact 9.000471107, Parent ID: 12.0
ID: 12, Name: Onions and Leeks, Impact 4.764500409, Parent ID: 13.0
ID: 13, Name: Bulbs, Impact 7.706728936, Parent ID: 10.0
ID: 14, Name: Celery, Impact 0.2183862344, Parent ID: 15.0
ID: 15, Name: Stem Vegetables, Impact 5.441168674, Parent ID: 10.0
ID: 16, Name: Garlic, Impact 4.436081072, Parent ID: 13.0
ID: 17, Name: Rosemary, Impact nan, Paren

#### Why not a `DataFrame`?
Using a class like FoodClass rather than just a DataFrame provides several benefits, especially when representing a hierarchical structure, such as a tree. Here are some reasons why this approach is advantageous:
1. **Encapsulation of Data and Behaviour**: The `FoodClass` class allows to encapsulate not only data but also methods that operate on this data. For example, if I wanted to add a specific behaviour or method related to a `FoodClass`, this can be done within the class. This organisation keeps related data and behaviours together, improving the readability and maintainability.
2. **Ease of Recursion and Hierarchical Structure**: A class-based structure is easier to manage when dealing with parent-child relationships. In this setup, each `FoodClass` object can be accessed and referenced independently. By storing each `FoodClass` instance in a dictionary, it is possible to retrive an instance by its `id` or `parent_id` and performe recursive operations. Handling the hierarchy using a class-based structure makes it easier to add specific methods for navigation, comparison, or aggregation that might not be as straightforward with a flat `DataFrame`.
3. **Improved Code Clarity**: It helps prevent errors when accessing attributes because each instance has well-defined fields. With a DataFrame doesn’t technically restrict to a specific schema, making it more prone to errors if column names or types are misused.
4. **Flexibility**: The class structure provides higher flexibility if the `FoodClass` has to be extended with more complex logic or additional attributes. For example, if you need to include methods that calculate certain values or manage additional relationships, it’s easier to add them to the `FoodClass` than to a DataFrame.
5. **Optimisation for Large Datasets**: While `DataFrames` are optimised for vectorised operations, navigating a hierarchical structure within a `DataFrame` can be slower and clunky, particularly if recursive lookups are involved. Using a dictionary of `FoodClass` instances allows for fast, direct access to specific nodes, which can make recursive operations more efficient.

#### Example 1 - Retrive the Parent Class Dicitionary vs DataFrame
If you want to retrieve the parent class of a given food class, you can easily do this with a dictionary:

In [13]:
food_classes_example = {
    1: FoodClass(id=1, name='Grains', impact=2.5, parent_id=None),
    2: FoodClass(id=2, name='Rice', impact=None, parent_id=1),
    3: FoodClass(id=3, name='Wheat', impact=1.5, parent_id=1),
    4: FoodClass(id=4, name='White Rice', impact=3.0, parent_id=2)
}

def get_parent(food_class_id):
    food_class = food_classes_example.get(food_class_id)
    if food_class:
        return food_classes_example.get(food_class.parent_id)

In [14]:
parent_class = get_parent(4)
if parent_class:
    print(f"Parent of White Rice is {parent_class.name}")
else:
    print("White Rice has no parent.")

Parent of White Rice is Rice


Now with a `DataFrame`:

In [15]:
data_example = {
    'ID': [1, 2, 3, 4],
    'Name': ['Grains', 'Rice', 'Wheat', 'White Rice'],
    'Impact': [2.5, None, 1.5, 3.0],
    'Parent ID': [None, 1, 1, 2]
}

df_example = pd.DataFrame(data_example)

def get_parent_row(food_class_id, df):
    # Get the parent ID for the specified food class
    parent_id = df.loc[df['ID'] == food_class_id, 'Parent ID'].values[0]
    
    # Retrieve the row where 'ID' matches the parent ID
    if pd.notna(parent_id):
        parent_row = df[df['ID'] == parent_id]
        return parent_row
    else:
       return None

In [16]:
parent_row = get_parent_row(4, df_example)
print(parent_row)

   ID  Name  Impact  Parent ID
1   2  Rice     NaN        1.0


#### Example 2 - Traverse Up the Hierarchy 
If you want to retrieve the entire chain of parent classes up to the root for a given food class, you can use a loop or recursive function.

In [17]:
def get_ancestor_chain(food_class_id):
    current_id = food_class_id
    ancestors = []
    
    while current_id is not None:
        parent = get_parent(current_id)
        if parent is not None:
            ancestors.append(parent.name)
            current_id = parent.id  # Move up to the next ancestor
        else:
            current_id = None  # Stop when there is no parent left    
    return ancestors

In [18]:
# Get the ancestor chain for "White Rice" (ID 4)
ancestors = get_ancestor_chain(4)
print("Ancestor chain for White Rice:", " -> ".join(ancestors))

Ancestor chain for White Rice: Rice -> Grains


Now with a `DataFrame`:

In [19]:
def get_ancestor_chain(food_class_id, df):
    current_id = food_class_id
    ancestors = []
    
    while current_id is not None:
        parent_row = get_parent_row(current_id, df)
        
        if parent_row is not None and not parent_row.empty:
            parent_name = parent_row['Name'].values[0]
            ancestors.append(parent_name)
            current_id = parent_row['ID'].values[0]  # Move to the next ancestor ID
        else:
            current_id = None  # Stop when there is no parent left
    return ancestors

In [20]:
# Example: Get the ancestor chain for White Rice (ID 4)
ancestors = get_ancestor_chain(4, df_example)
print("Ancestor chain for White Rice:", " -> ".join(ancestors))

Ancestor chain for White Rice: Rice -> Grains


#### Example 3 - New Instance in the Dictionary
To add a new food class to the existing hierarchy it can be done by creating a new instance of `FoodClass` and then add it to the `food_classes` dictionary with a unique ID.

In [21]:
new_food_class_id = 72
new_food_class_name = "Orange"
new_food_class_impact = 0.3  
new_food_class_parent_id = 1 

# Create a new FoodClass instance
new_food_class = FoodClass(new_food_class_id, new_food_class_name, new_food_class_impact, new_food_class_parent_id)

# Add it to the food_classes dictionary
food_classes[new_food_class_id] = new_food_class

# Verify the new class was added
print(f"Added new food class: ID = {new_food_class.id}, Name = {new_food_class.name}, "
      f"Impact = {new_food_class.impact}, Parent ID = {new_food_class.parent_id}")

# Display the updated food_classes dictionary
print("\nUpdated Food Class Hierarchy:")
for class_id, food_class in food_classes.items():
    print(f"ID: {food_class.id}, Name: {food_class.name}, Impact: {food_class.impact}, Parent ID: {food_class.parent_id}")

Added new food class: ID = 72, Name = Orange, Impact = 0.3, Parent ID = 1

Updated Food Class Hierarchy:
ID: 1, Name: Beef Mince, Impact: nan, Parent ID: 2.0
ID: 2, Name: Beef, Impact: 2.649402237, Parent ID: 3.0
ID: 3, Name: Ruminant Meat, Impact: 5.508225903, Parent ID: 4.0
ID: 4, Name: Meat, Impact: 1.310413655, Parent ID: nan
ID: 5, Name: Pork Mince, Impact: nan, Parent ID: 6.0
ID: 6, Name: Pork, Impact: nan, Parent ID: 7.0
ID: 7, Name: Pork & Poultry, Impact: 4.548881992, Parent ID: 4.0
ID: 8, Name: Carrots, Impact: 9.811717734, Parent ID: 9.0
ID: 9, Name: Root Vegetables, Impact: 7.781639031, Parent ID: 10.0
ID: 10, Name: Vegetables, Impact: 3.161091821, Parent ID: nan
ID: 11, Name: Onions, Impact: 9.000471107, Parent ID: 12.0
ID: 12, Name: Onions and Leeks, Impact: 4.764500409, Parent ID: 13.0
ID: 13, Name: Bulbs, Impact: 7.706728936, Parent ID: 10.0
ID: 14, Name: Celery, Impact: 0.2183862344, Parent ID: 15.0
ID: 15, Name: Stem Vegetables, Impact: 5.441168674, Parent ID: 10.0
ID

## Step 4: Recursive Impact Retrival
The `get_impact` function is the key to finding the CO2 impact of a given food class. Here's how it works:
1. Checck if the requested food class ID is in the `food_classes` dictionary.
2. If the class has a defined `impact` (i.e., it's not `NaN`), the function returns the impact.
3. If the class doesn't have an impact but has a `parent_id` the cunction calls itself recursively with the parent's ID.

This recursion continues until one of two conditions is met:
- An impact value is found and returned
- The recursion reaches a root node (a food class with no parent) that has no impact. If this happens, a `ValueError` is raised, indicating that the impact could not be determined for the ingredient

In [22]:
def get_impact(food_class_id: int) -> float:
    food_class = food_classes.get(food_class_id)
    if not food_class:
        raise ValueError(f"No food class found for ID {food_class_id}")
    if not pd.isna(food_class.impact):
        return food_class.impact
    if pd.isna(food_class.parent_id):
        raise ValueError(f"No impact available and no parent to recurse for {food_class.name}")
    return get_impact(food_class.parent_id)

Example of using the function with different food class IDs.

In [23]:
test_ids = [2, 17, 32, 70] 

for food_class_id in test_ids:
    try:
        impact = get_impact(food_class_id)
        print(f"Food Class ID {food_class_id}: Impact = {impact} kg CO2")
    except ValueError as e:
        print(f"Food Class ID {food_class_id}: {e}")

Food Class ID 2: Impact = 2.649402237 kg CO2
Food Class ID 17: Impact = 2.102187648 kg CO2
Food Class ID 32: Impact = 7.57055912 kg CO2
Food Class ID 70: Impact = 3.171540022 kg CO2


## Step 5: Recipe Calculator
The next function calculates the total CO₂ impact of a recipe by summing the impacts of its ingredients.
- **`recipe.iterrows()`**: Iterates over each row in the recipe DataFrame.
- **Ingredient Matching**: Normalises the ingredient’s name, then searches for a matching food class name within `food_classes`.
- **Impact Calculation**: Uses the `get_impact` function to find the CO₂ impact of the ingredient’s food class, then multiplies it by the ingredient’s weight and accumulates it in `total_impact`.
- **Error Handling**: If an ingredient is not found or impact cannot be calculated, the function prints an error message and returns `None`.

In [24]:
def calculate_recipe_impact(recipe: pd.DataFrame) -> float:
    total_impact = 0
    matched_classes = []
    for _, ingredient in recipe.iterrows():
        ingredient_name = normalize_name(ingredient['Ingredient Name'])
        matched_class = next(
            (fc for fc in food_classes.values() if normalize_name(fc.name) == ingredient_name), None)
        if not matched_class:
            print(f"Recipe ID {recipe_id} - {recipe_name}: Ingredient '{ingredient_name}' not found.")
            return None
        # Store the matched class for visualization
        matched_classes.append((ingredient_name, matched_class.name))
        try:
            impact = get_impact(matched_class.id)
            total_impact += impact * ingredient['Ingredient Weight / kg']
        except ValueError as e:
            print(e)
            return None
    matched_df = pd.DataFrame(matched_classes)
    display(matched_df)
    return total_impact

The most intricated part of the function above is matching the ingredient names with the corresponding food classes. Here is a more detailed explanation:
1. `for _, ingredient in recipe.iterrows()`: This loop iterates over each row in the recipe DataFrame.
    - `iterrows()` is a method provided by pandas to iterate over DataFrame rows as (index, Series) pairs.
    - Here, `_` is used as a placeholder for the index (since it’s not needed in this context), and `ingredient` represents each row (as a pandas Series object) containing data about an individual ingredient in the recipe.

2. `ingredient_name = normalize_name(ingredient['Ingredient Name'])`: This line retrieves the 'Ingredient Name' from the current ingredient row and then applies the `normalize_name` function to it.

3. `matched_class = next((fc for fc in food_classes.values() if normalize_name(fc.name) == ingredient_name), None)`: This line is attempting to find a matching food class for the current ingredient.
    - `food_classes` is a dictionary where each value is a FoodClass object, and `food_classes.values()` returns a view of the objects themselves.
    - The code uses a generator expression `(fc for fc in food_classes.values() if normalize_name(fc.name) == ingredient_name)`, which iterates over each `FoodClass` object fc in the dictionary and applies `normalize_name(fc.name)` to normalise the class name.
    - It checks if this normalised class name is equal to the `ingredient_name` (the normalized name of the ingredient).
    - `next(..., None)` retrieves the first `FoodClass` object that matches the `ingredient_name` condition. If no match is found, `None` is returned.

3. `if not matched_class`: This line checks if `matched_class` is None, which indicates that no matching food class was found for the current ingredient. If no match is found, it prints an error message and it returns `None`, indicating that the function couldn’t complete the recipe impact calculation because one of the ingredients couldn’t be matched to a food class.

In [25]:
for recipe_id, recipe_data in recipes_df.groupby('Recipe ID'):
    recipe_name = recipe_data['Recipe Name'].iloc[0]  # Get the recipe name
    impact = calculate_recipe_impact(recipe_data)
    if impact is not None:
        print(f"Recipe ID {recipe_id} - {recipe_name}: Total Impact = {impact} kg CO2")

Unnamed: 0,0,1
0,beef mince,Beef Mince
1,mince pork,Pork Mince
2,carrots,Carrots
3,onions,Onions
4,celery,Celery
5,garlic,Garlic
6,rosemary,Rosemary
7,oil olive,Olive Oil
8,plum tinned tomatoes,Tinned Plum Tomatoes
9,water,Water


Recipe ID 1 - Spaghetti Bolognese: Total Impact = 13.989865665656001 kg CO2


Unnamed: 0,0,1
0,cream double,Double Cream
1,mascarpone,Mascarpone
2,marsala,Marsala
3,caster golden sugar,Golden Caster Sugar
4,coffee granules,Coffee Granules
5,water,Water
6,fingers lady,Lady Fingers
7,chocolate dark,Dark Chocolate
8,cocoa powder,Cocoa Powder


Recipe ID 2 - Tiramisu: Total Impact = 4.8948132164 kg CO2


Unnamed: 0,0,1
0,chickpeas,Chickpeas
1,coriander,Coriander
2,parsley,Parsley
3,onions red,Red Onions
4,tomatoes,Tomatoes
5,oil olive,Olive Oil
6,juice lemon,Lemon Juice
7,harissa,Harissa


Recipe ID 3 - Chickpea Salad: Total Impact = 11.188634844578 kg CO2
Recipe ID 4 - Salted Caramel Ice Cream: Ingredient 'gum xanthum' not found.


#### Final Function

In [26]:
def calculate_recipe_impact_2(recipe: pd.DataFrame) -> float:
    total_impact = 0
    for _, ingredient in recipe.iterrows():
        ingredient_name = normalize_name(ingredient['Ingredient Name'])
        matched_class = next(
            (fc for fc in food_classes.values() if normalize_name(fc.name) == ingredient_name), None)
        if not matched_class:
            print(f"Ingredient '{ingredient_name}' not found.")
            return None
        try:
            impact = get_impact(matched_class.id)
            total_impact += impact * ingredient['Ingredient Weight / kg']
        except ValueError as e:
            print(e)
            return None
    return total_impact

In [27]:
for recipe_id, recipe_data in recipes_df_dummy.groupby('Recipe ID'):
    recipe_name = recipe_data['Recipe Name'].iloc[0]  # Get the recipe name
    impact = calculate_recipe_impact_2(recipe_data)
    if impact is not None:
        print(f"Recipe ID {recipe_id} - {recipe_name}: Total Impact = {impact} kg CO2")

Recipe ID 1 - Spaghetti Bolognese: Total Impact = 13.989865665656001 kg CO2
Recipe ID 2 - Tiramisu: Total Impact = 4.8948132164 kg CO2
Recipe ID 3 - Chickpea Salad: Total Impact = 11.188634844578 kg CO2
Ingredient 'gum xanthum' not found.


#### Alernative method - Fuzzy matching
To find the closest match between the ingredient names in the recipes and the food class names, it is possible to use a string similarity library such as `fuzzywuzzy` for example. This library allows to measure how similar two strings are and can help identify the closest match even when the strings don't match exactly.

In [28]:
from fuzzywuzzy import process

In [29]:
def calculate_recipe_impact_3(recipe: pd.DataFrame) -> float:
    total_impact = 0
    matched_classes = []
    
    # Extract food class names for fuzzy matching
    food_class_names = [fc.name for fc in food_classes.values()]
    
    for _, ingredient in recipe.iterrows():
        ingredient_name = normalize_name(ingredient['Ingredient Name'])
        
        # Use fuzzy matching to find the closest food class name
        closest_match, score = process.extractOne(ingredient_name, food_class_names)
        
        # Define a threshold for the match score
        if score < 80:  # You can adjust this threshold
            print(f"Ingredient '{ingredient_name}' not closely matched to any food class.")
            continue
        
        matched_class = next(fc for fc in food_classes.values() if fc.name == closest_match)
        matched_classes.append({"Ingredient Name": ingredient_name, "Matched Food Class": matched_class.name})
        
        try:
            impact = get_impact(matched_class.id)
            total_impact += impact * ingredient['Ingredient Weight / kg']
        except ValueError as e:
            print(e)
            return None
    
    matched_df = pd.DataFrame(matched_classes)
    print("Matched Ingredient Names and Food Class Names:")
    display(matched_df)
    
    return total_impact


In [30]:
for recipe_id, recipe_data in recipes_df_dummy.groupby('Recipe ID'):
    recipe_name = recipe_data['Recipe Name'].iloc[0]  # Get the recipe name
    impact = calculate_recipe_impact_3(recipe_data)
    if impact is not None:
        print(f"Recipe ID {recipe_id} - {recipe_name}: Total Impact = {impact} kg CO2")

Matched Ingredient Names and Food Class Names:


Unnamed: 0,Ingredient Name,Matched Food Class
0,beef mince,Beef Mince
1,mince pork,Pork Mince
2,carrots,Carrots
3,onions,Onions
4,celery,Celery
5,garlic,Garlic
6,rosemary,Rosemary
7,oil olive,Olive Oil
8,plum tinned tomatoes,Tinned Plum Tomatoes
9,water,Water


Recipe ID 1 - Spaghetti Bolognese: Total Impact = 13.989865665656001 kg CO2
Matched Ingredient Names and Food Class Names:


Unnamed: 0,Ingredient Name,Matched Food Class
0,cream double,Double Cream
1,mascarpone,Mascarpone
2,marsala,Marsala
3,caster golden sugar,Golden Caster Sugar
4,coffee granules,Coffee Granules
5,water,Water
6,fingers lady,Lady Fingers
7,chocolate dark,Dark Chocolate
8,cocoa powder,Cocoa Powder


Recipe ID 2 - Tiramisu: Total Impact = 4.8948132164 kg CO2
Matched Ingredient Names and Food Class Names:


Unnamed: 0,Ingredient Name,Matched Food Class
0,chickpeas,Chickpeas
1,coriander,Coriander
2,parsley,Parsley
3,onions red,Red Onions
4,tomatoes,Tomatoes
5,oil olive,Olive Oil
6,juice lemon,Lemon Juice
7,harissa,Harissa


Recipe ID 3 - Chickpea Salad: Total Impact = 11.188634844578 kg CO2
Ingredient 'gum xanthum' not closely matched to any food class.
Ingredient 'milk powder' not closely matched to any food class.
Matched Ingredient Names and Food Class Names:


Unnamed: 0,Ingredient Name,Matched Food Class
0,cream double,Double Cream
1,caster sugar,Caster Sugar


Recipe ID 4 - Salted Caramel Ice Cream: Total Impact = 0.8426292078000001 kg CO2


# Process Impact Streamlit App
The Python file `process_impact.py` was used as custom module to run a Streamlit app for better data and results visualisation. Here is the code step-by-step.

## Step 0: Import Modules
- **`streamlit`**: A Python library that allows you to build interactive web applications for data science and machine learning projects.
- **`pandas`**: A data manipulation library that helps in reading, analyzing, and manipulating data stored in tables.
- **`os`**: Provides operating system-dependent functions, which are used here to navigate file paths.
- **`from process_impact import *`**: Imports all functions and classes from the custom `process_impact` module, which should contain the data-loading and impact-calculation functions.

In [None]:
import streamlit as st
import pandas as pd
import os
from process_impact import *

## Step 1: Load Data

In [None]:
current_dir = os.path.dirname(__file__)
food_classes_df = pd.read_csv(os.path.join(current_dir, 'food_classes.csv'))
recipes_df = pd.read_csv(os.path.join(current_dir, 'recipes.csv'))

##  Step 2: Streamlit UI Setup
The title and purpose of the app are display first.

In [None]:
st.title("Recipe Impact Calculator")
st.write("This app allows the user to preview the recipes data and to calculate recipe impacts by clicking the button below.")

## Step 3: Display the Recipes Data
The first five rows of the recipes data were retrived and displayed using `st.write()`. Only the recipes data were shown assuming a user is not interested in seeing the food classes.

In [None]:
st.header("Recipes Data")
st.write("Below is a preview of the recipes data:")
st.write(recipes_df.head())  # Show the head of recipes da

## Step 4: Calculate Recipe Impacts with Button Interaction
- **`st.button()`**: Creates a button that users can click to trigger the calculation.
- **If the button is clicked**:
    - `st.header()`: Displays a new section header for the results.
    - **Try-Except Block**: Handles errors that might occur during the calculation process.
        - `impacts = {}`: Initializes an empty dictionary to store the impact results for each recipe.
        - `recipes_df.groupby('Recipe ID')`: Groups the `recipes_df` DataFrame by `Recipe ID`, allowing calculations on each unique recipe.
        - **Loop**:
            - Retrieves the name of each recipe.
            - Calls `calculate_recipe_impact(recipe_data)` to calculate the CO₂ impact based on the recipe data, which is likely implemented in `process_impact`.
            - If a valid impact is calculated, the result is stored in the `impacts` dictionary.
    - **Display Results**:
        - Loops over `impacts` and prints the recipe ID, name, and total impact on the app using `st.write()`.
- **Error Handling**: If an error occurs, `st.error()` displays the error message on the page.

In [None]:
if st.button("Calculate Recipe Impacts"):
    st.header("Recipe Impact Results")

    # Process each recipe and calculate impact
    try:
        impacts = {}
        for recipe_id, recipe_data in recipes_df.groupby('Recipe ID'):
            recipe_name = recipe_data['Recipe Name'].iloc[0]
            impact = calculate_recipe_impact(recipe_data)  # Make sure this function is defined
            if impact is not None:
                impacts[recipe_id] = (recipe_name, impact)

        for recipe_id, (recipe_name, impact) in impacts.items():
            st.write(f"Recipe ID {recipe_id} - {recipe_name}: Total Impact = {impact:.2f} kg CO2")
    except Exception as e:
        st.error(f"Error: {e}")

Link to the [Streamlit App](https://take-home-test-fs-jdygehunlwkzwbfrcybjhj.streamlit.app/).

# Take Home Test - Frontend - Data Processing Task
This task uses the API provided by <https://jsonplaceholder.typicode.com>.

Create a single page application using JavaScript or TypeScript that fetches data from the API, and displays a list of users along with their latest post. You should include a search bar which allows filtering the results by users' names using a simple substring match.

Assuming post ID corresponds to date created (lower ID means created earlier), for each user, show:

- Their name
- The title and body of their latest post

Feel free to use a framework (e.g. Create React App, React) or to do it in plain JavaScript using `<script>` tags, or anything inbetween – whatever’s easiest for you.

Styling is unimportant. Layout is important only insofar as the required information is presented in a sensible manner.

I solved the code challenge first using Python and Streamlit.

## Step 0: Import Modules
- **`streamlit`**: Used to build the web app interface.
- **`requests`**: Helps fetch data from the web via HTTP requests.

In [None]:
import streamlit as st
import requests

## Step 1: Title

In [None]:
st.title("Users and Posts")

## Step 2: Fetch Data from API
- `requests.get()`: Fetches data from the specified API endpoints.
- The users data and posts data are stored as JSON.

In [None]:
response = requests.get("https://jsonplaceholder.typicode.com/users")
users_data = response.json()

response_posts = requests.get("https://jsonplaceholder.typicode.com/posts")
posts_data = response_posts.json()

## Step 3: Store the Latest Posts for Each User
Loops through all posts to store the latest post for each user based on the post ID. There is an error in the following `dictionary`, the code snippet returns the earliest post for each user.

In [None]:
latest_posts = {}
for post in posts_data:
    user_id = post["userId"]
    post_id = post["id"]
    if user_id not in latest_posts or post_id < latest_posts[user_id]["id"]:
        latest_posts[user_id] = post

The above cell was modified with the following which assumes the latest post is the one with higher `post_id`.

In [None]:
latest_posts = {}
for post in posts_data:
    user_id = post["userId"]
    post_id = post["id"]
    if user_id not in latest_posts or post_id > latest_posts[user_id]["id"]:
        latest_posts[user_id] = post

## Step 4: Search Bar for Filtering
A search bar is created where users can type in a name to filter the displayed users.

In [None]:
search_term = st.text_input("Search for a user by name:")

## Step 5: Display Users and Their Latest Posts
- Filters and displays each user whose name matches the search term.
- Displays the user’s latest post by title and body.

In [None]:
for user in users_data:
    if search_term.lower() in user["name"].lower():
        st.subheader(user["name"])
        latest_post = latest_posts[user["id"]]
        st.write(f"**Title:** {latest_post['title']}")
        st.write(f"**Body:** {latest_post['body']}")
        st.write("---")

Link to the [Streamlit App](https://take-home-test-fs-dmdb6hg6h2qiqgedej7wfh.streamlit.app/).

# Create React App
This is a React component that displays users and their latest posts, allowing users to search by name.

In [None]:
import React, { useEffect, useState } from 'react';

## Step 1: State Management
- `users`: Stores the list of user data fetched from an API.
- `posts`: Stores the list of posts fetched from another API.
- `filteredUsers`: Stores the subset of users that match the current search term.
- `searchTerm`: Keeps track of the current text entered in the search bar.

In [None]:
const [users, setUsers] = useState([]);
const [posts, setPosts] = useState([]);
const [filteredUsers, setFilteredUsers] = useState([]);
const [searchTerm, setSearchTerm] = useState('');

## Step 2: Data Fetching with `useEffect`
This effect runs only once, when the component mounts, due to the empty dependency array []. It defines an asynchronous fetchData function that:
- Fetches users and posts from two endpoints.
- Parses the responses to JSON.
- Stores the data in the corresponding state variables (users and posts).
- If there’s an error in fetching the data, it logs the error to the console.

In [None]:
useEffect(() => {
    const fetchData = async () => {
        try {
            const usersResponse = await fetch('https://jsonplaceholder.typicode.com/users');
            const usersData = await usersResponse.json();
            const postsResponse = await fetch('https://jsonplaceholder.typicode.com/posts');
            const postsData = await postsResponse.json();

            setUsers(usersData);
            setPosts(postsData);
            setFilteredUsers(usersData);
        } catch (error) {
            console.error('Error fetching data:', error);
        }
    };
    fetchData();
}, []);

## Step 3: Search Filter with `useEffect`
This effect runs every time the s`earchTerm` or `users` state changes. It filters the `users` array based on whether each user’s name includes the search term, ignoring case. The resulting filtered list is stored in `filteredUsers`.

In [None]:
useEffect(() => {
    const filtered = users.filter(user =>
        user.name.toLowerCase().includes(searchTerm.toLowerCase())
    );
    setFilteredUsers(filtered);
}, [searchTerm, users]);

## Step 4: Get the Latest Post
This function retrieves all posts made by a specific user (by `userId`). If no posts are found, it returns `null`.
It sorts the posts by `id` in descending order, assuming the highest `id` corresponds to the most recent post. Finally, it returns the most recent post for that user.

In [None]:
const getLatestPost = (userId) => {
    const userPosts = posts.filter(post => post.userId === userId);
    if (userPosts.length === 0) return null;
    userPosts.sort((a, b) => b.id - a.id);
    return userPosts[0];
};

Instead, if the goal was to obtain the earliest post, the function can be modified to sort the posts by `id` in ascending order.

In [None]:
const getEarliestPost = (userId) => {
    const userPosts = posts.filter(post => post.userId === userId);
    if (userPosts.length === 0) return null;
    userPosts.sort((a, b) => a.id - b.id);
    return userPosts[0];
};

## Step 5: Render the UI
- A search bar is rendered with an `onChange` event handler that updates the `searchTerm` state.
- The `filteredUsers` array is iterated over to render each user.
- For each user, it retrieves their latest post using the `getLatestPost` function and displays it.
- If a user has no posts, it displays “No posts available.”

In [None]:
return (
    <div style={{ padding: '20px', fontFamily: 'Arial, sans-serif' }}>
        <h1>User Posts</h1>

        <input
            type="text"
            placeholder="Search users by name"
            value={searchTerm}
            onChange={(e) => setSearchTerm(e.target.value)}
            style={{ padding: '10px', width: '100%', marginBottom: '20px' }}
        />

        <div>
            {filteredUsers.map(user => {
                const latestPost = getLatestPost(user.id);
                return (
                    <div key={user.id} style={{ borderBottom: '1px solid #ccc', marginBottom: '20px', paddingBottom: '20px' }}>
                        <h2>{user.name}</h2>
                        {latestPost ? (
                            <div>
                                <h3>{latestPost.title}</h3>
                                <p>{latestPost.body}</p>
                            </div>
                        ) : (
                            <p>No posts available.</p>
                        )}
                    </div>
                );
            })}
        </div>
    </div>
);

## Originating from Streamlit
To translate this task from Streamlit to React, I was focused on:
1. **API Data Handling**: Instead of using `requests.get()`, I used JavaScript’s `fetch` API.
2. **State Management**: React’s `useState` and `useEffect` replace variable assignment and lifecycle functions in Python.
3. **Event-Driven Updates**: Unlike Streamlit, React is inherently reactive, meaning components automatically re-render when their state changes, so filtering users based on input is handled smoothly with state changes.

The core logic—fetching data, filtering it, and rendering based on user interactions—remains conceptually similar between a Streamlit app and this React app.