# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil. It's no small feat that most of these ingredients are readily available in grocery stores. 

In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are, fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple categories tags in the `categories_tags` column.
- There is a column in each CSV file called `origins_tags` with strings for country of origin of that item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

![](avocado_wallpaper.jpeg)

## 1. Read in the avocado data

Begin by reading the avocado data from CSV file in the data folder - it is actually tab-delimited. This creates quite a large DataFrame, so it's a good idea to subset it to only a smaller number of relevant columns. Then read in the file for relevant category tags for avocados.

In [96]:
import pandas as pd

# Read the tab-delimited avocado.csv using pandas 
df_avocado = pd.read_csv('data/avocado.csv', sep= '\t')

# Subsetting large DataFrame with columns of interest
df_avocado = df_avocado[['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']]

# Open the text file and read its contents into a list
file_path = "data/relevant_avocado_categories.txt"
with open(file_path, 'r') as file:
    relevant_avocado_categories = file.read().splitlines()

In [97]:
df_avocado.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,


In [98]:
relevant_avocado_categories

['en:avocadoes',
 'en:avocados',
 'en:fresh-foods',
 'en:fresh-vegetables',
 'en:fruchte',
 'en:fruits',
 'en:raw-green-avocados',
 'en:tropical-fruits',
 'en:tropische-fruchte',
 'en:vegetables-based-foods',
 'fr:hass-avocados']

## 2. Filter avocado data using relevant category tags

Each food DataFrame contains a column called categories_tags, which contains the food item category, e.g., fruits, vegetables, fruit-based oils, etc. Start by dropping rows with null values in categories_tags. This column is comma-separated, so you'll first need to turn it into a column of lists so that you can treat each item in the list as a separate tag. Filter this reduced DataFrame to contain only the rows where there is a relevant category tag.

In [99]:
# Dropping rows with null values in categories_tags
df_avocado = df_avocado.dropna(subset=['categories_tags'])

# Turning a column of comma separated tags into a column of lists
df_avocado['categories_list'] = df_avocado['categories_tags'].str.split(',')

# Filtering a DataFrame based on a column of lists
df_avocado = df_avocado[df_avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories
]))]

df_avocado.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags,categories_list
5,3662994002063,fr,,3 fruits,,,la compagnie des fruits mûrs,la-compagnie-des-fruits-murs,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
6,8437013031011,fr,,1 kg,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
14,4016249238155,de,,135g,100g,de:gläschen,Allos,allos,"en:plant-based-foods-and-beverages,en:plant-ba...","en:organic,en:vegetarian,en:eu-organic,en:no-g...",Deutschland,en:germany,Europäische Union,en:european-union,"[en:plant-based-foods-and-beverages, en:plant-..."
17,8718963381532,de,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,Deutschland,en:germany,,,"[en:plant-based-foods-and-beverages, en:plant-..."
23,8436002746707,es,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,España,en:spain,,,"[en:plant-based-foods-and-beverages, en:plant-..."


## 3. Where do most UK avocados come from?

Your avocado DataFrame should contain a column called origins_tags. Create a variable called top_avocado_origin, containing the top country where avocados in the United Kingdom come from.

In [100]:
# Filtering for UK
df_avocado = df_avocado[(df_avocado['countries']=='United Kingdom')]

# Counting and ordering by the unique values in the country of origin column
#df_avocado['origins_tags'].value_counts()

# Get the country with the highest count
top_avocado_origin = df_avocado['origins_tags'].value_counts().index[0]

# Strip out characters before country name
top_avocado_origin = top_avocado_origin.replace('en:', '')

# Replace hyphens with spaces in the 'country_name' column
top_avocado_origin = top_avocado_origin.replace(r'-(?=[a-zA-Z])', ' ').capitalize()

In [101]:
avocado_origin

'Peru'

## 4. Creating function to call for each ingredient

In [102]:
def main_origin_countries(filename, relevant_category_filename):
    # Read the tab-delimited csv using pandas 
    df = pd.read_csv('data/' + filename, sep= '\t')

    # Subsetting large DataFrame with columns of interest
    df = df[['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags',                 'origins','origins_tags']]
    
        
    # Dropping rows with null values in categories_tags
    df = df.dropna(subset=['categories_tags'])

    # Turning a column of comma separated tags into a column of lists
    df['categories_list'] = df['categories_tags'].str.split(',')
    
    # Open the text file and read its contents into a list
    file_path = "data/"+ relevant_category_filename
    with open(file_path, 'r') as file:
        relevant_category_list = file.read().splitlines()
    
    # Filtering a DataFrame based on a column of lists
    df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_category_list
    ]))]
    
    # Filtering for UK
    df = df.loc[df['countries']=='United Kingdom']
    
    # Check if the 'origins_tags' column is empty
    if df['origins_tags'].isna().all():
        origin = 'None'
        print("The 'origins_tags' column is empty")
    else:
            # Get the country with the highest count
            origin = df['origins_tags'].value_counts().index[0]

            # Strip out characters before country name
            origin = origin.replace('en:', '')

            # Replace hyphens with spaces in the 'country_name' column
            origin = origin.replace(r'-(?=[a-zA-Z])', ' ').replace('-', ' ').capitalize()

    return origin
    

In [103]:
top_olive_oil_origin = main_origin_countries('olive_oil.csv', "relevant_olive_oil_categories.txt")
top_olive_oil_origin

'Greece'

In [104]:
top_sourdough_origin = main_origin_countries('sourdough.csv', "relevant_sourdough_categories.txt")
top_sourdough_origin

'United kingdom'

## 5. Conclusions 

Through data analysis and the application of data cleaning and manipulation techniques, we were able to address an important question: where do the key ingredients of the popular 'avocado toast' dish in the UK primarily originate from?

Using data processing techniques, we determined that the avocados used in the United Kingdom predominantly come from Peru, while the olive oil is sourced from Greece. It's also not surprising that sourdough bread is locally produced, originating from the United Kingdom, as the general preference suggests that freshly baked bread is the best choice."

**Author:** Estephania Pivac Alcaraz