# What's in an Avocado Toast: A Supply Chain Analysis

![](avocado_wallpaper.jpeg)

You find yourself in London, crafting a delectable avocado toast, a dish that has risen dramatically in popularity on breakfast menus since the 2010s. This straightforward recipe requires just five ingredients: a ripe avocado, half a lemon, a generous pinch of salt flakes, two slices of sourdough bread, and a good drizzle of extra virgin olive oil. Most of these ingredients are now staples in grocery stores, and as you will find with this project, that is no small feat!

In this project, you'll conduct a supply chain analysis of three ingredients used in avocado toast using the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin.
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple category tags in the `categories_tags` column.
There is a column in each CSV file called `origins_tags`, which contains strings for the country of origin of each item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

[Open Food Facts database](https://world.openfoodfacts.org/)

In [53]:
# Importing libs
import csv
import os
import pandas as ps

In [54]:
# Checking where the files are
os.listdir('data')

['relevant_avocado_categories.txt',
 'olive_oil.csv',
 'sourdough.csv',
 'lemon.csv',
 'relevant_olive_oil_categories.txt',
 'salt_flakes.csv',
 'relevant_sourdough_categories.txt',
 'avocado.csv']

In [55]:
# Checking an example of relevant categories in a .txt file
file = open("data/relevant_avocado_categories.txt", "r")
content = file.read()
print(content)
file.close()

en:avocadoes
en:avocados
en:fresh-foods
en:fresh-vegetables
en:fruchte
en:fruits
en:raw-green-avocados
en:tropical-fruits
en:tropische-fruchte
en:vegetables-based-foods
fr:hass-avocados


In [56]:
# Reading all the relevant categories files
def read_categories(filename):
    filepath = os.path.join("data", filename)
    try:
        with open(filepath, "r") as file:
            return file.read().splitlines()
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
        return []
    
relevant_avocado_categories = read_categories("relevant_avocado_categories.txt")
relevant_olive_oil_categories = read_categories("relevant_olive_oil_categories.txt")
relevant_sourdough_categories = read_categories("relevant_sourdough_categories.txt")

In [57]:
# Checking the separator of csv files
def identify_separator(filename):
    try:
        with open(filename, 'r') as f:
            dialect = csv.Sniffer().sniff(f.read(1024))
            return dialect.delimiter
    except csv.Error:
        return None

filename = 'data/avocado.csv'
separator = identify_separator(filename)

if separator:
    print(f"The identified separator is: '{separator}'")
else:
    print("Could not identify the separator.")

The identified separator is: '	'


In [58]:
# Reading all tsv files
avocado = pd.read_csv('data/avocado.csv', sep='\t')
olive_oil = pd.read_csv('data/olive_oil.csv', sep='\t')
sourdough = pd.read_csv('data/sourdough.csv', sep='\t')

In [59]:
# Declaring cols to keep
cols_to_keep = [
    'code',
    'lc',
    'product_name_en',  # Corrected: productnameen -> product_name_en
    'quantity',
    'serving_size',     # Corrected: servingsize -> serving_size
    'packaging_tags',   # Corrected: packagingtags -> packaging_tags
    'brands',
    'brands_tags',      # Corrected: brandstags -> brands_tags
    'categories_tags',  # Corrected: categoriestags -> categories_tags
    'labels_tags',      # Corrected: labelstags -> labels_tags
    'countries',
    'countries_tags',   # Corrected: countriestags -> countries_tags
    'origins',
    'origins_tags'
]

In [60]:
# Subsetting each dataframe
avocado = avocado[cols_to_keep]
olive_oil = olive_oil[cols_to_keep]
sourdough = sourdough[cols_to_keep]

In [61]:
def analyze_data(df, relevant_categories):
    # Splitting categories tags in list
    df['categories'] = df['categories_tags'].str.split(',')
    df.dropna(subset=['categories'], inplace=True)
    
    # Keeping only relevant categories
    df = df[df['categories'].apply(lambda cats: any(cat in cats for cat in relevant_categories))]
    
    # Subsetting UK
    df_uk = df[df['countries'].str.contains('United Kingdom', na=False)]
    
    # Getting origins and top origin
    origins = df_uk['origins_tags'].value_counts()
    top_origin = origins.index[0].lstrip("en:").replace('-', ' ').strip()
    
    # Print the output
    print(f"Top origin country: {top_origin}\n")

    return top_origin

In [62]:
top_avocado_origin = analyze_data(avocado, relevant_avocado_categories)
top_olive_oil_origin = analyze_data(olive_oil, relevant_olive_oil_categories)
top_sourdough_origin = analyze_data(sourdough, relevant_sourdough_categories)

Top origin country: peru

Top origin country: greece

Top origin country: united kingdom

