# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil.

It's no small feat that most of these ingredients are readily available in grocery stores. In this project, you'll conduct a supply chain analysis of the ingredients used in an avocado toast, utilizing the [Open Food Facts database](https://world.openfoodfacts.org/). This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish. The data is contained in `.csv` files in the `data/` folder provided.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

In [215]:
import pandas as pd

# Reading Data

In [216]:
columns= [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']

df = pd.read_csv("data/avocado.csv",sep='\t')
df = df[columns]

In [217]:
df.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,


## Filtering Category Tags

In [218]:
#Dropping null values from `categories_tags` column 
df = df.dropna(subset='categories_tags')

In [219]:
#Turning a column of comma separated tags into a column of lists 
df['categories_tags'] = df['categories_tags'].str.split(',')

In [220]:
relevant_categories = ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados']

df = df[df['categories_tags'].apply(lambda x: any([i for i in x if i in relevant_categories]))]

## Where do most avocados come from?

In [221]:
df = df[df['countries'] == "United Kingdom"]

In [222]:
avocados= df['origins_tags'].value_counts()
avocado_origin = "Peru"

## Don't Repeat Yourself (DRY)

In [223]:
def read_and_filter_data(filepath, relevant_categories):
  df = pd.read_csv('data/' + filepath, sep='\t')
  # Subset data
  df = df[columns]
  # Split tags into lists
  df['categories_list'] = df['categories_tags'].str.split(',')
  # Drop null categories and filter data
  df = df.dropna(subset = 'categories_list')
  df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_categories]))]
  df = df[(df['countries']=='United Kingdom')]
  print(f'**{filepath[:-4]} origins**','\n',df['origins_tags'].value_counts(), '\n')
  return df

In [224]:
cat = ['en:aromatic-plants', 'en:citron', 'en:citrus', 'en:fresh-fruits', 'en:fresh-lemons', 'en:fruits', 'en:lemons', 'en:unwaxed-lemons']
lemons = read_and_filter_data('lemon.csv', cat)
lemon_origin = 'South Africa'

**lemon origins** 
 en:brazil,en:south-africa    1
en:south-africa              1
Name: origins_tags, dtype: int64 



In [225]:
# testing reading txt files : 
with open ("data/relevant_olive_oil_categories.txt","r") as file : 
    relevant_olive_oil_categories = file.read().splitlines()
    file.close()

In [226]:
relevant_olive_oil_categories[:5]

['ar:huile-d-olive',
 'ar:oil',
 'bg:green-olive-paste',
 'de:ol',
 'en:aceites-de-oliva']

In [227]:
with open ("data/relevant_sourdough_categories.txt","r") as file :
    relevant_sourdough_categories = file.read().splitlines()
    file.close()

In [228]:
olive_oil = read_and_filter_data('olive_oil.csv', relevant_olive_oil_categories)
olive_oil_origin = 'Greece'

**olive_oil origins** 
 en:greece                                             6
en:spain                                              4
en:italy                                              4
en:greece,en:italy,en:portugal,en:spain,en:tunisia    2
en:produce-of-italy                                   1
en:european-union-and-non-european-union              1
en:produced-in-italy                                  1
en:european-union                                     1
Name: origins_tags, dtype: int64 



In [229]:
sourdough = read_and_filter_data('sourdough.csv', relevant_sourdough_categories)
sourdough_origin = 'United Kingdom'

**sourdough origins** 
 en:united-kingdom    3
en:france            1
Name: origins_tags, dtype: int64 



In [230]:
relevant_salt_categories = [
 'en:edible-common-salt',
 'en:salts',
 'en:sea-salts',]
salt_flakes = read_and_filter_data('salt_flakes.csv',relevant_salt_categories)


**salt_flakes origins** 
 Series([], Name: origins_tags, dtype: int64) 

