# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil.

It's no small feat that most of these ingredients are readily available in grocery stores. In this project, you'll conduct a supply chain analysis of the ingredients used in an avocado toast, utilizing the [Open Food Facts database](https://world.openfoodfacts.org/). This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish. The data is contained in `.csv` files in the `data/` folder provided.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

![](avocado_wallpaper.jpeg)

Instructions
All required data is provided in the data/ folder. Start by loading the data about avocados as avocado. The data is tab-delimited. Subset the data to contain only the columns: [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', countries_tags', 'origins','origins_tags']

Filter the DataFrame based on the column categories_tags, to return only rows where categories_tags contains any of the following: ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados'].
Start by dropping rows with null values in categories_tags, and then splitting this column of comma-separated values into a column of lists. Save this as a new list, categories_list.

You can compare the list in each row of categories_list with the reference list, categories_tags. 
One way to do this is to call apply(lambda x: any(___)) on your new column of lists, where any() contains a list comprehension and evaluates to true if any item in the list is True.

Your avocado DataFrame should contain a column called origins_tags, containing origin countries. Create a variable called avocado_origin, containing the top country where avocados in the United Kingdom come from. Begin by using the countries column to filter for only products recorded in the United Kingdom.


Repeat the steps above for the rest of the ingredients. To make this replicable, you can create a function called read_and_filter_data() which takes two arguments, filepath and relevant_categories, and returns a filtered DataFrame containing only rows where categories_tags are in a list called relevant_categories. The function also prints the number of rows per origin, which you'll use to identify the most likely country of origin.

Apply your user-defined function to filter the data in lemon.csv, where relevant categories are ['en:aromatic-plants', 'en:citron', 'en:citrus', 'en:fresh-fruits', 'en:fresh-lemons', 'en:fruits', 'en:lemons', 'en:unwaxed-lemons'], saving the result to a varaible called lemon. Create a variable called lemon_origin, containing the top country where lemons in the United Kingdom originate from.

Just as you did with the lemon data, create the variables olive_oil_origin and sourdough_origin, respectively. Relevant categories can be found in the files relevant_olive_oil_categories.txt and relevant_sourdough_categories.txt, since these lists are slightly longer than the ones you've been working with. For salts, use the categories ['en:edible-common-salt', 'en:salts', 'en:sea-salts']. You'll find that we don't have any data on the origin of salt flakes: this is common with supply chain datasets, which are often incomplete. A great jumping off point for your next analysis!


In [1]:
import pandas as pd

In [2]:
avocado=pd.read_csv('avocado.csv',sep='\t')
print(avocado.head(5))

            code  lc product_name_de product_name_el  \
0  0059749979702  fr             NaN             NaN   
1  7610095131409  en             NaN             NaN   
2  4005514005578  en             NaN             NaN   
3  0879890002513  en             NaN             NaN   
4  0223086613685  en             NaN             NaN   

                     product_name_en product_name_es product_name_fi  \
0                                NaN             NaN             NaN   
1                                NaN             NaN             NaN   
2  Gelbe Linse Avocado Brotaufstrich             NaN             NaN   
3           Avocado toast chili lime             NaN             NaN   
4                            Avocado             NaN             NaN   

         product_name_fr product_name_id product_name_it  ...  \
0  Naturalia Avocado Oil             NaN             NaN  ...   
1     Avocado Bowl chips             NaN             NaN  ...   
2                    NaN           

In [3]:
avocado.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1785 entries, 0 to 1784
Columns: 184 entries, code to data_sources
dtypes: float64(58), int64(1), object(125)
memory usage: 2.5+ MB


In [4]:
new_columns= [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
avocado=avocado[new_columns]
print(avocado.head())

            code  lc                    product_name_en quantity serving_size  \
0  0059749979702  fr                                NaN      NaN          NaN   
1  7610095131409  en                                NaN      NaN          NaN   
2  4005514005578  en  Gelbe Linse Avocado Brotaufstrich      NaN          NaN   
3  0879890002513  en           Avocado toast chili lime      NaN          NaN   
4  0223086613685  en                            Avocado      NaN          NaN   

  packaging_tags     brands brands_tags  \
0            NaN  Naturalia   naturalia   
1            NaN    Zweifel     zweifel   
2            NaN     Tartex      tartex   
3            NaN        NaN         NaN   
4            NaN        NaN         NaN   

                                     categories_tags  \
0  en:plant-based-foods-and-beverages,en:plant-ba...   
1  en:snacks,en:salty-snacks,en:appetizers,en:chi...   
2                              de:abendbrotsufstrich   
3                             

In [5]:
avocado = avocado.dropna(subset = 'categories_tags')
print(avocado.head())

            code  lc                    product_name_en  quantity  \
0  0059749979702  fr                                NaN       NaN   
1  7610095131409  en                                NaN       NaN   
2  4005514005578  en  Gelbe Linse Avocado Brotaufstrich       NaN   
5  3662994002063  fr                                NaN  3 fruits   
6  8437013031011  fr                                NaN      1 kg   

  serving_size packaging_tags                        brands  \
0          NaN            NaN                     Naturalia   
1          NaN            NaN                       Zweifel   
2          NaN            NaN                        Tartex   
5          NaN            NaN  la compagnie des fruits mûrs   
6          NaN            NaN                           NaN   

                    brands_tags  \
0                     naturalia   
1                       zweifel   
2                        tartex   
5  la-compagnie-des-fruits-murs   
6                           NaN

In [6]:
avocado['categories_list']=avocado['categories_tags'].str.split(",")
print(avocado.head())

            code  lc                    product_name_en  quantity  \
0  0059749979702  fr                                NaN       NaN   
1  7610095131409  en                                NaN       NaN   
2  4005514005578  en  Gelbe Linse Avocado Brotaufstrich       NaN   
5  3662994002063  fr                                NaN  3 fruits   
6  8437013031011  fr                                NaN      1 kg   

  serving_size packaging_tags                        brands  \
0          NaN            NaN                     Naturalia   
1          NaN            NaN                       Zweifel   
2          NaN            NaN                        Tartex   
5          NaN            NaN  la compagnie des fruits mûrs   
6          NaN            NaN                           NaN   

                    brands_tags  \
0                     naturalia   
1                       zweifel   
2                        tartex   
5  la-compagnie-des-fruits-murs   
6                           NaN

In [13]:
relevant_avocado_categories=['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados']
# Filtering a DataFrame based on a column of lists
avocado = avocado[avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]
print(avocado.head())

             code  lc product_name_en  quantity serving_size packaging_tags  \
5   3662994002063  fr             NaN  3 fruits          NaN            NaN   
6   8437013031011  fr             NaN      1 kg          NaN            NaN   
14  4016249238155  de             NaN      135g         100g    de:gläschen   
17  8718963381532  de             NaN       NaN          NaN            NaN   
23  8436002746707  es             NaN       NaN          NaN            NaN   

                          brands                   brands_tags  \
5   la compagnie des fruits mûrs  la-compagnie-des-fruits-murs   
6                            NaN                           NaN   
14                         Allos                         allos   
17                           NaN                           NaN   
23                           NaN                           NaN   

                                      categories_tags  \
5   en:plant-based-foods-and-beverages,en:plant-ba...   
6   en:plant-b

In [14]:
avocado_uk=avocado[avocado['countries']=='United Kingdom']
print(avocado_uk.head())

              code  lc           product_name_en quantity serving_size  \
361       00985833  en                   Avacado    650 g          NaN   
381       00040464  en                   Avocado      NaN          NaN   
414  4088600100173  en                   Avocado    100 g          NaN   
468       01307351  en          Avacados organic      NaN          NaN   
508  5057172125395  en  Just Essentials Avocados    4pack          NaN   

                              packaging_tags                  brands  \
361                                      NaN         Marks & Spencer   
381                                      NaN                     NaN   
414                 en:mixed-plastic-unknown                    Aldi   
468                 en:card-tray,en:ldpe-bag  Sainsbury’s SO organic   
508  en:mixed-plastic-film-packet-to-recycle                    Asda   

                brands_tags  \
361           marks-spencer   
381                     NaN   
414                    aldi  

In [9]:
avocado_uk.shape

(13, 15)