## 2021: Week 7 - Vegan Shopping List

Now that Veganuary has come and gone we thought it would be interesting to take a look at some common supermarket products and use Prep to figure out whether or not they are vegan. Some results may surprise you!

For the sake of this analysis we're taking bee by-products as non-vegan (beeswax, honey, etc).

### Inputs

1. A shopping list of products and their ingredients (or allergens when ingredients were not available). I have a child-like palate so its mostly full of sweet treats, some of which you'd expect to be vegan and some of which you'd expect not to be, however everything is commonly found in UK supermarkets so no specialist shops required.

![img](https://1.bp.blogspot.com/-BfYMo--X7R0/X_RxbJVxzVI/AAAAAAAABDs/yNFOPkii-M8oy4jHtebRFhOt_zQJmJWWwCLcBGAsYHQ/w640-h248/01%2BShoppingListInput.jpg)

2. Two lists of common non-vegan ingredients and E numbers

![img2](https://1.bp.blogspot.com/-OKXkRvjBrUQ/X_RxbNpC42I/AAAAAAAABDo/dljP44u8uyMkN_5q4Kw8zUSX_zayCuGQACPcBGAYYCw/w640-h90/02%2BNonVeganIngredients.jpg)

### Requirments

- Input the data
- Prepare the keyword data
    - Add an 'E' in front of every E number.
    - Stack Animal Ingredients and E Numbers on top of each other.
    - Get every ingredient and E number onto separate rows.
- Append the keywords onto the product list.
- Check whether each product contains any non-vegan ingredients.
- Prepare a final shopping list of vegan products.
    - Aggregate the products into vegan and non-vegan.
    - Filter out the non-vegan products.
- Prepare a list explaining why the other products aren't vegan.
    - Keep only non-vegan products.
    - Duplicate the keyword field.
    - Rows to columns pivot the keywords using the duplicate as a header.
    - Write a calculation to concatenate all the keywords into a single comma-separated list for each product, e.g. "whey, milk, egg".
- Output the data.

### Outputs

Vegan Shopping List
- Product
- Description
- 20 rows (21 including headers)

![img3](https://1.bp.blogspot.com/-1WRcHfNPDAc/X_SEXlXvagI/AAAAAAAABD8/VmGKs9tDP1s713EKO1jYSO_yr3Kz-7_YwCLcBGAsYHQ/w400-h271/03%2BVeganOutput.jpg)

Non Vegan List
- Product
- Description
- Contains
- 19 rows (20 including headers)

![img4](https://1.bp.blogspot.com/-n90BmuM-xgQ/X_SEe5KydyI/AAAAAAAABEA/uQpiNeQNhCoOT5ntFPgzDZzdbcHQDT8uQCLcBGAsYHQ/w400-h269/04%2BNonVeganOutput.jpg)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

### Input the data

In [2]:
data = pd.read_excel("./data/Shopping List and Ingredients.xlsx", sheet_name=["Shopping List", "Keywords"])
df = data["Shopping List"].copy()
keywords = data["Keywords"].copy()
keywords

Unnamed: 0,Animal Ingredients,E Numbers
0,"Milk, Whey, Honey, Egg, Lactose, Collagen, Ela...","120, 441, 545, 901, 904, 910, 920, 921, 913, 966"


In [3]:
df.head()

Unnamed: 0,Product,Description,Ingredients/Allergens
0,Tesco Bacon Rashers Snacks,"Bacon flavour baked snacks made with maize, ri...","Maize, Rice Flour, Sunflower Oil, Soya Flour, ..."
1,Pringles Bbq,Texas Barbecue Sauce Flavour Savoury Snack,"Dehydrated Potatoes, Vegetable Oils (Sunflower..."
2,Doritos Chilli Heatwave Tortilla Chips,Chilli Heatwave Flavour Corn Chips,"Corn (Maize), Vegetable Oils (Corn, Sunflower,..."
3,Walkers Max Flamin Hot Crisps,Fiercely Flamin' Hot Flavour Ridged Potato Crisps,"Potatoes, Vegetable Oils (Sunflower, Rapeseed,..."
4,Smiths Frazzles Bacon Snacks,Crispy Bacon Flavour Corn Snack,"Maize, Rapeseed Oil, Bacon Flavour Seasoning [..."


### Prepare the keyword data 1. Add an 'E' in front of every E number

In [4]:
e_numbers = keywords["E Numbers"].str.split(",").values
e_numbers = np.concatenate(e_numbers).tolist()

results = []
for num in e_numbers:
    num = num.strip()
    tmp = "E" + num
    results.append(tmp)
e_numbers = results
e_numbers

['E120',
 'E441',
 'E545',
 'E901',
 'E904',
 'E910',
 'E920',
 'E921',
 'E913',
 'E966']

### Stack Animal Ingredients and E Numbers on top of each other

In [5]:
ingredients = keywords["Animal Ingredients"].str.split(",").values
ingredients = np.concatenate(ingredients).tolist()

results = []
for i in ingredients:
    i = i.strip()
    results.append(i)
ingredients = results
ingredients

['Milk',
 'Whey',
 'Honey',
 'Egg',
 'Lactose',
 'Collagen',
 'Elastin',
 'Keratin',
 'Gelatine',
 'Gelatin',
 'Pepsin',
 'Isinglass',
 'Shellac',
 'Lard',
 'Aspic',
 'Beeswax']

In [6]:
global total_ingredients
total_ingredients = ingredients + e_numbers

### Check whether each product contains any non-vegan ingredients.

In [7]:
import re
df["Ingredients_list"] = df["Ingredients/Allergens"].map(lambda ser_: re.sub(r"[{}\[\]()\,]", "", ser_)).str.split(" ")

def check_non_vegan(ser_):
    results = []
    for s in ser_:
        if s in total_ingredients:
            return True
        else :
            pass
    return False

df["Non_vegan"] = df["Ingredients_list"].map(lambda x: check_non_vegan(x))
df.head()

Unnamed: 0,Product,Description,Ingredients/Allergens,Ingredients_list,Non_vegan
0,Tesco Bacon Rashers Snacks,"Bacon flavour baked snacks made with maize, ri...","Maize, Rice Flour, Sunflower Oil, Soya Flour, ...","[Maize, Rice, Flour, Sunflower, Oil, Soya, Flo...",False
1,Pringles Bbq,Texas Barbecue Sauce Flavour Savoury Snack,"Dehydrated Potatoes, Vegetable Oils (Sunflower...","[Dehydrated, Potatoes, Vegetable, Oils, Sunflo...",False
2,Doritos Chilli Heatwave Tortilla Chips,Chilli Heatwave Flavour Corn Chips,"Corn (Maize), Vegetable Oils (Corn, Sunflower,...","[Corn, Maize, Vegetable, Oils, Corn, Sunflower...",False
3,Walkers Max Flamin Hot Crisps,Fiercely Flamin' Hot Flavour Ridged Potato Crisps,"Potatoes, Vegetable Oils (Sunflower, Rapeseed,...","[Potatoes, Vegetable, Oils, Sunflower, Rapesee...",True
4,Smiths Frazzles Bacon Snacks,Crispy Bacon Flavour Corn Snack,"Maize, Rapeseed Oil, Bacon Flavour Seasoning [...","[Maize, Rapeseed, Oil, Bacon, Flavour, Seasoni...",True


### Prepare a final shopping list of vegan products
 - Aggregate the products into vegan and non-vegan
 - Filter out the non-vegan products

In [8]:
vegan_product = df.groupby(["Non_vegan"])[["Product", "Description"]].get_group(False)
vegan_product = vegan_product.reset_index(drop=True)
vegan_product

Unnamed: 0,Product,Description
0,Tesco Bacon Rashers Snacks,"Bacon flavour baked snacks made with maize, ri..."
1,Pringles Bbq,Texas Barbecue Sauce Flavour Savoury Snack
2,Doritos Chilli Heatwave Tortilla Chips,Chilli Heatwave Flavour Corn Chips
3,Greggs Glazed Ring Doughnuts,A ring doughnut topped with fondant icing.
4,Co-op Bakery 5 Jam Ball Doughnuts,Jam Doughnut 5s
5,Oreos Original Vanilla,Chocolate Flavour Sandwich Biscuits with a Van...
6,Lotus Biscoff Sandwich Original Cream,Caramelised sandwich biscuits with a biscoff c...
7,Tesco Dark Chocolate Digestives,Digestive biscuits half coated in dark chocolate
8,Jammie Dodgers Jam Biscuits,Shortcake biscuits with a raspberry flavoured ...
9,Cadbury Bourneville Chocolate Fingers,Crisp biscuits covered with dark chocolate (48...


### Prepare a list explaining why the other products aren't vegan
- Keep only non-vegan products
- Duplicate the keywords field
- Rows to columns pivot the keywords using the duplicate as a header.
- Write a calculation to concatenate all the keywords into a single comma-separated list for each product, e.g. "whey, milk, egg".

In [9]:
non_vegan_product = df.groupby(["Non_vegan"])[["Product", "Description", "Ingredients_list"]].get_group(True)
non_vegan_product.shape

(19, 3)

In [10]:
# Create the function for matching Product Ingredients to Non_vegan ingredients
def contain_non_vegan_ingredients(ser_):
    results = []
    for s in ser_:
        if s in total_ingredients:
            results.append(s)
        else: pass
    return set(results)

In [11]:
non_vegan_product["Contains"] = non_vegan_product["Ingredients_list"].map(lambda x: contain_non_vegan_ingredients(x))
non_vegan_product["Contains"] = non_vegan_product["Contains"].str.join(",")
non_vegan_product = non_vegan_product.drop("Ingredients_list", axis=1)
non_vegan_product = non_vegan_product.reset_index(drop=True)
non_vegan_product

Unnamed: 0,Product,Description,Contains
0,Walkers Max Flamin Hot Crisps,Fiercely Flamin' Hot Flavour Ridged Potato Crisps,"Milk,Whey"
1,Smiths Frazzles Bacon Snacks,Crispy Bacon Flavour Corn Snack,"Milk,Lactose,Whey"
2,Sensations Thai Sweet Chilli,Thai Sweet Chilli Flavour Potato Crisps,Milk
3,Tesco 5 Pack Jam Doughnuts,Jam Doughnut 5PK,Milk
4,Krispy Kreme Original Glazed Doughnuts,Bring some light and fluffy joy into your day ...,"Milk,Egg"
5,Mcvities Dark Chocolate Digestives,Wheatmeal Biscuits Covered in Plain Chocolate,Milk
6,Tesco Jam Sandwich Creams Biscuit,Shortcake biscuits sandwiched with a vanilla f...,Milk
7,Cadbury Chocolate Fingers,Crisp Biscuits Covered with Cadbury Milk Choco...,Milk
8,Lindt Lindor 60% Dark Chocolate Truffles,Extra dark chocolate with a smooth melting fil...,"Milk,Lactose"
9,Lindt Excellence Mint Intense Dark Chocolate,Fine dark chocolate with an intense taste of mint,Milk


In [12]:
vegan_product.to_csv("./output/Week7_output_1.csv")
non_vegan_product.to_csv("./output/Week7_output_2.csv")