# Do Longer Recipes Get Higher Ratings?

**Name(s)**: Keilani Li, Casey So

**Website Link**: https://keil4ni.github.io/recipe-analysis/

In [21]:
import pandas as pd
import numpy as np
from pathlib import Path

import plotly.express as px
pd.options.plotting.backend = 'plotly'

from dsc80_utils import * # Feel free to uncomment and use this.

## Step 1: Introduction

In [22]:
# TODO

## Step 2: Data Cleaning and Exploratory Data Analysis

In [23]:
# read in recipes df
recipes_path = Path('data') / 'RAW_recipes.csv'
recipes = pd.read_csv(recipes_path)
recipes.head()

Unnamed: 0,name,id,minutes,contributor_id,...,steps,description,ingredients,n_ingredients
0,1 brownies in the world best ever,333281,40,985201,...,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9
1,1 in canada chocolate chip cookies,453467,45,1848091,...,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11
2,412 broccoli casserole,306168,40,50969,...,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9
3,millionaire pound cake,286009,120,461724,...,"['freheat the oven to 300 degrees', 'grease a ...",why a millionaire pound cake? because it's su...,"['butter', 'sugar', 'eggs', 'all-purpose flour...",7
4,2000 meatloaf,475785,90,2202916,...,"['pan fry bacon , and set aside on a paper tow...","ready, set, cook! special edition contest entr...","['meatloaf mixture', 'unsmoked bacon', 'goat c...",13


In [24]:
# read in interactions df
interactions_path = Path('data') / 'interactions.csv'
interactions = pd.read_csv(interactions_path)
interactions.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
1,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
2,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."
3,124416,120345,2011-08-06,0,"Just an observation, so I will not rate. I fo..."
4,2000192946,120345,2015-05-10,2,This recipe was OVERLY too sweet. I would sta...


In [77]:
# merge recipes + interactions dfs
df = recipes.merge(interactions, how = 'left', left_on = 'id', right_on = 'recipe_id')
df.head()

Unnamed: 0,name,id,minutes,contributor_id,...,recipe_id,date,rating,review
0,1 brownies in the world best ever,333281,40,985201,...,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba..."
1,1 in canada chocolate chip cookies,453467,45,1848091,...,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...
2,412 broccoli casserole,306168,40,50969,...,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...
3,412 broccoli casserole,306168,40,50969,...,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...
4,412 broccoli casserole,306168,40,50969,...,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...


In [78]:
# num of (rows, cols)
df.shape

(234429, 17)

In [79]:
# num of nans before replacement
df[df['rating'].isnull()].shape

(1, 17)

In [80]:
# num of 0 ratings before replacement
df[df['rating'] == 0.0].shape

(15035, 17)

In [81]:
# num of nans after replacement
df = df.replace(0.0, np.nan)
df[df['rating'].isnull()].shape

(15036, 17)

In [82]:
# find avg rating per recipe
df['avg_rating'] = df.groupby('id')['rating'].transform('mean')
df

Unnamed: 0,name,id,minutes,contributor_id,...,date,rating,review,avg_rating
0,1 brownies in the world best ever,333281,40.0,985201,...,2008-11-19,4.0,"These were pretty good, but took forever to ba...",4.0
1,1 in canada chocolate chip cookies,453467,45.0,1848091,...,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...,5.0
2,412 broccoli casserole,306168,40.0,50969,...,2008-12-31,5.0,This was one of the best broccoli casseroles t...,5.0
...,...,...,...,...,...,...,...,...,...
234426,cookies by design sugar shortbread cookies,298509,20.0,506822,...,2008-06-19,1.0,This recipe tastes nothing like the Cookies by...,3.0
234427,cookies by design sugar shortbread cookies,298509,20.0,506822,...,2010-02-08,5.0,"yummy cookies, i love this recipe me and my sm...",3.0
234428,cookies by design sugar shortbread cookies,298509,20.0,506822,...,2014-11-01,,I work at a Cookies By Design and can say this...,3.0


In [83]:
# check if avg rating is correct for any recipe
df[(df['id'] == 306168)]

Unnamed: 0,name,id,minutes,contributor_id,...,date,rating,review,avg_rating
2,412 broccoli casserole,306168,40.0,50969,...,2008-12-31,5.0,This was one of the best broccoli casseroles t...,5.0
3,412 broccoli casserole,306168,40.0,50969,...,2009-04-13,5.0,I made this for my son's first birthday party ...,5.0
4,412 broccoli casserole,306168,40.0,50969,...,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...,5.0
5,412 broccoli casserole,306168,40.0,50969,...,2017-10-17,5.0,"5 stars from my husband and son, my toughest c...",5.0


In [84]:
df.columns

Index(['name', 'id', 'minutes', 'contributor_id', 'submitted', 'tags',
       'nutrition', 'n_steps', 'steps', 'description', 'ingredients',
       'n_ingredients', 'user_id', 'recipe_id', 'date', 'rating', 'review',
       'avg_rating'],
      dtype='object')

In [85]:
# df[['id', 'recipe_id']]

# drop id bc its a dupe of recipe_id col. recipe_id is a more specific col name
# drop contributor_id bc it's unique, doesn't contrib to our analysis
df = df.drop(columns = ['id', 'contributor_id'])

In [86]:
df

Unnamed: 0,name,minutes,submitted,tags,...,date,rating,review,avg_rating
0,1 brownies in the world best ever,40.0,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...",...,2008-11-19,4.0,"These were pretty good, but took forever to ba...",4.0
1,1 in canada chocolate chip cookies,45.0,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...",...,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...,5.0
2,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...",...,2008-12-31,5.0,This was one of the best broccoli casseroles t...,5.0
...,...,...,...,...,...,...,...,...,...
234426,cookies by design sugar shortbread cookies,20.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...",...,2008-06-19,1.0,This recipe tastes nothing like the Cookies by...,3.0
234427,cookies by design sugar shortbread cookies,20.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...",...,2010-02-08,5.0,"yummy cookies, i love this recipe me and my sm...",3.0
234428,cookies by design sugar shortbread cookies,20.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...",...,2014-11-01,,I work at a Cookies By Design and can say this...,3.0


In [89]:
'''
should reorder the columns for better readability, put recipe_id near the front
'''

'\nshould reorder the columns for better readability, put recipe_id near the front\n'

In [87]:
# univariate analysis

In [44]:
# bivariate analysis

In [45]:
# interesting aggregates

## Step 3: Assessment of Missingness

In [None]:
# TODO

## Step 4: Hypothesis Testing

In [None]:
# TODO

## Step 5: Framing a Prediction Problem

In [None]:
# TODO

## Step 6: Baseline Model

In [None]:
# TODO

## Step 7: Final Model

In [None]:
# TODO

## Step 8: Fairness Analysis

In [None]:
# TODO