# Do Longer Recipes Get Higher Ratings?

**Name(s)**: Casey So and Keilani Li

**Website Link**: https://keil4ni.github.io/recipe-analysis/

In [2]:
import pandas as pd
import numpy as np
from pathlib import Path

import plotly.express as px
pd.options.plotting.backend = 'plotly'

# from dsc80_utils import * # Feel free to uncomment and use this.

## Step 1: Introduction

When looking for a recipe online, one of the first things people notice besides from the ingredients is how long it takes to cook. Some users are looking for quick meals they can prepare in under 30 minutes, while others are willing to invest time in more complex dishes. But does the time required to cook a recipe actually affect how well it's rated?

This project explores the connection between cooking time and user ratings of recipes. The goal is to find out whether recipes that take longer to make tend to receive better ratings, or if users prefer faster, simpler options. To do this, we will be working with a dataset of recipes that includes details like total cooking time, ingredients, steps, and user ratings.

By analyzing these variables, we want to see if there's a pattern, do people reward effort with higher ratings, or do they value convenience more? The results might help explain what makes a recipe more appealing to home cooks, and whether time investment is actually reflected in how satisfied users are with the outcome.

## Data Sets


We are analyzing two datasets from Food.com, containing recipes and user ratings posted between 2008 and 2018. These datasets were originally compiled for a research paper on recommender systems titled "Generating Personalized Recipes from Historical User Preferences" by Majumder et al.

The first dataset, called recipes, includes 83,782 entries, each representing a unique recipe. It contains 10 columns that capture various attributes of each recipe, such as:

      Column             | Description
      -------------------|------------------
      'name'	     | Recipe name
      'id'	             | Recipe ID
      'minutes'          | Minutes to prepare recipe
      'contributor_id'   | User ID who submitted this recipe
      'submitted'        | Date recipe was submitted
      'tags'             | Food.com tags for recipe
      'nutrition'	     | Nutrition information in the form
                         | [calories (#), total fat (PDV), sugar (PDV), sodium (PDV), protein (PDV),
                         | saturated fat (PDV), carbohydrates (PDV)];
                         | PDV stands for “percentage of daily value"
      'n_steps'	     | Number of steps in recipe
      'steps'            | Text for recipe steps, in order
      'description'	     | User-provided description
      'ingredients'	     | Text for recipe ingredients
      'n_ingredients'    | Number of ingredients in recipe

The second dataset, interactions, contains 731,927 entries, with each row representing a user's interaction with a specific recipe—typically a review or rating. This dataset helps capture user preferences and engagement over time. The columns included are:

      Column             | Description
      -------------------|------------------
      'user_id'	     | User ID
      'recipe_id'	     | Recipe ID
      'date'	     | Date of interaction
      'rating'	     | Rating given
      'review'	     | Review text

'name'	Recipe name
'id'	Recipe ID
'minutes'	Minutes to prepare recipe
'contributor_id'	User ID who submitted this recipe
'submitted'	Date recipe was submitted
'tags'	Food.com tags for recipe
'nutrition'	Nutrition information in the form [calories (#), total fat (PDV), sugar (PDV), sodium (PDV), protein (PDV), saturated fat (PDV), carbohydrates (PDV)]; PDV stands for “percentage of daily value”
'n_steps'	Number of steps in recipe
'steps'	Text for recipe steps, in order
'description'	User-provided description
'ingredients'	Text for recipe ingredients
'n_ingredients'	Number of ingredients in recipe

In [3]:
# TODO

## Step 2: Data Cleaning and Exploratory Data Analysis

In [26]:
# read in recipes df
# recipes_path = Path('data') / 'RAW_recipes.csv'

# uncomment if using colab
recipes_path = Path('/content/RAW_recipes.csv')

recipes = pd.read_csv(recipes_path)
recipes.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
0,1 brownies in the world best ever,333281,40,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9
1,1 in canada chocolate chip cookies,453467,45,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11
2,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9
3,millionaire pound cake,286009,120,461724,2008-02-12,"['time-to-make', 'course', 'cuisine', 'prepara...","[878.3, 63.0, 326.0, 13.0, 20.0, 123.0, 39.0]",7,"['freheat the oven to 300 degrees', 'grease a ...",why a millionaire pound cake? because it's su...,"['butter', 'sugar', 'eggs', 'all-purpose flour...",7
4,2000 meatloaf,475785,90,2202916,2012-03-06,"['time-to-make', 'course', 'main-ingredient', ...","[267.0, 30.0, 12.0, 12.0, 29.0, 48.0, 2.0]",17,"['pan fry bacon , and set aside on a paper tow...","ready, set, cook! special edition contest entr...","['meatloaf mixture', 'unsmoked bacon', 'goat c...",13


In [27]:
# read in interactions df
# interactions_path = Path('data') / 'interactions.csv'

# uncomment if using colab
interactions_path = Path('/content/interactions.csv')

interactions = pd.read_csv(interactions_path)
interactions.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
1,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
2,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."
3,124416,120345,2011-08-06,0,"Just an observation, so I will not rate. I fo..."
4,2000192946,120345,2015-05-10,2,This recipe was OVERLY too sweet. I would sta...


In [28]:
# merge recipes + interactions dfs
df = recipes.merge(interactions, how = 'left', left_on = 'id', right_on = 'recipe_id')
df.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,user_id,recipe_id,date,rating,review
0,1 brownies in the world best ever,333281,40,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,386585.0,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba..."
1,1 in canada chocolate chip cookies,453467,45,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,424680.0,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...
2,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,29782.0,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...
3,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,1196280.0,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...
4,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,768828.0,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...


In [8]:
# num of (rows, cols) after merging
# df.shape
# (234429, 17)

# num of nans before replacement
# df[df['rating'].isnull()].shape
# (1, 17)

# num of 0 ratings before replacement
# df[df['rating'] == 0.0].shape
# (15035, 17)

In [9]:
# fill all 0 ratings w np.nan
df = df.replace(0.0, np.nan)
df.head()

# num of nans after replacement
# df[df['rating'].isnull()].shape
# (15036, 17)

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,user_id,recipe_id,date,rating,review
0,1 brownies in the world best ever,333281,40.0,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,386585.0,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba..."
1,1 in canada chocolate chip cookies,453467,45.0,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,424680.0,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...
2,412 broccoli casserole,306168,40.0,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,29782.0,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...
3,412 broccoli casserole,306168,40.0,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,1196280.0,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...
4,412 broccoli casserole,306168,40.0,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,768828.0,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...


In [10]:
# find avg rating per recipe
df['avg_rating'] = df.groupby('id')['rating'].transform('mean')
df.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,user_id,recipe_id,date,rating,review,avg_rating
0,1 brownies in the world best ever,333281,40.0,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,386585.0,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba...",4.0
1,1 in canada chocolate chip cookies,453467,45.0,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,424680.0,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...,5.0
2,412 broccoli casserole,306168,40.0,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,29782.0,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...,5.0
3,412 broccoli casserole,306168,40.0,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,1196280.0,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...,5.0
4,412 broccoli casserole,306168,40.0,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,768828.0,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...,5.0


In [11]:
# check if avg rating is correct for any recipe
# df[(df['id'] == 306168)]
# yes!

In [12]:
df.columns

Index(['name', 'id', 'minutes', 'contributor_id', 'submitted', 'tags',
       'nutrition', 'n_steps', 'steps', 'description', 'ingredients',
       'n_ingredients', 'user_id', 'recipe_id', 'date', 'rating', 'review',
       'avg_rating'],
      dtype='object')

In [13]:
# df[['id', 'recipe_id']]

# drop id bc its a dupe of recipe_id col. recipe_id is a more specific col name
# drop contributor_id bc it's unique, doesn't contrib to our analysis
# drop user_id bc we aren't looking at who commented
# drop date bc we aren't looking at when the comment was posted

df = df.drop(columns = ['id', 'contributor_id', 'user_id', 'date'])

In [14]:
# reorder cols for better readability
df[['recipe_id', 'name', 'minutes', 'submitted', 'tags', 'nutrition', 'n_steps', 'steps', 'description', 'ingredients', 'n_ingredients', 'rating', 'review', 'avg_rating']]

# shape after dropping & reordering: (234429, 14)

Unnamed: 0,recipe_id,name,minutes,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,rating,review,avg_rating
0,333281.0,1 brownies in the world best ever,40.0,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,4.0,"These were pretty good, but took forever to ba...",4.0
1,453467.0,1 in canada chocolate chip cookies,45.0,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,5.0,Originally I was gonna cut the recipe in half ...,5.0
2,306168.0,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,5.0,This was one of the best broccoli casseroles t...,5.0
3,306168.0,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,5.0,I made this for my son's first birthday party ...,5.0
4,306168.0,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,5.0,Loved this. Be sure to completely thaw the br...,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234424,308080.0,zydeco ya ya deviled eggs,40.0,2008-06-07,"['60-minutes-or-less', 'time-to-make', 'course...","[59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0]",7,"['in a bowl , combine the mashed yolks and may...","deviled eggs, cajun-style","['hard-cooked eggs', 'mayonnaise', 'dijon must...",8,5.0,These were very good. I meant to add some jala...,5.0
234425,298512.0,cookies by design cookies on a stick,29.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0]",9,['place melted butter in a large mixing bowl a...,"i've heard of the 'cookies by design' company,...","['butter', 'eagle brand condensed milk', 'ligh...",10,1.0,I would rate this a zero if I could. I followe...,1.0
234426,298509.0,cookies by design sugar shortbread cookies,20.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0]",5,"['whip sugar and shortening in a large bowl , ...","i've heard of the 'cookies by design' company,...","['granulated sugar', 'shortening', 'eggs', 'fl...",7,1.0,This recipe tastes nothing like the Cookies by...,3.0
234427,298509.0,cookies by design sugar shortbread cookies,20.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0]",5,"['whip sugar and shortening in a large bowl , ...","i've heard of the 'cookies by design' company,...","['granulated sugar', 'shortening', 'eggs', 'fl...",7,5.0,"yummy cookies, i love this recipe me and my sm...",3.0


In [15]:
df[['minutes']].sort_values(by = 'minutes', ascending = False)

Unnamed: 0,minutes
109932,1051200.0
109931,1051200.0
106700,288000.0
107394,259205.0
107395,129600.0
...,...
147501,1.0
147383,1.0
223950,
223951,


In [16]:
# investigating the first 2 longest recipes
print(df.iloc[109931])
df.iloc[109932]

# how to preserve a husband recipe

name                                     how to preserve a husband
minutes                                                  1051200.0
submitted                                               2011-02-01
tags             ['time-to-make', 'course', 'preparation', 'for...
nutrition                [407.4, 57.0, 50.0, 1.0, 7.0, 115.0, 5.0]
n_steps                                                          9
steps            ['be careful in your selection', "don't choose...
description      found this in a local wyoming cookbook "a coll...
ingredients                                     ['cream', 'peach']
n_ingredients                                                    2
recipe_id                                                 447963.0
rating                                                         5.0
review           I'd thought that I would like to keep mine in ...
avg_rating                                                     5.0
Name: 109931, dtype: object


Unnamed: 0,109932
name,how to preserve a husband
minutes,1051200.0
submitted,2011-02-01
tags,"['time-to-make', 'course', 'preparation', 'for..."
nutrition,"[407.4, 57.0, 50.0, 1.0, 7.0, 115.0, 5.0]"
n_steps,9
steps,"['be careful in your selection', ""don't choose..."
description,"found this in a local wyoming cookbook ""a coll..."
ingredients,"['cream', 'peach']"
n_ingredients,2


In [17]:
# investigating third longest recipe
df.iloc[106700]

# homemade fruit liquers

Unnamed: 0,106700
name,homemade fruit liquers
minutes,288000.0
submitted,2008-03-12
tags,"['time-to-make', 'course', 'main-ingredient', ..."
nutrition,"[836.2, 0.0, 333.0, 0.0, 0.0, 0.0, 27.0]"
n_steps,12
steps,"['rinse the fruit or berries , fruit must be c..."
description,this should be a nice easy project for those w...
ingredients,"['berries', 'vodka', 'granulated sugar']"
n_ingredients,3


In [33]:
df = df.drop([109931, 109932])
df.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,user_id,recipe_id,date,rating,review
0,1 brownies in the world best ever,333281,40,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,386585.0,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba..."
1,1 in canada chocolate chip cookies,453467,45,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,424680.0,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...
2,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,29782.0,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...
3,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,1196280.0,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...
4,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,768828.0,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...


### Univariant Analysis

In [36]:
'''
univar analysis
look at rating distribution, histogram
'''
fig = px.histogram(df, x = 'rating')
title = 'Distribution of Recipe Ratings'
fig.update_layout(title = title)
fig.show()

In [43]:
'''
univar analysis
look at minutes distribution, histogram
'''

bins = [0, 360, 720, 1080, 1440]  # 0–6hr, 6–12hr, 12–18hr, 18–24hr
labels = ['0–6 hrs', '6–12 hrs', '12–18 hrs', '18–24 hrs']

# bin data
filter_minutes = df.copy()
filter_minutes['minutes_to_day'] = pd.cut(filter_minutes['minutes'], bins = bins, labels = labels)

fig2 = px.histogram(filter_minutes,
                    x = 'minutes_to_day',
                    labels = {'minutes_to_day': 'recipe time length'},
                    title = 'Recipe Count based on Length of Recipe')
fig2.show()

### Bivariant Analysis

In [40]:
'''
bivar analysi
'''
# Scatterplot version -KC
fig3 = px.scatter(df, x='minutes', y='rating',
                 title='Scatterplot of Cooking Time vs. Rating',
                 labels={'minutes': 'Cooking Time (minutes)', 'rating': 'User Rating'},
                 opacity=0.4)
fig3.show()

In [42]:
# Boxplot version (not sure which one is better you pick -KC)
df['time_bin'] = pd.cut(df['minutes'], bins=[0, 15, 30, 60, 120, 9999],
                        labels=['<15', '15–30', '30–60', '60–120', '120+'])

fig4 = px.box(df, x='time_bin', y='rating',
             title='Recipe Rating by Cooking Time Range',
             labels={'time_bin': 'Cooking Time (Minutes)', 'rating': 'User Rating'},
             category_orders={'time_bin': ['<15', '15–30', '30–60', '60–120', '120+']})
fig4.show()

## Step 3: Assessment of Missingness

In [22]:
# Looking at the missingness in each row
missing_recipes = df.isnull().sum().sort_values(ascending=False)
missing_recipes

Unnamed: 0,0
rating,15036
avg_rating,2777
description,114
review,58
minutes,3
name,1
recipe_id,1
nutrition,0
tags,0
submitted,0


Not every recipe in the dataset has a rating, so it’s important to think about why some ratings are missing.

In theory, there are a few possible reasons:

- Completely random (MCAR): like a glitch that caused some ratings not to be recorded. That’s unlikely here.

- Related to other stuff we can see (MAR): for example, maybe people are less likely to rate really long or complicated recipes.

- Related to the rating itself (NMAR): like someone trying a recipe, not liking it, and deciding not to leave a review.

- Missing by design: where it is a choice to not have a rating

In this case, it’s most likely that the missing ratings are Not Missing At Random (NMAR). People are generally more likely to leave a review if they either loved a recipe or really disliked it. If a recipe was just okay or they didn’t finish making it, they might not rate it at all. So the missing ratings might actually reflect lower satisfaction, we just don’t see it and are unable to confirm it.

### Handling Missingness

Since our project will be using ratings, we will be removing the recipes that do not have a rating. This will reduce the missingness in average rating and prevent our data from being skewed.

In [25]:
# Removing rows that are missing
df = df.dropna(subset=['rating'])
df

Unnamed: 0,name,minutes,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,recipe_id,rating,review,avg_rating
0,1 brownies in the world best ever,40.0,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,333281.0,4.0,"These were pretty good, but took forever to ba...",4.0
1,1 in canada chocolate chip cookies,45.0,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,453467.0,5.0,Originally I was gonna cut the recipe in half ...,5.0
2,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,306168.0,5.0,This was one of the best broccoli casseroles t...,5.0
3,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,306168.0,5.0,I made this for my son's first birthday party ...,5.0
4,412 broccoli casserole,40.0,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,306168.0,5.0,Loved this. Be sure to completely thaw the br...,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234423,zydeco ya ya deviled eggs,40.0,2008-06-07,"['60-minutes-or-less', 'time-to-make', 'course...","[59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0]",7,"['in a bowl , combine the mashed yolks and may...","deviled eggs, cajun-style","['hard-cooked eggs', 'mayonnaise', 'dijon must...",8,308080.0,5.0,"I halved the recipe, and they turned out great...",5.0
234424,zydeco ya ya deviled eggs,40.0,2008-06-07,"['60-minutes-or-less', 'time-to-make', 'course...","[59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0]",7,"['in a bowl , combine the mashed yolks and may...","deviled eggs, cajun-style","['hard-cooked eggs', 'mayonnaise', 'dijon must...",8,308080.0,5.0,These were very good. I meant to add some jala...,5.0
234425,cookies by design cookies on a stick,29.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0]",9,['place melted butter in a large mixing bowl a...,"i've heard of the 'cookies by design' company,...","['butter', 'eagle brand condensed milk', 'ligh...",10,298512.0,1.0,I would rate this a zero if I could. I followe...,1.0
234426,cookies by design sugar shortbread cookies,20.0,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0]",5,"['whip sugar and shortening in a large bowl , ...","i've heard of the 'cookies by design' company,...","['granulated sugar', 'shortening', 'eggs', 'fl...",7,298509.0,1.0,This recipe tastes nothing like the Cookies by...,3.0


## Step 4: Hypothesis Testing

In [None]:
# TODO

## Step 5: Framing a Prediction Problem

In [None]:
# TODO

## Step 6: Baseline Model

In [None]:
# TODO

## Step 7: Final Model

In [None]:
# TODO

## Step 8: Fairness Analysis

In [None]:
# TODO