In [174]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Tidy Tuesday: Recipe
**September 16, 2025**

Goals for today:
* Which authors are most successful: who is most prolific, who has the highest average ratings or popularity, and do top authors specialize by cuisine, ingredient, or recipe length?
* Is there a relationship between prep/cook time and average rating?
* Which recipe categories or cuisines tend to have the highest average ratings and review counts?
* Which recipes are the most "actionable" — high rating with low total time?

## Data Preparation

In [175]:
# Load in datasets
all_recipes = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-16/all_recipes.csv')
cuisines = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-16/cuisines.csv')

In [176]:
# Comment out to view data
#all_recipes
#cuisines

For simplicity, I will combine the 2 datasets together.

In [177]:
cuisines2 = cuisines # Create a copy of the original cuisines
cuisines2 = cuisines2.drop(['country', 'url'], axis =1) # Remove unneeded cols

all_recipes2 = all_recipes.drop(['url'], axis=1) # Create a copy and remove unneeded col

Append / concatenate the two dataframes.

In [178]:
complete = pd.concat([all_recipes2, cuisines2]) # Concatenate
complete = complete.drop_duplicates() # Drop the 953 duplicates
complete.shape

(15593, 15)

We get **15,593 total recipes** in the `complete` dataset.

## Data Analysis
Answering the questions.

### Authors

In [179]:
complete.author.describe()

count              15593
unique              8982
top       John Mitzewich
freq                 716
Name: author, dtype: object

In [180]:
complete.author.value_counts()

author
John Mitzewich       716
Nicole McLaughlin    274
TheDailyGourmet      149
Allrecipes Member    147
Kim                  138
                    ... 
Diana Penning          1
iheartcooking          1
Teri Lynn              1
KSU_brett              1
NIKKIJM                1
Name: count, Length: 8982, dtype: int64

John Mitzewich has the most recipes by far.\
There are 8,981 other authors listed. For evaluating the ratings and popularities of authors, I will ignore the authos that have posted less than 10 reviews.

In [193]:
print('There are ', sum(complete.author.value_counts() < 10), 'authors who have posted less than 10 recipes')
print(sum(complete.author.value_counts()>= 10), 'authors have posted 10 or more recipes.')

There are  8890 authors who have posted less than 10 recipes
92 authors have posted 10 or more recipes.


In [218]:
names = complete.author.value_counts().rename_axis('author').reset_index(name='recipe count') # Convert value_counts to dataframe
less10 = names[names['recipe count']<10].reset_index().drop('index', axis=1) # Mask to just the authors with less than 10
less10

Unnamed: 0,author,recipe count
0,Lisawas,9
1,Brenda,9
2,Amanda,9
3,Kevin Ryan,9
4,Jessica,9
...,...,...
8885,Diana Penning,1
8886,iheartcooking,1
8887,Teri Lynn,1
8888,KSU_brett,1


In [219]:
less10.loc[1]

author          Brenda
recipe count         9
Name: 1, dtype: object