When teaching at The Data School, I love it when people ask a question that spurs a Preppin' Data challenge. 

This week we'll use the same input dataset as we first used in the early weeks of 2022. This time we're looking at adding in the school term dates to further the initial birthday cake challenge. If you want to go back and do the original challenge then there is a link here.

This challenge is based on the School Year dates of schools in the county of Essex in the UK. 'Terms' describe when children go to school.

### Inputs
There are two inputs this week. One dates back to the 2022 challenge and a new data set to enable you to complete the challenge:

2022 Week 2 Input
2024 Week 33 Input

### Requirements
 - Determine each student's birthday for this school year
 - Input the data sets 
 - The school year starts on 1st September and ends on 31st August the following year
 - Create a Cake Day field that shows when the school needs to buy each student a cake to celebrate their birthday. The Cake Day rules are:
    -   If you birthday falls on a school day, you will receive cake on that day
    -   If your birthday falls on a Saturday or Sunday, you will receive a cake on the Friday before that weekend
    -   You might notice the 1st Sept is a weekend day so those kids will have received their birthday cake in the previous school year.
    -   If your birthday falls on a day during a holiday, you will receive your cake on the last Friday in the previous school term
 - Count how many cakes are needed for each school day and what day of the week that day is. 
 - Output the data set

### Output
3 fields:
Cake Needed On Date
Cake Weekday
Count of Cakes
186 rows (187 rows incl. header)
You can view the output here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter or LinkedIn using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 



In [1]:
import pandas as pd
import numpy as np
import os
import datetime as dt

In [2]:
# Input data
student_df = pd.read_csv('PD 2022 Wk 1 Input.csv')
term_df = pd.read_csv('PD 2024 Wk 33 Input.csv')

In [3]:
# Students clean df
student_df_copy=student_df.copy()
keep_clm=['id', 'Date of Birth']
student_df_copy=student_df_copy[keep_clm]
student_df_copy.head()
student_df_copy.columns=[i.strip().lower().replace(' ','_') for i in student_df_copy.columns]
student_df_copy.date_of_birth=pd.to_datetime(student_df_copy.date_of_birth)

In [4]:

# Solve for school year Start 01 Sep 2024 end 31 Aug 2025
reference_date = pd.to_datetime('2024-09-01')

# Function to calculate the next birthday after the reference date
def next_birthday(dob, ref_date):
    # Extract month and day
    month_day = dob.replace(year=ref_date.year)
    
    # If the birthday this year is before or on the reference date, use next year
    if month_day < ref_date:
        month_day = month_day.replace(year=ref_date.year + 1)
    
    return month_day

# Apply the function to calculate the next birthday
student_df_copy['date_of_birth'] = student_df_copy['date_of_birth'].apply(next_birthday, ref_date=reference_date)

In [5]:

# Get DOW
student_df_copy['dow']=student_df_copy.date_of_birth.dt.day_name()

# Solve for weekends
student_df_copy['date_of_birth']=np.where(student_df_copy['dow']=="Saturday",
         student_df_copy.date_of_birth - pd.Timedelta(days=1),
            np.where(student_df_copy['dow']=="Sunday",
            student_df_copy.date_of_birth - pd.Timedelta(days=2),
           student_df_copy.date_of_birth))

# Update DOW
student_df_copy['dow']=student_df_copy.date_of_birth.dt.day_name()

student_df_copy=student_df_copy.sort_values('date_of_birth').reset_index(drop=True)

# Solve for students that are outide the school year
student_df_copy=student_df_copy[(student_df_copy.date_of_birth>=reference_date)&
                (student_df_copy.date_of_birth.dt.date<=dt.date(2025,8,31))].reset_index(drop=True)


In [6]:
# Term clean df
term_df_copy=term_df.copy()
term_df_copy.columns = [i.lower().strip().replace(' ','_') for i in term_df_copy.columns]

term_df_copy.starts=pd.to_datetime(term_df_copy.starts)
term_df_copy.ends=pd.to_datetime(term_df_copy.ends)

# Add start date of next term
term_df_copy['start_of_semester']=term_df_copy.starts.shift(-1)


term_df_copy.start_of_semester=term_df_copy.start_of_semester.fillna(dt.date(2025,8,31))
term_df_copy.start_of_semester=pd.to_datetime(term_df_copy.start_of_semester)


In [7]:
# Join DF

# Solve for terms

def evaluate_date(date, terms_df):
    for _, row in terms_df.iterrows():
        # Check if the date falls between start_date_above and ends
        row_ends,row_starts=row['ends'],row['start_of_semester']
        if pd.Timestamp(date)>=row['ends'] and pd.Timestamp(date) < row['start_of_semester']:
            # Return the Friday before the 'ends' date
            date= row['ends'] - pd.DateOffset(days=row['ends'].weekday() + 2)
    # If no matching term is found, return the original date
    return date,row_ends,row_starts


In [9]:

# Apply the function to the dates
student_df_copy['date_of_birth_adjusted']=student_df_copy['date_of_birth'].apply(lambda x: evaluate_date(x, term_df_copy))
# student_df_copy['end_of_semester']=student_df_copy['date_of_birth'].apply(lambda x: evaluate_date(x, term_df_copy)[1])
# student_df_copy['start_of_semester']=student_df_copy['date_of_birth'].apply(lambda x: evaluate_date(x, term_df_copy)[2])

# print(dates_df)



In [10]:
student_df_copy['end_of_semester']=student_df_copy['date_of_birth_adjusted'].apply(lambda x:x[1])
student_df_copy['start_of_semester']=student_df_copy['date_of_birth_adjusted'].apply(lambda x:x[2])
student_df_copy['date_of_birth_adjusted']=student_df_copy['date_of_birth_adjusted'].apply(lambda x:x[0])

In [11]:
student_df_copy[student_df_copy['date_of_birth']!=student_df_copy['date_of_birth_adjusted']]

Unnamed: 0,id,date_of_birth,dow,date_of_birth_adjusted,end_of_semester,start_of_semester
143,282,2024-10-25,Friday,2024-10-19,2025-08-29,2025-08-31
144,41,2024-10-25,Friday,2024-10-19,2025-08-29,2025-08-31
145,718,2024-10-25,Friday,2024-10-19,2025-08-29,2025-08-31
146,955,2024-10-25,Friday,2024-10-19,2025-08-29,2025-08-31
147,213,2024-10-25,Friday,2024-10-19,2025-08-29,2025-08-31
...,...,...,...,...,...,...
993,850,2025-08-29,Friday,2025-08-23,2025-08-29,2025-08-31
994,771,2025-08-29,Friday,2025-08-23,2025-08-29,2025-08-31
995,447,2025-08-29,Friday,2025-08-23,2025-08-29,2025-08-31
996,389,2025-08-29,Friday,2025-08-23,2025-08-29,2025-08-31
