# Examples of Split-Apply Combine

In [14]:
#@markdown First, load all our datasets. *Run this cell*
import pandas as pd

# Example 1: Restaurant Revenue
data1 = {'Restaurant': ['A', 'A', 'A', 'B', 'B', 'B'],
         'DayOfWeek': ['Monday', 'Monday', 'Tuesday', 'Monday', 'Tuesday', 'Tuesday'],
         'Transactions': [10.00, 11.00, 9.00, 9.00, 8.50, 5.00]}
df1 = pd.DataFrame(data1)

# Example 2: Test Scores by Class
data2 = {'Student': ['Alice', 'Bob', 'Alice', 'Bob'],
         'Class': ['Math', 'Math', 'Sci', 'Sci'],
         'TestScore': [90, 85, 95, 82]}
df2 = pd.DataFrame(data2)

# Example 3: Sales by Region
data3 = {'Salesperson': ['John', 'Jane', 'Mark', 'Mary'],
         'Region': ['East', 'East', 'West', 'West'],
         'Sales': [3000, 2500, 2000, 2200]}
df3 = pd.DataFrame(data3)

# Example 4: Daily Web Traffic
data4 = {'Date': ['2023-04-01', '2023-04-01', '2023-04-02', '2023-04-02'],
         'Page': ['Home', 'Contact', 'Home', 'Contact'],
         'Visits': [1000, 200, 1200, 250]}
df4 = pd.DataFrame(data4)

# Example 5: Car Sales by Brand
data5 = {'Brand': ['Ford', 'Ford', 'BMW', 'BMW'],
         'Model': ['F-150', 'Focus', 'X5', '3'],
         'UnitsSold': [10000, 8000, 5000, 7000]}
df5 = pd.DataFrame(data5)

# Example 6: Library Book Checkouts
data6 = {'PatronID': [1, 2, 1, 3],
         'BookTitle': ['To Kill a Mockingbird', 'The Catcher in the Rye', 'Pride and Prejudice', 'To Kill a Mockingbird'],
         'CheckoutDate': ['2023-04-01', '2023-04-01', '2023-04-02', '2023-04-03']}
df6 = pd.DataFrame(data6)

# Example 7: Library Events by Age Group
data7 = {'EventName': ['Storytime', 'Coding Club', 'Book Club', 'Storytime'],
         'AgeGroup': ['0-5', '6-12', '13-18', '0-5'],
         'Attendees': [15, 20, 10, 18]}
df7 = pd.DataFrame(data7)

# Example 8: Library Patrons by Membership Type
data8 = {'PatronID': [1, 2, 3, 4],
         'MembershipType': ['Adult', 'Child', 'Adult', 'Child'],
         'BooksCheckedOut': [5, 3, 7, 2]}
df8 = pd.DataFrame(data8)

## Example 1: Restaurant Revenue

Here's our dummy data:

In [15]:
df1

Unnamed: 0,Restaurant,DayOfWeek,Transactions
0,A,Monday,10.0
1,A,Monday,11.0
2,A,Tuesday,9.0
3,B,Monday,9.0
4,B,Tuesday,8.5
5,B,Tuesday,5.0



**Research Question (RQ)**: What is the total revenue per day of the week for each restaurant?

**Tip**: Splitting the data is like stacking all the receipts for each restaurant for each day in different piles, applying would add the receipts for all the stacks, and combining is like writting down the total revenues in a day/restaurant/revenue ledger.

## Example 2: Test Scores by Class

Here's our dummy data:

In [4]:
df2

Unnamed: 0,Student,Class,TestScore
0,Alice,Math,90
1,Bob,Math,85
2,Alice,Sci,95
3,Bob,Sci,82


**RQ**: What is the average test score for each class?

**Tip**: Splitting the data is like sorting a stack of papers by subject. Applying the function is like calculating the average grade for each subject, and combining is assembling those averages into a report card.

## Example 3: Sales by Region

Here's our dummy data:


In [5]:
df3

Unnamed: 0,Salesperson,Region,Sales
0,John,East,3000
1,Jane,East,2500
2,Mark,West,2000
3,Mary,West,2200


**RQ**: What is the total sales amount for each region?

**Tip**: Splitting the data is like dividing a map into regions. Applying the function is like summing up the sales within each region, and combining is creating a summary table showing sales per region.

## Example 4: Daily Web Traffic

Here's our dummy data:

In [6]:
df4

Unnamed: 0,Date,Page,Visits
0,2023-04-01,Home,1000
1,2023-04-01,Contact,200
2,2023-04-02,Home,1200
3,2023-04-02,Contact,250


**RQ**: What is the maximum number of daily visits each page has had?

**Tip**: Splitting the data is like organizing daily visit counts by page. Applying the function is like calculating the minimum for each page group, and combining is presenting the minimums all in a table.

## Example 5: Car Sales by Brand

Here's our dummy data:

In [8]:
df5

Unnamed: 0,Brand,Model,UnitsSold
0,Ford,F-150,10000
1,Ford,Focus,8000
2,BMW,X5,5000
3,BMW,3,7000


**RQ**: What is the total number of units sold for each car brand?

**Tip**: Splitting the data is like organizing cars in a parking lot by their brand. Applying the function is like counting the number of cars for each brand, and combining is displaying the total number of cars per brand in a table.

## Example 6: Library Book Checkouts

Here's our dummy data:

In [9]:
df6

Unnamed: 0,PatronID,BookTitle,CheckoutDate
0,1,To Kill a Mockingbird,2023-04-01
1,2,The Catcher in the Rye,2023-04-01
2,1,Pride and Prejudice,2023-04-02
3,3,To Kill a Mockingbird,2023-04-03


**RQ**: What is the total number of checkouts for each book title?

**Tip**: Splitting the data is like arranging books on shelves by their title. Applying the function is like counting the number of times each book was checked out, and combining is creating a table showing the total checkouts per book title.

## Example 7: Library Events by Age Group

Here's our dummy data:

In [10]:
df7

Unnamed: 0,EventName,AgeGroup,Attendees
0,Storytime,0-5,15
1,Coding Club,6-12,20
2,Book Club,13-18,10
3,Storytime,0-5,18


**RQ**: What is the average number of attendees per event for each age group?

**Tip**: Splitting the data is like sorting event flyers into piles based on age group. Applying the function is like calculating the average number of attendees for each event in each age group, and combining is presenting the averages in a table.

## Example 8: Library Patrons by Membership Type

Here's our dummy data:

In [7]:
df8

Unnamed: 0,PatronID,MembershipType,BooksCheckedOut
0,1,Adult,5
1,2,Child,3
2,3,Adult,7
3,4,Child,2


**RQ**: What is the average number of books checked out per membership type?

**Tip**: Splitting the data is like organizing library cards by membership type (e.g., Adult, Child). Applying the function is like calculating the average number of books checked out for each membership type, and combining is displaying the averages in a table.