<h1 style="text-align: center">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY MicroProject #09</div>
<span style="">MicroProject: Valentine's Day</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/microproject/09-valentines-day/">https://discovery.cs.illinois.edu/microproject/09-valentines-day/</a></div>
</h1>

<hr style="color: #DD3403;">

# Data Source: The National Retail Federation


NRF (National Retail Federation) is the world's largest retail trade association. Its members include department stores, specialty, discount, catalog, Internet, and independent retailers, chain restaurants, grocery stores, and multi-level marketing companies. NRF has surveyed consumers about how they plan to celebrate Valentine’s Day annually for over a decade. This includes consumer spending, gifts purchased, and more! To learn more about the data visit [NRF's website](https://nrf.com/research-insights/holiday-data-and-trends/valentines-day)

## Importing the NRF Valentine's Day Dataset

The National Retail Federation dataset is included as part of this MicroProject as `valentines_day.csv`.  Import the `valentines_day.csv` into this notebook by reading the CSV into a new DataFrame called `df`:

In [None]:
# Import the Valentine's Day dataset:
...

In [None]:
## == TEST CASES for Data Import ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any errors, you PASSED all test cases!
# - If this cell results in any errors, check your previous cell, make changes, and RE-RUN your code and then this cell.
assert("df" in vars()), "Make sure to name the DataFrame df"
assert(len(df) == 13), "Make sure you read in the correct csv"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

<hr style="color: #DD3403;">

## Part 1: Exploratory Data Analysis

Before doing any analysis, let's explore the data.  Use `df.columns` to display the columns in the dataset:

In [None]:
...

Next, use `df.head()` to explore the first few rows of data:

In [None]:
...

## Puzzle 1.1: Setting a custom index column

By default, pandas will **default the index column to a numeric index starting with 0**.  However, if one value is uniquely descriptive of the entire row, we may want to use that custom index instead.

Since the `Year` column is descriptive of all the data in each row, let's set the `Year` as the index of the DataFrame.  To set an index on a DataFrame, update the DataFrame by using the DataFrame function `df.set_index(index_column_name)` where we replace `index_column_name` with the name of our index.  For example:

```
df = df.set_index("column_name")
```

Set the index to be the `Year` column:

In [None]:
# Set the index of the DataFrame df to the "Year" columns:
...

In [None]:
## == TEST CASES for Puzzle 1.1 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any errors, you PASSED all test cases!
# - If this cell results in any errors, check your previous cell, make changes, and RE-RUN your code and then this cell.
assert(df.iloc[0,0] == "47%"), "Make sure the \"Year\" column is set as the index"
assert(df.shape[1] == 22), "Make sure the \"Year\" column is set as the index"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

## Puzzle 1.2: Finding the Total Amount Spent Each Year

Using your list of column names you found earlier, calculate the total amount spent on valentines day each year by summing up all columns that end in the word "Total" (ex: "Candy Total", "Flowers Total", etc).

Add this total spending per year data to a new column called `Total Spending` in your DataFrame `df`.

In [None]:
df["Total Spending"] = ...

## Puzzle 1.3: Finding the year where the most money was spent

Using the `"Total Spending"` column you just made, find the year that had the largest spending and store it in a DataFrame called, `df_largest_total`

In [None]:
df_largest_total = ...
df_largest_total

In [None]:
## == TEST CASES for Puzzle 1.2 and 1.3==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any errors, you PASSED all test cases!
# - If this cell results in any errors, check your previous cell, make changes, and RE-RUN your code and then this cell.

assert('Total Spending' in df), "Make sure you've named the Total Spending column properly and added it to the dataframe"
assert(df['Total Spending'].iloc[0] == 12700000000), "Double check the values of your Total Spending column"
assert(df_largest_total.iloc[0,0] == "56%"), "Make sure you correctly selected the row with the largest total spending"
assert(df_largest_total.index[0] == 2022), "Make sure you correctly selected the row with the largest total spending"
## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

<hr style="color: #DD3403;">

# Part 2: Descriptive Statistics

Calculate the **mean** for the total spent on flowers into the variable `flowers_mean`

In [None]:
flowers_mean = ...
flowers_mean

Calculate the **mode** for the total spent on clothing into the variable `clothing_mode`

In [None]:
clothing_mode = ...
clothing_mode

Calculate the **median** for the total spent on candy into the variable `candy_median`

In [None]:
candy_median = ...
candy_median

Calculate the **standard deviation** for the total spent on jewelry and save into the variable `jewelry_std`

In [None]:
jewelry_std = ...
jewelry_std

In [None]:
## == TEST CASES for Puzzle 1.2 and 1.3==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any errors, you PASSED all test cases!
# - If this cell results in any errors, check your previous cell, make changes, and RE-RUN your code and then this cell.
import math

assert(flowers_mean == df["Flowers Total"].sum() / df["Flowers Total"].count()), "Make sure you're using the correct formula to calculate the mean"
assert(clothing_mode[2] == df["Clothing Total"].mode()[2]), "Make sure you're using the correct formula to calculate the mode"
assert(clothing_mode[3] == df["Clothing Total"].mode()[3]), "Make sure you're using the correct formula to calculate the mode"
assert(candy_median == df["Candy Total"].sort_values().iloc[int(len(df["Candy Total"].sort_values())/2)]), "Make sure you're using the correct formula to calculate the median"
assert(math.isclose(jewelry_std, 861647200.097521)), "Make sure you're using the correct formula to calculate the standard deviation"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

<hr style="color: #DD3403;">

# Part 3: Analysis

Use `df.plot.line()` to create a line plot of your data.

- By default, `df.plot.line()` will have use the **index column** as the x-axis.  Since we set that to `"Year"` earlier, we do not need to specify an `x` value and pandas will use our index.
- However, `df.plot.line()` requires a column for the `y` axis.  To specify the column to use, provide the column name as a string to the function as the `y` parameter.

A plot of the total spending on flowers could be created with:

```
df.plot.line("Flowers Total")
```

## Your Turn:

Create a line graph using the `"Per person Expected Valentines Day Spend"` column:

In [None]:
# Create a line plot of expected valentines day spend per person:
...

<hr style="color: #DD3403;">

## Submission

You're almost done!  All you need to do is to commit your lab to GitHub and run the GitHub Actions Grader:

1.  ⚠️ **Make certain to save your work.** ⚠️ To do this, go to **File => Save All**

2.  After you have saved, exit this notebook and follow the instructions to commit and grade this MicroProject!