# **Love and Relationships Analysis**

- **AUTHOR**: Edwin Ronald Lambert
- **LAST MODIFIED**: July 4, 2024

## **Introduction**

Romance is in the air! But how much does that cost?

Valentine's Day, also known as Saint Valentine's Day or the Feast of Saint Valentine, celebrated on February 14 is a Christian feast day honoring a martyr named Valentine, and later became a significant celebration of romance and love in many regions of the world. ([*Valentine's Day - Wikipedia*, 2024](https://en.wikipedia.org/wiki/Valentine%27s_Day))

The day grew into an occupation for couples to express their love by presenting flowers, offering chocolates and sending (gift) cards, thus becoming a day dedicated for romantic gestures.

As year passes, the cost for this romantic gestures are going high. The cost of items, marketing done for the day by major companies to increase sales, and societal pressure have impacted Valentine's day throughout all regions.

Yes, it costs a lot to have a boy/girlfriend. To those who are single, I understand the struggle, but financially, you're well off!

So, how much impact is there? Let's find out.


## **About Dataset**

We're exploring the [Happy Valentine's Day 2022](https://www.kaggle.com/datasets/infinator/happy-valentines-day-2022) from Kaggle. The National Retail Federation in the United States conducted surveys and has created a [Valentine's Day Data Center](https://nrf.com/research-insights/holiday-data-and-trends/valentines-day/valentines-day-data-center).

The data is collected from 2009 to 2022, organized by Suraj Das for the above Kaggle dataset. 

In this dataset, we'll be analyzing the different types of gifts people buy, the estimated cost of spending during valentines day, and spending between different age group and gender.

Here are the list of gift options for valentine's day in this dataset:
- Candy
- Flowers
- Jewelry
- Greeting cards
- An evening out
- Clothing
- Gift cards

Here are the list of age groups in this dataset:
- 18-24
- 25-34
- 35-44
- 45-54
- 55-64
- 65+

Here are the list of genders in this dataset:
- Men
- Women


## **Exploratory Data Analysis**

So here are the things that we are trying to find out with this dataset.

1. Understand the relationship between spending average and prominent gift options.
2. Understand the relationship between total expected spending and percentage of celebration.
3. Understand the relationship between age group and prominent gift options.
4. Understand the relationship between gender and prominent gift options.

n. Predict what the predicted values will be for average expected spending for each people and the total expected spending.


## **Importing Libraries**

Here are the list of libraries used in this project:

1. **pandas**: pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. Check out the [Pandas PyPi](https://pypi.org/project/pandas/) page.
2. **plotly**: plotly.py is an interactive, open-source, and browser-based graphing library for Python. Check out the [Plotly PyPi](https://pypi.org/project/plotly/) page.

In [68]:
# Installing Libraries
# !pip install pandas
# !pip install plotly

In [69]:
# Importing Libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import nbformat

In [70]:
# Save the dataset folder location.
raw_dataset_folder = "../data/raw/happy_valentines_day_2022"

# Files that have year as the identifier.
files_to_process_for_year = [
    "historical_spending_average_expected_spending.csv",
    "historical_gift_trends_per_person_spending.csv"
]

In [71]:
# Function to load the DataFrame and update column headers.
def load_and_process_columns(file_name):
    # Load the DataFrame
    df = pd.read_csv(f"{raw_dataset_folder}/{file_name}")
    # Check and rename "Unnamed: 0" to "year"
    df.rename(columns=lambda col: "year" if col =="Unnamed: 0" else col, inplace=True)
    # Update column headers to lowercase separated by underscore(_)
    df.columns = [col.replace(' ', '_').lower() for col in df.columns]
        
    return df
    
# Dictionary to hold the DataFrames.
dataframes = {
    file_name: load_and_process_columns(file_name) for file_name in files_to_process_for_year
}

print(dataframes[files_to_process_for_year[0]].head())

   year per_person_expected_valentines_day_spend
0  2009                                  $102.50
1  2010                                  $103.00
2  2011                                  $116.21
3  2012                                  $126.03
4  2013                                  $130.97


## **Clean Dataset**

In [72]:
# Function to operate for all values in the dataset.
for i, file in enumerate(files_to_process_for_year):
    df = dataframes[file]
    
    # Find the number of missing values.
    print(df.isnull().sum())
    
    # Remove the $ from the amount for all values.
    df = df.apply(lambda x: x.astype(str) if x.name == 'year' else x.replace('[\\$,]', '', regex=True).astype(float) if x.dtype == 'object' else x)
    
    # Update the DataFrame in the dictionary
    dataframes[file] = df
    
# Assigning DataFrame to relevant variable name for understanding.
average_expected_spending_df = dataframes[files_to_process_for_year[0]]
gift_trends_per_person_spending_df = dataframes[files_to_process_for_year[1]]

year                                        0
per_person_expected_valentines_day_spend    0
dtype: int64
year              0
candy             0
flowers           0
jewelry           0
greeting_cards    0
an_evening_out    0
clothing          0
gift_cards        0
dtype: int64


In [73]:
# Display the analytics and summary statistics of the spending_average_expected_spending dataset.
average_expected_spending_df.info()
average_expected_spending_df.describe().round(2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 2 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   year                                      14 non-null     object 
 1   per_person_expected_valentines_day_spend  14 non-null     float64
dtypes: float64(1), object(1)
memory usage: 356.0+ bytes


Unnamed: 0,per_person_expected_valentines_day_spend
count,14.0
mean,141.45
std,26.63
min,102.5
25%,127.26
50%,139.44
75%,158.18
max,196.31


In [74]:
# Display the analytics and summary statistics of the gift_trend_per_person_spending dataset.
gift_trends_per_person_spending_df.info()
gift_trends_per_person_spending_df.describe().round(2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   year            13 non-null     object 
 1   candy           13 non-null     float64
 2   flowers         13 non-null     float64
 3   jewelry         13 non-null     float64
 4   greeting_cards  13 non-null     float64
 5   an_evening_out  13 non-null     float64
 6   clothing        13 non-null     float64
 7   gift_cards      13 non-null     float64
dtypes: float64(7), object(1)
memory usage: 964.0+ bytes


Unnamed: 0,candy,flowers,jewelry,greeting_cards,an_evening_out,clothing,gift_cards
count,13.0,13.0,13.0,13.0,13.0,13.0,13.0
mean,12.84,14.65,32.55,7.68,27.47,14.94,11.5
std,2.4,1.35,6.19,0.87,3.22,3.7,2.72
min,8.6,12.33,21.52,5.91,21.39,10.42,8.42
25%,10.85,13.49,30.34,7.31,25.66,12.0,10.23
50%,12.7,14.78,30.94,7.87,27.48,14.04,11.04
75%,14.12,15.42,34.1,8.32,28.46,16.08,12.52
max,17.3,16.71,45.75,9.01,33.46,21.46,17.22
