# Grammys Project
This project analyzes real-world web analytics data from both websites owned by The Recording Academy, the non-profit organization behind the Grammy Awards. Following a strategic decision to split the websites into grammy.com and recordingacademy.com, the goal is to assess the impact of this separation on user engagement, traffic trends, and audience behavior.

The analysis focuses on identifying patterns in key performance indicators (KPIs), including pages per session, session duration, and bounce rate, as well as understanding demographic differences across the two domains. Insights from this project aim to inform recommendations for optimizing website structure and content to better serve distinct audience segments.


## Data Dictionary
I will be working with two files, `grammys_live_web_analytics.csv` and `ra_live_web_analytics.csv`.

These files will contain the following information:

- **date** - The date the data was confirmed. It is in `yyyy-mm-dd` format.
- **visitors** - The number of users who went on the website on that day.
- **pageviews** - The number of pages that all users viewed on the website.
- **sessions** - The total number of sessions on the website. A session is a group of user interactions with your website that take place within a given time frame. For example a single session can contain multiple page views, events, social interactions.
- **bounced_sessions** - The total number of bounced sessions on the website. A bounced session is when a visitor comes to the website and does not interact with any pages / links and leaves.
- **avg_session_duration_secs** - The average length for all session durations for all users that came to the website that day.
- **awards_week** - A binary flag if the dates align with marketing campaigns before and after the Grammys award ceremony was held. This is the big marketing push to get as many eyeballs watching the event.
- **awards_night** - The actual night that Grammy Awards event was held.

# Part I - Exploratory Data Analysis



## Task 1

Import the `pandas`,`numpy`, and `plotly.express` libraries.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import plotly.express as px

In [None]:
# format numbers to two decimal places when shown in pandas
pd.set_option('display.float_format', lambda x: '%.2f' % x)

## Task 2

Load in and preview files.

In [None]:
# Read in dataframes
full_df = pd.read_csv('/content/grammy_live_web_analytics.csv')
rec_academy = pd.read_csv('/content/ra_live_web_analytics.csv')

NameError: name 'pd' is not defined

In [None]:
# preview full_df dataframe
full_df.head()

In [None]:
# preview rec_academy dataframe
rec_academy.head()

## Task 3

Create a line chart of the number of users on the site for every day in the `full_df`.

In [None]:
# Plot a line chart of the visitors on the site.
px.line(full_df,
       x = 'date',
       y = 'visitors')

**Observation**


Website traffic on grammy.com peaks on the days of the Grammy Awards ceremony and during nominee announcement periods, indicating that these events drive the highest levels of user engagement.

## Task 4

Investigate what an "average" day looks like when the awards show is being hosted versus the other 364 days out of the year.


In [None]:
full_df.groupby('awards_night').agg({'visitors': 'mean'})

**Observation**

On average, the grammy.com website receives 1,389,590 visitors on ceremony days, whereas regular days see only 32,388 visitors. Ceremony days therefore attract over 42 times the traffic of a typical day, highlighting the enormous spike in engagement during these events.
This illustrates a key challenge for The Recording Academy: converting a platform that peaks around a single annual event into one that maintains consistent user engagement throughout the year.


## Task 5

When The Recording Academy decided to split their website into two domains, grammy.com and recordingacademy.com, the data capture for grammy.com was not affected. So the `full_df` variable needs to be split separately into two dataframes. The day the domains were switched is on `2022-02-01`.

Create two new dataframes:

1. `combined_site` for all dates before `2022-02-01`
2. `grammys` for all dates after (and including) `2022-02-01`

In [None]:
# Split the data to separate the full_df into two new dataframes.
# One for before the switch of the websites and one for after
combined_site = full_df[full_df['date'] < '2022-02-01']
grammys = full_df[full_df['date'] >= '2022-02-01']


In [None]:
# Return a copy of the dataframes.
combined_site = combined_site.copy()
grammys = grammys.copy()

In [None]:
# print the shape of the combined_site dataframe
combined_site.shape

# Part II - Investigate KPIs






## Task 6

**A.** Create a new list called `frames` that has the `combined_site`, `rec_academy`, and `grammys` dataframes as entries.

In [None]:
# create the list of dataframes
combined_site.name = 'Combined Site'
rec_academy.name = 'Recording Academy'
grammys.name = 'Grammys'
frames = [combined_site, rec_academy, grammys]

**B.** For each frame in the frames list, create a new column `pages_per_session`. This new column is the average number of pageviews per session on a given day.


In [None]:
# create the `pages_per_session` column for all 3 dataframes.
for frame in frames:
    frame['pages_per_session'] = frame['pageviews'] / frame['sessions']

**C.** Visualize this new metric using a line chart for each site.

In [None]:
# combined_site graph
px.line(combined_site,
       x = 'date',
       y = 'pages_per_session',
       title = 'Pages per Session for Combined Site')

In [None]:
# grammys graph
px.line(grammys,
       x = 'date',
       y = 'pages_per_session',
       title = 'Pages per Session for Grammys Site')

In [None]:
# rec_academy graph
px.line(rec_academy,
       x = 'date',
       y = 'pages_per_session',
       title = 'Pages per Session for Recording Academy Site')

**Observation**

After the websites were split, the average pages per session nearly doubled. The combined site recorded approximately 1 page per session on average, whereas each individual site, grammy.com and recordingacademy.com, recorded between 2 and 3 pages per session. This indicates that users are engaging more deeply with content when the sites are separated.

Note: Any large spikes that do not correspond with the Grammy Awards ceremony are likely due to anomalies in data collection and have been excluded from this analysis.


## Task 7

Create a function to calculate the bounce rate.



In [None]:
def bounce_rate(dataframe):
    sum_bounced = dataframe['bounced_sessions'].sum()
    sum_sessions = dataframe['sessions'].sum()
    return 100 * sum_bounced / sum_sessions

**B.** Loop over each website.




In [None]:
# Calculate the Bounce Rate for each site.
for frame in frames:
    rate = bounce_rate(frame)
    print(f"Bounce rate for {frame.name}: {rate:.2f}%")

**C.** Calculate how long on average visitors are staying on the website.



In [None]:
# Calculate the average of the avg_session_duration_secs.
for frame in frames:
    site_mean = frame['avg_session_duration_secs'].mean()
    print(f"Mean session duration for {frame.name}: {site_mean:.2f} seconds")

**Observation**

The Recording Academy website demonstrates stronger performance overall, with the lowest bounce rate and longest average session duration. This suggests that visitors are highly engaged and spend more time exploring its content. In contrast, the Grammy website shows a shorter average session duration and a bounce rate similar to the combined site, indicating that its content may perform better when integrated within a broader platform.

# Part III - Demographics


## Task 8

The `grammys_age_demographics.csv` and `tra_age_demographics.csv` each contain the following information:

- **age_group** - The age group range. e.g. `18-24` are all visitors between the ages of 18 to 24 who come to the site.
- **pct_visitors** - The percentage of all of the websites visitors that come from that specific age group.

**A.** Read in the `grammys_age_demographics.csv` and `tra_age_demographics.csv` files and store them into dataframes named `age_grammys` and `age_tra`, respectively.

In [None]:
# read in the files
age_grammys = pd.read_csv('datasets/grammys_age_demographics.csv')
age_tra = pd.read_csv('datasets/tra_age_demographics.csv')

In [None]:
# preview the age_grammys file.
age_grammys.head()

**B.** For each dataframe, create a new column called `website` whose value is the name of the website.

In [None]:
# create the website column
age_grammys['website'] = 'Grammys'
age_tra['website'] = 'Recording Academy'

**C.** Join these two datasets together.


In [None]:
# use pd.concat to join the two datasets
age_df = pd.concat([age_grammys, age_tra])
age_df.head()

**D.** Create a bar chart of the `age_group` and `pct_visitors`.


In [None]:
# Create bar chart
px.bar(age_df, x='age_group',
       y='pct_visitors',
       color='website',
       barmode='group')

**Observation**

Both websites attract a similar distribution of visitors across age groups. However, the youngest and oldest audiences show a slight preference for the Grammy website, while middle-aged users are more likely to visit the Recording Academy website.

# Part IV - Recommendation


I recommend that grammy.com and recordingacademy.com remain as separate websites. The key performance indicators (KPIs) demonstrate that each site effectively serves a distinct audience segment. Following the split, both platforms recorded notable improvements in engagement metrics compared to when the sites were combined.

The average number of pages per session increased from 1.0 on the unified site to 2–3 on each individual site, indicating that users are discovering more relevant and engaging content. Average session duration also improved: visitors to RecordingAcademy.com spent an average of 128.5 seconds per session, while Grammy.com visitors averaged 82.99 seconds, compared to 102.85 seconds on the combined site. These results suggest that the more focused content strategies are effectively sustaining user interest.

The bounce rate showed further improvement, particularly for RecordingAcademy.com, which declined to 33.67% from the combined site’s 41.58%, signifying that visitors are finding desired information more efficiently. While audience age distribution remains generally consistent across both sites, younger and older users are more likely to engage with Grammy.com, whereas middle-aged visitors show a stronger preference for RecordingAcademy.com.
   
Overall, the enhanced engagement metrics and clear audience differentiation support maintaining the current dual-site structure. This approach ensures that both fan-oriented and industry-focused users receive content that aligns with their interests, thereby maximizing engagement and optimizing user experience.
