# **Project** | Analyzing Website Performance for The Grammys

<div style="text-align: center;">
<img src="https://upload.wikimedia.org/wikipedia/en/thumb/0/01/The_Recording_Academy_logo.svg/2560px-The_Recording_Academy_logo.svg.png" alt="The Recording Academy Logo" width="320"/>
</div>
<br>

You'll work on real data from both websites owned by The Recording Academy, better known as "the Grammys."

As you saw in the videos, the VP of Digital Strategy, Ray Starck, decided in 2022 to split the websites into grammy.comLinks to an external site. and recordingacademy.comLinks to an external site. to better serve the Recording Academy's various audience needs. You're tasked with examining the impact of splitting up the two websites, and analyzing the data for a better understanding of trends and audience behavior.


## Data Dictionary
You'll be working with two files, `grammys_live_web_analytics.csv` and `ra_live_web_analytics.csv`.

These files will contain the following information:

- **date** - The date the data was confirmed. It is in `yyyy-mm-dd` format.
- **visitors** - The number of users who went on the website on that day.
- **pageviews** - The number of pages that all users viewed on the website.
- **sessions** - The total number of sessions on the website. A session is a group of user interactions with your website that take place within a given time frame. For example a single session can contain multiple page views, events, social interactions.
- **bounced_sessions** - The total number of bounced sessions on the website. A bounced session is when a visitor comes to the website and does not interact with any pages / links and leaves.
- **avg_session_duration_secs** - The average length for all session durations for all users that came to the website that day.
- **awards_week** - A binary flag if the dates align with marketing campaigns before and after the Grammys award ceremony was held. This is the big marketing push to get as many eyeballs watching the event.
- **awards_night** - The actual night that Grammy Awards event was held.

# Part 1: Exploring the Data

This task will help you build a foundational understanding of the web analytics data for The Grammy Awards and The Recording Academy. By exploring the dataset first, you'll be better equipped to make meaningful observations and informed decisions later in the Project.


## Task 1

To start, import the both the `pandas` and `plotly.express` libraries so that you can load the data into a DataFrame and visualize.


In [None]:
# Import libraries
import pandas as pd
import plotly.express as px


## Task 2

Load in the first two files for your analysis. They are the `grammy_live_web_analytics.csv` and `ra_live_web_analytics.csv`.


**A.** Read the `grammy_live_web_analytics.csv` file into your notebook. Store the data in a DataFrame named `full_df`.

**B.** Read the `ra_live_web_analytics.csv` file into your notebook. Store that data into a DataFrame called `rec_academy`.

**C.** Preview both DataFrames to familiarize yourself with the data.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Remeber: </strong>These files can be found in the datasets folder!
</span>
</div>

In [None]:
# Read in dataframes
full_df = pd.read_csv('grammy_live_web_analytics.csv')
rec_academy = pd.read_csv('ra_live_web_analytics.csv')

# Preview full_df dataframe
full_df.head()

# Preview rec_academy dataframe
rec_academy.head()

In [None]:
# Preview full_df dataframe
full_df.head()

In [None]:
# Preview rec_academy dataframe
rec_academy.head()

## Task 3

The Grammy Awards are among the most prominent events in the global music industry. With such high visibility, it's important to understand how this event impacts web traffic.

**A.** Create a line chart of the number of users on the site for every day in the `full_df`.

In [None]:
# Plot a line chart of the visitors on the site.
px.line(full_df, x='date', y='visitors', title='Daily Visitors on Grammys.com')


**B.** What do you notice about when and why traffic spikes occur? Are the traffic spikes in your visualization only aligning with "Show Night," or are there lesser-known events that could explain certain spikes in website traffic?

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> Can you identify any specific lesser-known events (with exact dates) that might have caused significant increases in website traffic on grammys.com? What external data sources could help confirm these trends?
  </span>
</div>

Based on the line chart, traffic on grammys.com shows very clear spikes around the Grammys period. The largest spike aligns with Awards Night, but there are also several smaller increases during the days leading up to the show and shortly after it. These smaller spikes are likely tied to nomination announcements, performer reveals, red carpet coverage, recap articles, and promotional campaigns that happen before and after the ceremony. This shows that traffic is not only driven by the one, night event—there are multiple lesser-known marketing and media moments that also cause noticeable rises in engagement.

## Task 4

To evaluate the impact of the Grammy Awards on user engagement, you'll compare average site traffic on the day of the ceremony versus all other days.

Understanding this contrast provides insight into how concentrated user attention is around a single event — and highlights the challenge of sustaining traffic throughout the year.


**A.** Use the pandas `.groupby()` to compare the average daily website visitors on days when an award ceremony was held to those when no awards ceremonies were held.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>You'll group by the <span style="font-family: monospace; color: #222;">awards_night</span> column!
</span>
</div>

In [None]:
# Average number of visitors on awards nights versus other nights
full_df.groupby('awards_night')['visitors'].mean()

**B.** What does this comparison reveal about the difference in traffic between award ceremony days and regular days? How many more visitors does the Grammy Awards site receive on Show Night?


<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Remark: </strong>This is The Recording Academy's biggest challenge! How do you transform a business that relies on the success of one event per year into one that continues to bring users back on the site year round?
</span>
</div>

The comparison shows that traffic on grammys.com is dramatically higher on Awards Night compared to regular days. The number of visitors on Show Night increases by a very large margin—several times higher than the daily average on non-award days. This shows how heavily the site’s traffic depends on the Grammy Awards event. While regular days maintain steady but lower activity, Show Night brings an intense spike in interest, confirming that the Grammys ceremony is the primary driver of annual website engagement.

## Task 5

When The Recording Academy split its digital presence across two domains, grammy.com and recordingacademy.com, the data capture for grammy.com was not affected. Meaning, the way visitor data was collected for grammy.com stayed exactly the same before and after the split. You'll need to separate the data from before the split (when both sites were combined) and after the split (when grammy.com data continued independently). The split happened on February 1, 2022 (`2022-02-01`).


Create two new DataFrames:

1. `combined_site` should contain all data with dates before `2022-02-01`.

2. `grammys` should contains all data with dates on or after `2022-02-01`.

In [None]:
# Split the data to separate the full_df into two new dataframes.
# One for before the switch of the websites and one for after

combined_site = full_df[full_df['date'] < '2022-02-01']
grammys = full_df[full_df['date'] >= '2022-02-01']

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Tip: </strong>After creating these DataFrames, best practice is to use the .copy() method to avoid any warning messages from pandas when you modify them later.
</span>
</div>

In [None]:
# Run the following cell - DO NOT MODIFY
# .copy() prevents pandas from printing a warning message
combined_site = combined_site.copy()
grammys = grammys.copy()

In [None]:
# Print the shape of the combined_site dataframe
combined_site.shape

<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      If done correctly, the <span style="font-family: monospace; color: #222;">combined_site</span> DataFrame should have a total of <strong>1857</strong> rows and <strong>8</strong> columns.
  </span>
</div>



# Part 2: Analyzing Key Metrics

Remember the overall goal of this Project: to analze whether splitting the website into two has improved user engagement. This Task will focus on evaluating key metrics, such as bounce rate, pages per session, and average time on site, to determine if the split has had a positive or negative impact on how visitors interact with the site.

## Task 6

In this Task, you'll calculate the `pages_per_session` metric by dividing the total `pageviews` by the total number of `sessions`. Pages per session is an important measure of how many unique pages a user views before leaving the site -- a strong indicator of engagement!


**A.** Create a new list called `frames` that has each dataframe as an entry. e.g. If there were 3 dataframes, `df1`, `df2`, and `df3`, then the code would look like:

```python
frames = [df1, df2, df3]
```

**B.** `For` each frame in the frames list, create a new column called `pages_per_session`. This column should represent the *average* number of pageviews per session for each day.


<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>Divide the <span style="font-family: monospace; color: #222;">pageviews</span> column by <span style="font-family: monospace; color: #222;">sessions</span> column.
</span>
</div>

This can be achieved by using the following template:

```python
frame['new_col'] = frame['col_A'] / frame['col_B']
```



In [None]:
# Create the `pages_per_session` column for all 3 dataframes.

frames = [combined_site, grammys, rec_academy]

for frame in frames:
    frame['pages_per_session'] = frame['pageviews'] / frame['sessions']

**C.** Visualize this new `pages_per_session` metric using a line chart for each site. You will have 3 separate graphs!

In [None]:
# combined_site graph
px.line(combined_site, x='date', y='pages_per_session',
        title='Pages per Session - Combined Site')


In [None]:
# grammys graph
px.line(grammys, x='date', y='pages_per_session',
        title='Pages per Session - Grammys.com')


In [None]:
# rec_academy graph
px.line(rec_academy, x='date', y='pages_per_session',
        title='Pages per Session - RecordingAcademy.com')


**D.** In one sentence, what does the `pages_per_session` metric suggest regarding the impact of the website split?

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> What does pages per session reveal about user engagement? How should I interpret changes in this metric after the website split?
  </span>
</div>
<br>
<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Note: </strong>Any large spikes in the data that do not correspond with the Grammy Awards Ceremony can be attributed to abnormalities in the data collection process and ignored in your analysis.
</span>
</div>

The pages_per_session metric suggests that user engagement either stayed consistent or improved slightly after the website split, meaning visitors are viewing more pages per visit and interacting more deeply with the content.


## Task 7

Next, you'll calculate the `bounce_rate` metric by dividing the total `bounced_sessions` by the total number of `sessions`. Bounce rate is an important metric that calculates the percentage of users (aka sessions) that come to your site, never interact with the page, and leave. They are said to have "bounced" off your home page. It is a measure of how engaging your home page is with users.

**A.** Create a function called `bounce_rate` that:

1. Takes in a `dataframe` as input
2. adds up all of the values in the `bounced_sessions` column and stores in a variable called `sum_bounced`
3. adds up all of the values in the `sessions` column and stores it in a variable called `sum_sessions`
4. returns `100 * sum_bounced / sum_sessions`

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>You will need use the <span style="font-family: monospace; color: #222;">.sum()</span> function both in the <span style="font-family: monospace; color: #222;">sum_bounced</span> and <span style="font-family: monospace; color: #222;">sum_sessions</span> calculations. Don't forget to multiply by <strong>100</strong> so that the answer appears as a percentage instead of a decimal.
</span>
</div>

In [None]:
def bounce_rate(dataframe):
    '''
    Calculates the bounce rate for visitors on the website.
    input: dataframe with bounced_sessions and sessions columns
    output: numeric value from bounce rate
    '''

    # WRITE YOUR CODE HERE
    sum_bounced = dataframe['bounced_sessions'].sum()
    sum_sessions = dataframe['sessions'].sum()
    
    return 100 * sum_bounced / sum_sessions

**B.** Use the `frames` variable from Task 6 to loop over each website (represented by a dataframe) to calculate the bounce rate. Print the bounce rate for each site.

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>To get the bounce rate use <span style="font-family: monospace; color: #222;">bounce_rate(frame)</span>.
</span>
</div>

<br>

<div style="border: 3px solid #30EE99; background-color: #f0fff4; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
    <strong>Try This AI Prompt:</strong> How do I show a number with only 2 decimal places in an f-string?
  </span>
</div>

In [None]:
# Calculate the Bounce Rate for each site

for frame, name in zip(frames, ['Combined Site', 'Grammys', 'Recording Academy']):
    rate = bounce_rate(frame)
    print(f"{name} Bounce Rate: {rate:.2f}%")


<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
  If done correctly, the <span style="font-family: monospace; color: #222;">combined_site</span> and <span style="font-family: monospace; color: #222;">grammys</span> site will each have bounce rates in the low 40s. The <span style="font-family: monospace; color: #222;">rec_academy</span> will have a bounce rate in the low 30s.
  </span>
</div>


**C.** Next, you'll calculate the `average_time_on_site metric.`To do this, you only need to calculate the average of the `avg_session_duration_secs` column. Average Time on Site measures how engaging your website experience is for your users. The higher the number, the longer they are staying on your page and engaging with the content.

For each site (DataFrame), use an f-string to print the average time on site in a clean, readable format.

In [None]:
# Calculate the average of the avg_session_duration_secs

for frame, name in zip(frames, ['Combined Site', 'Grammys', 'Recording Academy']):
    avg_time = frame['avg_session_duration_secs'].mean()
    print(f"{name} - Average Time on Site: {avg_time:.2f} seconds")

**D.** Which of these three metrics changed the most after the site split? What do these changes suggest about user behavior?

Double click (or enter) to edit

# Part 3: Demographics

Understanding age demographics helps identify which audiences are most engaged with your content. These insights can guide marketing strategies, advertising decisions, and content planning.

You’ll analyze the age demographics for both websites. To do this, you’ll need to read in two new datasets and combine them into one!



## Task 8

The `grammys_age_demographics.csv` and `tra_age_demographics.csv` each contain the following information:

- **age_group** - The age group range. e.g. `18-24` are all visitors between the ages of 18 to 24 who come to the site.
- **pct_visitors** - The percentage of all of the websites visitors that come from that specific age group.

**A.** Read in the `grammys_age_demographics.csv` and `tra_age_demograhics.csv` files and store them into dataframes named `age_grammys` and `age_tra`, respectively.

In [None]:
# Read in the files
age_grammys = pd.read_csv('grammys_age_demographics.csv')
age_tra = pd.read_csv('tra_age_demographics.csv')


In [None]:
# Preview the age_grammys file
age_grammys.head()

**B.** For each dataframe, create a new column called `website` whose value is the name of the website.
e.g. the `age_grammys` values for `website` should all be `Grammys` and for the `age_tra` they should be `Recording Academy`.

In [None]:
# Label rows as 'Grammys'
age_grammys['website'] = 'Grammys'

# Label rows as 'Recording Academy'
age_tra['website'] = 'Recording Academy'

**C.** use the `pd.concat()` method to join these two datasets together. Store the result into a new variable called `age_df`

<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>Remember that you need to put your dataframe variables inside of a <strong>list</strong> first. Then pass that list as your input of <span style="font-family: monospace; color: #222;">pd.concat()</span>.
</div>

In [None]:
# Concatenate dataframes
age_df = pd.concat([age_grammys, age_tra])

# Preview combined data
age_df

<div style="border: 3px solid #f8c43e; background-color: #fff3c1; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
  <span style="font-size: 10pt;">
      If done correctly, your new DataFrame will have <strong>12</strong> rows and <strong>3</strong> columns.
  </span>
</div>

**D.** Create a bar chart of the `age_group` and `pct_visitors`. This chart should have, for each age group, one color for the Recording Academy and a different color for the Grammys.


<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>You will need to use the <span style="font-family: monospace; color: #222;">barmode='group'</span> option in <span style="font-family: monospace; color: #222;">px.bar()</span>. See the code snippet below to guide you.
</span>
</div>


```python
# Template for visualization
px.bar(dataframe, x='variable1', y='variable2', color='variable3', barmode='group')
```

In [None]:
# age_group and pct_visitors bar chart
px.bar(
    age_df,
    x='age_group',
    y='pct_visitors',
    color='website',
    barmode='group',
    title='Age Demographics: Grammys vs Recording Academy'
)

**E.** Looking at the chart above, what can you say about how the age demographics differ between the two websites?

The chart shows that Grammys.com attracts a younger audience with higher percentages in the 18–34 age groups, while the Recording Academy site has a relatively older audience, with stronger representation in the 35+ age ranges.

# Part 4: Making a Business Recommendation


## Task 9

Now that you've analyzed the engagement metrics before and after the website split, it’s time to interpret your findings and make a recommendation to The Recording Academy team.


**A.** Write a clear and specific prompt for ChatGPT to draft a brief business memo to The Recording Academy. Your prompt should guide ChatGPT to summarize key findings and suggest a recommendation based on the data: should The Recording Academy keep the sites separate, merge them back, or consider an alternative approach? Paste your prompt below.


You are a data analyst for The Recording Academy. I have analyzed website traffic, engagement metrics, and age demographics for Grammys.com and RecordingAcademy.com both before and after the site split. Using all these findings, write a clear business memo summarizing the key insights, including when traffic spikes occur, how engagement metrics like pages per session, bounce rate, and time on site changed after the split, and how the audience age demographics differ between the two websites. Based on the data, recommend whether The Recording Academy should keep the sites separate, merge them back together, or consider an alternative strategy. Provide 2–3 actionable suggestions to improve year-round engagement.

**B.** What did ChatGPT do well? Did it capture the key trends and insights? What was missing or inaccurate? Were any important details left out or misrepresented?

ChatGPT did a good job summarizing the main trends, including the traffic spikes around Award Night, how engagement metrics changed after the site split, and the differences in age demographics between the two websites. It captured the overall story clearly and highlighted the key insights in a concise way. However, the response was somewhat general at times and did not reference specific numeric values from my actual calculations, which made some points less precise. It also simplified a few findings, so some nuance—such as the exact size of the traffic difference between award nights vs regular nights—was missing. Overall, it provided a strong summary but needed more data-specific detail for full accuracy.


**C.** Based on your reflection and evaluation of AI's assist, write your final, revised business memo below. This version should be polished and ready as if you were presenting it to Ray at The Recording Academy team.

To: Ray Starck and The Recording Academy Leadership Team
From: Damierra-Joy Mbu Besong
Subject: Web Performance Insights & Recommendation Following the Site Split

After analyzing traffic, engagement, and demographic data for both Grammys.com and RecordingAcademy.com, several clear patterns emerged that help explain how users engage with each platform and how the 2022 site split has influenced performance.

First, grammys.com continues to experience dramatic traffic surges around Award Night, where visitor volume increases several times higher than the daily average on regular days. Smaller spikes also appear leading up to the ceremony due to nomination announcements, performance reveals, and promotional campaigns. This confirms that the Grammys ceremony remains the core driver of web interest, while off-season traffic remains comparatively low.

Second, engagement metrics following the site split show signs of improvement. Pages per session remained steady or increased slightly, and bounce rates dropped on the music-facing site—indicating that visitors are finding more relevant content faster. Average time on site also remained strong. These trends suggest that separating fan-focused content (Grammys) from industry- and membership-focused content (Recording Academy) has helped streamline user pathways and improved the overall quality of engagement.

Demographic data reinforces this separation: Grammys.com attracts a significantly younger audience, particularly within the 18–34 range, while RecordingAcademy.com draws a more mature audience with heavier concentration in the 35+ brackets. This aligns well with the intended purpose of each platform and supports the value of keeping them distinct.

Based on these insights, I recommend keeping the two sites separate, while strengthening the year-round engagement strategy on Grammys.com and optimizing navigation across both platforms. To support this, I suggest:
	1.	Develop ongoing, evergreen fan content (editorials, playlists, short-form videos, performance archives) to reduce dependence on Awards Night spikes.
	2.	Improve cross-site navigation and interlinking, guiding highly engaged Grammys visitors toward Recording Academy initiatives such as membership, advocacy, and programs.
	3.	Enhance user experience with clearer content categories and stronger calls-to-action to continue reducing bounce rates and increasing pages per session.

Overall, the site split has improved content relevance and supported stronger user pathways. With a more intentional year-round content strategy, The Recording Academy can transform peak Award Night traffic into a sustained audience across both platforms.

# LevelUp

Ray and Harvey are both interested to see how the Grammys.com website compares to that of their main music award competitor, The American Music Awards (AMA). The dashboard below is aggregated information about the performace of The AMA website for the months of April, May, and June of 2023.

Your goal is to determine how the Grammys website is performing relative to The AMA website. In particular, you will be looking at the device distribution and total visits over the same time span and leveraging information about Visit Duration, Bounce Rate, and Pages / Visit from your work in the core of this project.


![](figs/TheAMAs.png)



The **Total Visits** column is the total number of visitors on the website during the timespan given.
The **Device Distribution** is the percentage share of visitors coming from Desktop users (PCs, Macs, etc.) and Mobile Users (iPhone, Android, etc.).

Visitors on the AMA website are spending on average, 5 mins and 53 seconds on the site and viewing 2.74 pages per visit (aka session). They have a bounce rate of 54.31%

**A.** Load in the two files. The `desktop_users.csv` and `mobile_users.csv` files contain the users coming from desktop users and mobile users respectively.

Store them in variables named `desktop_users` and `mobile_users`

In [None]:
# Load in the data


In [None]:
# Preview the desktop_users file


In [None]:
# Preview mobile_users file


As you can imagine, you will be joining the two datasets together! But before you do that, you will modify the column names before you do that so that it's easier to use.

**B.** For each dataframe, change the name of the `visitors` column so that it says which category they come from. For example, the `desktop_users` dataframe should have a column named `desktop_visitors` instead of `visitors`.

Additionally, drop the `segment` column since it is no longer needed.

In [None]:
# Change name of the visitors column to indicate which category it comes from


In [None]:
# Drop the segment column from each dataframe


**C.** Join the two dataframes together in a new variable called `segment_df`.

In [None]:
# Join the two dataframes and preview the dataframe


**D.** In the next few steps, you will calculate the percentage share of users coming from desktop and mobile on the Grammys website.

Calculate a new column, `total_visitors` that is the addition of `desktop_visitors` and `mobile_visitors`.

In [None]:
# Create total_visitors column


<div style="border: 3px solid #b67ae5; background-color: #f9f1ff; padding: 15px; border-radius: 8px; color: #222; display: flex; align-items: center;">
<span style="font-size: 10pt;">
<strong>Hint: </strong>To calculate the percentage share you will first need to filter the data to dates after (and including) `2023-04-01`. Then calculate the `sum` of desktop visitors and total visitors and divide those values. The percentage share of mobile visitors will be the value needed to get to 100%.
</span>
</div>



In [None]:
# Filter and calculate the percentage share
# Use an f string to print each percentage to the screen



**E.** How is the Grammys website performing relative to its competitor? What is the Grammys doing well and what KPIs does it need to improve?

Double-click (or enter) to edit
