# Homework 4 - Tidy and Process the Billboard Dataset
The Billboard dataset comes with **76 columns** corresponding to the chart position of each song from `x1st.week` through `x76th.week`. This is a classic example of **wide** data that needs to be **melted** (unpivoted) into a long (tidy) format.

### Instructions
1. Follow the instructions on how to setup your Python and Jupyter (or VSCode) environment and cloning or downloading our repository. Instructions can be found in the class notes.
2. Fill the missing pieces of code in the provided notebook.
3. Run the notebook and make sure everything works.


### Dataset Overview
The dataset consists of songs and their weekly chart positions on the Billboard Hot 100. The dataset contains the following columns:
- `year`: The year the song entered the chart.
- `artist`: The artist of the song.
- `track`: The title of the song.
- `time`: The duration of the song.
- `date.entered`: The date the song entered the chart.
- `x1st.week` to `x76th.week`: The chart position of the song for each week.

### Goals

1. **Load** the Billboard dataset from CSV.
2. **Tidy** the data so each row represents one song in one week.
3. **Calculate** the actual date for each week using `date.entered + week * 7 days`.
4. **Split** the data into two tables:
   - A **songs** table with static song information.
   - A **positions** table with `(song_id, week, rank, date)`.
5. **Save** the tidy data to **Feather** format in the same directory with `_tidy` suffix.

### Submission Guidelines

- Submit your completed notebook as a HTML export, or a PDF file.

To export to HTML, if you are on Jupyter, select `File` > `Export Notebook As` > `HTML`.

If you are on VSCode, you can use the `Jupyter: Export to HTML` command.
 - Open the command palette (Ctrl+Shift+P or Cmd+Shift+P on Mac).
     - Search for `Jupyter: Export to HTML`.
     - Save the HTML file to your computer and submit it via Canvas.

---

In [None]:
import pandas as pd

# 1. Load the Billboard dataset
df_bill = pd.read_csv("../../Datasets/billboard.csv")

# Let's check a few columns to see the structure.
df_bill.head()

 The dataset has columns like:

 - **year**, **artist.inverted**, **track**, **time**, **genre** … (song info)

 - **date.entered**, **date.peaked** … (chart-related dates)

 - **x1st.week** through **x76th.week** … (chart positions over 76 weeks)



 We want to **melt** these weekly columns into a single `week` and `rank` column.

In [None]:
# Your code here

 Notice how each row is now **one song** in **one week**. However, the `week` column currently contains strings like `"x1st.week"`, `"x2nd.week"`, etc. Let's clean those up and create a numeric week column.

In [None]:
# Your code here

 Now, `week = 1, 2, 3, ... 76`. Next, we want to calculate the **exact date** on the chart for each row by adding `week * 7` days to `date.entered`. Create a column named "date" to hold the result. See the expected result in our lecture materials for tidy data.

In [None]:
# Note that after doing that, you should have a new column called date
# Your code here


 ### Split into Two Tables



 **Why split?** We often separate the **static** song info (e.g., artist, track, time, genre) from the **weekly** chart performance (week, rank, date).



 - **Songs Table**: Contains unique identifiers for each song plus basic metadata.

 - **Positions Table**: Contains `(song_id, week, rank, date)`, referencing the **song_id** from the songs table.

In [None]:
# Your code here

 Next, we merge this `song_id` back into our `df_tidy` so we can create the positions table.

In [None]:
# Your code here


 ### Create the Positions Table



 We only keep the **relevant columns** for weekly positions: `song_id`, `week`, `rank`, and `date`.

In [None]:
# Your code here


## 8.Playing with the data
 Now that we have our data in a tidy format, let's do some analysis.

### Only songs that reached top 10
We can use `query()` to filter the data for songs that reached the top 10 at least once. We will merge this back to the songs table to get the song details.

Get a dataframe with the top 10 songs and their details.

In [None]:
# Your code here


You may want to remove duplicates to get a list of unique songs that reached the top 10. See `df.drop_duplicates()` for more details.

In [None]:
# Your code here


### How long did each song stay in the top 10?
Create add to the current dataframe or create a new dataframe with the following columns:
- `song_id` : the song id
- `weeks_in_top_10` : the number of weeks the song was in the top 10

In [None]:
# Your code here


### In which week did each song reach the top 10?
Create or add to a new dataframe with the following columns:
- `week_reached_top_10` : the week in which the song reached the top 10 for the first time

In [None]:
# Your code here


 ### 9. Save Tidy Data to Feather



 We want to save:

 - The **tidy** DataFrame (`df_tidy`) to a single file with the suffix `_tidy`.

 - (Optionally) Also save **songs** and **positions** as separate Feather files if needed.

In [None]:
# Your code here
