# Section 1 - Final Project: Data In Hand

**Report Title:** Data In Hand <br>
**Name:** Morgan Rawski <br>
**Date:** 11/15/25

# Section 2 - Data Collection Information
I had to change the direction of my project because I wasn't able to find data for the original proposal. I decided to change to find data for the following question: How does the time of day that I listen to music affect the duration of the songs that I listen to? Instead of a Spotify user, I am a frequent listener of Amazon Music, so I went to the Amazon website and requested my data from them. I received the email to download my data a few days later. The structure of my data that I pulled from Amazon Music is a CSV file that contains the strings and integers of my listening data. This listening data covers all of the songs I have listened to and their duration that I listened to them from October 2024 to November 2025. It contains 8,928 rows and 12 columns, where each row represents one playback event recorded by Amazon Music. This file tells me the date and time of day that I listened to each song, the duration of the song itself, the duration that I listened to it, the title of each song, and the territory of where I was listening to it.

# Section 3 Data in a Pandas DataFrame

In [28]:
# imports
import requests
import pandas as pd

df = pd.read_csv(r'/Users/morawski28/Downloads/Amazon-Music/listening.csv')
df.head()

Unnamed: 0,Date,Product Name,ASIN,Listen Duration in Milliseconds,Reason for Stopped Playback,Device Type,Music Territory,Playback from Shuffle,Title Set ASIN,Track Length in Milliseconds,Artist ASIN,Selection Source Artist ASIN
0,2025-11-11 03:13:39 UTC,It's Beginning to Look a Lot like Christmas,B005UA0FTQ,1000,trackFinished,iOS,United States,No,B0DHHFVHLL,207000,B0011Z76FK,Not Available
1,2025-11-11 02:56:14 UTC,It's Beginning to Look a Lot like Christmas,B005UA0FTQ,54000,systemStop,iOS,United States,No,B0DHHFVHLL,207000,B0011Z76FK,Not Available
2,2025-11-11 02:30:57 UTC,Like It's Christmas,B07ZTZ719C,66000,trackFinished,iOS,United States,No,B0991P8SBX,201000,B00137FSEC,Not Available
3,2025-11-11 02:29:42 UTC,Like It's Christmas,B07ZTZ719C,134000,systemStop,iOS,United States,No,B0991P8SBX,201000,B00137FSEC,Not Available
4,2025-11-11 02:20:57 UTC,What Christmas Means to Me,B07ZS6G617,160000,trackFinished,iOS,United States,No,B0DHDC4YCK,160000,B000QJIC2I,Not Available


In [29]:
df = df.drop(columns=["ASIN", "Device Type", "Music Territory", "Title Set ASIN", "Artist ASIN", "Selection Source Artist ASIN", "Playback from Shuffle"])
df.head(30)

Unnamed: 0,Date,Product Name,Listen Duration in Milliseconds,Reason for Stopped Playback,Track Length in Milliseconds
0,2025-11-11 03:13:39 UTC,It's Beginning to Look a Lot like Christmas,1000,trackFinished,207000
1,2025-11-11 02:56:14 UTC,It's Beginning to Look a Lot like Christmas,54000,systemStop,207000
2,2025-11-11 02:30:57 UTC,Like It's Christmas,66000,trackFinished,201000
3,2025-11-11 02:29:42 UTC,Like It's Christmas,134000,systemStop,201000
4,2025-11-11 02:20:57 UTC,What Christmas Means to Me,160000,trackFinished,160000
5,2025-11-11 02:17:57 UTC,Christmas (Baby Please Come Home),170000,trackFinished,170000
6,2025-11-11 02:14:57 UTC,Santa Tell Me,204000,trackFinished,204000
7,2025-11-11 02:11:57 UTC,Run Rudolph Run,165000,trackFinished,163000
8,2025-11-11 02:08:57 UTC,"Baby, It's Cold Outside",142000,trackFinished,144000
9,2025-11-11 02:06:57 UTC,Wonderful Christmastime (Edited Version / Rema...,228000,trackFinished,228000


In [34]:
df['Date'] = pd.to_datetime(df['Date'])

def time_of_day(hour):
    if 5 <= hour < 12:
        return "Morning"
    elif 12 <= hour < 17:
        return "Afternoon"
    elif 17 <= hour < 21:
        return "Evening"
    else:
        return "Night"

df['Time of Day'] = df['Date'].dt.hour.map(time_of_day)

df["Time of Day"].value_counts()

df.head(30)

Unnamed: 0,Date,Product Name,Listen Duration in Milliseconds,Reason for Stopped Playback,Track Length in Milliseconds,Time of Day
0,2025-11-11 03:13:39+00:00,It's Beginning to Look a Lot like Christmas,1000,trackFinished,207000,Night
1,2025-11-11 02:56:14+00:00,It's Beginning to Look a Lot like Christmas,54000,systemStop,207000,Night
2,2025-11-11 02:30:57+00:00,Like It's Christmas,66000,trackFinished,201000,Night
3,2025-11-11 02:29:42+00:00,Like It's Christmas,134000,systemStop,201000,Night
4,2025-11-11 02:20:57+00:00,What Christmas Means to Me,160000,trackFinished,160000,Night
5,2025-11-11 02:17:57+00:00,Christmas (Baby Please Come Home),170000,trackFinished,170000,Night
6,2025-11-11 02:14:57+00:00,Santa Tell Me,204000,trackFinished,204000,Night
7,2025-11-11 02:11:57+00:00,Run Rudolph Run,165000,trackFinished,163000,Night
8,2025-11-11 02:08:57+00:00,"Baby, It's Cold Outside",142000,trackFinished,144000,Night
9,2025-11-11 02:06:57+00:00,Wonderful Christmastime (Edited Version / Rema...,228000,trackFinished,228000,Night


In [40]:
cols = df.columns.tolist()

# Remove column if it already exists in the list (prevents duplicates)
cols.remove("Time of Day")

# Find where "date" is located
Date_index = cols.index("Date")

# Insert time_of_day right after date
cols.insert(Date_index + 1, "Time of Day")

# Reorder the dataframe
df = df[cols]

df.head(30)

Unnamed: 0,Date,Time of Day,Product Name,Listen Duration in Milliseconds,Reason for Stopped Playback,Track Length in Milliseconds
0,2025-11-11 03:13:39+00:00,Night,It's Beginning to Look a Lot like Christmas,1000,trackFinished,207000
1,2025-11-11 02:56:14+00:00,Night,It's Beginning to Look a Lot like Christmas,54000,systemStop,207000
2,2025-11-11 02:30:57+00:00,Night,Like It's Christmas,66000,trackFinished,201000
3,2025-11-11 02:29:42+00:00,Night,Like It's Christmas,134000,systemStop,201000
4,2025-11-11 02:20:57+00:00,Night,What Christmas Means to Me,160000,trackFinished,160000
5,2025-11-11 02:17:57+00:00,Night,Christmas (Baby Please Come Home),170000,trackFinished,170000
6,2025-11-11 02:14:57+00:00,Night,Santa Tell Me,204000,trackFinished,204000
7,2025-11-11 02:11:57+00:00,Night,Run Rudolph Run,165000,trackFinished,163000
8,2025-11-11 02:08:57+00:00,Night,"Baby, It's Cold Outside",142000,trackFinished,144000
9,2025-11-11 02:06:57+00:00,Night,Wonderful Christmastime (Edited Version / Rema...,228000,trackFinished,228000


# Section 4 Describing the Data with Statistics

Rows = 8,928 <br>
Columns = 12
<br>
Minimum Date = 10/13/2024<br>
Maximum Date = 11/11/2025

In [15]:
# Listen Duration Stats
mean_listen = round(df["Listen Duration in Milliseconds"].mean() / 60000, 2)
median_listen = round(df["Listen Duration in Milliseconds"].median() / 60000, 2)
min_listen = round(df["Listen Duration in Milliseconds"].min() / 60000, 2)
max_listen = round(df["Listen Duration in Milliseconds"].max() / 60000, 2)

print("Listen Duration Statistics (Minutes):")
print("Mean:", mean_listen)
print("Median:", median_listen)
print("Minimum:", min_listen)
print("Maximum:", max_listen)

Listen Duration Statistics (Minutes):
Mean: 2.78
Median: 2.97
Minimum: 0.0
Maximum: 24.2


In [14]:
# Track Length Stats
mean_track = round(df["Track Length in Milliseconds"].mean() / 60000, 2)
median_track = round(df["Track Length in Milliseconds"].median() / 60000, 2)
min_track = round(df["Track Length in Milliseconds"].min() / 60000, 2)
max_track = round(df["Track Length in Milliseconds"].max() / 60000, 2)

print("Track Length Statistics (Minutes):")
print("Mean:", mean_track)
print("Median:", median_track)
print("Minimum:", min_track)
print("Maximum:", max_track)

Track Length Statistics (Minutes):
Mean: 3.17
Median: 3.12
Minimum: 0.95
Maximum: 9.78


In [33]:
reason_counts = df["Reason for Stopped Playback"].value_counts(dropna=False)
reason_percent = df["Reason for Stopped Playback"].value_counts(normalize=True, dropna=False) * 100

print(reason_counts)
print(reason_percent.round(2))

Reason for Stopped Playback
trackFinished      7745
systemStop          496
userStop            329
userNext            296
trackScrub           35
userPrev             24
trackInitFailed       3
Name: count, dtype: int64
Reason for Stopped Playback
trackFinished      86.75
systemStop          5.56
userStop            3.69
userNext            3.32
trackScrub          0.39
userPrev            0.27
trackInitFailed     0.03
Name: proportion, dtype: float64


My data has 12 columns of sorted data information that covers 8,928 rows of songs that I listened to during the period of October 13, 2024 to November 11, 2025. A lot of my data is made up of strings that I will have to categorize and organize later. Of the numerical data that I have, I am able to calculate the mean, median, maximum, and minimum values for how long a song is supposed to be and how long I actually listen to that song. Through these calculations, I have learned that on average the songs I listen to are just over three minutes long and on average I listen to a song for about two and three quarter minutes. The track length statistics also show me that the shortest song in my data is 95 seconds long, the longest song in my data is 9 minutes and 78 seconds long, and the song in the middle of my data is about 3 minutes and 12 seconds long. The listen duration statistics tell me that the shortest amount of time I listened to a song was 0 seconds (probably due to skipping it right away), the longest amount of time I listened to a song was 24.2 minutes, and the song near the middle of my data I listened to for about 2 minutes 97 seconds. I am not sure what happened in the data that I was given from Amazon that it says I finished the song, but also listened to it for 24 minutes when it is only a 3.5 minute song. Dispite this one hiccup, I found it really valuable to see that I often did end up listening to an average 3 minute song all of the way through. The last bit of information that I was able to produce values for is the Reason for Stopped Playback column. By quantifying this data, it can show me exactly if I finished a song, if I stopped a song, if I skipped a song, or for some reason the system stopped the song. Quanifying this data from values into percentages also helps me to see how often I am really finishing a song, which almost 87 percent of the songs I play, I finish and is more than I originally thought. With all of this data quantified and created into statistics, it helps me to see more of the meaning behind it rather than the numbers or data on the screen and gets me further into testing the correlation between the data, which is the ultimate eventual goal of this project to see if my listening through songs is affected by the time of day I'm listening.