# Demand Forecast
   
---
   
This project intended to predict the demand that define by streaming time in time-series manner. The data used for this project will aggregate several datasets that been request from Spotify of my Personal streaming history. 
     
---
  

# 1. Streaming

## Data Overview

---    
   
The datasets used is `Streaming_History_Audio_#.json` which provided detailed streaming history and has long time range that adequate for training a model. The data that will be used in this project consists of:   
   
*  `ts`: This field is a timestamp indicating when the track stopped playing in UTC (Coordinated Universal Time). The order is year, month and day followed by a timestamp in military time.   
    
*  `ms_played`: This field is the number of milliseconds the stream was played.   
   
*  `platform`: This field is the platform used when streaming the track (e.g. Android OS, Google Chromecast).    
   
*  `conn_country`: This field is the country code of the country where the stream was played (e.g. SE - Sweden).   
   
*  `reason_start`: This field is a value telling why the track started (e.g. “trackdone”)   
   
*  `reason_end`: This field is a value telling why the track ended (e.g. “endplay”).   
   
*  `shuffle`: This field has the value True or False depending on if shuffle mode was used when playing the track.   
   
*  `skipped`: This field indicates if the user skipped to the next song.   
   
*  `offline`: This field indicates whether the track was played in offline mode (“True”) or not (“False”).    
   
*  `incognito_mode`: This field indicates whether the track was played during a private session (“True”) or not (“False”).   
   
---

## Library

In [1]:
# Import Library
import os

## Universal Data Processing
import numpy as np
import pandas as pd

## Regular Expression for Text Data
import re

## JSON Files Manipulation
import json

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

*Display Settings*

In [11]:
pd.set_option("display.max_columns", None)   # show all columns
pd.set_option("display.width", None)         # auto-detect width
pd.set_option("display.max_colwidth", None)  # don't truncate cell content

## Data Preparation

In [14]:
# Get Current Directory Address
base_dir = os.getcwd()
dataset_dir = os.path.join(base_dir, "Dataset")

# List included datasets paths
paths = [
    os.path.join(dataset_dir, "Streaming_History_Audio_2018-2019_0.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2019-2020_1.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2020_2.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2020-2021_3.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2021-2022_4.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2022-2023_5.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2023_6.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2023-2024_7.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2024-2025_8.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2025_9.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2025-2026_10.json"),
    os.path.join(dataset_dir, "Streaming_History_Audio_2026_11.json"),
]

# Load each datasets
all_data = []

for idx, path in enumerate(paths):              # use `for count, item in enumerate(items, start=1)` to customize the indexing
    print(f"Loading file {idx}: {path}")
    with open(path, "r", encoding="utf-8") as json_file:
        data_idx = json.load(json_file)
        all_data.append(data_idx)

print("Files loaded", len(all_data))

Loading file 0: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2018-2019_0.json
Loading file 1: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2019-2020_1.json
Loading file 2: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2020_2.json
Loading file 3: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2020-2021_3.json
Loading file 4: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2021-2022_4.json
Loading file 5: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2022-2023_5.json
Loading file 6: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2023_6.json
Loading file 7: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2023-2024_7.json
Loading file 8: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2024-2025_8.json
Loading file 9: c:\03. Other\Spotify_Unwrapped\Dataset\Streaming_History_Audio_2025_9.json
Loading file 10: c:\03. Other\Spotify_Unwrapped\Dataset

In [15]:
# Convert loaded data into suitable objects for ease of manipulation
flat_data = []

for file_data in all_data:
    flat_data.extend(file_data)

# Flatten the Table to make sure the dictionary or embedded arays data in the flat table
df_json = pd.json_normalize(flat_data)
df_json.head()

Unnamed: 0,ts,platform,ms_played,conn_country,master_metadata_track_name,master_metadata_album_artist_name,master_metadata_album_album_name,spotify_track_uri,episode_name,episode_show_name,spotify_episode_uri,audiobook_title,audiobook_uri,audiobook_chapter_uri,audiobook_chapter_title,reason_start,reason_end,shuffle,skipped,offline,offline_timestamp,incognito_mode
0,2018-02-05T00:58:40Z,"Android OS 5.1.1 API 22 (OPPO, F1f)",9887,ID,A Different Way (with Lauv),DJ Snake,A Different Way (with Lauv),spotify:track:0Wv5wuenRLI3BcwgT3HPIP,,,,,,,,playbtn,fwdbtn,True,False,False,,False
1,2018-02-05T00:59:02Z,"Android OS 5.1.1 API 22 (OPPO, F1f)",11193,ID,The Story Never Ends,Lauv,The Story Never Ends,spotify:track:5cDNs4utoHt0WNRZuziews,,,,,,,,fwdbtn,fwdbtn,True,False,False,,False
2,2018-02-05T00:59:26Z,"Android OS 5.1.1 API 22 (OPPO, F1f)",11463,ID,Don't Matter - Recorded at Spotify Studios NYC,Lauv,Spotify Singles,spotify:track:4a00SVbdG6saNqRJlC4XKQ,,,,,,,,fwdbtn,fwdbtn,True,False,False,,False
3,2018-02-05T00:59:48Z,"Android OS 5.1.1 API 22 (OPPO, F1f)",7826,ID,Comfortable,Lauv,Lost in the Light,spotify:track:4DPezINQNrkmmzjasvltzW,,,,,,,,fwdbtn,endplay,True,False,False,,False
4,2018-02-05T01:03:51Z,"Android OS 5.1.1 API 22 (OPPO, F1f)",236759,ID,Falling,Fancy Feelings,Falling,spotify:track:5HeufMDrbfDWQj7AZlhlky,,,,,,,,playbtn,trackdone,True,False,False,,False
