# Analysis Plan

### Project Idea 
I will explore my “like” activity on Instagram alongside my music listening history. By looking at when I like posts and listen to music, I aim to see if there are patterns in my Instagram activity across different days of the week and times of day. This will help determine whether I use Instagram more on days when I listen to more music.

### Hypothesis 
I’m more likely to engage with Instagram posts on days when I don’t listen to music, as when I set aside time to listen to music for productivity, I end up spending less time on my phone.



### Import Statements

In [1]:
import pandas as pd
import glob
import json
from pathlib import Path

### Spotify from Downloads into DataFrame

In [124]:
# folder path & sorting JSON file
spotify_folder = Path.home() / "Downloads" / "Spotify Account Data 2"

spotify_files = sorted(spotify_folder.glob("StreamingHistory_music_2*.json"))
# temp DF with first few rows
spotify_dfs = []
for f in spotify_files:
    df_temp = pd.read_json(f)
    spotify_dfs.append(df_temp)

spotify_raw = pd.concat(spotify_dfs, ignore_index=True)

spotify_raw.head(5) 

Unnamed: 0,endTime,artistName,trackName,msPlayed
0,2025-06-20 02:10,Ariana Grande,intro (end of the world),3690
1,2025-06-20 02:13,Ariana Grande,we can't be friends (wait for your love),3541
2,2025-06-20 02:13,SZA,Nobody Gets Me,173760
3,2025-06-20 02:13,Ariana Grande,the boy is mine,41322
4,2025-06-20 02:14,Ariana Grande,The Way,16000


### Spotify Timestamps to dates

In [125]:
# creating date column converting endtime to datetime
spotify_raw['endTime'] = pd.to_datetime(spotify_raw['endTime'])

spotify_raw['date'] = spotify_raw['endTime'].dt.date

spotify_raw['minutes_played'] = spotify_raw['msPlayed'] / 1000 / 60

# groups data by date
spotify_daily = (
    spotify_raw
    .groupby('date', as_index=False)
    .agg(
        total_minutes=('minutes_played', 'sum'),
        total_tracks=('trackName', 'count'),
        unique_artists=('artistName', 'nunique')
    )
    .sort_values('date')
)

spotify_daily.head(10)

Unnamed: 0,date,total_minutes,total_tracks,unique_artists
0,2025-06-20,113.095083,118,52
1,2025-06-21,167.6218,142,75
2,2025-06-22,59.834717,62,38
3,2025-06-23,63.471417,49,14
4,2025-06-24,39.418767,44,30
5,2025-06-25,98.86615,79,23
6,2025-06-26,98.220933,112,61
7,2025-06-27,255.14265,93,44
8,2025-06-28,17.8566,21,14
9,2025-06-29,25.80985,20,12


### Spotify adding Weekends/Weekdays 

In [127]:
# endtime to datetime column
spotify_raw['datetime'] = pd.to_datetime(spotify_raw['endTime'])

# extract day of week (0=Monday, 6=Sunday)
spotify_raw['day_of_week'] = spotify_raw['datetime'].dt.dayofweek

# create weekday/weekend column
spotify_raw['day_type'] = spotify_raw['day_of_week'].apply(lambda x: 'Weekend' if x >= 5 else 'Weekday')

spotify_daily_days = (
    spotify_raw
    .groupby('date', as_index=False)
    .agg(
        total_minutes=('minutes_played', 'sum'),
        total_tracks=('trackName', 'count'),
        unique_artists=('artistName', 'nunique')
    )
)

day_info = spotify_raw[['date', 'day_of_week', 'day_type']].drop_duplicates()
spotify_daily_days = pd.merge(spotify_daily_days, day_info, on='date', how='left')

spotify_daily_days.head(10)

Unnamed: 0,date,total_minutes,total_tracks,unique_artists,day_of_week,day_type
0,2025-06-20,113.095083,118,52,4,Weekday
1,2025-06-21,167.6218,142,75,5,Weekend
2,2025-06-22,59.834717,62,38,6,Weekend
3,2025-06-23,63.471417,49,14,0,Weekday
4,2025-06-24,39.418767,44,30,1,Weekday
5,2025-06-25,98.86615,79,23,2,Weekday
6,2025-06-26,98.220933,112,61,3,Weekday
7,2025-06-27,255.14265,93,44,4,Weekday
8,2025-06-28,17.8566,21,14,5,Weekend
9,2025-06-29,25.80985,20,12,6,Weekend


### Spotify Summarizes Data

In [132]:
# total minutes, tracks, or any metric by weekday/weekend
spotify_daytype_summary = spotify_daily_days.groupby('day_type').agg(
    total_minutes=('total_minutes', 'sum'),
    total_tracks=('total_tracks', 'sum'),
    unique_artists=('unique_artists', 'sum'),  # optional
    days_count=('date', 'count')  # number of days
).reset_index()

print(spotify_daytype_summary)

  day_type  total_minutes  total_tracks  unique_artists  days_count
0  Weekday   10484.786433          7539            3245         107
1  Weekend    2840.237700          2208             916          40


### Instagram Download

In [134]:

# path & JSON to likes folder & files
insta_folder = Path.home() / "Downloads" / "instagram-gabby" / "your_instagram_activity" / "likes"
path = insta_folder / "liked_posts.json"


with open(path, "r") as j:
    data = json.load(j)

# Flatten the nested structure
rows = []
for like in data["likes_media_likes"]:
    for item in like.get("string_list_data", []):
        rows.append({
            "title": like.get("title"),
            "timestamp": item.get("timestamp")
        })


### Instagram DataFrame with Weekends/Weekdays 

In [144]:
# Create DataFrame
df = pd.DataFrame(rows)

# Convert timestamp to datetime
df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')

# Extract day of week (0=Monday, 6=Sunday)
df['day_of_week'] = df['datetime'].dt.dayofweek

# Weekday vs Weekend
df['day_type'] = df['day_of_week'].apply(lambda x: 'Weekend' if x >= 5 else 'Weekday')

# Preview
print(df[['datetime', 'day_of_week', 'day_type', 'title']].head(10))


             datetime  day_of_week day_type            title
0 2025-11-17 19:11:52            0  Weekday  linkon_tyrrell_
1 2025-11-17 03:45:02            0  Weekday      ecfromthedc
2 2025-11-17 03:44:35            0  Weekday  artdate.couples
3 2025-11-17 02:11:33            0  Weekday   juliananoelle_
4 2025-11-17 02:11:13            0  Weekday   jaylaamariiee_
5 2025-11-17 02:10:26            0  Weekday        kellymllr
6 2025-11-16 07:53:38            6  Weekend         trinpat_
7 2025-11-16 02:53:19            6  Weekend        down2errf
8 2025-11-15 23:49:31            5  Weekend      kyliejenner
9 2025-11-15 18:02:59            5  Weekend      stinkyasher


### Instagram Summarizes Data

In [142]:

insta_daytype_summary = df.groupby('day_type').agg(
    total_likes=('title', 'count'),   # total likes
    unique_days=('datetime', lambda x: x.dt.date.nunique())  # number of unique days
).reset_index()

print(insta_daytype_summary)

  day_type  total_likes  unique_days
0  Weekday          800           64
1  Weekend          342           26


### Instagram vs Spotify Data Comparison 

In [136]:
import pandas as pd

# Instagram
insta_daytype_counts = df['day_type'].value_counts().reset_index()
insta_daytype_counts.columns = ['day_type', 'instagram_count']

# Spotify 
spotify_daytype_counts = spotify_raw['day_type'].value_counts().reset_index()
spotify_daytype_counts.columns = ['day_type', 'spotify_count']

# merge
combined_daytype = pd.merge(insta_daytype_counts, spotify_daytype_counts, on='day_type', how='outer')

# weekday / weekend
combined_daytype = combined_daytype.set_index('day_type').reindex(['Weekday', 'Weekend']).reset_index()

print(combined_daytype)

  day_type  instagram_count  spotify_count
0  Weekday              800           7539
1  Weekend              342           2208


## TO DO for Rough Draft

This is my overall Analysis plan of the final project. Get Spotify & Instagram data. Make table of stats, especially with weekends as those are my days off. Compare to then prove or disprove my hypothesis.

Now I definitely need tweaking & cleaning up of my code. But as this is just an analysis plan I think this is what you were looking for as far a plan & overlook of how the draft & final will look

Still need to:
  Change up the times/ days I chose from so I can ge the most accurate results
  Add graphs/ maps for final results to visually showcase data
  Clean up code

  