### Likes Report: Discover how interests change over time through Instagram likes 
#### Jenna Bal

The data featured in this report was collected by data analysts at Instagram to better infer my (the user's) interests so the content I see, such as ads, suggested content, and sponsored posts, is catered towards me. The data is reliable because it is a record of everything I have ever liked since I first created this account. It tracks my interests over time. However, for the same reason, this data could be considered unreliable because I no longer even follow many of the accounts I used to like posts from. If all of the data were to be considered when analyzing my current interests, advertisers would not be marketing to me correctly. 
 
**I started this project how every project starts: opening a jupyter notebook and importing pandas and my json file.**

In [20]:
import pandas as pd
import json

In [21]:
with open ("/Users/jennabal/Desktop/Data in Emerging Tech/jenna.bal3_20220906/likes/liked_posts.json") as j:
    liked_posts = json.load(j)

Each data item included the "tile" (user that made the post), "media list data," "string list data" (the link), "value," and "timestamp." For this assignment, the directions were to organize the liked posts initially by timestamp, then the number of posts liked from each individual user.

In [None]:
liked_posts

**I created a table using the "DataFrame" function.**

In [23]:
liked_posts_df = pd.DataFrame(liked_posts['likes_media_likes'])

In [24]:
liked_posts_df

Unnamed: 0,title,media_list_data,string_list_data
0,kenzie,[],[{'href': 'https://www.instagram.com/p/5kTrMyM...
1,maddieziegler,[],[{'href': 'https://www.instagram.com/p/5EhGi5P...
2,kenzie,[],[{'href': 'https://www.instagram.com/p/4rdp11s...
3,maddieziegler,[],[{'href': 'https://www.instagram.com/p/4Sv0WXP...
4,maddieziegler,[],[{'href': 'https://www.instagram.com/p/4M6FIwv...
...,...,...,...
6330,hollyhumberstone,[],[{'href': 'https://www.instagram.com/p/CengbYU...
6331,_taypet,[],[{'href': 'https://www.instagram.com/p/CepDw7z...
6332,kendallg_11,[],[{'href': 'https://www.instagram.com/p/CepAG1h...
6333,emmachamberlain,[],[{'href': 'https://www.instagram.com/p/Ceo-fIK...


**I isolated a single timestamp from the first data point.**

In [25]:
liked_posts_df['string_list_data'][0]

[{'href': 'https://www.instagram.com/p/5kTrMyMypi/',
  'value': 'ð\x9f\x91\x8d',
  'timestamp': 1437849168}]

In [26]:
liked_posts_df['string_list_data'][0][0]

{'href': 'https://www.instagram.com/p/5kTrMyMypi/',
 'value': 'ð\x9f\x91\x8d',
 'timestamp': 1437849168}

In [27]:
liked_posts_df['string_list_data'][0][0]['timestamp']

1437849168

In [28]:
timestamp = [x[0]['timestamp'] for x in liked_posts_df['string_list_data']]

**I also created a list of all the timestamps.**

In [None]:
timestamp

In [30]:
liked_posts_df['timestamps'] = timestamp

In [31]:
liked_posts_df.head

<bound method NDFrame.head of                  title media_list_data  \
0               kenzie              []   
1        maddieziegler              []   
2               kenzie              []   
3        maddieziegler              []   
4        maddieziegler              []   
...                ...             ...   
6330  hollyhumberstone              []   
6331           _taypet              []   
6332       kendallg_11              []   
6333   emmachamberlain              []   
6334  hollyhumberstone              []   

                                       string_list_data  timestamps  
0     [{'href': 'https://www.instagram.com/p/5kTrMyM...  1437849168  
1     [{'href': 'https://www.instagram.com/p/5EhGi5P...  1436797483  
2     [{'href': 'https://www.instagram.com/p/4rdp11s...  1435941153  
3     [{'href': 'https://www.instagram.com/p/4Sv0WXP...  1435161440  
4     [{'href': 'https://www.instagram.com/p/4M6FIwv...  1434921328  
...                                          

**Because the assignment was focused on the timestamp of the like and user who made the post, I dropped "string list data" and "media list data" because they were not relevant to my work. In order to tidy up my workspace, I dropped the info using ".drop"**

In [32]:
liked_posts_df_ = liked_posts_df.drop('media_list_data', axis = 1)

In [37]:
liked_posts_df_

Unnamed: 0,title,string_list_data,timestamps
0,kenzie,[{'href': 'https://www.instagram.com/p/5kTrMyM...,1437849168
1,maddieziegler,[{'href': 'https://www.instagram.com/p/5EhGi5P...,1436797483
2,kenzie,[{'href': 'https://www.instagram.com/p/4rdp11s...,1435941153
3,maddieziegler,[{'href': 'https://www.instagram.com/p/4Sv0WXP...,1435161440
4,maddieziegler,[{'href': 'https://www.instagram.com/p/4M6FIwv...,1434921328
...,...,...,...
6330,hollyhumberstone,[{'href': 'https://www.instagram.com/p/CengbYU...,1654901966
6331,_taypet,[{'href': 'https://www.instagram.com/p/CepDw7z...,1654901815
6332,kendallg_11,[{'href': 'https://www.instagram.com/p/CepAG1h...,1654901789
6333,emmachamberlain,[{'href': 'https://www.instagram.com/p/Ceo-fIK...,1654897794


From a quick glance at the data, it is obvious my interests in middle school (when I first created my account) to now have changed drastically. Many accounts whose posts I liked frequently a few years I do not even follow now. For example, I used to like a lot of posts from all the stars of the TV show Dance Moms (examples above include "kenzie" and "maddieziegler") when I now only follow one of the girls. It was interesting to look back on posts I have liked because it jogged my memory about many of my old interests that I forgot I was once very invested in.

In [34]:
liked_posts_df_final = liked_posts_df_.drop('string_list_data', axis = 1)

In [35]:
liked_posts_df_final

Unnamed: 0,title,timestamps
0,kenzie,1437849168
1,maddieziegler,1436797483
2,kenzie,1435941153
3,maddieziegler,1435161440
4,maddieziegler,1434921328
...,...,...
6330,hollyhumberstone,1654901966
6331,_taypet,1654901815
6332,kendallg_11,1654901789
6333,emmachamberlain,1654897794


**Lastly, I used the "group by" function in order to see the number of posts I have liked from each account. Throughout the entire project I referenced past in-class work that dealt with Facebook data while adapting it to my needs for Instagram data.**

In [38]:
liked_posts_df_final.groupby('title').count().sort_values('title', ascending = False)

Unnamed: 0_level_0,timestamps
title,Unnamed: 1_level_1
zheng.mei.ling,1
zendaya,28
zack_vannette76,1
xxclose,2
xtinemay,19
...,...
_baileybaird,5
_alainaconte_,1
_adriennebundy_,25
_abbycoe_,3


Another interesting discovery is that I do not like posts from every account I follow. I follow over 800 people, but my there are only 680 accounts that have a like count. This data could be used by me to evaluate if there are any accounts that I am following that are simply taking up space in my feed.

When looked at as a whole, the data can track the evolution of my interests. If all the data were to be considered when analyzing my current interests, the correct content would not reach me. If I wanted to focus more on my current interests, I would only use data from a certain point forward. For example, when I started college until now. However, if I wanted to compare my interests overtime, I would create a timeline with different sections (middle school, high school, college for example) to compare and contrast my interests over time. In order to do this, I would need to be able to decipher what time each timestamp stands for in accordance with how we keep track of time in a daily sense. This would be my next step.

If I were to conduct an analysis of other Insatgram data collected, O would use the "followers" file to prove that at significant points of change in my life, I gained followers more exponentially than I did on a normal day. For example, when I graduated from middle school to high school or joined a sorority in college. 

The theoretical hypothesis would be that I gain followers more exponentially at points of significant change in my life than I gain on a normal daily basis. 

The statistical hypothesis would (hopefully) be similar to the theoretical hypothesis but would also include the specific values that need to be met to prove the hypothesis true (or false). To do this I would look at the timestamps of when people started following me to see when there was less time between a significant amount of followers gained. 