<h1>Data Report On Instagram Likes</h1>

<h4>Sophia Swengel</h4>

September 15, 2023

<hr/>

<p>Instagram is one of the social media posts I use the most to connect with friends and the wider world. I follow a variety of accounts, the general majority of which are people I know personally. Because of this, I hypothesize that <b>there is a positive relationship between the number of "likes" I grant Instagram posts and the ownership of the account of the post I liked by an individual as opposed to a group or organization.</b></p>

<p>The data included in the liked_posts.json file is appropriate for testing the hypothesis because it measures the timestamps of "likes" I grant to Instagram posts as well as the names of the accounts that posted each post I "liked". I can pull from this data which accounts I have granted the most likes to.</p>

<p>This data was collected by Instagram owner Meta in order to generate a curated Instagram experience through algorithms generating relevant advertising and recommended posts, effectively tracking the user.</p>
<p>The data may be <b>reliable</b> because it is what it is: every time I "like" a post, Meta's data gets another tick. It is all the "likes" that I have given to Instagram posts, and the data is an accurate reflection of my "liking" habits.</p>
<p>The data may be <b>unreliable</b> because of Facebook-Instagram cross posting and user like fatigue. When people I follow post the same post on Facebook and Instagram, I often like whichever one I see first on either platform. Hence, many posts that I viewed and gave a like to via Facebook friendship that were also posted to Intagram will not be represented in the Instagram like data. Also, I am often quite picky with what I give "likes" to on Instagram and am sometimes even disinterested in 'liking' posts. Hence, posts that I liked but didn't like enough to 'like' will not be represented.</p>

<h3>Process</h3>

In order to test our hypothesis, we can first load the Instagram likes data for my account, which I downloaded from Instagram itself.

In [22]:
import pandas as pd
import json

In [23]:
with open(r'/Users/sophiaswengel/Downloads/sophia_swengel_20230830/likes/liked_posts.json') as l:
    likes_data = json.load(l)

We can load up the data and see the many values it defines so we can figure out what list elements to examine. The list elements are stored within 'likes_media_likes'; knowing this allows us to access the list elements. Within 'likes_media_likes' is the 'title' value (the name of the Instagram account) and the 'string_list_data' value, which contains within it the values for 'href' (the link to the liked Instagram post that the data represents), 'value' (the emoticon used; in this case it is always a heart since that is how Instagram represents hearts), and 'timestamp' (the timestamp of the like in seconds since 1970).

In [25]:
likes_data

{'likes_media_likes': [{'title': 'fruits_magazine_archives',
   'string_list_data': [{'href': 'https://www.instagram.com/p/B4OAKpdlsBm/',
     'value': 'ð\x9f\x91\x8d',
     'timestamp': 1572391466}]},
  {'title': 'michaelpilmer',
   'string_list_data': [{'href': 'https://www.instagram.com/p/B4NwMhlBht_/',
     'value': 'ð\x9f\x91\x8d',
     'timestamp': 1572383890}]},
  {'title': 'j0cko_h0mo',
   'string_list_data': [{'href': 'https://www.instagram.com/p/B4MOhGQH8jO/',
     'value': 'ð\x9f\x91\x8d',
     'timestamp': 1572348184}]},
  {'title': 'fruits_magazine_archives',
   'string_list_data': [{'href': 'https://www.instagram.com/p/B4LhueTltza/',
     'value': 'ð\x9f\x91\x8d',
     'timestamp': 1572311081}]},
  {'title': 'joshfreese',
   'string_list_data': [{'href': 'https://www.instagram.com/p/B4Kw13LhoBe/',
     'value': 'ð\x9f\x91\x8d',
     'timestamp': 1572283094}]},
  {'title': 'michaelpilmer',
   'string_list_data': [{'href': 'https://www.instagram.com/p/B4K2Npnh0I3/',
     'v

Now we know where the timestamp values, which is what we want to access for this study, are located within the data structure. From this we can then isolate out these timestamps.

First we load up the 'likes_media_likes' data so we can access the values within.

In [26]:
likes_data['likes_media_likes'][0]

{'title': 'fruits_magazine_archives',
 'string_list_data': [{'href': 'https://www.instagram.com/p/B4OAKpdlsBm/',
   'value': 'ð\x9f\x91\x8d',
   'timestamp': 1572391466}]}

Then we access 'string_list_data', which the timestamp data is located within.

In [27]:
likes_data['likes_media_likes'][0]['string_list_data'][0]

{'href': 'https://www.instagram.com/p/B4OAKpdlsBm/',
 'value': 'ð\x9f\x91\x8d',
 'timestamp': 1572391466}

Then we access the 'timestamp' values from having access to 'string_list_data'.

In [28]:
likes_data['likes_media_likes'][0]['string_list_data'][0]['timestamp']

1572391466

Now that we know where the timestamp data is located, we can convert the likes data into a dataframe to visualize the data using access to the overarching 'likes_media_likes' as the jumping-off point.

In [24]:
likes_frame = pd.DataFrame(likes_data['likes_media_likes'])

We can add additional columns and visually sort our data to make it easier to digest using this frame. The syntax "['string_list_data'][0]['timestamp']" lets us navigate to the timestamp value from 'likes_media_likes'. We can make a new column from this syntax by first generating a list for the data we want in this new column.

In [29]:
data_list = [x['string_list_data'][0]['timestamp'] for x in likes_data['likes_media_likes']]

This is what the list of timestamps looks like:

In [30]:
data_list[0:10]

[1572391466,
 1572383890,
 1572348184,
 1572311081,
 1572283094,
 1572283065,
 1572277593,
 1572277582,
 1572227450,
 1572227429]

Now we can make a new column in our dataframe from this list using this syntax and look at the frame itself.

In [31]:
likes_frame['timestamp'] = data_list

In [32]:
likes_frame

Unnamed: 0,title,string_list_data,timestamp
0,fruits_magazine_archives,[{'href': 'https://www.instagram.com/p/B4OAKpd...,1572391466
1,michaelpilmer,[{'href': 'https://www.instagram.com/p/B4NwMhl...,1572383890
2,j0cko_h0mo,[{'href': 'https://www.instagram.com/p/B4MOhGQ...,1572348184
3,fruits_magazine_archives,[{'href': 'https://www.instagram.com/p/B4LhueT...,1572311081
4,joshfreese,[{'href': 'https://www.instagram.com/p/B4Kw13L...,1572283094
...,...,...,...
5916,neozeiss,[{'href': 'https://www.instagram.com/p/B4Svtt0...,1572562649
5917,boojiboy_,[{'href': 'https://www.instagram.com/p/B4SkjNU...,1572562641
5918,michaelpilmer,[{'href': 'https://www.instagram.com/p/B4SrWyr...,1572562637
5919,fruits_magazine_archives,[{'href': 'https://www.instagram.com/p/B4QjXzY...,1572510700


We want to analyze like counts by account. Let's group our frame by the names of the Instagram accounts, represented by the 'title' value, and examine it.

In [106]:
grouped_frame = likes_frame.groupby('title').count()
grouped_frame

Unnamed: 0_level_0,string_list_data,timestamp
title,Unnamed: 1_level_1,Unnamed: 2_level_1
123pingu,1,1
2000s.internet,1,1
2008_hyundai_sonata,1,1
3chordpolitics,1,1
3oclockrock_la,1,1
...,...,...
zoomtecheurope,1,1
zorakfanclub,1,1
zuggyandkathy,8,8
zuorio,1,1


From this grouped frame, we can sort the values in descending order and use the head to get a top-however many values we want to analyze (the top twenty was chosen to be less narrow than just ten, but not excessive).

This is our final data table:

In [117]:
grouped_frame.sort_values('timestamp', ascending = False).head(20)

Unnamed: 0_level_0,string_list_data,timestamp
title,Unnamed: 1_level_1,Unnamed: 2_level_1
krk_ryden,208,208
clubdevo,197,197
darrinzwengel,169,169
michaelpilmer,136,136
martiancolors,136,136
robalster1,107,107
joshfreese,103,103
winnerjeff,102,102
msmollyharvey,101,101
theonion,100,100


Now we have the top twenty accounts sorted by number of likes given by my account.

<h3>Data analysis</h3>

<p>Upon generating my list, I realized that the original intention behind my initial hypothesis was somewhat forgetful. When I drafted my hypothesis, I did not consider the nuances between average Joes using Instagram and popular, more public figures using the platform as an extention of whatever work they are trying to promote. I may consider myself immune to "influencer" culture, but I'm still following people with some degree of a following on Instagram and using Instagram in general, so I guess I'm still playing into it. To take this into account, I split the "average Joes" and "well known people" categorizations of "individuals" into <b>individuals</b> and <b>professional individuals</b> respectively to coincide with the <b>organizations</b> category, as opposed to just individuals and organizations as originally defined in the hypothesis, which turned out to be too vague.

<ol>
    <li>An artist who I like. <b>Professional individual</b>.</li>
    <li>A band I like. <b>Organization</b>.</li>
    <li>My dad! <b>Individual</b>.</li>
    <li>Someone I don't talk to anymore, but I was friends with him. <b>Individual</b>.</li>
    <li>A friend of mine. <b>Individual</b>.</li>
    <li>Also a friend. <b>Individual</b>.</li>
    <li>A famous drummer. <b>Professional individual</b>.</li>
    <li>A friend. <b>Individual</b>.</li>
    <li>A musician I used to listen to. <b>Professional individual</b>.</li>
    <li>Satirical news. <b>Organization</b>.</li>
    <li>A friend. <b>Individual</b>.</li>
    <li>A streaming service i used to use. <b>Organization</b>.</li>
    <li>A friend. <b>Individual</b>.</li>
    <li>A friend. <b>Individual</b>.</li>
    <li>A band I used to like. <b>Organization</b>.</li>
    <li>An musician I like. <b>Professional individual</b>.</li>
    <li>An musician I like. <b>Professional individual</b>.</li>
    <li>Funny edited videos. <b>Organization</b>.</li>
    <li>The creators behind the Criterion DVDs. <b>Organization</b>.</li>
    <li>A friend. <b>Individual</b>.</li>
</ol>

<p>Out of the ten accounts with the most amount of liked posts,</p>
<ul>
    <li><b>Nine</b> of the accounts represented <b>individuals</b>.</li>
    <li><b>Six</b> of the accounts represented <b>organizations</b>.</li>
    <li><b>Five</b> of the accounts represented <b>professional individuals</b>.</li>
</ul>

<p>Hence, the data proved the hypothesis of a positive relationship between the liked posts and their posting by an individual, not an organization, to be <b>correct</b>.

<p>However, there were further nuances to this analysis than I expected, considering the distinction between non-professional and professional individuals. On top of this, three of the "professional individuals" featured in my data I've actually met and consider myself an acquaintance of. I went into this data report thinking that the data would show curated pages by bands and entertainment pages versus friends not chained to any sort of social rep just posting things that happen in their lives. It turns out that, in my case, the lines are a little blurrier! An analysis like this could take a vast amount of different approaches depending on how one classified the accounts being analyzed.</p>

<p>Such data could also be used to draw wider conclusions about how people use Instagram than just focusing on one user's liking habits. An interesting perspective to take using this data could be to examine whether or not accounts with many "likes" are specifically labeled as "professional" accounts or not. Many accounts run by individuals are registered as "professional" accounts with a tagline on their profile for one's occupation, some of which don't even coincide with one's line of work for humor or "aesthetic" purposes. (I am not immune to this; my account claims I am a "scientist" just because I think it's cheeky.) We could draw parallels between the likes granted to average Joe individuals and individuals who deem themselves relevant enough in some way to click the button for the professional account for added profile oomph.</p>

<p>While the data here is useful, it seems that most of the analysis comes from examining the accounts themselves for further information since the data does not contain any of the intricacies of these accounts and is solely focused on one user's "like" data, as opposed to anything deeper. It is up to human analysis and inference to come to a meaningful conclusion, which is a slight limitation.</p>