## Data Analysis Project: Analyzing Instagram Engagement

### Project Summary: 

This project analyzes a dataset related to the engagement of posts on the Instagram account of a fictional company. The goal is to identify the factors that most influence engagement, with a focus on the tags used.

The dataset covers the period from the account's creation until March 27th and includes metrics such as likes, comments, and interactions. Views were disregarded, as the focus of the analysis is on direct engagement metrics. This project concentrates on data preparation and exploratory analysis to generate insights into the performance of the posts.

In [1]:
import pandas as pd

from src.config import ORIGINAL_DATA

df_insta = pd.read_excel(ORIGINAL_DATA)

df_insta.head()

Unnamed: 0,Type,Date,Likes,Comments,Views,Tags,People,Campaign,Carousel,Interactions
0,Photo,2021-09-11,2858,16,,Shop,N,N,,2874
1,Photo,2021-09-11,2930,28,,Shop/Products,N,N,,2958
2,Photo,2021-09-11,2807,9,,Shop,N,N,,2816
3,Video,2021-09-12,5115,49,82878.0,Products,N,N,,5164
4,Photo,2021-09-13,4392,45,,Products,Y,N,,4437


The fictional company asked us to disregard the "views" column, so we removed it from the dataset.

In [2]:
df_insta = df_insta.drop("Views", axis=1)

df_insta.head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
0,Photo,2021-09-11,2858,16,Shop,N,N,,2874
1,Photo,2021-09-11,2930,28,Shop/Products,N,N,,2958
2,Photo,2021-09-11,2807,9,Shop,N,N,,2816
3,Video,2021-09-12,5115,49,Products,N,N,,5164
4,Photo,2021-09-13,4392,45,Products,Y,N,,4437


In [3]:
df_insta.tail()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
47,IGTV,2022-03-12,5489,77,Usages tips/New Products,Y,N,,5566
48,Photo,2022-03-20,29084,479,Celebration date/Promotions,Y,Y,,29563
49,Photo,2022-03-22,9087,106,,Y,Y,,9193
50,Photo,2022-03-26,16551,186,,Y,N,,16737
51,IGTV,2022-03-27,4934,65,Usages tips/Products,Y,N,,4999


In [4]:
df_insta.shape

(52, 9)

In [5]:
df_insta.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Type           52 non-null     object        
 1   Date           52 non-null     datetime64[ns]
 2   Likes          52 non-null     int64         
 3   Comments       52 non-null     int64         
 4   Tags           44 non-null     object        
 5   People         52 non-null     object        
 6   Campaign       52 non-null     object        
 7   Carousel       8 non-null      object        
 8   Interactions   52 non-null     int64         
dtypes: datetime64[ns](1), int64(3), object(5)
memory usage: 3.8+ KB


To Do: Analyze why the "carousel" column has only 8 non-null values.

In [6]:
df_insta["Carousel"].value_counts()

Carousel
Y    8
Name: count, dtype: int64

In [7]:
df_insta.loc[df_insta["Carousel"].isnull()].head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
0,Photo,2021-09-11,2858,16,Shop,N,N,,2874
1,Photo,2021-09-11,2930,28,Shop/Products,N,N,,2958
2,Photo,2021-09-11,2807,9,Shop,N,N,,2816
3,Video,2021-09-12,5115,49,Products,N,N,,5164
4,Photo,2021-09-13,4392,45,Products,Y,N,,4437


In [8]:
df_insta.loc[df_insta["Carousel"].notnull()].head(10)

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
5,Photo,2021-09-17,5359,62,New Products,N,Y,Y,5421
8,Photo,2021-09-27,6355,89,Products,Y,N,Y,6444
12,Photo,2021-10-21,6166,55,New Products,Y,Y,Y,6221
21,Photo,2021-12-23,8328,93,Products,Y,N,Y,8421
25,Photo,2022-01-02,12193,138,New Products,Y,N,Y,12331
26,Photo,2022-01-08,24585,354,Celebration date,Y,Y,Y,24939
28,Photo,2022-01-15,9936,119,New Products,Y,N,Y,10055
40,Photo,2022-02-21,21621,213,Influencers,Y,Y,Y,21834


To Do: Assume that the null values in the "Carousel" column represent "N", since all non-null values are "Y".

In [9]:
df_insta.loc[df_insta["Carousel"].isnull(), "Carousel"] = "N"

In [10]:
df_insta.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Type           52 non-null     object        
 1   Date           52 non-null     datetime64[ns]
 2   Likes          52 non-null     int64         
 3   Comments       52 non-null     int64         
 4   Tags           44 non-null     object        
 5   People         52 non-null     object        
 6   Campaign       52 non-null     object        
 7   Carousel       52 non-null     object        
 8   Interactions   52 non-null     int64         
dtypes: datetime64[ns](1), int64(3), object(5)
memory usage: 3.8+ KB


In [11]:
df_insta["Carousel"].value_counts()

Carousel
N    44
Y     8
Name: count, dtype: int64

In [12]:
df_insta["Type"].value_counts()

Type
Photo    36
Video     6
Reels     5
IGTV      5
Name: count, dtype: int64

To do: Use the describe() function to analyze the statistics of the numeric columns.

In [13]:
with pd.option_context("float_format", "{:.2f}".format):
    display(df_insta.describe(exclude=["datetime", "object"]))

Unnamed: 0,Likes,Comments,Interactions
count,52.0,52.0,52.0
mean,12262.73,189.5,12452.23
std,8165.88,170.69,8299.39
min,2807.0,9.0,2816.0
25%,5492.0,69.5,5562.5
50%,9603.0,128.0,9773.5
75%,17621.75,265.25,17920.75
max,37351.0,852.0,37853.0


In [14]:
df_insta.sort_values(by="Likes", ascending=False).head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
39,Photo,2022-02-17,37351,502,Promotions,Y,Y,N,37853
30,Reels,2022-01-24,29981,502,Trends,Y,Y,N,30483
48,Photo,2022-03-20,29084,479,Celebration date/Promotions,Y,Y,N,29563
33,Photo,2022-02-06,24655,186,Influencers,Y,Y,N,24841
26,Photo,2022-01-08,24585,354,Celebration date,Y,Y,Y,24939


In [15]:
df_insta.sort_values(by="Likes", ascending=True).head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
2,Photo,2021-09-11,2807,9,Shop,N,N,N,2816
0,Photo,2021-09-11,2858,16,Shop,N,N,N,2874
20,Photo,2021-12-16,2881,29,Products,N,N,N,2910
1,Photo,2021-09-11,2930,28,Shop/Products,N,N,N,2958
17,Video,2021-11-09,3213,60,Products,N,N,N,3273


The 5 posts with the highest number of likes featured people and were part of campaigns, while the 5 posts with the lowest number of likes did not include people or were related to campaigns.

This demonstrates that likes are directly related to the presence of people in the photos and the execution of campaigns.

In [16]:
pd.options.display.float_format = "{:,.2f}".format

To Do: Use the describe() function to calculate averages and correlate columns, in order to identify the elements most present in the posts that generated the highest engagement.

In [17]:
df_insta.groupby("Type")["Likes"].mean()

Type
IGTV     6,833.40
Photo   13,341.14
Reels   14,873.00
Video    8,141.50
Name: Likes, dtype: float64

In [18]:
df_insta.groupby(["Type"])[["Likes", "Comments"]].mean().sort_values("Likes", ascending=False)

Unnamed: 0_level_0,Likes,Comments
Type,Unnamed: 1_level_1,Unnamed: 2_level_1
Reels,14873.0,244.4
Photo,13341.14,193.42
Video,8141.5,166.83
IGTV,6833.4,133.6


The data indicates that it is worth investing in Reels and photo posts.

In [19]:
df_insta.groupby(["People"])[["Likes", "Comments"]].mean().sort_values("Likes", ascending=False)

Unnamed: 0_level_0,Likes,Comments
People,Unnamed: 1_level_1,Unnamed: 2_level_1
Y,14664.55,230.5
N,4256.67,52.83


It was also observed that posts featuring people generate higher engagement.

In [20]:
df_insta.groupby(["Type", "People"])[["Likes", "Comments"]].mean().sort_values("Likes", ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Likes,Comments
Type,People,Unnamed: 2_level_1,Unnamed: 3_level_1
Reels,Y,20832.0,342.0
Video,Y,16409.5,370.0
Photo,Y,15236.67,226.2
IGTV,Y,6833.4,133.6
Reels,N,5934.5,98.0
Video,N,4007.5,65.25
Photo,N,3863.5,29.5


Videos featuring people also showed a high level of engagement.

In [21]:
df_insta.groupby(["Type", "People", "Campaign"])[["Likes", "Comments"]].mean().sort_values("Likes", 
                                                                                                   ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Likes,Comments
Type,People,Campaign,Unnamed: 3_level_1,Unnamed: 4_level_1
Reels,Y,Y,24801.0,388.5
Photo,Y,Y,19809.47,298.33
Video,Y,Y,16409.5,370.0
Reels,Y,N,12894.0,249.0
Photo,Y,N,10815.29,159.93
Photo,Y,S,8544.0,72.0
IGTV,Y,N,6833.4,133.6
Reels,N,N,5934.5,98.0
Photo,N,Y,5852.5,47.5
Video,N,N,4007.5,65.25


In conclusion, Reels and photos are the most effective formats for engagement, especially when they include people.

Videos stood out when associated with campaigns featuring people, even generating more comments than photos.

In [22]:
df_insta.groupby("Tags")["Likes"].mean()

Tags
Celebration date              17,975.00
Celebration date/Promotions   29,084.00
Influencers                   15,197.29
New Products                  11,619.57
Products                       5,666.92
Promotions                    26,645.50
Shop                           2,832.50
Shop/Products                  2,930.00
Trends                        22,400.67
Trends/Products               12,894.00
Usages tips/New Products       5,703.50
Usages tips/Products           7,586.67
Name: Likes, dtype: float64

To Do: Split posts that have two or more tags into separate rows to facilitate analysis.

In [23]:
df_insta["Tags"] = df_insta["Tags"].str.split("/")

df_insta.head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
0,Photo,2021-09-11,2858,16,[Shop],N,N,N,2874
1,Photo,2021-09-11,2930,28,"[Shop, Products]",N,N,N,2958
2,Photo,2021-09-11,2807,9,[Shop],N,N,N,2816
3,Video,2021-09-12,5115,49,[Products],N,N,N,5164
4,Photo,2021-09-13,4392,45,[Products],Y,N,N,4437


In [24]:
df_insta = df_insta.explode("Tags")
df_insta.head()

Unnamed: 0,Type,Date,Likes,Comments,Tags,People,Campaign,Carousel,Interactions
0,Photo,2021-09-11,2858,16,Shop,N,N,N,2874
1,Photo,2021-09-11,2930,28,Shop,N,N,N,2958
1,Photo,2021-09-11,2930,28,Products,N,N,N,2958
2,Photo,2021-09-11,2807,9,Shop,N,N,N,2816
3,Video,2021-09-12,5115,49,Products,N,N,N,5164


In [25]:
df_insta.groupby("Tags")["Likes"].mean()

Tags
Celebration date   20,752.25
Influencers        15,197.29
New Products       10,304.89
Products            6,269.82
Promotions         27,458.33
Shop                2,865.00
Trends             20,024.00
Usages tips         6,833.40
Name: Likes, dtype: float64

To Do: Assign elements to null rows in the "Tags" column.

In [26]:
df_insta.loc[df_insta["Tags"].isnull(), "Tags"]

11    NaN
19    NaN
29    NaN
38    NaN
41    NaN
43    NaN
49    NaN
50    NaN
Name: Tags, dtype: object

In [27]:
df_insta.loc[df_insta["Tags"].isnull(), "Tags"] = "No Tag"

In [28]:
df_insta.groupby("Tags")[["Likes", "Comments"]].mean().sort_values("Likes", ascending=False)

Unnamed: 0_level_0,Likes,Comments
Tags,Unnamed: 1_level_1,Unnamed: 2_level_1
Promotions,27458.33,531.0
Celebration date,20752.25,343.5
Trends,20024.0,352.25
No Tag,15347.88,207.75
Influencers,15197.29,161.71
New Products,10304.89,198.56
Usages tips,6833.4,133.6
Products,6269.82,94.12
Shop,2865.0,17.67


Posts related to promotions show the highest engagement, followed by those associated with holidays and trending content.

In [29]:
df_insta.groupby(["People", "Tags"])[["Likes", "Comments"]].mean().sort_values("Likes", ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Likes,Comments
People,Tags,Unnamed: 2_level_1,Unnamed: 3_level_1
Y,Promotions,27458.33,531.0
Y,Celebration date,20752.25,343.5
Y,Trends,20024.0,352.25
Y,No Tag,15347.88,207.75
Y,Influencers,15197.29,161.71
Y,New Products,10923.12,215.62
Y,Products,8316.38,131.62
Y,Usages tips,6833.4,133.6
N,New Products,5359.0,62.0
N,Products,4450.67,60.78


In [30]:
df_insta.groupby(["Campaign", "Tags"])[["Likes", "Comments"]].mean().sort_values("Likes", ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Likes,Comments
Campaign,Tags,Unnamed: 2_level_1,Unnamed: 3_level_1
Y,Promotions,33217.5,490.5
Y,Trends,22400.67,386.67
Y,Influencers,21258.25,229.0
Y,Celebration date,20752.25,343.5
Y,No Tag,16850.75,257.75
N,Promotions,15940.0,612.0
N,No Tag,13845.0,157.75
N,Trends,12894.0,249.0
Y,New Products,11040.67,323.0
N,New Products,9937.0,136.33


## Conclusions

Using people in posts is essential for good engagement.

Creating campaigns boosts brand visibility.

Promotions is the most sought-after tag, but it should be used more strategically due to its cost.

Using trending content also brings a lot of views to the brand.

Leveraging special dates to promote the brand is a great marketing strategy.

New products perform better when presented with other people.

The shop tag, at first glance, does not help promote the brand, but it also hasn’t been used alongside people and campaigns, which have already been proven as excellent ways to reach a larger audience.

There is still room for more types of posts that the brand hasn’t tried yet, such as campaigns with IGTV.