# **Optimizing Instagram with Data Insights**

Exploratory Data Analysis using Python

In [2]:
import pandas as pd

importing pandas library

In [3]:
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

The code imports three libraries from the Plotly visualization package:
- plotly.express (px) for easy, interactive charts
- plotly.graph_objects (go) for more detailed chart customization
- plotly.io (pio) for managing chart display and output.

It then sets the default theme for all charts to "plotly_white", giving them a clean look with a white background.

In [4]:
data = pd.read_csv("/content/sample_data/Instagram data.csv", encoding='latin-1')

In [5]:
data.head()

Unnamed: 0,Impressions,From Home,From Hashtags,From Explore,From Other,Saves,Comments,Shares,Likes,Profile Visits,Follows,Caption,Hashtags
0,3920,2586,1028,619,56,98,9,5,162,35,2,Here are some of the most important data visua...,#finance #money #business #investing #investme...
1,5394,2727,1838,1174,78,194,7,14,224,48,10,Here are some of the best data science project...,#healthcare #health #covid #data #datascience ...
2,4021,2085,1188,0,533,41,11,1,131,62,12,Learn how to train a machine learning model an...,#data #datascience #dataanalysis #dataanalytic...
3,4528,2700,621,932,73,172,10,7,213,23,8,Heres how you can write a Python program to d...,#python #pythonprogramming #pythonprojects #py...
4,2518,1704,255,279,37,96,5,4,123,8,0,Plotting annotations while visualizing your da...,#datavisualization #datascience #data #dataana...


Now let’s have a look at all the columns the dataset contains:

In [6]:
data.columns

Index(['Impressions', 'From Home', 'From Hashtags', 'From Explore',
       'From Other', 'Saves', 'Comments', 'Shares', 'Likes', 'Profile Visits',
       'Follows', 'Caption', 'Hashtags'],
      dtype='object')

Now let’s have a look at the column info:

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Impressions     119 non-null    int64 
 1   From Home       119 non-null    int64 
 2   From Hashtags   119 non-null    int64 
 3   From Explore    119 non-null    int64 
 4   From Other      119 non-null    int64 
 5   Saves           119 non-null    int64 
 6   Comments        119 non-null    int64 
 7   Shares          119 non-null    int64 
 8   Likes           119 non-null    int64 
 9   Profile Visits  119 non-null    int64 
 10  Follows         119 non-null    int64 
 11  Caption         119 non-null    object
 12  Hashtags        119 non-null    object
dtypes: int64(11), object(2)
memory usage: 12.2+ KB


Describing the statistical values of the dataset

In [8]:
data.describe()

Unnamed: 0,Impressions,From Home,From Hashtags,From Explore,From Other,Saves,Comments,Shares,Likes,Profile Visits,Follows
count,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0
mean,5703.991597,2475.789916,1887.512605,1078.10084,171.092437,153.310924,6.663866,9.361345,173.781513,50.621849,20.756303
std,4843.780105,1489.386348,1884.361443,2613.026132,289.431031,156.317731,3.544576,10.089205,82.378947,87.088402,40.92158
min,1941.0,1133.0,116.0,0.0,9.0,22.0,0.0,0.0,72.0,4.0,0.0
25%,3467.0,1945.0,726.0,157.5,38.0,65.0,4.0,3.0,121.5,15.0,4.0
50%,4289.0,2207.0,1278.0,326.0,74.0,109.0,6.0,6.0,151.0,23.0,8.0
75%,6138.0,2602.5,2363.5,689.5,196.0,169.0,8.0,13.5,204.0,42.0,18.0
max,36919.0,13473.0,11817.0,17414.0,2547.0,1095.0,19.0,75.0,549.0,611.0,260.0


Checking the Count of null Values

In [9]:
data.isnull().sum()

Unnamed: 0,0
Impressions,0
From Home,0
From Hashtags,0
From Explore,0
From Other,0
Saves,0
Comments,0
Shares,0
Likes,0
Profile Visits,0


In [10]:
fig = px.histogram(data,
                   x='Impressions',
                   nbins=10,
                   title='Distribution of Impressions')
fig.show()

The code creates a histogram to visualize the distribution of 'Impressions' from an Instagram dataset. It uses the plotly.express library to generate the histogram with 10 bins and sets the title to 'Distribution of Impressions'. Finally, it displays the histogram using fig.show().

 let’s have a look at the number of impressions on each post over time:


In [11]:
fig = px.line(data, x= data.index,
              y='Impressions',
              title='Impressions Over Time')
fig.show()

This code creates a line chart to visualize how Instagram impressions change over time. It uses the plotly.express library, which is a part of Plotly designed for creating interactive charts easily.

Now let’s have a look at all the metrics like Likes, Saves, and Follows from each post over time:

In [12]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=data.index, y=data['Likes'], name='Likes'))
fig.add_trace(go.Scatter(x=data.index, y=data['Saves'], name='Saves'))
fig.add_trace(go.Scatter(x=data.index, y=data['Follows'], name='Follows'))

fig.update_layout(title='Metrics Over Time',
                  xaxis_title='Date',
                  yaxis_title='Count')

fig.show()

The code creates an interactive line chart showing Instagram 'Likes', 'Saves', and 'Follows' over time.
It uses Plotly to plot these metrics from a dataset, customizing the chart for clarity.

Finally, fig.show() displays the chart, allowing users to analyze the engagement trends.

Now let’s have a look at the distribution of reach from different sources:

In [13]:
reach_sources = ['From Home', 'From Hashtags', 'From Explore', 'From Other']
reach_counts = [data[source].sum() for source in reach_sources]

colors = ['#FFB6C1', '#87CEFA', '#90EE90', '#FFDAB9']

fig = px.pie(data_frame=data, names=reach_sources,
             values=reach_counts,
             title='Reach from Different Sources',
             color_discrete_sequence=colors)
fig.show()

this code takes Instagram reach data, calculates the total reach for each source, and then uses Plotly to generate a visually appealing pie chart showing the distribution of reach across different sources.

 let’s have a look at the relationship between the number of profile visits and follows


In [14]:
fig = px.scatter(data,
                 x='Profile Visits',
                 y='Follows',
                 trendline = 'ols',
                 title='Profile Visits vs. Follows')
fig.show()

This code takes Instagram data and creates a scatter plot where each point represents a data point with its 'Profile Visits' value on the horizontal axis and 'Follows' value on the vertical axis. It then adds a trendline to this scatter plot to show the general relationship between these two metrics. Finally, it displays the plot to analyze

Now let’s have a look at the type of hashtags used in the posts using a wordcloud

In [15]:
from wordcloud import WordCloud

hashtags = ' '.join(data['Hashtags'].astype(str))
wordcloud = WordCloud().generate(hashtags)

fig = px.imshow(wordcloud, title='Hashtags Word Cloud')
fig.show()

 this code takes Instagram hashtag data, processes it to determine the frequency of each hashtag, and then visualizes this information in an engaging word cloud format, where more frequent hashtags appear larger. This allows to quickly grasp the most prominent topics and themes being used in your posts.

let’s have a look at the distribution of engagement sources

In [16]:
engagement_metrics = ['Saves', 'Comments', 'Shares', 'Likes']
engagement_counts = [data[metric].sum() for metric in engagement_metrics]

colors = ['#FFB6C1', '#87CEFA', '#90EE90', '#FFDAB9']

fig = px.pie(data_frame=data, names=engagement_metrics,
             values=engagement_counts,
             title='Engagement Sources',
             color_discrete_sequence=colors)
fig.show()

 this code takes Instagram data, calculates the total engagement for different types of interactions, and then presents this information in a user-friendly pie chart using the Plotly library. This allows for a clear and concise understanding of the distribution of engagement sources for your Instagram post

Now let’s explore the hashtags column in detail. Each post contains different combinations of hashtags, which impacts reach on Instagram. So let’s have a look at the distribution of hashtags to see which hashtag is used the most in all the posts:


In [17]:
# Create a list to store all hashtags
all_hashtags = []

# Iterate through each row in the 'Hashtags' column
for row in data['Hashtags']:
    hashtags = str(row).split()
    hashtags = [tag.strip() for tag in hashtags]
    all_hashtags.extend(hashtags)

# Create a pandas DataFrame to store the hashtag distribution
hashtag_distribution = pd.Series(all_hashtags).value_counts().reset_index()
hashtag_distribution.columns = ['Hashtag', 'Count']

fig = px.bar(hashtag_distribution, x='Hashtag',
             y='Count', title='Distribution of Hashtags')
fig.show()

This code snippet extracts hashtags from Instagram data, calculates their frequency, and then creates a bar chart visualization to show the distribution of hashtags, highlighting the most popular ones. This can be useful for understanding trends and optimizing hashtag usage

Now let’s have a look at the distribution of likes and impressions received from the presence of each hashtag on the post:

In [18]:
# Create a dictionary to store the likes and impressions for each hashtag
hashtag_likes = {}
hashtag_impressions = {}

# Iterate through each row in the dataset
for index, row in data.iterrows():
    hashtags = str(row['Hashtags']).split()
    for hashtag in hashtags:
        hashtag = hashtag.strip()
        if hashtag not in hashtag_likes:
            hashtag_likes[hashtag] = 0
            hashtag_impressions[hashtag] = 0
        hashtag_likes[hashtag] += row['Likes']
        hashtag_impressions[hashtag] += row['Impressions']

# Create a DataFrame for likes distribution
likes_distribution = pd.DataFrame(list(hashtag_likes.items()), columns=['Hashtag', 'Likes'])

# Create a DataFrame for impressions distribution
impressions_distribution = pd.DataFrame(list(hashtag_impressions.items()), columns=['Hashtag', 'Impressions'])

fig_likes = px.bar(likes_distribution, x='Hashtag', y='Likes',
                   title='Likes Distribution for Each Hashtag')

fig_impressions = px.bar(impressions_distribution, x='Hashtag',
                         y='Impressions',
                         title='Impressions Distribution for Each Hashtag')

fig_likes.show()
fig_impressions.show()

This code segment takes Instagram data, specifically focusing on hashtags, to analyze and visualize how effective different hashtags are at driving likes and impressions. It does this by:

- Collecting data on likes and impressions for each hashtag.

- Organizing the data into DataFrames.
- Creating bar charts to visualize the relationship between hashtags, likes, and impressions.

# **Conclusion**

This project provided a comprehensive analysis of Instagram data, focusing on key metrics like impressions, reach, engagement, and hashtags. By leveraging the capabilities of the plotly library, we were able to visualize these aspects through informative charts such as histograms, line charts, pie charts, and scatter plots. This visual approach enabled us to gain insights into posting patterns and identify potential strategies for optimization.

# **Key Findings**

- Impressions and Reach: Analysis revealed the distribution of impressions over time and across various sources. The identification of primary reach sources can guide content strategy and targeting efforts.
- Engagement: Examining metrics like likes, saves, comments, and shares highlighted the overall engagement levels. Understanding the relationship between these metrics and content types can inform future posting strategies.
- Hashtags: A deep dive into hashtags, including their distribution and impact on likes and impressions, revealed the most effective hashtags. This information is crucial for optimizing hashtag usage and expanding content reach.
- Correlation: Exploring the correlation between profile visits and follows using a scatter plot provided insights into the relationship between these two metrics, suggesting potential strategies to convert profile visits into follows.

# **Recommendations**

 Based on the analysis, the following recommendations can be made to improve Instagram performance:

- Optimize Content Strategy: Focus on creating content that aligns with peak engagement times and resonates with the target audience.
- Leverage High-Performing Hashtags: Incorporate the most effective hashtags into posts to enhance discoverability and reach.
- Diversify Content Formats: Experiment with different content formats to cater to audience preferences and maintain engagement levels.
- Track and Analyze Performance: Regularly monitor key metrics and adjust strategies based on the insights gained.
By implementing these recommendations, users can leverage data-driven insights to optimize their Instagram presence and achieve their desired goals. This project demonstrates the value of data analysis in understanding audience behavior and making informed decisions to enhance social media performance.

I hope this conclusion provides a good summary of the analysis and offers actionable recommendations. Let me know if you have any other questions.