**Instagram Post Analysis**

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from wordcloud import WordCloud
pio.templates.default = "plotly_white"

In [11]:
df = pd.read_csv("/content/Instagram.csv", encoding='latin-1')

**Q.1: Show column names and have a look at their info.**

In [13]:
print("Column Names:")
print(df.columns)


Column Names:
Index(['Impressions', 'From Home', 'From Hashtags', 'From Explore',
       'From Other', 'Saves', 'Comments', 'Shares', 'Likes', 'Profile Visits',
       'Follows', 'Caption', 'Hashtags'],
      dtype='object')


In [14]:
# Display column info
print("\nColumn Info:")
print(df.info())


Column Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Impressions     119 non-null    int64 
 1   From Home       119 non-null    int64 
 2   From Hashtags   119 non-null    int64 
 3   From Explore    119 non-null    int64 
 4   From Other      119 non-null    int64 
 5   Saves           119 non-null    int64 
 6   Comments        119 non-null    int64 
 7   Shares          119 non-null    int64 
 8   Likes           119 non-null    int64 
 9   Profile Visits  119 non-null    int64 
 10  Follows         119 non-null    int64 
 11  Caption         119 non-null    object
 12  Hashtags        119 non-null    object
dtypes: int64(11), object(2)
memory usage: 12.2+ KB
None


In [15]:
df.head()

Unnamed: 0,Impressions,From Home,From Hashtags,From Explore,From Other,Saves,Comments,Shares,Likes,Profile Visits,Follows,Caption,Hashtags
0,3920,2586,1028,619,56,98,9,5,162,35,2,Here are some of the most important data visua...,#finance #money #business #investing #investme...
1,5394,2727,1838,1174,78,194,7,14,224,48,10,Here are some of the best data science project...,#healthcare #health #covid #data #datascience ...
2,4021,2085,1188,0,533,41,11,1,131,62,12,Learn how to train a machine learning model an...,#data #datascience #dataanalysis #dataanalytic...
3,4528,2700,621,932,73,172,10,7,213,23,8,Heres how you can write a Python program to d...,#python #pythonprogramming #pythonprojects #py...
4,2518,1704,255,279,37,96,5,4,123,8,0,Plotting annotations while visualizing your da...,#datavisualization #datascience #data #dataana...


**Q.2: Show the descriptive statistics of the data.**

In [16]:
df.describe()

Unnamed: 0,Impressions,From Home,From Hashtags,From Explore,From Other,Saves,Comments,Shares,Likes,Profile Visits,Follows
count,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0,119.0
mean,5703.991597,2475.789916,1887.512605,1078.10084,171.092437,153.310924,6.663866,9.361345,173.781513,50.621849,20.756303
std,4843.780105,1489.386348,1884.361443,2613.026132,289.431031,156.317731,3.544576,10.089205,82.378947,87.088402,40.92158
min,1941.0,1133.0,116.0,0.0,9.0,22.0,0.0,0.0,72.0,4.0,0.0
25%,3467.0,1945.0,726.0,157.5,38.0,65.0,4.0,3.0,121.5,15.0,4.0
50%,4289.0,2207.0,1278.0,326.0,74.0,109.0,6.0,6.0,151.0,23.0,8.0
75%,6138.0,2602.5,2363.5,689.5,196.0,169.0,8.0,13.5,204.0,42.0,18.0
max,36919.0,13473.0,11817.0,17414.0,2547.0,1095.0,19.0,75.0,549.0,611.0,260.0


**Q.3: Check if your data contains any missing values**

In [17]:
print("\nMissing Values Check:")
print(df.isnull().sum())


Missing Values Check:
Impressions       0
From Home         0
From Hashtags     0
From Explore      0
From Other        0
Saves             0
Comments          0
Shares            0
Likes             0
Profile Visits    0
Follows           0
Caption           0
Hashtags          0
dtype: int64


**Q.4: When you start exploring your data, always start by exploring the main feature of your data. For
example, as we are working on a dataset based on Instagram Reach, we should start by exploring the
feature that contains data about reach. In our data, the Impressions column contains the data about the
reach of an Instagram post. So let’s have a look at the distribution of the Impressions:**

In [18]:
# Plotting the distribution of Impressions
fig = px.histogram(df, x='Impressions', title='Distribution of Impressions')
fig.update_layout(bargap=0.1)  # Adjusts the gap between bars
fig.show()

**Q.5: Have a look at the number of impressions on each post over time as shown below**

In [19]:
fig = px.line(df, x=df.index, y='Impressions', title='Number of Impressions Over Time')
fig.show()

**Q.6: Have a look at all the metrics like Likes, Saves, and Follows from each post over time as
shown below**

In [24]:
# Create a line plot with multiple metrics over time
fig = go.Figure()

# Adding traces for each metric
fig.add_trace(go.Scatter(x=df.index, y=df['Likes'], mode='lines', name='Likes'))
fig.add_trace(go.Scatter(x=df.index, y=df['Saves'], mode='lines', name='Saves'))
fig.add_trace(go.Scatter(x=df.index, y=df['Follows'], mode='lines', name='Follows'))

# Update layout
fig.update_layout(title='Metrics Over Time',
                  xaxis_title='Date',
                  yaxis_title='Metrics',
                  legend_title='Metrics')

fig.show()

**Q.7: Have a look at the distribution of reach from different sources as shown below**

In [27]:
reach_sources = ['From Home', 'From Hashtags', 'From Explore', 'From Other']
reach_counts = [df[source].sum() for source in reach_sources]

colors = ['#FFB6C1', '#87CEFA', '#90EE90', '#FFDAB9']

fig = px.pie(data_frame=df, names=reach_sources,
             values=reach_counts,
             title='Reach from Different Sources',
             color_discrete_sequence=colors)
fig.show()

**Q.8: Have a look at the distribution of engagement sources as shown below**

In [28]:
engagement_metrics = ['Saves', 'Comments', 'Shares', 'Likes']
engagement_counts = [df[metric].sum() for metric in engagement_metrics]

colors = ['#FFB6C1', '#87CEFA', '#90EE90', '#FFDAB9']

fig = px.pie(data_frame=df, names=engagement_metrics,
             values=engagement_counts,
             title='Engagement Sources',
             color_discrete_sequence=colors)
fig.show()

**Q.9: Have a look at the relationship between the number of profile visits and follows as shown
below**

In [29]:
fig = px.scatter(df,
                 x='Profile Visits',
                 y='Follows',
                 trendline = 'ols',
                 title='Profile Visits vs. Follows')
fig.show()

**Q.10: Have a look at the type of hashtags used in the posts using a wordcloud as shown below**

In [31]:
from wordcloud import WordCloud

hashtags = ' '.join(df['Hashtags'].astype(str))
wordcloud = WordCloud().generate(hashtags)

fig = px.imshow(wordcloud, title='Hashtags Word Cloud')
fig.show()

**Q.11: Have a look at the correlation between all the features as shown below**

In [32]:
corr_matrix = df.corr()

fig = go.Figure(data=go.Heatmap(z=corr_matrix.values,
                               x=corr_matrix.columns,
                               y=corr_matrix.index,
                               colorscale='RdBu',
                               zmin=-1,
                               zmax=1))

fig.update_layout(title='Correlation Matrix',
                  xaxis_title='Features',
                  yaxis_title='Features')

fig.show()





**Q.12: Havea look at the distribution of hashtags to see which hashtag is used the most in all the
posts as shown below**

In [33]:
# Create a list to store all hashtags
all_hashtags = []

# Iterate through each row in the 'Hashtags' column
for row in df['Hashtags']:
    hashtags = str(row).split()
    hashtags = [tag.strip() for tag in hashtags]
    all_hashtags.extend(hashtags)

# Create a pandas DataFrame to store the hashtag distribution
hashtag_distribution = pd.Series(all_hashtags).value_counts().reset_index()
hashtag_distribution.columns = ['Hashtag', 'Count']

fig = px.bar(hashtag_distribution, x='Hashtag',
             y='Count', title='Distribution of Hashtags')
fig.show()

**Q.13: Have a look at the distribution of likes and impressions received from the presence of each
hashtag on the post as shown below**

In [35]:
# Create a dictionary to store the likes and impressions for each hashtag
hashtag_likes = {}
hashtag_impressions = {}

# Iterate through each row in the dataset
for index, row in df.iterrows():
    hashtags = str(row['Hashtags']).split()
    for hashtag in hashtags:
        hashtag = hashtag.strip()
        if hashtag not in hashtag_likes:
            hashtag_likes[hashtag] = 0
            hashtag_impressions[hashtag] = 0
        hashtag_likes[hashtag] += row['Likes']
        hashtag_impressions[hashtag] += row['Impressions']
# Create a DataFrame for likes distribution
likes_distribution = pd.DataFrame(list(hashtag_likes.items()), columns=['Hashtag', 'Likes'])

# Create a DataFrame for impressions distribution
impressions_distribution = pd.DataFrame(list(hashtag_impressions.items()), columns=['Hashtag', 'Impressions'])

fig_likes = px.bar(likes_distribution, x='Hashtag', y='Likes',
                   title='Likes Distribution for Each Hashtag')

fig_impressions = px.bar(impressions_distribution, x='Hashtag',
                         y='Impressions',
                         title='Impressions Distribution for Each Hashtag')
fig_likes.show()
fig_impressions.show()

**Q.14: Write summary as per your observation**

**Summary:**
Exploratory data analysis (EDA) is a Data Science concept where we analyze a dataset to discover patterns, trends, and relationships within the data. It helps us better understand the information contained in the dataset and guides us in making informed decisions and formulating strategies to solve real business problems.
