# Final Project: Data in Hand

## Project Question:
### For my final project, I plan to analyze how different types of social media content affect engagement rates on Instagram. I am interested in whether images, videos, or carousel posts lead to higher engagement and how different hashtags may be associated with engagement levels (likes and comments).



## Data Source and Collection
### My data comes from a publicly available Instagram dataset that includes post-level information such as media type, caption text, hashtags, likes, and comments. Each row represents a single Instagram post. I filtered the original dataset to include only posts from [describe account type or topic, if relevant], and I removed any rows that were missing both likes and comments. The dataset was downloaded as a CSV file named `instagram_posts.csv` and stored in the same folder as this notebook. No private account credentials or API keys are included in this notebook, in line with course guidelines about anonymizing personal information before posting to GitHub.


In [16]:
import os
os.listdir()


['OneDrive - Kent State University',
 '.config',
 'Music',
 '.condarc',
 'Untitled1.ipynb',
 '.DS_Store',
 'Instagram_data.csv',
 '.CFUserTextEncoding',
 '.xonshrc',
 'anaconda_projects',
 'Untitled3.ipynb',
 'Untitled.ipynb',
 'Bradley_SystemsCheck.ipynb',
 '.zshrc',
 'Untitled4.ipynb',
 'DataInHand.ipynb',
 'Pictures',
 'Sept15.ipynb',
 '.zsh_history',
 'Untitled2.ipynb',
 '.ipython',
 'Desktop',
 'Library',
 '.matplotlib',
 'Sept8.ipynb',
 'Likesreport.ipynb',
 '.cricut-design-space',
 'Public',
 '.tcshrc',
 '.anaconda',
 'Movies',
 'Applications',
 'sept22.ipynb',
 '.Trash',
 'json_load.ipynb',
 '.ipynb_checkpoints',
 '.jupyter',
 'Documents',
 '.vscode',
 '.bash_profile',
 'Pandas_Demo_Aug_27.ipynb',
 'Downloads',
 '.continuum',
 '.zsh_sessions',
 '.conda']

In [17]:
import pandas as pd
df = pd.read_csv("instagram_data.csv", encoding=("latin1"))
df.head()

Unnamed: 0,Impressions,From Home,From Hashtags,From Explore,From Other,Saves,Comments,Shares,Likes,Profile Visits,Follows,Caption,Hashtags
0,3920,2586,1028,619,56,98,9,5,162,35,2,Here are some of the most important data visua...,#finance #money #business #investing #investme...
1,5394,2727,1838,1174,78,194,7,14,224,48,10,Here are some of the best data science project...,#healthcare #health #covid #data #datascience ...
2,4021,2085,1188,0,533,41,11,1,131,62,12,Learn how to train a machine learning model an...,#data #datascience #dataanalysis #dataanalytic...
3,4528,2700,621,932,73,172,10,7,213,23,8,Heres how you can write a Python program to d...,#python #pythonprogramming #pythonprojects #py...
4,2518,1704,255,279,37,96,5,4,123,8,0,Plotting annotations while visualizing your da...,#datavisualization #datascience #data #dataana...


In [18]:
df.info()
df.shape
df.columns.tolist()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Impressions     119 non-null    int64 
 1   From Home       119 non-null    int64 
 2   From Hashtags   119 non-null    int64 
 3   From Explore    119 non-null    int64 
 4   From Other      119 non-null    int64 
 5   Saves           119 non-null    int64 
 6   Comments        119 non-null    int64 
 7   Shares          119 non-null    int64 
 8   Likes           119 non-null    int64 
 9   Profile Visits  119 non-null    int64 
 10  Follows         119 non-null    int64 
 11  Caption         119 non-null    object
 12  Hashtags        119 non-null    object
dtypes: int64(11), object(2)
memory usage: 12.2+ KB


['Impressions',
 'From Home',
 'From Hashtags',
 'From Explore',
 'From Other',
 'Saves',
 'Comments',
 'Shares',
 'Likes',
 'Profile Visits',
 'Follows',
 'Caption',
 'Hashtags']

In [19]:
df.head(10)

Unnamed: 0,Impressions,From Home,From Hashtags,From Explore,From Other,Saves,Comments,Shares,Likes,Profile Visits,Follows,Caption,Hashtags
0,3920,2586,1028,619,56,98,9,5,162,35,2,Here are some of the most important data visua...,#finance #money #business #investing #investme...
1,5394,2727,1838,1174,78,194,7,14,224,48,10,Here are some of the best data science project...,#healthcare #health #covid #data #datascience ...
2,4021,2085,1188,0,533,41,11,1,131,62,12,Learn how to train a machine learning model an...,#data #datascience #dataanalysis #dataanalytic...
3,4528,2700,621,932,73,172,10,7,213,23,8,Heres how you can write a Python program to d...,#python #pythonprogramming #pythonprojects #py...
4,2518,1704,255,279,37,96,5,4,123,8,0,Plotting annotations while visualizing your da...,#datavisualization #datascience #data #dataana...
5,3884,2046,1214,329,43,74,7,10,144,9,2,Here are some of the most important soft skill...,#data #datascience #dataanalysis #dataanalytic...
6,2621,1543,599,333,25,22,5,1,76,26,0,Learn how to analyze a candlestick chart as a ...,#stockmarket #investing #stocks #trading #mone...
7,3541,2071,628,500,60,135,4,9,124,12,6,Here are some of the best books that you can f...,#python #pythonprogramming #pythonprojects #py...
8,3749,2384,857,248,49,155,6,8,159,36,4,Here are some of the best data analysis projec...,#dataanalytics #datascience #data #machinelear...
9,4115,2609,1104,178,46,122,6,3,191,31,6,Here are two best ways to count the number of ...,#python #pythonprogramming #pythonprojects #py...


## Data Structure

The dataset contains **[X] rows** and **[Y] columns**. Each row represents one Instagram post.

### Variables

- `post_id` (string): Unique identifier for each post.
- `media_type` (category): Type of content (image, video, carousel).
- `likes` (integer): Number of likes each post received.
- `comments` (integer): Number of comments each post received.
- `hashtags` (string): List or string of hashtags included in the caption.
- `caption` (string, optional): Full text of the post caption.
- `timestamp` (datetime, optional): Date and time when the post was published.


In [25]:
df[['Impressions',
    'Likes',
    'Comments',
    'Saves',
    'Shares',
    'Profile Visits',
    'Follows']].describe()


Unnamed: 0,Impressions,Likes,Comments,Saves,Shares,Profile Visits,Follows
count,119.0,119.0,119.0,119.0,119.0,119.0,119.0
mean,5703.991597,173.781513,6.663866,153.310924,9.361345,50.621849,20.756303
std,4843.780105,82.378947,3.544576,156.317731,10.089205,87.088402,40.92158
min,1941.0,72.0,0.0,22.0,0.0,4.0,0.0
25%,3467.0,121.5,4.0,65.0,3.0,15.0,4.0
50%,4289.0,151.0,6.0,109.0,6.0,23.0,8.0
75%,6138.0,204.0,8.0,169.0,13.5,42.0,18.0
max,36919.0,549.0,19.0,1095.0,75.0,611.0,260.0


In [26]:
df[['From Home', 'From Hashtags', 'From Explore', 'From Other']].describe()

Unnamed: 0,From Home,From Hashtags,From Explore,From Other
count,119.0,119.0,119.0,119.0
mean,2475.789916,1887.512605,1078.10084,171.092437
std,1489.386348,1884.361443,2613.026132,289.431031
min,1133.0,116.0,0.0,9.0
25%,1945.0,726.0,157.5,38.0
50%,2207.0,1278.0,326.0,74.0
75%,2602.5,2363.5,689.5,196.0
max,13473.0,11817.0,17414.0,2547.0


In [27]:
df.isna().sum()

Impressions       0
From Home         0
From Hashtags     0
From Explore      0
From Other        0
Saves             0
Comments          0
Shares            0
Likes             0
Profile Visits    0
Follows           0
Caption           0
Hashtags          0
dtype: int64

In [30]:
df['Engagement'] = df['Likes'] + df['Comments'] + df['Saves'] + df['Shares']
# Summarize it
df['Engagement'].describe()


count     119.000000
mean      343.117647
std       238.849012
min       104.000000
25%       202.500000
50%       288.000000
75%       379.500000
max      1721.000000
Name: Engagement, dtype: float64

#### The dataset includes 119 Instagram posts, each with information about how many people saw and interacted with the content. On average, each post reached about 5,700 people, but some posts reached much fewer and others reached up to 36,900 people, showing big differences in how well posts perform. Most posts got around 151 likes and 6 comments, but engagement can vary a lot, especially with saves, where some posts were saved over 1,000 times. Shares are usually lower, with most posts getting fewer than 10 shares. When we look at where views come from, most impressions come from people’s home feeds and hashtags, with some coming from the Explore page. Posts also led to about 50 profile visits and 21 new followers on average, which shows that content can help grow an account. There are no missing values in the dataset, meaning everything is complete and ready to analyze how different types of posts and hashtags affect engagement.





#### This dataset is a good fit for my project because it includes several important engagement metrics that can help me understand how different Instagram posts perform. Each post has information about impressions, likes, comments, saves, and shares, which gives a full picture of audience interaction. The data also shows where impressions came from, including home feeds, hashtags, and the Explore page, which can help me understand how posts are discovered. The dataset includes captions and hashtags, so I can analyze how certain hashtags or text might relate to higher engagement. Since there are no missing values, I can start my analysis right away without spending time cleaning the data.
#### However, there are some limitations to keep in mind. The dataset does not include media type (image, video, or carousel), which means I can’t directly compare different post formats. It also does not show follower count, so I can’t calculate engagement rates based on audience size, only total likes and comments. The data comes from a single account, so the results may not represent all Instagram content or other types of users. Finally, while captions and hashtags are included, they may need text cleaning if I want to do deeper hashtag analysis. Even with these limits, the dataset still allows me to explore patterns in engagement and discover which hashtags or content ideas seem to perform well.