# Instagram Reach Analysis and Prediction

## 1. Introduction

This notebook analyzes Instagram reach and predicts future reach based on historical data. We will go through the following steps:
1. Data Loading and Preprocessing
2. Exploratory Data Analysis (EDA)
3. Feature Engineering
4. Model Training and Evaluation
5. Conclusion

## 2. Data Loading and Preprocessing

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

df = pd.read_excel('Instagram-Reach-Analysis.xlsx')
df.head()

In [None]:
df.info()

In [None]:
df.isnull().sum()

We have one missing value in the 'Caption' and 'Hashtags' columns. We will fill these with empty strings.

In [None]:
df['Caption'].fillna('', inplace=True)
df['Hashtags'].fillna('', inplace=True)

## 3. Exploratory Data Analysis (EDA)

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(df['Reach'], bins=30, kde=True)
plt.title('Distribution of Instagram Reach')
plt.xlabel('Reach')
plt.ylabel('Frequency')
plt.show()

In [None]:
plt.figure(figsize=(12, 8))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Numerical Features')
plt.show()

## 4. Feature Engineering

We will create some new features from the existing ones.

In [None]:
df['caption_length'] = df['Caption'].apply(len)
df['hashtags_count'] = df['Hashtags'].apply(lambda x: len(x.split()))

## 5. Model Training and Evaluation

In [None]:
features = ['Likes', 'Saves', 'Comments', 'Shares', 'Profile Visits', 'Follows', 'caption_length', 'hashtags_count']
target = 'Reach'

X = df[features]
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

print('MAE:', mean_absolute_error(y_test, y_pred))
print('MSE:', mean_squared_error(y_test, y_pred))
print('R2 Score:', r2_score(y_test, y_pred))

## 6. Conclusion

In this notebook, we have analyzed the Instagram reach dataset and built a linear regression model to predict the reach of a post. The model's performance is reasonable, but it can be further improved by using more advanced models and feature engineering techniques.