# AI-Based Social Media Post & Reel Performance Predictor
## Using Multimodal Machine Learning (Text + Image + Video + Audio)

This project predicts Instagram post/reel performance (High/Medium/Low) using multimodal features. Adapted to your dataset: image_path, caption, hashtags, likes.

**Platform**: Instagram
**Dashboard**: Streamlit (code provided at the end)
**Dataset**: `Projects/Untitled Folder/instagram_dataset.csv` (columns: image_path, caption, hashtags, likes)

Install dependencies: `pip install torch torchvision transformers scikit-learn pandas numpy opencv-python librosa streamlit pillow requests`

In [38]:
import sys
print(sys.executable)

C:\Users\Sid\anaconda3\python.exe


In [39]:
!pip install pillow
!pip install opencv-python
!pip install librosa
!pip install torch
!pip install transformers
!pip install scikit-learn



In [60]:
import pandas as pd
import numpy as np
import requests
from PIL import Image
import cv2
import librosa
import torch
from transformers import BertTokenizer, BertModel, CLIPProcessor, CLIPModel
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import os
import warnings
warnings.filterwarnings('ignore')

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## Step 1: Load and Explore Dataset
Load your dataset with columns: image_path, caption, hashtags, likes. Supplement with open datasets if needed.`

In [65]:
# Load your dataset
df = pd.read_csv('instagram_dataset.csv')
print(df.head())
print(df.info())

# Compute engagement score (using only likes, as per your data)
df['engagement'] = df['likes']

# Categorize performance based on likes
thresholds = df['likes'].quantile([0.25, 0.75])

# Example: Based on your data (adjust numbers from df['likes'].describe())
# Low: < 50 likes, Medium: 50-200, High: >200 (make bins narrower)
df['performance'] = pd.cut(df['likes'], bins=[-np.inf, 500, np.inf], labels=['Low/Medium', 'High'])  # Combine Low and Medium


# Add this to check your data
print("Likes summary:", df['likes'].describe())  # Shows min, max, average likes
print("Performance counts:", df['performance'].value_counts())  # Shows how many High/Medium/Low

# Note: No reels in this dataset; assuming all are posts (images)

   id   image_path                caption                           hashtags  \
0   1  image_1.jpg  Small moments matter.    #goodday #motivation #lifestyle   
1   2  image_2.jpg      Adventure awaits.  #travelgram #vacation #wanderlust   
2   3  image_3.jpg   Latest product drop!        #reviews #brand #onlineshop   
3   4  image_4.jpg        Must-have item!  #onlineshop #newproduct #shopping   
4   5  image_5.jpg  Sweat. Smile. Repeat.   #healthyliving #fitlife #workout   

   likes  
0   4442  
1   3551  
2   1700  
3   1758  
4   1479  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          500 non-null    int64 
 1   image_path  500 non-null    object
 2   caption     500 non-null    object
 3   hashtags    500 non-null    object
 4   likes       500 non-null    int64 
dtypes: int64(2), object(3)
memory usage: 19.7+ KB
None
Likes summary

## Step 2: Feature Extraction
Extract features from text and images (no videos/audio in your dataset).

In [66]:
# Initialize models
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = BertModel.from_pretrained('bert-base-uncased').to(device)
clip_processor = CLIPProcessor.from_pretrained('openai/clip-vit-base-patch32')
clip_model = CLIPModel.from_pretrained('openai/clip-vit-base-patch32').to(device)

def extract_text_features(text):
    inputs = bert_tokenizer(text, return_tensors='pt', truncation=True, padding=True, max_length=512).to(device)
    with torch.no_grad():
        outputs = bert_model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().numpy()

def extract_image_features(image_path):
    try:
        if image_path.startswith('http'):  # If URL
            image = Image.open(requests.get(image_path, stream=True).raw).convert('RGB')
        else:  # Local path
            image = Image.open(image_path).convert('RGB')
        inputs = clip_processor(images=image, return_tensors='pt').to(device)
        with torch.no_grad():
            features = clip_model.get_image_features(**inputs)
        return features.squeeze().cpu().numpy()
    except:
        return np.zeros(512)  # CLIP image feature dim

# Apply to dataset
df['text'] = df['caption'] + ' ' + df['hashtags']
df['text_features'] = df['text'].apply(extract_text_features)
df['image_features'] = df['image_path'].apply(extract_image_features)

# Combine features (text + image only)
def combine_features(row):
    text_feat = row['text_features']
    img_feat = row['image_features']
    return np.concatenate([text_feat, img_feat])

df['combined_features'] = df.apply(combine_features, axis=1)
X = np.array(df['combined_features'].tolist())
y = df['performance'].astype('category').cat.codes  # Encode labels

## Step 3: Train Model
Train a classifier on the combined features.

In [67]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight='balanced', random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


# Save model
import joblib
joblib.dump(model, 'performance_predictor.pkl')

              precision    recall  f1-score   support

           0       0.10      0.10      0.10        10
           1       0.90      0.90      0.90        90

    accuracy                           0.82       100
   macro avg       0.50      0.50      0.50       100
weighted avg       0.82      0.82      0.82       100



['performance_predictor.pkl']

## Step 4: Prediction Function
Function to predict for new content (image path, caption, hashtags).

In [68]:
def predict_performance(caption, hashtags, image_path=None):
    text = caption + ' ' + hashtags
    text_feat = extract_text_features(text)
    img_feat = extract_image_features(image_path) if image_path else np.zeros(512)
    features = np.concatenate([text_feat, img_feat]).reshape(1, -1)
    pred = model.predict(features)[0]
    labels = ['Low', 'Medium', 'High']
    return labels[pred]

# Example
print(predict_performance('Amazing sunset!', '#sunset #photography', 'path/to/image.jpg'))  # Replace with actual path

Medium
