<a href="https://colab.research.google.com/github/jessmiramontes/instagram_sotries_views/blob/imagesdataset/instagram_stories_step1v3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Analyze images of my IG stories, create a Dataframe and export it to a csv file so I can combine this dataset with the Analytics given by Meta.

**First attempt:** All values on the "dominant color" column are Unknown
Not sure that the objects recognized are correct.

**Second attempt:** Instead of translating colors to a name, I decided to let them as HEX values.

**Next step:** See if the object recognition can be improved because on the first attempt objects were recognized but not all of them were correct. I used ResNet50 for object recognition, next I will try YOLOv5.

**Update Jan 24:** YOLOv5 did a better job recognizing but the most important object in all images is a person so that is not helpful for prediction. I will combine ResNet50 and YOLOv5 as well as MobileNet to detect the activity and clothing.  Added error handling for when a file can't be read. The function adds the name to the DataFrame and NA in the other fields. Result: No activities or clothes were detected so I will remove that. I wonder what can be done with YOLOv5. But I will let the dataset as is and move now to combine it with the Analytics given by Meta.



In [71]:
# Install requerid libraries
!pip install tensorflow
!pip install opencv-python-headless
!pip install deepface
!pip install pytesseract
!pip install pandas
!pip install Pillow
!pip install yolov5
!wget -O mobilenet_iter_73000.caffemodel https://github.com/chenyuntc/simple-faster-rcnn-pytorch/raw/master/models/mobilenet_iter_73000.caffemodel
!wget -O deploy.prototxt https://github.com/chenyuntc/simple-faster-rcnn-pytorch/raw/master/models/MobileNetSSD_deploy.prototxt



--2025-01-25 01:24:34--  https://github.com/chenyuntc/simple-faster-rcnn-pytorch/raw/master/models/mobilenet_iter_73000.caffemodel
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2025-01-25 01:24:34 ERROR 404: Not Found.

--2025-01-25 01:24:34--  https://github.com/chenyuntc/simple-faster-rcnn-pytorch/raw/master/models/MobileNetSSD_deploy.prototxt
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2025-01-25 01:24:34 ERROR 404: Not Found.



In [72]:
# Import libraries
import os
import pandas as pd
import cv2
from PIL import Image as PILImage, UnidentifiedImageError  # Import UnidentifiedImageError
import numpy as np
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import torch
from deepface import DeepFace
import pytesseract
from google.colab import drive


In [30]:
# Mount Google Drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [57]:
# Load pre-trained YOLOv5 and ResNet50 models
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
resnet_model = ResNet50(weights='imagenet')
#net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'mobilenet_iter_73000.caffemodel')  # Load MobileNet model

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2025-1-25 Python-3.11.11 torch-2.5.1+cu121 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


In [None]:
# Initialize the ResNet50 model
#model = ResNet50(weights='imagenet')

In [45]:
# Function to extract date from filename
def extract_date(filename):
    date_str = filename.split('_')[1]
    return pd.to_datetime(date_str, format='%Y%m%d')

In [58]:
# Load YOLOv5 and ResNet50 models
## model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
## resnet_model = ResNet50(weights='imagenet')

# New object detection function using YOLOv5 with error handling
def detect_object_yolo(img_path):
    try:
        results = model(img_path)
        if results:
            labels = results.names
            coords = results.xyxy[0][:4]
            obj = labels[0] if labels else "Unknown"
            return obj
    except UnidentifiedImageError as e:
      ##  print(f"Error identifying image {img_path}: {e}")
        return 'Unknown'
    except Exception as e:
        ## print(f"Unexpected error processing {img_path}: {e}")
        return 'Unknown'
    return 'Unknown'


# Function to use ResNet50 for animal and specific object detection
def detect_resnet_objects(img_path):
    img = PILImage.open(img_path)
    img = img.resize((224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = resnet_model.predict(x)
    return decode_predictions(preds, top=1)[0][0][1]

# Function to combine YOLOv5 and ResNet50
def detect_hybrid_objects(img_path):
    yolo_obj = detect_object_yolo(img_path)
    resnet_obj = detect_resnet_objects(img_path)

    if yolo_obj.lower() != 'person' and yolo_obj != 'Unknown':
        return yolo_obj
    elif resnet_obj.lower() != 'unknown':
        return resnet_obj
    else:
        return 'Object unknown'  # Fallback value if both return "Unknown"


In [69]:
# Load the MobileNet model
net = None

try:
    net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'mobilenet_iter_73000.caffemodel')
except Exception as e:
    print(f"Error loading MobileNet model: {e}")

def recognize_activity(img_path):
    try:
        net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'mobilenet_iter_73000.caffemodel')
        img = cv2.imread(img_path)
        if img is None:
            print(f"Error: Unable to read image file {img_path}")
            return 'Unknown'

        blob = cv2.dnn.blobFromImage(cv2.resize(img, (224, 224)), 0.007843, (224, 224), 127.5)
        net.setInput(blob)
        detections = net.forward()

        if detections.shape[2] > 0:
            activity = detections[0, 0, 0, 1]
            conf_score = detections[0, 0, 0, 2]
            return activity if conf_score > 0.5 else 'Unknown'
    except Exception as e:
       ## print(f"Error during activity recognition for {img_path}: {e}")
        return 'Unknown'

    return 'Unknown'



In [67]:
def detect_clothing(img_path):
    try:
        results = model(img_path)
        labels = results.names if results.names else []
        clothing_items = [label for label in labels if isinstance(label, str) and 'clothing' in label]
        return clothing_items if clothing_items else ['Unknown']
    except Exception as e:
        print(f"Error during clothing detection for {img_path}: {e}")
        return ['Unknown']



In [61]:
# Function to convert RGB to hex color code
def rgb_to_hex(rgb_array):
    return '#%02x%02x%02x' % tuple(rgb_array)

# Updated function to analyze dominant color with error handling and returning hex values
def dominant_color(img_path):
    img = cv2.imread(img_path)
    if img is None:
        print(f"Error: Unable to read image file {img_path}")
        return 'Unknown'

    data = np.reshape(img, (-1, 3))
    data = np.float32(data)
    _, _, centers = cv2.kmeans(data, 1, None, (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2), 10, cv2.KMEANS_RANDOM_CENTERS)
    dominant_color_rgb = centers[0].astype(int)
    dominant_color_hex = rgb_to_hex(dominant_color_rgb)
   # print(f"Dominant RGB values for {img_path}: {dominant_color_rgb}")  # Debugging statement
   # print(f"Dominant hex value for {img_path}: {dominant_color_hex}")  # Debugging statement
    return dominant_color_hex


In [62]:
# Function to detect emotions, will return "no face detected" if there are no faces on the image
def analyze_emotion(img_path):
    try:
        result = DeepFace.analyze(img_path, actions=['emotion'], enforce_detection=False)
        #print(result)  # Print the result to understand its structure
        if isinstance(result, list) and len(result) > 0:
            return result[0]['dominant_emotion'] if 'dominant_emotion' in result[0] else None
        return None
    except ValueError as e:
        print(f"Error analyzing emotion in {img_path}: {e}")
        return 'No face detected'  # Or any other default value you prefer



In [63]:
# Directory of images in Google Drive
image_directory = '/content/drive/MyDrive/Colab Notebooks/stories_archive'

In [64]:
# List to store results
image_data = []

In [77]:
# Process each image in the directory
for filename in os.listdir(image_directory):
    if filename.endswith('.jpg') or filename.endswith('.webp'):  # Adjust for your image extensions
        # print(f"Processing {filename}")
        img_path = os.path.join(image_directory, filename)

        try:

          date = extract_date(filename)
          obj = detect_hybrid_objects(img_path)
          color = dominant_color(img_path)
          emotion = analyze_emotion(img_path)
          activity = recognize_activity(img_path)
          clothing = detect_clothing(img_path)

          image_data.append({
            'nombre_archivo': filename,
            'fecha': date,
            'objeto_principal': obj,
            'color_dominante': color,
            'emocion_primaria': emotion,
            'actividad_detectada': activity,
            'ropa_detectada': clothing
        })



        except UnidentifiedImageError:
            # Handle the error and append 'NA' to image_data list
            image_data.append({
                'nombre_archivo': filename,
                'fecha': 'NA',
                'objeto_principal': 'NA',
                'color_dominante': 'NA',
                'emocion_primaria': 'NA',
                'actividad_detectada': 'NA',
                'ropa_detectada': 'NA'
            })
            print(f"Skipped {filename} due to unidentifiable image.")

print("Finished processing all images.")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 187ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 190ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 187ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 189ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 205ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 189ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 183ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 349ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 190ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 186ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 335ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 186ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 202ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

In [78]:
# Create DataFrame
df = pd.DataFrame(image_data)

# Export DataFrame to CSV
df.to_csv('/content/drive/MyDrive/Colab Notebooks/stories_archive/imagedatajan24.csv', index=False)

print("Data saved successfully.")

Data saved successfully.
