# Code structure
### Data Preprocessing
Load and preprocess image data
Preprocess text data

### Feature Extraction
Load pre-trained CNN model
Extract features from images

### Caption Generation Model
Design sequence-based model
Define embedding layer
Implement attention mechanism (if needed)
Train the model using training data

### Training
Split data into train, validation, and test sets
Train the model
Validate the model
Evaluate the model using BLEU scores

### Evaluation
Calculate BLEU scores
Visual inspection of generated captions
Compare with expert and crowd judgments

### Hyperparameter Tuning
Experiment with different architectures and hyperparameters

### Model Deployment
Deploy the model for inference
Provide a user-friendly interface
Monitor the model's performance


# Code starts here 

In [5]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

# Set random seed for reproducibility
tf.random.set_seed(42)

# Define paths to dataset files
dataset_dir = "dataset"
image_dir = os.path.join(dataset_dir, "Flicker8k_Dataset")
caption_file = os.path.join(dataset_dir, "Flickr8k_text/Flickr8k.token.txt")

# Load captions into a DataFrame
captions_df = pd.read_csv(caption_file, sep="\t", header=None, names=["image_id", "caption"])

## Preprocessing

We'll resize the images to a fixed size, preprocess them according to the requirements of the VGG16 model, and extract features using the pre-trained VGG16 model.

In [6]:
# Function to load and preprocess images
def load_and_preprocess_image(image_path, target_size=(224, 224)):
    img = load_img(image_path, target_size=target_size)
    img_array = img_to_array(img)
    img_array = preprocess_input(img_array)
    return img_array

# Load and preprocess all images
image_data = {}
for img_file in os.listdir(image_dir):
    img_path = os.path.join(image_dir, img_file)
    image_data[img_file.split('.')[0]] = load_and_preprocess_image(img_path)

# Extract image features using pre-trained VGG16 model
vgg_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
image_features = {}
for img_id, img_data in image_data.items():
    img_data = np.expand_dims(img_data, axis=0)
    features = vgg_model.predict(img_data)
    image_features[img_id] = features.reshape(features.shape[1:])


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


2024-05-04 13:13:43.305882: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


