Skip to content

rishabhh-me/CodeClause_Image_Caption_Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Image Caption Generation

Image Caption Generation is a project that automatically generates textual descriptions of images using deep learning techniques. Given an image as input, the model produces a natural language description of the image, allowing for easier interpretation and understanding of the visual content.

This project uses a combination of Convolutional Neural Networks (CNNs) to extract features from images, and Recurrent Neural Networks (RNNs) to generate text descriptions. The architecture of the model is based on the widely used Encoder-Decoder architecture with attention mechanism, which has been shown to perform well in image caption generation tasks.

Requirements

  • Python 3
  • TensorFlow 2
  • NumPy
  • Matplotlib
  • Pillow

Getting Started

  • Clone the repository: git clone https://github.com/psychopass-me/CodeClause_Image_Caption_Generator.git
  • Install the required dependencies: pip install -r requirements.txt
  • Download the dataset: You can download a pre-processed dataset such as COCO (Common Objects in Context) or Flickr30k.
  • Train the model: Run python train.py --data_path path/to/dataset --model_path path/to/save/model
  • Generate captions: Run python generate.py --image_path path/to/image --model_path path/to/model

Dataset

This project can be used with any image caption dataset. The model has been tested on the COCO and Flickr30k datasets, which are commonly used for image caption generation tasks. The dataset should include images and corresponding textual descriptions.

Training

To train the model, run python train.py with the following arguments:

  • --data_path: Path to the directory containing the pre-processed dataset.

  • --model_path: Path to save the trained model.

  • --epochs: Number of epochs to train for.

  • --batch_size: Batch size for training.

  • --learning_rate: Learning rate for training.

  • --save_every: Number of epochs between saving the model. Inference To generate captions for an image, run python generate.py with the following arguments:

  • --image_path: Path to the image for which to generate a caption.

  • --model_path: Path to the trained model.

  • --max_length: Maximum length of the generated caption.

  • --beam_size: Beam size for beam search decoding.

  • --temperature: Temperature for sampling words during decoding.

Evaluation

To evaluate the performance of the model, metrics such as BLEU (Bilingual Evaluation Understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), and METEOR (Metric for Evaluation of Translation with Explicit ORdering) can be used. You can use the nltk library in Python to calculate these metrics.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published