Image Caption Generation

Image Caption Generation is a project that automatically generates textual descriptions of images using deep learning techniques. Given an image as input, the model produces a natural language description of the image, allowing for easier interpretation and understanding of the visual content.

This project uses a combination of Convolutional Neural Networks (CNNs) to extract features from images, and Recurrent Neural Networks (RNNs) to generate text descriptions. The architecture of the model is based on the widely used Encoder-Decoder architecture with attention mechanism, which has been shown to perform well in image caption generation tasks.

Requirements

Python 3
TensorFlow 2
NumPy
Matplotlib
Pillow

Getting Started

Clone the repository: git clone https://github.com/psychopass-me/CodeClause_Image_Caption_Generator.git
Install the required dependencies: pip install -r requirements.txt
Download the dataset: You can download a pre-processed dataset such as COCO (Common Objects in Context) or Flickr30k.
Train the model: Run python train.py --data_path path/to/dataset --model_path path/to/save/model
Generate captions: Run python generate.py --image_path path/to/image --model_path path/to/model

Dataset

This project can be used with any image caption dataset. The model has been tested on the COCO and Flickr30k datasets, which are commonly used for image caption generation tasks. The dataset should include images and corresponding textual descriptions.

Training

To train the model, run python train.py with the following arguments:

--data_path: Path to the directory containing the pre-processed dataset.
--model_path: Path to save the trained model.
--epochs: Number of epochs to train for.
--batch_size: Batch size for training.
--learning_rate: Learning rate for training.
--save_every: Number of epochs between saving the model. Inference To generate captions for an image, run python generate.py with the following arguments:
--image_path: Path to the image for which to generate a caption.
--model_path: Path to the trained model.
--max_length: Maximum length of the generated caption.
--beam_size: Beam size for beam search decoding.
--temperature: Temperature for sampling words during decoding.

Evaluation

To evaluate the performance of the model, metrics such as BLEU (Bilingual Evaluation Understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), and METEOR (Metric for Evaluation of Translation with Explicit ORdering) can be used. You can use the nltk library in Python to calculate these metrics.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
image-caption-generator.ipynb		image-caption-generator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Caption Generation

Requirements

Getting Started

Dataset

Training

Evaluation

License

About

Uh oh!

Releases

Packages

Languages

License

rishabhh-me/CodeClause_Image_Caption_Generator

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generation

Requirements

Getting Started

Dataset

Training

Evaluation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages