This project provides an automatic image captioning system using a pre-trained image captioning model. It generates captions for images uploaded through a FastAPI-based web service. Additionally, it suggests similar sentences from a dataset based on the generated caption.
The project consists of two Python files:
This file contains the main code for the FastAPI web service and the image captioning functionality.
io
fastapi
fastapi.responses
transformers
PIL
similarity
(custom module)
root()
: Root endpoint that returns an HTML response with the contents of thestatic/index.html
file./caption
(POST endpoint): Accepts an uploaded image file and generates captions for the image. It uses the pre-trained image captioning model to generate the caption and finds the top 5 similar sentences from a dataset based on the generated caption.
The app can be run using the command uvicorn main:app --host 0.0.0.0 --port 8080
.
This file contains helper functions for finding the closest sentences to a given input sentence from a dataset.
spacy
csv
sklearn.metrics.pairwise
find_closest_sentences(input_sentence, sentences, top_k=5)
: Finds the closest sentences to the given input sentence from a list of sentences using cosine similarity. It returns the topk
closest sentences.read_sentences_from_csv(file_path, column_name)
: Reads sentences from a CSV file. It returns a list of sentences from the specified column.
To use the image captioning system:
- Install the required dependencies listed in
main.py
usingpip
. - Download the pre-trained image captioning model by Salesforce:
"Salesforce/blip-image-captioning-base"
. - Place the
static/index.html
file in the appropriate location. - Run the
main.py
script using the provided command. - Access the web service by opening the root endpoint in a web browser.
- Upload an image file to the
/caption
endpoint to generate captions.
If desired, you can utilize an OpenAI API or any other Language Model (LLM) instead of the dataset similarity approach. By using an LLM, you can generate caption suggestions directly from the model instead of relying on a predefined dataset. You would need to modify the code accordingly to integrate with the chosen LLM.
Please note that using an LLM may require additional configuration and API access credentials. Refer to the documentation of the specific LLM or OpenAI API for instructions on how to integrate it into the project.
This README provides an overview of the Automatic Instagram Ready Image Captioning project and instructions on how to set it up and use it. Feel free to customize it further based on your specific project requirements.