Image-to-Text Tool - Release Notes
Version: 1.0.0
This release introduces a robust and flexible tool designed to process images and generate accurate textual descriptions using advanced machine learning models.
Features:
-
Multiple Model Support: Now supports BLIP and UForm models, providing users with a choice to select the model that best fits their image processing needs.
-
Docker Integration: Includes a Dockerfile setup for easy and consistent environment setup across different systems. The Docker environment is built on the NVIDIA CUDA base image, ensuring optimal GPU support.
-
Flexible Execution: Users can choose to process images with a specific model or use all available models by simply adjusting the script execution flags.
-
Input and Output Management: Images can be placed in an
input
folder, and the output descriptions are saved inJSON
format in designated files for each model.
Installation:
-
The tool can be set up using provided installation scripts for Unix-based (Linux, macOS) and Windows systems.
-
For Docker users, a Dockerfile is provided for building and running the tool in a containerized environment, complete with CUDA support for GPU acceleration.
Usage:
-
Simply place the images in the
input
folder and run therun.py
script with desired model flags. -
For Docker, use the provided commands to build and run the tool in a Docker container.