This script automates the process of generating a transparent PNG image from a given input image using the MobileSAM model. It includes steps for loading the image, generating masks ✂️, applying the largest mask, and saving the output ✨. Additionally, it integrates with ComfyUI for further processing.
- 🐍 Python 3.x
- ⚡ PyTorch
- OpenCV
- NumPy
- PIL (Pillow)
- 🧠 MobileSAM library
- Install Python 3.x if not already installed.
- Install the required libraries using pip:
bash pip install torch torchvision opencv-python numpy pillow - Install MobileSAM library (ensure it is compatible with your Python version):
bash pip install mobile-sam
- Place the input image in the
inputs/📂 directory. - Ensure the MobileSAM checkpoint file is in the
checkpoints/🔑 directory. - Run the script:
bash python script_name.py - The script will generate the following outputs:
-
outputs/product_mask.png: The generated mask 🎭. -outputs/product_transparent.png: The transparent PNG image 💨. - The transparent PNG image will also be copied to
ComfyUI/input/product.png.
INPUT_PATH: ➡️ Path to the input image.CHECKPOINT_PATH: 🔑 Path to the MobileSAM checkpoint file.OUTPUT_MASK: 💾 Path to save the generated mask.OUTPUT_TRANSPARENT: 💾 Path to save the transparent PNG image.
The script checks for GPU availability and uses it if available 🔥. If no GPU is detected, it falls back to CPU 🧊.
- The input image is loaded and converted to RGB format.
- The image is resized if its dimensions exceed the maximum allowed size (800 pixels).
The MobileSAM model is loaded and moved to the GPU (if available). The model is then set to evaluation mode.
- Masks are generated using the
SamAutomaticMaskGeneratorwith specified parameters. - The largest mask is selected based on its area.
- The selected mask is saved as a PNG image.
- The original image is resized to match the mask dimensions.
- The mask is applied to the original image to create a transparent PNG image.
- The transparent PNG image is saved and copied to the ComfyUI input directory.
The script automatically copies the generated transparent PNG image to the ComfyUI/input/ directory ➡️, making it ready for further processing in ComfyUI 🎨.
The script clears GPU memory if a GPU was used during execution.
The provided JSON configuration includes settings for various nodes in ComfyUI, such as loading checkpoints, applying LoRA, encoding text prompts, and handling image inputs and outputs.
- KSampler: Configures sampling parameters.
- CheckpointLoaderSimple: Loads the specified checkpoint.
- LoraLoader: Applies LoRA to the model.
- LoadImage: Loads input images.
- CLIPTextEncode: Encodes positive and negative prompts.
- EmptyLatentImage: Creates an empty latent image.
- ImageToMask: Converts an image to a binary mask.
- GrowMask: Expands the mask edges.
- SetLatentNoiseMask: Applies the inpaint mask.
- LatentUpscale: Ensures the latent image size.
- VAEDecode: Decodes the latent image.
- PreviewImage: Previews the final image.
- Ensure the input image and checkpoint paths are correct.
- Adjust the maximum image dimensions (
MAX_DIM) if needed. - Verify that the MobileSAM checkpoint file is compatible with your model.
This script is provided under the MIT License. Feel free to modify and distribute it as needed.
Happy coding! 🧑💻