Pixelbrain is a project that uses machine learning models to easily and automatically process and classify images.
It includes modules for image Q&A with GPT-4 Vision, image clustering using embedding models and vector search, image classification with models such as ResNet, preprocessing modules for different models, and a database for storing and retrieving processed data.
All the modules are composable and extendable.
The project also includes pre-built apps for purposes such as people identification.
To install Pixel Brain, you can use pip to install directly from the GitHub repository. Run the following command:
# install libgl (if not installed)
sudo apt-get install libgl-dev
# install mongodb and start it (if not using mongodb atlas)
sudo apt-get install -y mongodb
sudo systemctl start mongodb
pip install git+https://github.com/omerhac/pixel-brain.git
# to use GroundedSAMDectectionModule:
pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.gitexport OPENAI_KEY=your_openai_key # for using gpt4 modules
export MONGODB_ATLAS_KEY=your_mongodb_atlas_key # if remote db is used
# pre-built identity-tagger application
tag_identity --data_path /path/to/your/data --export /path/to/export.scv
tag_identity -h # for more optionsThis is an interface for preprocessing a batch of images for a certain model. It is an abstract base class and needs to be subclassed for specific preprocessing methods.
The DataLoader class loads and decodes images either from disk or S3. It can be configured to load images in batches and optionally decode the images.
The Database class is used to interact with the MongoDB database. It can store fields, query vector fields, find images, and perform other database operations.
This module processes images with GPT-4 Vision and stores the results in a database. It can ask a question to GPT-4 Vision and store the results in a specified field in the database.
This module classifies images into one of the ImageNet classes and stores the class in a database. It can receive a list of classes to choose from (a subset of ImageNet classes), out of which it will pick the one with the largest probability.
This module is used to embed images using the FaceNet model. It crops out faces from the images and then embed's them in a vector database (ChromaDB)
This module is used to identify people in images. The module processes the images and assigns identities to them based on the embeddings stored in the database.