Skip to content

Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn!

Notifications You must be signed in to change notification settings

jacobmarks/clustering-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clustering Plugin for FiftyOne

GPT4V_labels

This plugin provides a FiftyOne App that allows you to cluster your dataset using a variety of algorithms:

It also serves as a proof of concept for adding new "types" of runs to FiftyOne!!!

Installation

fiftyone plugins download https://github.com/jacobmarks/clustering-plugin

You will also need to have scikit-learn installed:

pip install -U scikit-learn

Usage

Clustering

Once you have the plugin installed, you can generate clusters for your dataset using the compute_clusters operator:

compute_clusters_from_scratch

The specific arguments depend on the method you choose — kmeans, birch, or agglomerative.

Here, we are generating clusters at the same time as we are generating the embeddings, but you can also generate clusters from existing embeddings:

compute_clusters_from_embeddings

You can generate clusters for:

  • Your entire dataset
  • A view of your dataset
  • Currently selected samples in the App

Additionally, you can run the operator in:

  • Real-time, or
  • In the background, as a delegated operation

Once you have generated clusters, you can view information about the clusters in the App with the get_clustering_run_info operator:

get_cluster_info

Visualizing Clusters

It can be insightful to use clustering in conjunction with compute_visualization to visualize the clusters:

visualize_clusters

Labeling Clusters

Once you have generated clusters, you can also use the magic of multimodal AI to automatically assign short descriptions, or labels to each cluster!

This is achieved by randomly selecting a few samples from each cluster, and prompting GPT-4V to generate a description for the cluster from the samples.

To use this functionality, you must have an API key for OpenAI's GPT-4V API, and you must set it in your environment as OPENAI_API_KEY.

export OPENAI_API_KEY=your-api-key

Then, you can label the clusters using the label_clusters_with_gpt4v operator. This might take a minute or so, depending on the number of clusters, but it is worth it! It is recommended to delegate the execution of this operation, and then launch it via

fiftyone delegated launch

Then you can view the labels in the App!

GPT4V_labels

Releases

No releases published

Packages

No packages published

Languages