Meerkat is an open-source Python library helps teams index unstructured data with foundation models.
Website | Quickstart | Docs | Contributing | Discord | Blogpost
We recommend installing Meerkat in a virtual environment,
pip install meerkat-ml
GPU Install: If you want to use Meerkat with a GPU, you will need to install PyTorch with GPU support. See here for more details.
Optional Dependencies: some parts of Meerkat rely on optional dependencies e.g. audio processing may rely on utilities from
torchaudio. We leave it up to you to install necessary dependencies when required. As a convenience, we provide bundles of optional dependencies that you can install e.g.
pip install meerkat-ml[text]for text dependencies. See
setup.pyfor a full list of optional dependencies.
Then try one of our demos,
mk demo tutorial-image-gallery --copy
Explore the code for this demo in
To see a full list of demos, use
mk demo --help. (If this didn't work for you, we'd appreciate if you could open an issue and let us know.)
Next Steps. Check out our Getting Started page and our documentation to start building with Meerkat. As we work to make the documentation more comprehensive, please feel free to open an issue or reach out if you have any questions.
Meerkat is an open-source Python library, designed to help technical teams interactively wrangle images, videos, text documents and more with foundation models.
Our goal is to make foundation models a more reliable software abstraction for processing unstructured datasets. Read our blogpost to learn more.
Meerkat’s approach is based on two pillars:
(1) Heterogeneous data frames with extended API. At the heart of Meerkat is a data frame that can store structured fields (e.g. numbers, strings, and dates) alongside complex objects (e.g. images, web pages, audio) and their tensor representations (e.g. embeddings, logits) in a single table. Meerkat's data frame API goes beyond structured data analysis libraries like Pandas by providing a set of FM-backed unstructured data operations.
import meerkat as mk df = mk.from_csv("paintings.csv") df["img"] = mk.files("img_path") df["embeddings"] = mk.embed(df["img"], encoder="clip") df
(2) Interactivity in Python. Meerkat provides interactive data frame visualizations that allow you to control foundation models as they process your data. Meerkat visualizations are implemented in Python, so they can be composed and customized in notebooks or data scripts. Labeling is critical for instructing and validating foundation models. Labeling GUIs are a priority in Meerkat.
match = mk.gui.Match(df, against="embedding", engine="clip" ) sorted_df = mk.sort(df, by=match.criterion.name, ascending=False ) gallery = mk.gui.Gallery(sorted_df) mk.gui.html.div([match, gallery])
Meerkat is being built by Machine Learning PhD students in the Hazy Research lab at Stanford. We're excited to build for a future where models will make it easier for teams to sift and reason through large volumes of data effortlessly. We have varied research backgrounds and have done research that touches all parts of the machine learning process: we've created new model architectures, studied model robustness and evaluation, worked on applications ranging from audio generation to medical imaging.
Please reach out to
kgoel [at] cs [dot] stanford [dot] edu, eyuboglu [at] stanford [dot] edu, and arjundd [at] stanford [dot] edu if you would like to use Meerkat for a project, at your company or if you have any questions.