## Importing Data

This notebook has 2 examples

***

The first examples is how to load in data that uses Chroma's default embedding function (SentenceTransformers).

In [None]:
! pip install chromadb --quiet
# ! pip install chroma_datasets --quiet

In [3]:
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma

chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"], n_results=1)
print(result)

Loaded 41 documents into the collection named: StateOfTheUnion
{'ids': [['40']], 'embeddings': None, 'documents': [['Now is our moment to meet and overcome the challenges of our time.\nAnd we will, as one people.\nOne America.\nThe United States of America.\nMay God bless you all. May God protect our troops.']], 'metadatas': [[None]], 'distances': [[1.186147928237915]]}


The second example is how to load in data that is embedded using OpenAI embeddings. This requires passing a `OpenAIEmbeddingFunction` because in order to use the collection and query it, you need to configure it with your API keys.

In [1]:
import chromadb
from chromadb.utils import embedding_functions
from chroma_datasets import Glue
from chroma_datasets.utils import import_into_chroma
from chroma_datasets import HubermanPodcast

chroma_client = chromadb.Client()
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="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",
    model_name="text-embedding-ada-002"
)
sotu_coll = import_into_chroma(chroma_client=chroma_client, dataset=HubermanPodcast, embedding_function=openai_ef)
print(sotu_coll.count())


  from .autonotebook import tqdm as notebook_tqdm
Downloading readme: 100%|██████████| 784/784 [00:00<00:00, 358kB/s]


Downloading and preparing dataset None/None to /Users/jeff/.cache/huggingface/datasets/dexaai___parquet/dexaai--huberman_on_exercise-0477fa3eb690ed4c/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...


Downloading data: 100%|██████████| 4.29M/4.29M [00:00<00:00, 9.12MB/s]
Downloading data files: 100%|██████████| 1/1 [00:01<00:00,  1.49s/it]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 664.92it/s]
                                                                     

Dataset parquet downloaded and prepared to /Users/jeff/.cache/huggingface/datasets/dexaai___parquet/dexaai--huberman_on_exercise-0477fa3eb690ed4c/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7. Subsequent calls will reuse this data.
Loaded 293 documents into the collection named: HubermanPodcast
293


In [2]:
print(sotu_coll.query(query_texts=["How important is lifting weights?"], n_results=1))

{'ids': [['chunk_57977']], 'embeddings': None, 'documents': [["Peter Attia: Yeah, if you look at the literature on this, it's going to tell you it's going to differentiate power lifting from weightlifting. In other words. Yeah, you do need to be kind of moving against a very heavy load. Now, again, that can look very different depending on your level of experience. Like, I really like deadlifting. Now. I can count the number of days left in my life when I'm going to want to do sets over £400, but I'll pick and choose the days that I do. But I grew up doing those things. I'm comfortable with those movements. If I had a 60 year old woman who's never lifted weights in her life, who we now have to get lifting, I mean, we could get her to Deadlift, but I think I wouldn't make perfect the enemy of good. I'd be happy to put her on a leg press machine and just get her doing that. It's not as pure a movement as a deadlift, but who cares, right? We can still put her at a heavy load for her and d