In [1]:
!pip install nomic pandas langchain



### Load a demo dataset of 25k news articles

In [2]:
from nomic import AtlasProject
import pandas

#load a demo dataset of 25k news articles
news_articles = pandas.read_csv('https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv').to_dict('records')

#use only the first 10k
news_articles = news_articles[:10_000]

In [3]:
print(news_articles[0].keys())

#create a project in the Atlas Embedding Database.
#By specifying modality='embedding' you are saying you will upload your own embeddings.
project = AtlasProject(name='10k News Articles', unique_id_field='id', modality='embedding')

dict_keys(['id', 'text', 'label'])


2023-03-20 13:53:53.442 | INFO     | nomic.project:__init__:856 - Loading existing project `10k News Articles` from organization `andriy`.


In [4]:
from langchain.embeddings import OpenAIEmbeddings
openai_key= ''

openai = OpenAIEmbeddings(openai_api_key=openai_key, model='text-embedding-ada-002')
# embed the news articles with OpenAI
embeddings = openai.embed_documents(texts=[article['text'] for article in news_articles])

In [None]:
import numpy as np

#add your OpenAI embeddings and metadata to the Atlas DB project
project.add_embeddings(
    embeddings=np.array(embeddings),
    data=news_articles
)
project.create_index(name=project.name, build_topic_model=True, topic_label_field='text')
print(project.maps[0])

By running the next cell, you will be shown the Atlas Embedding DB inspector (Atlas calls it The Map). Points close to each are semantically similar. It is useful for getting an idea of the types of search results your app will serve for any query.

In [8]:
project.maps[0]

# Semantic Search in your app
To get semantic search running in your app, you need to simply paste your Atlas Embedding DB project name (in this case '10k News Articles') into the settings.py environment variable of the FastAPI app!