## Visualizing Open AI Embeddings in Atlas

Atlas enables you to:

- Store, update and organize multi-million point datasets of unstructured text, images and embeddings.
- Visually interact with embeddings of your data from a web browser.
- Operate over unstructured data and embeddings with topic modeling, semantic duplicate clustering and semantic search.
- Generate high dimensional and two-dimensional embeddings of your data.

Use Atlas to:
- Visualize, interact, collaborate and share large datasets of text and embeddings.
- Collaboratively curate your unstructured datasets (clean, tag and label)
- Build high-availability apps powered by semantic search
- Understand and debug the latent space of your AI model trains
- Read about how Atlas works or get started below!

https://docs.nomic.ai/index.html

In [1]:
#!pip install nomic

In [2]:
import datetime
import numpy as np
import nomic
import pandas as pd

from nomic import atlas

In [3]:
print(f"Today is {datetime.datetime.today().strftime('%d-%b-%Y %H:%M:%S')}")

Today is 09-Nov-2023 10:42:26


In [4]:
nomic.login('7xDPkYXSYDc1_ErdTPIcoAR9RNd8YDlkS3nVNXcVoIMZ6') # This is a public demo account

In [5]:
!wget https://huggingface.co/spaces/ThirdEyeData/Semantic-Search/resolve/main/fine_food_reviews_with_embeddings_1k.csv

--2023-11-09 10:42:27--  https://huggingface.co/spaces/ThirdEyeData/Semantic-Search/resolve/main/fine_food_reviews_with_embeddings_1k.csv
Resolving huggingface.co (huggingface.co)... 52.85.242.8, 52.85.242.35, 52.85.242.84, ...
Connecting to huggingface.co (huggingface.co)|52.85.242.8|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/56/ee/56eec609e6e674d4ddc29404618b1577b6380cfddc0941873c0c018dd15096e1/0acc913f3deda7b91fcfb73e86a8780d490a54e33f2d2b9b6343078c45f0501b?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27fine_food_reviews_with_embeddings_1k.csv%3B+filename%3D%22fine_food_reviews_with_embeddings_1k.csv%22%3B&response-content-type=text%2Fcsv&Expires=1699785748&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5OTc4NTc0OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy81Ni9lZS81NmVlYzYwOWU2ZTY3NGQ0ZGRjMjk0MDQ2MThiMTU3N2I2MzgwY2ZkZGMwOTQx

In [6]:
csvfile = "fine_food_reviews_with_embeddings_1k.csv"

df = pd.read_csv(csvfile)
df

Unnamed: 0.1,Unnamed: 0,ProductId,UserId,Score,Summary,Text,combined,n_tokens,embedding
0,0,B003XPF9BO,A3R7JR3FMEBXQB,5,where does one start...and stop... with a tre...,Wanted to save some to bring to my Chicago fam...,Title: where does one start...and stop... wit...,52,"[0.007018072064965963, -0.02731654793024063, 0..."
1,297,B003VXHGPK,A21VWSCGW7UUAR,4,"Good, but not Wolfgang Puck good","Honestly, I have to admit that I expected a li...","Title: Good, but not Wolfgang Puck good; Conte...",178,"[-0.003140551969408989, -0.009995664469897747,..."
2,296,B008JKTTUA,A34XBAIFT02B60,1,Should advertise coconut as an ingredient more...,"First, these should be called Mac - Coconut ba...",Title: Should advertise coconut as an ingredie...,78,"[-0.01757248118519783, -8.266511576948687e-05,..."
3,295,B000LKTTTW,A14MQ40CCU8B13,5,Best tomato soup,I have a hard time finding packaged food of an...,Title: Best tomato soup; Content: I have a har...,111,"[-0.0013932279543951154, -0.011112828738987446..."
4,294,B001D09KAM,A34XBAIFT02B60,1,Should advertise coconut as an ingredient more...,"First, these should be called Mac - Coconut ba...",Title: Should advertise coconut as an ingredie...,78,"[-0.01757248118519783, -8.266511576948687e-05,..."
...,...,...,...,...,...,...,...,...,...
995,623,B0000CFXYA,A3GS4GWPIBV0NT,1,Strange inflammation response,Truthfully wasn't crazy about the taste of the...,Title: Strange inflammation response; Content:...,110,"[0.00011091353371739388, -0.00466986745595932,..."
996,624,B0001BH5YM,A1BZ3HMAKK0NC,5,My favorite and only MUSTARD,You've just got to experience this mustard... ...,Title: My favorite and only MUSTARD; Content:...,80,"[-0.020869314670562744, -0.013138455338776112,..."
997,625,B0009ET7TC,A2FSDQY5AI6TNX,5,My furbabies LOVE these!,Shake the container and they come running. Eve...,Title: My furbabies LOVE these!; Content: Shak...,47,"[-0.009749102406203747, -0.0068712360225617886..."
998,619,B007PA32L2,A15FF2P7RPKH6G,5,got this for the daughter,all i have heard since she got a kuerig is why...,Title: got this for the daughter; Content: all...,50,"[-0.00521062919870019, 0.0009606690146028996, ..."


In [7]:
df

Unnamed: 0.1,Unnamed: 0,ProductId,UserId,Score,Summary,Text,combined,n_tokens,embedding
0,0,B003XPF9BO,A3R7JR3FMEBXQB,5,where does one start...and stop... with a tre...,Wanted to save some to bring to my Chicago fam...,Title: where does one start...and stop... wit...,52,"[0.007018072064965963, -0.02731654793024063, 0..."
1,297,B003VXHGPK,A21VWSCGW7UUAR,4,"Good, but not Wolfgang Puck good","Honestly, I have to admit that I expected a li...","Title: Good, but not Wolfgang Puck good; Conte...",178,"[-0.003140551969408989, -0.009995664469897747,..."
2,296,B008JKTTUA,A34XBAIFT02B60,1,Should advertise coconut as an ingredient more...,"First, these should be called Mac - Coconut ba...",Title: Should advertise coconut as an ingredie...,78,"[-0.01757248118519783, -8.266511576948687e-05,..."
3,295,B000LKTTTW,A14MQ40CCU8B13,5,Best tomato soup,I have a hard time finding packaged food of an...,Title: Best tomato soup; Content: I have a har...,111,"[-0.0013932279543951154, -0.011112828738987446..."
4,294,B001D09KAM,A34XBAIFT02B60,1,Should advertise coconut as an ingredient more...,"First, these should be called Mac - Coconut ba...",Title: Should advertise coconut as an ingredie...,78,"[-0.01757248118519783, -8.266511576948687e-05,..."
...,...,...,...,...,...,...,...,...,...
995,623,B0000CFXYA,A3GS4GWPIBV0NT,1,Strange inflammation response,Truthfully wasn't crazy about the taste of the...,Title: Strange inflammation response; Content:...,110,"[0.00011091353371739388, -0.00466986745595932,..."
996,624,B0001BH5YM,A1BZ3HMAKK0NC,5,My favorite and only MUSTARD,You've just got to experience this mustard... ...,Title: My favorite and only MUSTARD; Content:...,80,"[-0.020869314670562744, -0.013138455338776112,..."
997,625,B0009ET7TC,A2FSDQY5AI6TNX,5,My furbabies LOVE these!,Shake the container and they come running. Eve...,Title: My furbabies LOVE these!; Content: Shak...,47,"[-0.009749102406203747, -0.0068712360225617886..."
998,619,B007PA32L2,A15FF2P7RPKH6G,5,got this for the daughter,all i have heard since she got a kuerig is why...,Title: got this for the daughter; Content: all...,50,"[-0.00521062919870019, 0.0009606690146028996, ..."


In [8]:
embeddings = np.array(df.embedding.apply(eval).to_list())
embeddings

array([[ 7.01807206e-03, -2.73165479e-02,  1.05734831e-02, ...,
        -7.01120170e-03, -2.18614824e-02, -3.75671238e-02],
       [-3.14055197e-03, -9.99566447e-03, -3.48033849e-03, ...,
        -9.74494778e-03, -2.39829952e-03, -9.20392852e-03],
       [-1.75724812e-02, -8.26651158e-05, -1.15222773e-02, ...,
        -1.39020244e-02, -3.90170924e-02, -2.35151257e-02],
       ...,
       [-9.74910241e-03, -6.87123602e-03, -5.70622832e-03, ...,
        -3.00459806e-02, -8.14515445e-03, -1.95114054e-02],
       [-5.21062920e-03,  9.60669015e-04,  2.82862745e-02, ...,
        -5.38039953e-03, -1.33138765e-02, -2.71892995e-02],
       [-6.05782261e-03, -1.50158405e-02, -2.07575737e-03, ...,
        -2.90671214e-02, -1.41164539e-02, -2.28756946e-02]])

In [9]:
df = df.drop('embedding', axis=1)
df = df.rename(columns={'Unnamed: 0': 'id'})
df['id'] = df['id'].astype(str)

df.head()

Unnamed: 0,id,ProductId,UserId,Score,Summary,Text,combined,n_tokens
0,0,B003XPF9BO,A3R7JR3FMEBXQB,5,where does one start...and stop... with a tre...,Wanted to save some to bring to my Chicago fam...,Title: where does one start...and stop... wit...,52
1,297,B003VXHGPK,A21VWSCGW7UUAR,4,"Good, but not Wolfgang Puck good","Honestly, I have to admit that I expected a li...","Title: Good, but not Wolfgang Puck good; Conte...",178
2,296,B008JKTTUA,A34XBAIFT02B60,1,Should advertise coconut as an ingredient more...,"First, these should be called Mac - Coconut ba...",Title: Should advertise coconut as an ingredie...,78
3,295,B000LKTTTW,A14MQ40CCU8B13,5,Best tomato soup,I have a hard time finding packaged food of an...,Title: Best tomato soup; Content: I have a har...,111
4,294,B001D09KAM,A34XBAIFT02B60,1,Should advertise coconut as an ingredient more...,"First, these should be called Mac - Coconut ba...",Title: Should advertise coconut as an ingredie...,78


In [10]:
df.shape

(1000, 8)

In [11]:
project = atlas.map_embeddings(embeddings=embeddings,
                               data=df.to_dict('records'),
                               id_field='id',
                               colorable_fields=['Score'])
map = project.maps[0]

[32m2023-11-09 10:42:43.197[0m | [1mINFO    [0m | [36mnomic.project[0m:[36m_create_project[0m:[36m790[0m - [1mCreating project `obsolete-occasion` in organization `Atlas Demo`[0m
[32m2023-11-09 10:42:49.226[0m | [1mINFO    [0m | [36mnomic.atlas[0m:[36mmap_embeddings[0m:[36m107[0m - [1mUploading embeddings to Atlas.[0m
1it [00:11, 11.88s/it]
[32m2023-11-09 10:43:01.607[0m | [1mINFO    [0m | [36mnomic.project[0m:[36m_add_data[0m:[36m1422[0m - [1mUpload succeeded.[0m
[32m2023-11-09 10:43:01.608[0m | [1mINFO    [0m | [36mnomic.atlas[0m:[36mmap_embeddings[0m:[36m126[0m - [1mEmbedding upload succeeded.[0m
[32m2023-11-09 10:43:03.141[0m | [1mINFO    [0m | [36mnomic.project[0m:[36mcreate_index[0m:[36m1132[0m - [1mCreated map `obsolete-occasion` in project `obsolete-occasion`: https://atlas.nomic.ai/map/47c5d6b0-ed07-4708-9a36-c8b214a61b76/3662a65f-ff10-4096-9e08-0438505464e8[0m
[32m2023-11-09 10:43:03.142[0m | [1mINFO    [0m | [3

In [12]:
map

<img src="capture.png">