# Objective

We want support 3 common methods across vectorDB interfaces:

(1) `delete by ids`: Delete documents by their IDs.

(2) `update by ids`: Update documents by their IDs.

(3) `add by ids`: Add document with their IDs.


## Example Text

Example from Karpathy-GPT app [here](https://github.com/rlancemartin/karpathy-gpt/tree/main/eval).

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
course_txt = open('/Users/31treehaus/Desktop/AI/karpathy-gpt/eval/karpathy_course_all.txt').read()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
splits = text_splitter.split_text(course_txt)

In [2]:
# Full course
len(splits)

1360

In [3]:
# Test
splits = splits[0:100]

## Pinecone

**Create index** 

* Use Pinecone console to create a new index with `index_name`
 
 ---
 
**Pinecone python client:**

(1) `delete by ids`: Delete documents by their IDs.

* [`Delete`](https://docs.pinecone.io/reference/delete_post) by ID:
```
pinecone.Index(index_name).delete(ids=ids_to_delete)

```

(2) `update by ids`: Update documents by their IDs.

* [`Insert`](https://docs.pinecone.io/reference/upsert) by ID:

```
pinecone.Index(index_name).upsert(vectors=vectors, ids=ids)
```

(3) `add by ids`: Add document with their IDs.

* [`Insert`](https://docs.pinecone.io/reference/upsert) by ID:

```
pinecone.Index(index_name).upsert(vectors=vectors, ids=ids)
```

---

**Langchain:**

(1) `delete by ids`: Delete documents by their IDs.

* Create new method

(2) `update by ids`: Update documents by their IDs.

* `add_texts` is using `upsert` with IDs optionally supplied
* Create a new method that calls `add_texts`

(3) `add by ids`: Add document with their IDs.

* `add_texts` is using `upsert` with IDs optionally supplied

In [None]:
! pip install pinecone-client

In [4]:
import os
import pinecone
from langchain.vectorstores import Pinecone
from langchain.docstore.document import Document
from langchain.embeddings.openai import OpenAIEmbeddings

# Auth
pinecone.init(
    api_key="xxx",  
    environment="us-east1-gcp"  
)

# Create index
embeddings = OpenAIEmbeddings()
index_name = "karpathy-gpt"
# vectorstore_new = Pinecone.from_texts(splits, embeddings, index_name=index_name)

In [8]:
# Read index
vectorstore_pinecone = Pinecone.from_existing_index(index_name=index_name,embedding=embeddings)

In [9]:
# Add by IDs
docs=[
Document(page_content='foo',metadata={"source":'/docs'}),
Document(page_content='bar',metadata={"source":'/docs'}),
Document(page_content='baz',metadata={"source":'/docs'}),
]
vectorstore_pinecone.add_documents_by_id(documents=docs,ids=["1","2","3"])
vectorstore_pinecone.similarity_search("Foo Bar Baz",k=3)

Upserted vectors:   0%|          | 0/3 [00:00<?, ?it/s]

[Document(page_content='baz', metadata={'source': '/docs'}),
 Document(page_content='foo', metadata={'source': '/docs'}),
 Document(page_content='bar', metadata={'source': '/docs'})]

In [10]:
# Update by IDs 
docs=[
Document(page_content='foo',metadata={"source":'/docs'}),
Document(page_content='bar',metadata={"source":'/docs'}),
Document(page_content='biz',metadata={"source":'/docs'}),
]
vectorstore_pinecone.update_documents_by_id(documents=docs,ids=["1","2","3"])
vectorstore_pinecone.similarity_search("Foo Bar Baz",k=3)

Upserted vectors:   0%|          | 0/3 [00:00<?, ?it/s]

[Document(page_content='foo', metadata={'source': '/docs'}),
 Document(page_content='bar', metadata={'source': '/docs'}),
 Document(page_content='biz', metadata={'source': '/docs'})]

In [11]:
# Delete by IDs
vectorstore_pinecone.delete_by_id(ids=["1","2","3"])
vectorstore_pinecone.similarity_search("Foo Bar Baz",k=3)

[Document(page_content="but what I did also is I printed 10,000 characters, so a lot more, and I wrote them to a file. And so here we see some of the outputs. So it's a lot more recognizable as the input text file. So the input text file, just for reference, looked like this. So there's always someone speaking in this manner. And our predictions now take on that form. Except, of course, they're nonsensical when you actually read them. So it is, every crimpty be house. Oh, those prepation. We give heed. You know. Oh, ho, sent me you mighty lord. Anyway, so you can read through this. It's nonsensical, of course, but this is just a transformer trained on the character level for 1 million characters that come from Shakespeare. So there's sort of like blabbers on in Shakespeare-like manner, but it doesn't, of course, make sense at this scale. But I think still a pretty good demonstration of what's possible. So now I think that kind of concludes the programming section of this video. We basi

## Supabase

**Create index** 

* Create a new project in [Supabase dashboard](https://supabase.com/dashboard/project/xhbejgrankzufmczyqil).
* In the project, go to the SQL editor on the left.
* We need to create a table to store our embeddings.
* We will use `pgvector`, an extension for PostgreSQL that allows you to both store and query vector embeddings.
* Create the table in the SQL editor with [this code](https://supabase.com/docs/guides/ai/langchain), modified below for our table name `karpathy_gpt`:

```
-- Enable the pgvector extension to work with embedding vectors
-- create extension vector;

-- Create a table to store your documents
create table karpathy_gpt (
  id bigserial primary key,
  content text, -- corresponds to Document.pageContent
  metadata jsonb, -- corresponds to Document.metadata
  embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);

-- Create a function to search for documents
CREATE OR REPLACE function match_documents (
  query_embedding vector(1536),
  match_count int default null,
  filter jsonb DEFAULT '{}'
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
  return query
  select
    id,
    content,
    metadata,
    1 - (karpathy_gpt.embedding <=> query_embedding) as similarity
  from karpathy_gpt
  where metadata @> filter
  order by karpathy_gpt.embedding <=> query_embedding
  limit match_count;
end;
$$;
```

* Now, the table is created!
* In the project, you can find `SUPABASE_URL` and `SUPABASE_SERVICE_KEY`, which we will use to connect to this table.

---

**Python client:**

(1) `delete by ids`: Delete documents by their IDs.

```
condition = {'id': 'your_id'}
response = client.table(table).delete(condition)
```

(2) `update by ids`: Update documents by their IDs.

```
client = create_client(supabase_url, supabase_key)
condition = {'id': 'your_id'}
response = client.table(table).update(data, condition)
```

(3) `add by ids`: Add document with their IDs.

* [`Insert`](https://supabase.com/docs/reference/python/insert) by ID:

```
client = create_client(supabase_url, supabase_key)
data = {'id': 'custom_id', 'name': 'John Doe', 'age': 30}
response = client.table(table).insert(data)
```

---

**Langchain:**

(1) `delete by ids`: Delete documents by their IDs.

* Create new method

(2) `update by ids`: Update documents by their IDs.

* Create new method using `update`

(3) `add by ids`: Add document with their IDs.

* `add_texts` is using `insert`, but does not support IDs (AFAICT).

```
result = client.from_(table_name).insert(chunk).execute()
```

In [None]:
! pip install supabase

In [2]:
from langchain.docstore.document import Document
from langchain.vectorstores import SupabaseVectorStore
from langchain.embeddings.openai import OpenAIEmbeddings
from supabase.client import Client, create_client

# Auth
supabase_url = "https://xhbejgrankzufmczyqil.supabase.co"
supabase_key = "xxx"
supabase: Client = create_client(supabase_url, supabase_key)

In [3]:
# Create new index
table_name="karpathy_gpt"
embeddings = OpenAIEmbeddings()
# vectorstore_new = SupabaseVectorStore.from_texts(splits,embeddings,client=supabase,table_name=table_name)

In [4]:
# Read from index
vectorstore_supabase = SupabaseVectorStore(client=supabase,embedding=embeddings,table_name=table_name)

In [8]:
# Add by IDs
docs=[
Document(page_content='foo',metadata={"source":'/docs'}),
Document(page_content='bar',metadata={"source":'/docs'}),
Document(page_content='baz',metadata={"source":'/docs'}),
]
vectorstore_supabase.add_documents_by_id(documents=docs,ids=["1000","1002","1003"])
vectorstore_supabase.similarity_search("Foo Bar Baz",k=3)

[Document(page_content='baz', metadata={'source': '/docs'}),
 Document(page_content='foo', metadata={'source': '/docs'}),
 Document(page_content='bar', metadata={'source': '/docs'})]

In [5]:
# Update by IDs 
docs=[
Document(page_content='foo',metadata={"source":'/docs'}),
Document(page_content='bar',metadata={"source":'/docs'}),
Document(page_content='biz',metadata={"source":'/docs'}),
]
vectorstore_supabase.update_documents_by_id(documents=docs,ids=["1000","1002","1003"])
vectorstore_supabase.similarity_search("Foo Bar Baz",k=3)

[Document(page_content='foo', metadata={'source': '/docs'}),
 Document(page_content='bar', metadata={'source': '/docs'}),
 Document(page_content='biz', metadata={'source': '/docs'})]

In [5]:
# Delete by IDs
vectorstore_supabase.delete_by_id(ids=["1000","1002","1003"])
vectorstore_supabase.similarity_search("Foo Bar Baz",k=3)

[Document(page_content="the left here. So this is A, B creating E. And then E plus C creates D, just like we have it here. And finally, let's make this expression just one layer deeper. So D will not be the final output node. Instead, after D, we are going to create a new value object called F. We're going to start running out of variables soon. F will be negative 2.0. And its label will, of course, just be F. And then L, capital L, will be the output of our graph. And L will be T times F. OK. So L will be negative 8,", metadata={}),
 Document(page_content="save it in each node. And then here, we're going to do label as A, label as B, label as C. And then let's create a special E equals A times B. And E dot label will be E. It's kind of naughty. And E will be E plus C. And a D dot label will be B. OK. So nothing really changes. I just added this new E function, new E variable. And then here, when we are printing this, I'm going to print the label here. So this will be a percent S bar. 