# Objective

Test that we can both write to and read from an index across several production vectorstores.

In particular, the existing documentation often involoves simmply writing to an `index` (e.g., `from_documents`).

But, there are two gaps:

1) We don't confirm that we can also read an existing index, which appears to be a problem w/ Weviate today.

2) We also don't confirm integration w/ hosted instances in all cases (e.g., many are just local). 


## Example Text

Example from Karpathy-GPT app [here](https://github.com/rlancemartin/karpathy-gpt/tree/main/eval).

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
course_txt = open('example_data/karpathy_course_all.txt').read()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
splits = text_splitter.split_text(course_txt)

In [None]:
# Full course
len(splits)

In [None]:
# Test
splits = splits[0:100]

## Pinecone

**Create index** 

* Use Pinecone console to create a new index with `index_name`
 
 ---
 
**Pinecone python client:**

* [`Insert`](https://docs.pinecone.io/reference/upsert) by ID:

```
pinecone.Index(index_name).upsert(vectors=vectors, ids=ids)
```

* `Update` an existing entry by ID is done by upsert if the ID does not exist in the index:

* [`Delete`](https://docs.pinecone.io/reference/delete_post) by ID:
```
pinecone.Index(index_name).delete(ids=ids_to_delete)

```

---

**Langchain:**

`Write / Update`

* `from_texts` and `add_texts` both using `upsert`
* IDs can be supplied

```
ids = ids or [str(uuid.uuid4()) for _ in texts]
docs.append((ids[i], embedding, metadata))
self._index.upsert(vectors=docs, namespace=namespace, batch_size=batch_size)
```

`Read`

* Supported from an existing index

`Delete`

* **Need support for delete**

`Update`

[Docs](https://python.langchain.com/en/latest/reference/modules/vectorstores.html#langchain.vectorstores.Pinecone)

In [None]:
! pip install pinecone-client

In [None]:
import os
import pinecone
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings

# Auth
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY'),  
    environment="us-east1-gcp"  
)

# Create new index
embeddings = OpenAIEmbeddings()
index_name = "karpathy-gpt"
vectorstore_new = Pinecone.from_texts(splits, embeddings, index_name=index_name)

In [None]:
# Read from index
vectorstore_pinecone = Pinecone.from_existing_index(index_name=index_name,embedding=embeddings)

In [None]:
# Query
query = "What is micrograd?"
matched_docs = vectorstore_pinecone.similarity_search(query)
matched_docs[0]

## Supabase

**Create index** 

* Create a new project in [Supabase dashboard](https://supabase.com/dashboard/project/xhbejgrankzufmczyqil).
* In the project, go to the SQL editor on the left.
* We need to create a table to store our embeddings.
* We will use `pgvector`, an extension for PostgreSQL that allows you to both store and query vector embeddings.
* Create the table in the SQL editor with [this code](https://supabase.com/docs/guides/ai/langchain), modified below for our table name `karpathy_gpt`:

```
-- Enable the pgvector extension to work with embedding vectors
-- create extension vector;

-- Create a table to store your documents
create table karpathy_gpt (
  id bigserial primary key,
  content text, -- corresponds to Document.pageContent
  metadata jsonb, -- corresponds to Document.metadata
  embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);

-- Create a function to search for documents
CREATE OR REPLACE function match_documents (
  query_embedding vector(1536),
  match_count int default null,
  filter jsonb DEFAULT '{}'
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
  return query
  select
    id,
    content,
    metadata,
    1 - (karpathy_gpt.embedding <=> query_embedding) as similarity
  from karpathy_gpt
  where metadata @> filter
  order by karpathy_gpt.embedding <=> query_embedding
  limit match_count;
end;
$$;
```

* Now, the table is created!
* In the project, you can find `SUPABASE_URL` and `SUPABASE_SERVICE_KEY`, which we will use to connect to this table.

---

**Python client:**

* [`Insert`](https://supabase.com/docs/reference/python/insert) by ID:

```
client = create_client(supabase_url, supabase_key)
data = {'id': 'custom_id', 'name': 'John Doe', 'age': 30}
response = client.table(table).insert(data)
```

* `Update` an existing entry by ID:
```
client = create_client(supabase_url, supabase_key)
condition = {'id': 'your_id'}
response = client.table(table).update(data, condition)
```

* `Delete` by ID:
```
condition = {'id': 'your_id'}
response = client.table(table).delete(condition)
```

---

**Langchain:**

`Write / Update` 

* `from_texts` and `add_texts` both using `insert`
* **Need support for ID-wise write and update**
```
result = client.from_(table_name).insert(chunk).execute()
```

`Read` 

* Supported from an existing index


`Delete`

* **Need support for delete**

[Docs](http://localhost:8888/notebooks/docs/modules/indexes/vectorstores/examples/vector_db_testing.ipynb#Supabase)

In [None]:
! pip install supabase

In [None]:
from langchain.vectorstores import SupabaseVectorStore
from langchain.embeddings.openai import OpenAIEmbeddings
from supabase.client import Client, create_client
# Auth
supabase_url = os.environ.get('supabase_url')
supabase_key = os.environ.get('supabase_key')
supabase: Client = create_client(supabase_url, supabase_key)

In [None]:
# Create new index
table_name="karpathy_gpt"
embeddings = OpenAIEmbeddings()
vectorstore_new = SupabaseVectorStore.from_texts(splits,embeddings,client=supabase,table_name=table_name)

In [None]:
# Read from index
vectorstore_supabase = SupabaseVectorStore(client=supabase,embedding=embeddings,table_name=table_name)

In [None]:
# Query
query = "What is micrograd?"
matched_docs = vectorstore_supabase.similarity_search(query,k=1)
matched_docs

# Elastic

**Create index** 

* Log into Elastic Cloud console at https://cloud.elastic.co
* Create deployment
* Go to the deployment page and `copy endpoint`

---

**Python client:**

* [`Bulk`](https://elasticsearch-py.readthedocs.io/en/7.x/helpers.html) to add or update documents by specifying the document ID in the request dictionary

* `Delete` by ID:

```
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
for document_id in document_ids:
    es.delete(index=index_name, id=document_id)
 ```

---

**Langchain:**

`Write / Update` 

* `from_texts` and `add_texts` both using `bulk()` with an ID passed

```
for i, text in enumerate(texts):
    metadata = metadatas[i] if metadatas else {}
    _id = str(uuid.uuid4())
    request = {
        "_op_type": "index",
        "_index": self.index_name,
        "vector": embeddings[i],
        "text": text,
        "metadata": metadata,
        "_id": _id,
    }
    ids.append(_id)
    requests.append(request)
bulk(self.client, requests)
```

`Read` 

* Supported from an existing index
 
`Delete`

* **Need support for delete**

In [None]:
# Auth
elastic_endpoint = "langchain-test.es.us-central1.gcp.cloud.es.io"
elasticsearch_url = f"https://elastic:cYo6rjQMesQbwqcGHblf7P0K@{elastic_endpoint}:9243"

In [None]:
from langchain import ElasticVectorSearch
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
index_name = "karpathy-gpt"

# Create new index
vectorstore_new = ElasticVectorSearch.from_texts(splits, embeddings, 
                                                 elasticsearch_url=elasticsearch_url,
                                                 index_name=index_name)

In [None]:
# Check
query = "What is micrograd?"
matched_docs = vectorstore_new.similarity_search(query,k=1)
matched_docs

In [None]:
# Read from index
vectorstore_estc = ElasticVectorSearch(elasticsearch_url=elasticsearch_url, index_name=index_name, embedding=embeddings)

In [None]:
# Query
matched_docs = vectorstore_estc.similarity_search(query,k=1)
matched_docs

## Weviate

**Create index** 

* Create a new cluser in [Weviate dashboard](https://console.weaviate.cloud/dashboard).
* This gives you a url: https://langchain-test-l73n8vle.weaviate.network
* `text_key` is the name of the text property in your Weaviate schema where the text of your documents is stored. 
* It's used to find documents that are similar to a text query.

A few notes:

* Index names [must be capitalized](https://github.com/weaviate/weaviate/issues/3132#event-9524209890)
* Be sure to pass `by_text=False` in the client [when connecting to an existing index](https://github.com/weaviate/weaviate/issues/3142#event-9541172186)

---

**Python client:**

* `add_data_object` is used to add a data object (either create a new one or updating based on ID)
* `delete` is easily handled by ID

```
weaviate_url = "http://your-weaviate-url:8080"
client = Client(weaviate_url)
client.data.delete(uuid=data_object_uuid)
```

**Langchain:**

* `from_texts` and `add_texts` both using `add_data_object()` with an ID passed

```
# If the UUID of one of the objects already exists
# then the existing objectwill be replaced by the new object.
if "uuids" in kwargs:
    _id = kwargs["uuids"][i]
else:
    _id = get_valid_uuid(uuid4())

# if an embedding strategy is not provided, we let
# weaviate create the embedding. Note that this will only
# work if weaviate has been installed with a vectorizer module
# like text2vec-contextionary for example
params = {
    "uuid": _id,
    "data_object": data_properties,
    "class_name": index_name,
}
if embeddings is not None:
    params["vector"] = embeddings[i]

batch.add_data_object(**params)
```

`Read` 

* Supported from an existing index
 
`Delete`

* **Need support for delete**

In [None]:
!pip install weaviate-client

In [None]:
import os
from weaviate import Client, auth
from langchain.vectorstores import Weaviate
from langchain.embeddings.openai import OpenAIEmbeddings

# Auth
weaviate_url = "https://langchain-test-l73n8vle.weaviate.network"
client = Client(url=weaviate_url, auth_client_secret=auth.AuthClientPassword("lance@langchain.dev", "j!ZEFs6pFd.SWH."))

In [None]:
# Create and add texts
embeddings = OpenAIEmbeddings()
index_name = "Karpathy_gpt"
vectorstore_new = Weaviate.from_texts(
    splits, embeddings, client=client, index_name=index_name, text_key="text"
)

In [None]:
# Check
query = "What is micrograd?"
matched_docs = vectorstore_new.similarity_search(query,k=1)
matched_docs

In [None]:
# Read from index
vectorstore_weviate = Weaviate(
    client=client,
    index_name=index_name,
    text_key="text",
    by_text=False,
    embedding=embeddings,
)

In [None]:
# Query
query = "What is micrograd?"
matched_docs = vectorstore_weviate.similarity_search(query,k=1)
matched_docs

## Redis

< TO FINISH > 

Cloud -

* Create database in Redis public cloud, which has endpoint: `redis-16792.c302.asia-northeast1-1.gce.cloud.redislabs.com:16792`
* **Need**: Documentation on how to [connect](https://docs.redis.com/latest/rs/references/client_references/client_python/) to this because we still get auth errors.

--- 

Local - 
  
```
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install redis
brew services start redis
```

In [None]:
! pip install redis

In [None]:
from langchain.vectorstores.redis import Redis
password = "LangChainTest01!"
public_endpoint = "redis-18547.c1.us-central1-2.gce.cloud.redislabs.com:18547"
redis_url = f'redis://pexpresss31@gmail.com:{password}@{public_endpoint}'
print(redis_url)
vectorstore_new = Redis.from_texts(splits,embeddings,redis_url=redis_url,index_name='link')

In [None]:
### AUTH ERROR
import urllib.parse
password = 'm.fN%A#A8vEwVK6'
redis_url="redis://redis-16792.c302.asia-northeast1-1.gce.cloud.redislabs.com:16792"
password_encoded = urllib.parse.quote(password)
redis_url = f'redis://:{password_encoded}@redis-16792.c302.asia-northeast1-1.gce.cloud.redislabs.com:16792'