# Objective

Test that we can both write to and read from an index across several production vectorstores.

In particular, the existing documentation often involoves simmply writing to an `index` (e.g., `from_documents`).

But, there are two gaps:

1) We don't confirm that we can also read an existing index, which appears to be a problem w/ Weviate today.

2) We also don't confirm integration w/ hosted instances in all cases (e.g., many are just local). 


## Example Text

Example from Karpathy-GPT app [here](https://github.com/rlancemartin/karpathy-gpt/tree/main/eval).

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
course_txt = open('example_data/karpathy_course_all.txt').read()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
splits = text_splitter.split_text(course_txt)

In [2]:
# Full course
len(splits)

1360

In [3]:
# Test
splits = splits[0:100]

## Pinecone

* Use Pinecone console to create a new index with `index_name`
* [Write and read supported](https://python.langchain.com/en/latest/reference/modules/vectorstores.html#langchain.vectorstores.Pinecone) to `index_name`

In [None]:
! pip install pinecone-client

In [42]:
import os
import pinecone
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings

# Auth
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY'),  
    environment="us-east1-gcp"  
)

# Create new index
embeddings = OpenAIEmbeddings()
index_name = "karpathy-gpt"
vectorstore_new = Pinecone.from_texts(splits, embeddings, index_name=index_name)

In [43]:
# Read from index
vectorstore_pinecone = Pinecone.from_existing_index(index_name=index_name,embedding=embeddings)

In [49]:
# Query
query = "What is micrograd?"
matched_docs = vectorstore_pinecone.similarity_search(query)
matched_docs[0]

Document(lc_kwargs={'page_content': "Hello, my name is Andrej and I've been training deep neural networks for a bit more than a decade. And in this lecture I'd like to show you what neural network training looks like under the hood. So in particular we are going to start with a blank Jupyter notebook and by the end of this lecture we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level. Now specifically what I would like to do is I would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd

## Supabase


* Create a new project in [Supabase dashboard](https://supabase.com/dashboard/project/xhbejgrankzufmczyqil).
* In the project, go to the SQL editor on the left.
* We need to create a table to store our embeddings.
* We will use `pgvector`, an extension for PostgreSQL that allows you to both store and query vector embeddings.
* Create the table in the SQL editor with [this code](https://supabase.com/docs/guides/ai/langchain), modified below for our table name `karpathy_gpt`:

```
-- Enable the pgvector extension to work with embedding vectors
-- create extension vector;

-- Create a table to store your documents
create table karpathy_gpt (
  id bigserial primary key,
  content text, -- corresponds to Document.pageContent
  metadata jsonb, -- corresponds to Document.metadata
  embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);

-- Create a function to search for documents
CREATE OR REPLACE function match_documents (
  query_embedding vector(1536),
  match_count int default null,
  filter jsonb DEFAULT '{}'
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
  return query
  select
    id,
    content,
    metadata,
    1 - (karpathy_gpt.embedding <=> query_embedding) as similarity
  from karpathy_gpt
  where metadata @> filter
  order by karpathy_gpt.embedding <=> query_embedding
  limit match_count;
end;
$$;
```

* Now, the table is created!
* In the project, you can find `SUPABASE_URL` and `SUPABASE_SERVICE_KEY`, which we will use to connect to this table.

In [None]:
! pip install supabase

In [36]:
from langchain.vectorstores import SupabaseVectorStore
from langchain.embeddings.openai import OpenAIEmbeddings
from supabase.client import Client, create_client
# Auth
supabase_url = os.environ.get('supabase_url')
supabase_key = os.environ.get('supabase_key')
supabase: Client = create_client(supabase_url, supabase_key)

In [37]:
# Create new index
table_name="karpathy_gpt"
embeddings = OpenAIEmbeddings()
vectorstore_new = SupabaseVectorStore.from_texts(splits,embeddings,client=supabase,table_name=table_name)

In [38]:
# Read from index
vectorstore_supabase = SupabaseVectorStore(client=supabase,embedding=embeddings,table_name=table_name)

In [40]:
# Query
query = "What is micrograd?"
matched_docs = vectorstore_supabase.similarity_search(query,k=1)
matched_docs

[Document(lc_kwargs={'metadata': {}, 'page_content': "would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient"}, page_content="would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basica

# Elastic

* Log into Elastic Cloud console at https://cloud.elastic.co
* Create deployment
* Go to the deployment page and `copy endpoint`

In [55]:
# Auth
elastic_endpoint = "langchain-test.es.us-central1.gcp.cloud.es.io"
elasticsearch_url = f"https://elastic:cYo6rjQMesQbwqcGHblf7P0K@{elastic_endpoint}:9243"

In [None]:
from langchain import ElasticVectorSearch
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
index_name = "karpathy-gpt"

# Create new index
vectorstore_new = ElasticVectorSearch.from_texts(splits, embeddings, 
                                                 elasticsearch_url=elasticsearch_url,
                                                 index_name=index_name)

In [58]:
# Check
query = "What is micrograd?"
matched_docs = vectorstore_new.similarity_search(query,k=1)
matched_docs

[Document(lc_kwargs={'page_content': "would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient", 'metadata': {}}, page_content="would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basica

In [None]:
# Read from index
vectorstore_estc = ElasticVectorSearch(elasticsearch_url=elasticsearch_url, index_name=index_name, embedding=embeddings)

In [60]:
# Query
matched_docs = vectorstore_estc.similarity_search(query,k=1)
matched_docs

[Document(lc_kwargs={'page_content': "would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient", 'metadata': {}}, page_content="would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basica

## Redis

Cloud -

* Create database in Redis public cloud, which has endpoint: `redis-16792.c302.asia-northeast1-1.gce.cloud.redislabs.com:16792`
* **Need**: Documentation on how to [connect](https://docs.redis.com/latest/rs/references/client_references/client_python/) to this because we still get auth errors.

--- 

Local - 
  
```
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install redis
brew services start redis
```

In [25]:
! pip install redis



**Need Docker to run `RediSearch` locally -** 

`docker run -p 6379:6379 redislabs/redisearch:latest`

In [None]:
from langchain.vectorstores.redis import Redis

In [None]:
## Redis cannot be used as a vector database without RediSearch
redis_local = Redis.from_texts(splits,embeddings,redis_url="redis://localhost:6379")

`Cloud`

**Need to sort out auth errors and get crisp documentation here.**

In [None]:
### AUTH ERROR
import redis
r = redis.Redis(
    host='redis-16792.c302.asia-northeast1-1.gce.cloud.redislabs.com',
    port=16792, 
    password=os.environ.get('redis_pw'))

r.set('foo', 'bar')
value = r.get('foo')
print(value)

In [None]:
### AUTH ERROR
redis_url="redis://redis-16792.c302.asia-northeast1-1.gce.cloud.redislabs.com:16792"
vectorstore_new = Redis.from_texts(splits,embeddings,redis_url=redis_url,index_name='link')

## Weviate

* Create a new cluser in [Weviate dashboard](https://console.weaviate.cloud/dashboard).
* This gives you a url: https://langchain-test-l73n8vle.weaviate.network
* `text_key` is the name of the text property in your Weaviate schema where the text of your documents is stored. 
* It's used to find documents that are similar to a text query.

**Bug: https://github.com/hwchase17/langchain/issues/6121**

In [None]:
!pip install weaviate-client

In [None]:
import os
from weaviate import Client, auth
from langchain.vectorstores import Weaviate
from langchain.embeddings.openai import OpenAIEmbeddings

# Auth
weaviate_url = "https://langchain-test-l73n8vle.weaviate.network"
client = Client(url=weaviate_url, auth_client_secret=auth.AuthClientPassword("lance@langchain.dev", "j!ZEFs6pFd.SWH."))

In [17]:
# Create and add texts
### Added logging to show the auto-generated index_name and text_key ###
embeddings = OpenAIEmbeddings()
vectorstore_new = Weaviate.from_texts(splits,embeddings,client=client)

Index!
LangChain_0b289ac1f5dd4543b76b9f9bdf047ef6
Text key: text_key!
text


In [18]:
query = "What is micrograd?"
matched_docs = vectorstore_new.similarity_search(query,k=1)
matched_docs

Index Name in similarity_search_by_vector
LangChain_0b289ac1f5dd4543b76b9f9bdf047ef6
Result
{'data': {'Get': {'LangChain_0b289ac1f5dd4543b76b9f9bdf047ef6': [{'text': "would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient"}]}}}


[Document(lc_kwargs={'page_content': "would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient", 'metadata': {}}, page_content="would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basica

In [21]:
# Connect to our index using the index_name and text_key we obtained from logging
index_name = "LangChain_0b289ac1f5dd4543b76b9f9bdf047ef6"
text_key = "text"
vectorstore_weviate = Weaviate(client=client,index_name=index_name,text_key=text_key)

In [22]:
matched_docs = vectorstore_weviate.similarity_search(query,k=1)
matched_docs

ValueError: Error during query: [{'locations': [{'column': 58, 'line': 1}], 'message': 'Unknown argument "nearText" on field "LangChain_0b289ac1f5dd4543b76b9f9bdf047ef6" of type "GetObjectsObj". Did you mean "nearVector" or "nearObject"?', 'path': None}]

In [None]:
# Let's try to supply an index name on initialization
index_name = "karpathy_gpt"
embeddings = OpenAIEmbeddings()
vectorstore_new = Weaviate.from_texts(splits, embeddings, client=client, index_name=index_name)

In [24]:
query = "What is micrograd?"
matched_docs = vectorstore_new.similarity_search(query,k=1)
matched_docs

Index Name in similarity_search_by_vector
karpathy_gpt
Result
{'data': {'Get': {'Karpathy_gpt': [{'text': "would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient"}]}}}


KeyError: 'karpathy_gpt'