## Creating an index and populating it with documents using Redis

Simple example on how to ingest PDF documents, then web pages content into a Redis VectorStore.

Requirements:
- A Redis cluster
- A Redis database with at least 2GB of memory (to match with the initial index cap)

### Base parameters, the Redis info

In [1]:
redis_url = "redis://default:O4jGUYsT@my-doc-headless.redis.svc.cluster.local:14845"
index_name = "docs"

#### Imports

In [2]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores.redis import Redis

## Initial index creation and document ingestion

#### Document loading from a folder containing PDFs

In [3]:
pdf_folder_path = 'pdf'

loader = PyPDFDirectoryLoader(pdf_folder_path)
docs = loader.load()

#### Split documents into chunks with some overlap

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                               chunk_overlap=40)
all_splits = text_splitter.split_documents(docs)

#### Create the index and ingest the documents

In [5]:
embeddings = HuggingFaceEmbeddings()
rds = Redis.from_documents(all_splits,
                           embeddings,
                           redis_url=redis_url,
                           index_name=index_name)

#### Write the schema to a yaml file to be able to open the index later on

In [6]:
rds.write_schema("redis_schema.yaml")

## Ingesting new documents

#### Example with Web pages

In [7]:
from langchain.document_loaders import WebBaseLoader

In [8]:
loader = WebBaseLoader(["https://ai-on-openshift.io/getting-started/openshift/",
                        "https://ai-on-openshift.io/getting-started/opendatahub/",
                        "https://ai-on-openshift.io/getting-started/openshift-data-science/",
                        "https://ai-on-openshift.io/odh-rhods/configuration/",
                        "https://ai-on-openshift.io/odh-rhods/custom-notebooks/",
                        "https://ai-on-openshift.io/odh-rhods/nvidia-gpus/",
                        "https://ai-on-openshift.io/odh-rhods/custom-runtime-triton/",
                        "https://ai-on-openshift.io/odh-rhods/openshift-group-management/",
                        "https://ai-on-openshift.io/tools-and-applications/minio/minio/"
                       ])

In [9]:
data = loader.load()

In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                               chunk_overlap=40)
all_splits = text_splitter.split_documents(data)

In [11]:
embeddings = HuggingFaceEmbeddings()
rds = Redis.from_existing_index(embeddings,
                                redis_url=redis_url,
                                index_name=index_name,
                                schema="redis_schema.yaml")

In [12]:
rds.add_documents(all_splits)

['doc:docs:5645b4efff844473abbf501127c7e98c',
 'doc:docs:81ff8f2ecc3a4d42914fdcf8aa9e3175',
 'doc:docs:2f0d7dc5339b457383f09e6314ddd27b',
 'doc:docs:24e5c60808924f6e803c5aeffa1e7f31',
 'doc:docs:f3abe712e80a4a5ba06ed55ad161d334',
 'doc:docs:711f6160204d49249fe0bb4ecaeec400',
 'doc:docs:dca3edf7ff51405d99683f451601f328',
 'doc:docs:5838903243224268961c120b32c3e83f',
 'doc:docs:4468b4eb5f714587bf082b9a2b4a6145',
 'doc:docs:181c35339c8e42c499f11070e9b41183',
 'doc:docs:58070eb2f1b44bd5af6c34ecf32e8c09',
 'doc:docs:af017b927aac423e8b85c5248320c108',
 'doc:docs:90a9c5d9a00e4c8aaa09624a225c6b01',
 'doc:docs:b2193c4b49fc4d6c9b1016d4d73acfe1',
 'doc:docs:fd3a5cf7b6c3445291d9ad0a2d089e79',
 'doc:docs:c1c3e17b4c204ec09f1615ac4b944b32',
 'doc:docs:73b8779a9aef435a8858f198988870ec',
 'doc:docs:1e839168053f44c08226d4ad47096c83',
 'doc:docs:8ee44137bd1b4b52922f93b28ab4b8c5',
 'doc:docs:bfa2681a04fb4e8bb3c327a65cbbfca8',
 'doc:docs:f5292114c01047ee9ac28aeafce38a55',
 'doc:docs:35bba0b3061c457e8423d85