## Document loaders with langchain
https://python.langchain.com/docs/modules/data_connection/document_loaders/json

`pip install jq`

### The course will show the pip installs you would need to install packages on your own machine.
### These packages are already installed on this platform and should not be run again.
### ! pip install pypdf 

In [2]:
from langchain.document_loaders import PyPDFLoader
import os

In [3]:
span_doc = os.listdir('./qa-schedule-an-appointment/spa-data/')

In [4]:
span_doc

['CRANIAL-FACIAL-MASSAGE.pdf',
 'UNLOADING-SPORTS-MASSAGE.pdf',
 'LYMPHATIC-DRAINAGE.pdf',
 'SCAR-MASSAGE.pdf',
 'ABDOMINAL-MASSAGE.pdf',
 'PERINEAL-MASSAGE.pdf',
 'RESPIRATORY-PHYSIOTHERAPY .pdf',
 'FOOT-MASSAGE.pdf']

In [6]:
loaders = []
for n in span_doc:
    file = "./qa-schedule-an-appointment/spa-data/" + n
    loader = PyPDFLoader(file)
    pages = loader.load()
    loaders.append(loader.load())

## Embeddings
https://github.com/aws-samples/rag-using-langchain-amazon-bedrock-and-opensearch/blob/main/ask-bedrock-with-rag.py

In [8]:
import boto3
from langchain.embeddings import BedrockEmbeddings

bedrock_client = boto3.client("bedrock-runtime", region_name="us-east-1")

bedrock_embedding_model_id = 'amazon.titan-embed-text-v1'

In [9]:
def create_langchain_vector_embedding_using_bedrock(bedrock_client, bedrock_embedding_model_id):
    bedrock_embeddings_client = BedrockEmbeddings(
        client=bedrock_client,
        model_id=bedrock_embedding_model_id)
    return bedrock_embeddings_client

In [10]:
bedrock_embeddings_client = create_langchain_vector_embedding_using_bedrock(bedrock_client, bedrock_embedding_model_id)

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 150
)

In [12]:
splits = []
for doc in loaders:
    splits.append(text_splitter.split_documents(doc))

## Vectorstores

https://python.langchain.com/docs/integrations/vectorstores/chroma

! pip install chromadb

In [14]:
from langchain.vectorstores import Chroma

In [15]:
persist_directory = './qa-schedule-an-appointment/vectordb/docs/chroma/'

In [16]:
!rm -rf ./docs/chroma  # remove old database files if any
for split in splits :
    vectordb = Chroma.from_documents(
        documents=split,
        embedding=bedrock_embeddings_client,
        persist_directory=persist_directory
    )
   #print(vectordb)

### TEST the vectordb

In [18]:
query = "Which massage is the best for relieving headache?"
vector_result = vectordb.similarity_search(query,k=2)
vector_result

[Document(page_content='CRANIAL FACIAL MASSAGE  You probably think about going to the physical therapist to treat the muscles of the back, legs, feet, arms... but what about the muscles of the face and skull?  These muscles also need attention, and techniques specifically aimed at this musculature are necessary to regulate tone, and thereby release tensions that can lead to migraines, headaches or even bruxism.  Massage therapy provides all its benefits to this part of the cupero, also highlighting the oxygenation of', metadata={'page': 0, 'source': './qa-schedule-an-appointment/spa-data/CRANIAL-FACIAL-MASSAGE.pdf'}),
 Document(page_content='to migraines, headaches or even bruxism.  Massage therapy provides all its benefits to this part of the cupero, also highlighting the oxygenation of the tissues, the stimulation of the muscles and the reactivation of blood flow, with which we manage to restore firmness and elasticity to the skin.  With all this, this massage has not only a therapeu