## Langchain demo

* Pulls in local corpus
* Answers questions based on corpus

## Quickstart notes
On PC:
```
$wsl -d ubuntu
$ env OPENAI_API_KEY='[API_KEY_HERE]' jupyter notebook
```

In [None]:
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

### Load your data

In [2]:
loader = UnstructuredPDFLoader("../data/field-guide-to-data-science.pdf")
# loader = OnlinePDFLoader("https://wolfpaulus.com/wp-content/uploads/2017/05/field-guide-to-data-science.pdf")

In [3]:
data = loader.load()

### Chunk data into smaller documents

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

### Create embeddings of documents for semantic search

In [1]:
from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
#import pinecone
import os

In [2]:
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter

# pip install unstructured
# pip install openai


In [3]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

In [4]:
loader = DirectoryLoader('./', glob='data/*.txt')
docs = loader.load()

In [None]:
docs


In [11]:
!pwd

/mnt/e/2_github/game-langchain


In [5]:
char_text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
doc_texts = char_text_splitter.split_documents(docs)

In [None]:
doc_texts

In [6]:
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
#vStore = Chroma.from_documents(doc_texts, openAI_embeddings)

In [7]:
vStore = Chroma.from_documents(doc_texts, embeddings)

Using embedded DuckDB without persistence: data will be transient


### Query docs

In [8]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [9]:
llm = OpenAI(temperature=0.5, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")

## Ask (query) any question about A Summer with the Shiba Inu

In [20]:
query = "are syd and max friends?"

In [21]:
docs = vStore.similarity_search(query, include_metadata=True, k=10)

In [22]:
chain.run(input_documents=docs, question=query)

' Yes, Syd and Max are friends. It is clear from their conversation that they care deeply for each other and are willing to support each other in difficult times.'

# Query

In [140]:
from langchain import OpenAI, VectorDBQA

In [172]:
model = VectorDBQA.from_chain_type(llm=OpenAI(temperature=0.1), chain_type='stuff', vectorstore=vStore)


In [210]:
dby_preamble = "so is shorthand for Sophie. al is shorthand for Alice. ap is shorthand for Apolline. da is shorthand for Daniella. mc is shorthand for Sidney. [mcName] is also shorthand for Sidney. mc_t is shorthand for Sidney thinking. Daniella, Sophie, Alice, Apolline, Sidney, are all female and use she/her pronouns."

In [215]:
query = dby_preamble + "Question: " + "what is the main lesson of the story?"

In [216]:
model.run(query)

' The main lesson of the story is that in order to not let the important things in life pass without a trace, one can accept inner peace and enjoy the moment.'

# Webnovel corpus test

In [133]:
loader = DirectoryLoader('./', glob='data-webnovel/*.txt')
docs = loader.load()

In [164]:
loader = DirectoryLoader('./', glob='data-dby/*.txt')
docs = loader.load()

## References

In [2]:
#https://shweta-lodha.medium.com/use-your-locally-stored-files-to-get-response-from-gpt-like-chatgpt-a50ea6a520df

In [3]:
# Iris on rainy day