https://app.pinecone.io/organizations/-NUaJiYvEB6uARCsouWz/projects/us-west1-gcp-free:828f5ed/indexes/python-index
https://docs.pinecone.io/docs/python-client
https://community.pinecone.io/t/error-on-query-endpoint-typeerror-replace-argument-2-must-be-str-not-none/674
https://docs.pinecon.io/docs/langchain-retrieval-augmentation

docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)
# if you already have an index, you can load it like this
# docsearch = Pinecone.from_existing_index(index_name, embeddings)

In [1]:
#!pip install langchain --upgrade
# Version: 0.0.164

#!pip install pypdf

In [2]:
# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

### Load your data

In [3]:
loader = PyPDFLoader("../data/summary_strategy.pdf")

## Other options for loaders 
# loader = UnstructuredPDFLoader("../data/summary_strategy.pdf")
# loader = OnlinePDFLoader("...")

In [4]:
# load your data
data = loader.load()

In [5]:
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')

You have 18 document(s) in your data
There are 3659 characters in your document


### Chunk your data up into smaller documents

In [50]:
# Note: If you're using PyPDFLoader then we'll be splitting for the 2nd time.
# This is optional, test out on your own data.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

In [51]:
print (f'Now you have {len(texts)} documents')

Now you have 41 documents


### Create embeddings of your documents to get ready for semantic search

In [52]:
from langchain.vectorstores import  Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

In [53]:
# import your API keys from a file called credentials.py
from credentials import OPENAI_API_KEY, PINECONE_API_KEY, PINECONE_API_ENV

In [37]:
# you can also store the keys in your environment variables
# OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY', 'sk-...')
#
# PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY', '...')
# PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV', '...')

In [54]:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [55]:
pinecone.create_index("python-index", dimension=1536, metric="cosine")

ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'content-type': 'text/plain; charset=UTF-8', 'date': 'Wed, 12 Jul 2023 10:34:36 GMT', 'x-envoy-upstream-service-time': '1328', 'content-length': '131', 'server': 'envoy'})
HTTP response body: The index exceeds the project quota of 1 pods by 1 pods. Upgrade your account or change the project settings to increase the quota.


In [56]:
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "python-index" # put in the name of your pinecone index here

In [57]:
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

In [58]:
query = "Who was Von Clausewitz?"
docs = docsearch.similarity_search(query)

In [59]:
docs

[Document(page_content='9RQ\x03&ODXVHZLW]  Historic perspectives can still be seen in business perspective today. Clausewitz his perspective seems timeless. He is in the same timeframe as Jomini. French was in a revolution; Eu was ruled by royal houses and the Napoleonic war was going on. When was 12 he joined the army and when he turned 21 he joined the military academy as a scholar. There he met Gerhard von Scharnhorst (lecturer) and Marie von Bruhl (married Clausewitz). The was a grave and was Clausewitz his ticket to the higher circles in Prussia. She played an important role in Clausewitz his career progress and development of perspective in strategy. Von Clausewitz served in an old-fashioned army, Clausewitz was captured by the French and Prussia was conquered. After the release Clausewitz joined the Russian army and fought Napoleon. They defeated Napoleon. Clausewitz tried to finish a book but died of Cholera. 7KH\x03ERRN\x03KDV\x03WKH\x03WLWOH\x03µ¶RQ\x03ZDU¶¶\x11\x03', metadat

In [60]:
# Here's an example of the first document that was returned
print(docs[0].page_content[:450])

9RQ&ODXVHZLW]  Historic perspectives can still be seen in business perspective today. Clausewitz his perspective seems timeless. He is in the same timeframe as Jomini. French was in a revolution; Eu was ruled by royal houses and the Napoleonic war was going on. When was 12 he joined the army and when he turned 21 he joined the military academy as a scholar. There he met Gerhard von Scharnhorst (lecturer) and Marie von Bruhl (married Clausewitz).


### Query those docs to get your answer back

In [61]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [62]:
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")

In [63]:
query = "Who was Von Clausewitz?"
docs = docsearch.similarity_search(query)

In [64]:
chain.run(input_documents=docs, question=query)

' Von Clausewitz was a Prussian military theorist and soldier who lived in the same timeframe as Jomini. He joined the army at age 12 and the military academy as a scholar at age 21. He was married to Marie von Bruhl and served in an old-fashioned army. He was captured by the French and Prussia was conquered. After his release, he joined the Russian army and fought Napoleon. He tried to finish a book but died of Cholera.'

In [65]:
docs

[Document(page_content='9RQ\x03&ODXVHZLW]  Historic perspectives can still be seen in business perspective today. Clausewitz his perspective seems timeless. He is in the same timeframe as Jomini. French was in a revolution; Eu was ruled by royal houses and the Napoleonic war was going on. When was 12 he joined the army and when he turned 21 he joined the military academy as a scholar. There he met Gerhard von Scharnhorst (lecturer) and Marie von Bruhl (married Clausewitz). The was a grave and was Clausewitz his ticket to the higher circles in Prussia. She played an important role in Clausewitz his career progress and development of perspective in strategy. Von Clausewitz served in an old-fashioned army, Clausewitz was captured by the French and Prussia was conquered. After the release Clausewitz joined the Russian army and fought Napoleon. They defeated Napoleon. Clausewitz tried to finish a book but died of Cholera. 7KH\x03ERRN\x03KDV\x03WKH\x03WLWOH\x03µ¶RQ\x03ZDU¶¶\x11\x03', metadat