# Pinecone and LangChain

## MVP 1 - Perform Simple Pipeline

### Load documents

In [32]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter, RecursiveCharacterTextSplitter

loader = TextLoader(r"G:\My Drive\Profissional & Acadêmico\Mestrados\DTU\5_thesis\dev_thesis\data\de_data\pinecone\questions_tobeIndexed.txt")
documents = loader.load()
documents

[Document(page_content='What was the total revenue generated yesterday?\nHow many items were sold in the last month?\nCan you provide the average daily sales volume for the previous month?\nHow many different customers we had last week?\nHow many orders were processed two days ago?\nWhat was the sales last Friday?\nCan you report the total number of orders placed during the last weekend?\nWhat was the average order value in the previous week?\nHow many units were sold last year?\nHow many orders we had last year?\nWhat was the average basket size for purchases made last Thursday?\nHow many different customers made purchases in the last month?\nWhat was the average sales amount for customers in the last week?\nCan you provide the total units sold during the year?\nWhat was the average basket value for all orderslast week?\nWhat was the most popular product category sold last month?\nWhat is the most sold Product Type yesterday?\nHow many units of the most sold product marketing category

### Split Doducments

In [33]:
text_splitter = CharacterTextSplitter(chunk_size=0, chunk_overlap=0, separator='\n')
docs = text_splitter.split_documents(documents)

len(docs)

Created a chunk of size 47, which is longer than the specified 0
Created a chunk of size 43, which is longer than the specified 0
Created a chunk of size 70, which is longer than the specified 0
Created a chunk of size 46, which is longer than the specified 0
Created a chunk of size 44, which is longer than the specified 0
Created a chunk of size 31, which is longer than the specified 0
Created a chunk of size 73, which is longer than the specified 0
Created a chunk of size 54, which is longer than the specified 0
Created a chunk of size 35, which is longer than the specified 0
Created a chunk of size 33, which is longer than the specified 0
Created a chunk of size 66, which is longer than the specified 0
Created a chunk of size 62, which is longer than the specified 0
Created a chunk of size 65, which is longer than the specified 0
Created a chunk of size 53, which is longer than the specified 0
Created a chunk of size 58, which is longer than the specified 0
Created a chunk of size 5

60

### Indexing

In [31]:
from pinecone import Pinecone

pc = Pinecone(api_key="10cdbaec-3d90-4fdd-9ad3-b7f51924ce90")
index = pc.Index("text-to-sql")

index.delete(delete_all=True)

{}

In [5]:
# from pinecone import Pinecone, ServerlessSpec

# pc = Pinecone(api_key="10cdbaec-3d90-4fdd-9ad3-b7f51924ce90")

# pc.create_index(
#     name="nl2sql",
#     dimension=1536, # Replace with your model dimensions
#     metric="cosine", # Replace with your model metric
#     spec=ServerlessSpec(
#         cloud="aws",
#         region="us-east-1"
#     ) 
# )


In [6]:
# index = pc.Index("nl2sql")

### Embed Documents

In [34]:
# from langchain_pinecone import PineconeVectorStore
# embeddings = OpenAIEmbeddings(model='text-embedding-3-small')
# docsearch = PineconeVectorStore.from_documents(docs, embeddings, index_name='text-to-sql')

In [45]:
from langchain_pinecone import PineconeVectorStore
embeddings = OpenAIEmbeddings(model='text-embedding-3-small')
docsearch = PineconeVectorStore(index_name='text-to-sql', embedding=embeddings) 

### Search

In [47]:
query = "What market had the highest WoW growth measured in units sold?"
docs = docsearch.similarity_search(query)
for doc in docs:
    print(doc.page_content)

What was the WoW growth of last week's sales for Headwear in Japan?
Can you show the WoW change in units sold for Headwear in the past month?
Give me the WoW growth in sales of the week from 25th of March to 31st of March 2024 sales for Headwear in Japan
What was the WoW sales performance of Running Wear in the UK and Germany during the last three months?


#### Nearest Neighbors

- I want to use the product metadata to not give me for example, Men's Merino T-Shirt when I search for Hiking Classic Socks.

In [39]:
query = "Tell me about the sales in the American market yesterday."
found_docs = docsearch.max_marginal_relevance_search(query, k=3, fetch_k=4)
for i, doc in enumerate(found_docs):
    print(f"{i + 1}.", doc.page_content, "\n")
    # print("Similarity:", doc.type, "\n")
    # print("Similarity:", doc.metadata, "\n")

1. What was the sales last Friday? 

2. How did Compression Socks sales perform in the USA market in the previous month? 

3. Can you provide the average daily sales volume for the previous month? 



#### mmr retriever

In [16]:
retriever = docsearch.as_retriever(search_type="mmr")
matched_docs = retriever.invoke(query)
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)


## Document 0

What was the YoY growth in units of Hiking Classic Socks in the last month?

## Document 1

How many orders were placed containing Hiking Classic Socks last 2 days?

## Document 2

What was the WoW units sold change for Hiking Classic Socks in the France during the last month?

## Document 3

How was the sales of Anti-Friction Race Socks 2-pack MC (1xBlack, 1xOrange) 43-47 in the previous 4 days?


## MVP2 - Some How map Metrics