<a href="https://colab.research.google.com/github/sanislearning/LangChainProjects/blob/main/SemanticSearchEngine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [30]:
#Goal is to familiarize with LangChain's document loader, embedding and vector store abstractions

In [31]:
import pandas as pd
import numpy as np
from google.colab import userdata

In [32]:
import getpass
import os

os.environ["LANGSMITH_TRACING"]='true'
os.environ['LANGSMITH_API_KEY']=userdata.get('LANGSMITH_API_KEY')

In [33]:
if not os.environ.get("GOOGLE_API_KEY"):
  os.environ['GOOGLE_API_KEY']=userdata.get('GOOGLE_API_KEY')

#Documents

In [34]:
from langchain_core.documents import Document

In [35]:
from langchain_community.document_loaders import PyPDFLoader
from google.colab import drive
drive.mount('/content/drive')
file_path="/content/drive/MyDrive/PData/nke-10k-2023.pdf"

loader=PyPDFLoader(file_path)
#PyPDFLoader loads one Document object per PDF page
#A Document object contains a unit of text and associated metadata
docs=loader.load()
print(len(docs))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
107


In [36]:
print(f"{docs[0].page_content}")

Table of Contents
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 10-K
(Mark One)
☑  ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934
FOR THE FISCAL YEAR ENDED MAY 31, 2023
OR
☐  TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934
FOR THE TRANSITION PERIOD FROM                         TO                         .
Commission File No. 1-10635
NIKE, Inc.
(Exact name of Registrant as specified in its charter)
Oregon 93-0584541
(State or other jurisdiction of incorporation) (IRS Employer Identification No.)
One Bowerman Drive, Beaverton, Oregon 97005-6453
(Address of principal executive offices and zip code)
(503) 671-6453
(Registrant's telephone number, including area code)
SECURITIES REGISTERED PURSUANT TO SECTION 12(B) OF THE ACT:
Class B Common Stock NKE New York Stock Exchange
(Title of each class) (Trading symbol) (Name of each exchange on which registered)
SECURITIES REGISTERED PURSU

In [37]:
print(docs[0].metadata)

{'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': '/content/drive/MyDrive/PData/nke-10k-2023.pdf', 'total_pages': 107, 'page': 0, 'page_label': '1'}


#Splitting

In [41]:
#A page might have information that is not really concise. End goal should be to retrieve Document objects that answer an input query
#Nearby text shouldn't wash out the meaning of relavant portions, so we use text splitters
#Here we will be splitting documents into chunks of 1000 characters with a 200 character overlap

In [43]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter=RecursiveCharacterTextSplitter(
    chunk_size=1000,chunk_overlap=200,add_start_index=True
    #add_start_index tracks where the chunk initially started from in the orginal Document
)
all_splits=text_splitter.split_documents(docs)
len(all_splits)

516

#Embedding and Vectorizing

In [44]:
#Vector search is a common way to store and search over unstructured data
#VectorStore contains methods for adding text and Document objects to the store
#and querying them using various similarity metrics

In [45]:
from langchain.chat_models import init_chat_model
llm=init_chat_model("gemini-2.0-flash",model_provider='google_genai')

In [46]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings #imports the Embedding model related to the Gemini models
embeddings=GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [47]:
#Creates a vector store in-memory, ie in the form of a python dictionary and uses cosine similarity to match
from langchain_core.vectorstores import InMemoryVectorStore
vector_store=InMemoryVectorStore(embeddings)

In [50]:
vector1=embeddings.embed_query(all_splits[0].page_content)
vector2=embeddings.embed_query(all_splits[1].page_content)
assert len(vector1)==len(vector2)
print(f"Generated vectors of length {len(vector1)}\n")
print(vector1[:10])

Generated vectors of length 768

[0.003303560661152005, -0.01885664090514183, -0.023528870195150375, 0.013265608809888363, 0.04694835841655731, 0.04489297419786453, 0.030707117170095444, 0.017642803490161896, 0.0011852466268464923, 0.028473228216171265]


In [52]:
ids=vector_store.add_documents(documents=all_splits) #all_splits is the variable that holds the split Documents

#Usage

In [57]:
results=vector_store.similarity_search(
    "Are nike shoes for athletes ?"
)
print(results[0])

page_content='We also sell sports apparel, which features the same trademarks and are sold predominantly through the same marketing and distribution channels as athletic footwear.
Our sports apparel, similar to our athletic footwear products, is designed primarily for athletic use, although many of the products are worn for casual or leisure purposes,
and demonstrates our commitment to innovation and high-quality construction. Our Men's and Women's apparel products currently lead in apparel sales and we expect
them to continue to do so. We often market footwear, apparel and accessories in "collections" of similar use or by category. We also market apparel with licensed college
and professional team and league logos.
We sell a line of performance equipment and accessories under the NIKE Brand name, including bags, socks, sport balls, eyewear, timepieces, digital devices, bats,' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creati

In [58]:
#Async Query
result=await vector_store.asimilarity_search("When was Nike created?")
print(result[0])

page_content='Table of Contents
PART I
ITEM 1. BUSINESS
GENERAL
NIKE, Inc. was incorporated in 1967 under the laws of the State of Oregon. As used in this Annual Report on Form 10-K (this "Annual Report"), the terms "we," "us," "our,"
"NIKE" and the "Company" refer to NIKE, Inc. and its predecessors, subsidiaries and affiliates, collectively, unless the context indicates otherwise.
Our principal business activity is the design, development and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories and services. NIKE is
the largest seller of athletic footwear and apparel in the world. We sell our products through NIKE Direct operations, which are comprised of both NIKE-owned retail stores
and sales through our digital platforms (also referred to as "NIKE Brand Digital"), to retail accounts and to a mix of independent distributors, licensees and sales' metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'cr

In [59]:
#Score is a distance metric that varies inversely with similarity

results=vector_store.similarity_search_with_score("What was Nike's revenue in 2023?")
doc,score=results[0]
print(f'Score: {score}\n')
print(doc)

Score: 0.7798891596101759

page_content='Table of Contents
FISCAL 2023 NIKE BRAND REVENUE HIGHLIGHTSThe following tables present NIKE Brand revenues disaggregated by reportable operating segment, distribution channel and major product line:
FISCAL 2023 COMPARED TO FISCAL 2022
• NIKE, Inc. Revenues were $51.2 billion in fiscal 2023, which increased 10% and 16% compared to fiscal 2022 on a reported and currency-neutral basis, respectively.
The increase was due to higher revenues in North America, Europe, Middle East & Africa ("EMEA"), APLA and Greater China, which contributed approximately 7, 6,
2 and 1 percentage points to NIKE, Inc. Revenues, respectively.
• NIKE Brand revenues, which represented over 90% of NIKE, Inc. Revenues, increased 10% and 16% on a reported and currency-neutral basis, respectively. This
increase was primarily due to higher revenues in Men's, the Jordan Brand, Women's and Kids' which grew 17%, 35%,11% and 10%, respectively, on a wholesale
equivalent basis.' metad

In [61]:
embedding=embeddings.embed_query("How were Nike's margins impacted in 2023?")
results=vector_store.similarity_search_by_vector(embedding)
print(results[0])
#Querying the vector store via a vectorized query

page_content='Table of Contents
GROSS MARGIN
FISCAL 2023 COMPARED TO FISCAL 2022
For fiscal 2023, our consolidated gross profit increased 4% to $22,292 million compared to $21,479 million for fiscal 2022. Gross margin decreased 250 basis points to
43.5% for fiscal 2023 compared to 46.0% for fiscal 2022 due to the following:
*Wholesale equivalent
The decrease in gross margin for fiscal 2023 was primarily due to:
• Higher NIKE Brand product costs, on a wholesale equivalent basis, primarily due to higher input costs and elevated inbound freight and logistics costs as well as
product mix;
• Lower margin in our NIKE Direct business, driven by higher promotional activity to liquidate inventory in the current period compared to lower promotional activity in
the prior period resulting from lower available inventory supply;
• Unfavorable changes in net foreign currency exchange rates, including hedges; and
• Lower off-price margin, on a wholesale equivalent basis.
This was partially offset by:'

#Retrievers

In [63]:
#Runnables are a unit of work that can be invoked, batched, streamed, transformed and composed.
#VectorStore objects are not runnable and so we have Retrivers
#Retrivers can be constructed from vector stores

In [64]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain  #chain is a decorator that changes an function into a Runnable
def retriever(query:str)->List[Document]:
  return vector_store.similarity_search(query,k=1) #k controls chaos

retriever.batch(
    [
        "How many distribution centers does Nike have in the US?",
        "When was Nike incorporated?",
    ],
)

[[Document(id='9b61c8b5-c14d-4739-b5bf-436250fd72cb', metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': '/content/drive/MyDrive/PData/nke-10k-2023.pdf', 'total_pages': 107, 'page': 26, 'page_label': '27', 'start_index': 804}, page_content='operations. We also lease an office complex in Shanghai, China, our headquarters for our Greater China geography, occupied by employees focused on implementing our\nwholesale, NIKE Direct and merchandising strategies in the region, among other functions.\nIn the United States, NIKE has eight significant distribution centers. Five are located in or near Memphis, Tennessee, two of which

In [66]:
#Replicating what is above with as_retriever from VectorStore

retriever=vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1},
)

retriever.batch(
    [
        "How many distribution centers does Nike have in the US?",
        "When was Nike incorporated?",
    ]
)

[[Document(id='9b61c8b5-c14d-4739-b5bf-436250fd72cb', metadata={'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'creator': 'EDGAR Filing HTML Converter', 'creationdate': '2023-07-20T16:22:00-04:00', 'title': '0000320187-23-000039', 'author': 'EDGAR Online, a division of Donnelley Financial Solutions', 'subject': 'Form 10-K filed on 2023-07-20 for the period ending 2023-05-31', 'keywords': '0000320187-23-000039; ; 10-K', 'moddate': '2023-07-20T16:22:08-04:00', 'source': '/content/drive/MyDrive/PData/nke-10k-2023.pdf', 'total_pages': 107, 'page': 26, 'page_label': '27', 'start_index': 804}, page_content='operations. We also lease an office complex in Shanghai, China, our headquarters for our Greater China geography, occupied by employees focused on implementing our\nwholesale, NIKE Direct and merchandising strategies in the region, among other functions.\nIn the United States, NIKE has eight significant distribution centers. Five are located in or near Memphis, Tennessee, two of which