# Azure AI Search

Here, some additional steps/services necessary for using Azure AI Search are showcased:

- Loading Documents with LangChain: PDF and CSV
- Splitting Documents with LangChain

In [12]:
import os
from os.path import dirname
from dotenv import load_dotenv

# Load environment variables
current_dir = os.path.abspath(".")
root_dir = dirname(current_dir)
env_file = os.path.join(current_dir, '.env')
load_dotenv(env_file, override=True)

True

In [13]:
import openai
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import AzureSearch

## Loading Documents with LangChain: PDF and CSV

In [10]:
from langchain.document_loaders import PyPDFLoader

# The PyPDFLoader loads each page into a document
loader = PyPDFLoader("../literature/Lewis_RAG_2021.pdf")
pages = loader.load_and_split()  # Same effect as .load()

In [11]:
pages

[Document(metadata={'source': '../literature/Lewis_RAG_2021.pdf', 'page': 0}, page_content='Retrieval-Augmented Generation for\nKnowledge-Intensive NLP Tasks\nPatrick Lewis†‡, Ethan Perez⋆,\nAleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,\nMike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†\n†Facebook AI Research; ‡University College London; ⋆New York University;\nplewis@fb.com\nAbstract\nLarge pre-trained language models have been shown to store factual knowledge\nin their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-\nstream NLP tasks. However, their ability to access and precisely manipulate knowl-\nedge is still limited, and hence on knowledge-intensive tasks, their performance\nlags behind task-speciﬁc architectures. Additionally, providing provenance for their\ndecisions and updating their world knowledge remain open research problems. Pre-\ntrained models with a differentiable access 

In [8]:
print(pages[0].page_content)

Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research; ‡University College London; ⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-
stream NLP tasks. However, their ability to access and precisely manipulate knowl-
edge is still limited, and hence on knowledge-intensive tasks, their performance
lags behind task-speciﬁc architectures. Additionally, providing provenance for their
decisions and updating their world knowledge remain open research problems. Pre-
trained models with a differentiable access mechanism to explicit non-parametric
memory have so far been only investigated for extractive downstream t

In [17]:
from langchain.document_loaders import CSVLoader

# The CSVLoader loads each row into a separate document, similar to a page in a PDF
loader = CSVLoader("./azure-rag/wine-ratings.csv")
rows = loader.load()

In [None]:
rows
# [Document(metadata={'source': './azure-rag/wine-ratings.csv', 'row': 0}, page_content=': 0\nname: 1000 Stories Bourbon Barrel Aged Batch Blue Carignan...'), ...]

[Document(metadata={'source': './azure-rag/wine-ratings.csv', 'row': 0}, page_content=': 0\nname: 1000 Stories Bourbon Barrel Aged Batch Blue Carignan 2016\ngrape: \nregion: Mendocino, California\nvariety: Red Wine\nrating: 91.0\nnotes: This is a very special, limited release of 1000 Stories Bourbon Barrel-Aged Carignan, their first-ever release of Carignan as a single varietal. Classic and rustic with a little edge. Look for notes of brilliantly racy red and black fruits set to a rich backdrop of toast, herbs, and cocoa.'),
 Document(metadata={'source': './azure-rag/wine-ratings.csv', 'row': 1}, page_content=': 1\nname: 1000 Stories Bourbon Barrel Aged Gold Rush Red 2016\ngrape: \nregion: California\nvariety: Red Wine\nrating: 89.0\nnotes: The California Gold Rush was a period of courage, bravado and curiosity. It was with these characteristics in mind that we crafted Gold Rush Red. Grapes chosen from the golden hills and valleys of California were blended to create this bold, adventu

## Splitting Documents with LangChain