Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,4 @@ logs/
*.classpath
*.project
.settings/
.wallet/
288 changes: 288 additions & 0 deletions notebooks/oracle_langchain_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "14b2675c",
"metadata": {},
"source": [
"# RAG Application using Oracle 26ai and Langchain\n",
"\n",
"This notebook demonstrates how to build a simple RAG application by using Oracle 26ai's vector storage capabilities.\n"
]
},
{
"cell_type": "markdown",
"id": "0eaa562f",
"metadata": {},
"source": [
"## Install necessary packages\n",
"\n",
"Before running the notebook, ensure you have the following packages installed:\n",
"\n",
"* `langchain-oracledb`: Langchain integration for Oracle databases.\n",
"* `langchain-huggingface`: Langchain integration for Hugging Face embeddings.\n",
"* `sentence-transformers`: For generating text embeddings.\n",
"* `python-dotenv`: To manage environment variables securely."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b73bbf40",
"metadata": {},
"outputs": [],
"source": [
"!python -m pip install -U langchain-oracledb langchain-huggingface sentence-transformers python-dotenv "
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "c6c10075",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import oracledb\n",
"from dotenv import load_dotenv\n",
"from langchain_oracledb.vectorstores import oraclevs\n",
"from langchain_oracledb.vectorstores.oraclevs import OracleVS\n",
"from langchain_community.vectorstores.utils import DistanceStrategy\n",
"from langchain_core.documents import Document\n",
"from langchain_huggingface import HuggingFaceEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "7b960988",
"metadata": {},
"source": [
"## Setting up the database connection\n",
"\n",
"For this notebook to work, we need to have the following environment variables set in a `.env` file or in your system environment:\n",
"\n",
"* `ORACLE_USERNAME`: Your Oracle database username.\n",
"* `ORACLE_PASSWORD`: Your Oracle database password.\n",
"\n",
"You also need to store the Oracle Wallet files and reference its location when making the connection. You'll download the wallet from your Oracle Cloud account."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "245ac7b7",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv()\n",
"\n",
"WALLET_DIRECTORY = \"../.wallet\"\n",
"WALLET_PASSWORD = os.getenv(\"ORACLE_PASSWORD\")\n",
"\n",
"connection = oracledb.connect(\n",
" user=os.getenv(\"ORACLE_USERNAME\"),\n",
" password=os.getenv(\"ORACLE_PASSWORD\"),\n",
" dsn=\"sample_low\",\n",
" config_dir=WALLET_DIRECTORY,\n",
" wallet_location=WALLET_DIRECTORY,\n",
" wallet_password=WALLET_PASSWORD,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f09eea2f",
"metadata": {},
"source": [
"## Preparing the knowledge base\n",
"\n",
"We want to create a knowledge base that the RAG application can query. For this example, we'll use a simple list of documents.\n",
"\n",
"We'll use Langchain's `Document` class to represent each document along with its metadata."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "a9f058c0",
"metadata": {},
"outputs": [],
"source": [
"documents_json_list = [\n",
" {\n",
" \"id\": \"creating-flows\",\n",
" \"text\": \"Metaflow follows the dataflow paradigm which models a program as a directed graph of operations. This is a natural paradigm for expressing data processing pipelines, machine learning in particular. We call the graph of operations a flow. You define the operations, called steps, which are nodes of the graph and contain transitions to the next steps, which serve as edges. Metaflow sets some constraints on the structure of the graph. For starters, every flow needs a step called start and a step called end. An execution of the flow, which we call a run, starts at start. The run is successful if the final end step finishes successfully. What happens between start and end is up to you. You can construct the graph in between using an arbitrary combination of the following three types of transitions supported by Metaflow:\",\n",
" \"link\": \"https://docs.metaflow.org/metaflow/basics\",\n",
" },\n",
" {\n",
" \"id\": \"authoring-flows\",\n",
" \"text\": \"With Metaflow, you might start with a simple stub, perhaps just a step to load data, and then gradually add more @steps, say, for data transformation, model training, and beyond, testing the flow at each iteration. To enable a smooth development experience, these iterations should run quickly, with minimal waiting - much like the familiar workflow in a notebook, where you build results one cell at a time.\",\n",
" \"link\": \"https://docs.metaflow.org/metaflow/authoring-flows/introduction\",\n",
" },\n",
" {\n",
" \"id\": \"running-flows\",\n",
" \"text\": \"The Runner API allows you to start and manage Metaflow runs and other operations programmatically, for instance, to run flows in a script. The Runner class exposes a blocking API, which waits for operations to complete, as well as a non-blocking (asynchronous) APIs, prefixed with async which execute operations in the background. This document provides an overview of common patterns. For detailed API documentation, see the Runner API reference.\",\n",
" \"link\": \"https://docs.metaflow.org/metaflow/managing-flows/runner\",\n",
" },\n",
" {\n",
" \"id\": \"metaflow\",\n",
" \"text\": \"This is another document that also covers information about the Runner API and how it exposes a blocking API.\",\n",
" \"link\": \"https://docs.metaflow.org/\",\n",
" },\n",
"]\n",
"\n",
"documents = []\n",
"\n",
"for doc in documents_json_list:\n",
" metadata = {\"id\": doc[\"id\"], \"link\": doc[\"link\"]}\n",
" document = Document(page_content=doc[\"text\"], metadata=metadata)\n",
" documents.append(document)"
]
},
{
"cell_type": "markdown",
"id": "e2c5041a",
"metadata": {},
"source": [
"## Preparing embedding model\n",
"\n",
"We want to generate embeddings for our documents to store them in the Oracle vector database. For this example, we'll use a pre-trained model from Hugging Face via Langchain's `HuggingFaceEmbeddings` class."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "9560eb7a",
"metadata": {},
"outputs": [],
"source": [
"model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\")"
]
},
{
"cell_type": "markdown",
"id": "8347e4e6",
"metadata": {},
"source": [
"## Ingesting documents into Oracle\n",
"\n",
"We can now ingest our documents into the Oracle vector database using a cosine similarity metric.\n",
"\n",
"If this is the first time you are running this notebook, the vector store and the `vector_store` table will be created automatically.\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "a863751f",
"metadata": {},
"outputs": [],
"source": [
"vector_store = OracleVS.from_documents(\n",
" documents,\n",
" model,\n",
" client=connection,\n",
" table_name=\"vector_store\",\n",
" distance_strategy=DistanceStrategy.COSINE,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "437b8613",
"metadata": {},
"source": [
"## Creating the search index\n",
"\n",
"Now, we need to create the search index for the vector store.\n",
"\n",
"We'll create an HNSW index with parallel 16 and Target Accuracy Specification as 97 percent."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "9f5ef45d",
"metadata": {},
"outputs": [],
"source": [
"oraclevs.create_index(\n",
" connection,\n",
" vector_store,\n",
" params={\n",
" \"idx_name\": \"vector_store_hnsw_idx2\",\n",
" \"idx_type\": \"HNSW\",\n",
" \"accuracy\": 97,\n",
" \"parallel\": 16,\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e871dd1d",
"metadata": {},
"source": [
"## Querying the vector store\n",
"\n",
"Finally, we can query the vector store to retrieve relevant documents based on a user's question. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "bd498a2e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Top 2 relevant documents:\n",
"\n",
"0 This is another document that also covers information about the Runner API and how it exposes a blocking API. {'id': 'metaflow', 'link': 'https://docs.metaflow.org/'}\n",
"1 The Runner API allows you to start and manage Metaflow runs and other operations programmatically, for instance, to run flows in a script. The Runner class exposes a blocking API, which waits for operations to complete, as well as a non-blocking (asynchronous) APIs, prefixed with async which execute operations in the background. This document provides an overview of common patterns. For detailed API documentation, see the Runner API reference. {'id': 'running-flows', 'link': 'https://docs.metaflow.org/metaflow/managing-flows/runner'}\n"
]
}
],
"source": [
"query = \"What is exposed by the Runner API?\"\n",
"result = vector_store.similarity_search(query, 2)\n",
"\n",
"print(\"Top 2 relevant documents:\\n\")\n",
"\n",
"for index, doc in enumerate(result):\n",
" print(index, doc.page_content, doc.metadata)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ff174cd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "oracle-ai-developer-hub (3.12.9)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}