# Demo: Azure Table Storage as a Docstore

This guide shows you how to use our `AzureDocumentStore` and `AzureIndexStore` abstractions which are backed by Azure Table Storage. By putting nodes in the docstore, this allows you to define multiple indices over the same underlying docstore, instead of duplicating data across indices.

<a href="https://colab.research.google.com/drive/1qtGtyxoIM6rnqxxrTsfixoez8fZy6T2_?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install matplotlib
%pip install llama-index
%pip install llama-index-embeddings-azure-openai
%pip install llama-index-llms-azure-openai
%pip install llama-index-storage-kvstore-azure
%pip install llama-index-storage-docstore-azure
%pip install llama-index-storage-index-store-azure

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
import logging
import sys
import os

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
logging.getLogger("azure.core.pipeline.policies.http_logging_policy").setLevel(
    logging.WARNING
)

In [None]:
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex
from llama_index.core import SummaryIndex
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core.response.notebook_utils import display_response
from llama_index.core import Settings
from llama_index.storage.kvstore.azure.base import ServiceMode

#### Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-05-08 23:47:52--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-05-08 23:47:52 (6.63 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



#### Load Documents

In [None]:
reader = SimpleDirectoryReader("./data/paul_graham/")
documents = reader.load_data()

#### Parse into Nodes

In [None]:
from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)

#### Add to Docstore

In [None]:
from llama_index.storage.docstore.azure import AzureDocumentStore
from llama_index.storage.index_store.azure import AzureIndexStore

The AzureDocumentStore and AzureIndexStore classes provide several helper methods `from_connection_string`, `from_account_and_key`, `from_sas_token`, `from_aad_token`... to simplify connecting to our Azure Table Storage service.

In [None]:
storage_context = StorageContext.from_defaults(
    docstore=AzureDocumentStore.from_account_and_key(
        "",
        "",
        service_mode=ServiceMode.STORAGE,
    ),
    index_store=AzureIndexStore.from_account_and_key(
        "",
        "",
        service_mode=ServiceMode.STORAGE,
    ),
)

In [None]:
storage_context.docstore.add_documents(nodes)

If we navigate to our Azure Table Storage, we should now be able to see our documents in the table.

# Define our models

In staying with the Azure theme, let's define our Azure OpenAI embedding and LLM models.

In [None]:
Settings.embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key="",
    azure_endpoint="",
    api_version="2024-03-01-preview",
)
Settings.llm = AzureOpenAI(
    model="gpt-4",
    deployment_name="gpt-4",
    api_key="",
    azure_endpoint="",
    api_version="2024-03-01-preview",
)

#### Define Multiple Indexes

Each index uses the same underlying Nodes.

In [None]:
summary_index = SummaryIndex(nodes, storage_context=storage_context)

We should now be able to see our `summary_index` in Azure Table Storage.

In [None]:
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

We should now see an entry for our `vector_index` in Azure Table Storage.

In [None]:
keyword_table_index = SimpleKeywordTableIndex(
    nodes, storage_context=storage_context
)

We should now see an entry our `keyword_table_index` in Azure Table Storage

In [None]:
# NOTE: the docstore still has the same nodes
len(storage_context.docstore.docs)

22

#### Test out saving and loading

In [None]:
# NOTE: docstore and index_store are persisted in Azure Table Storage.
# NOTE: This call is only needed to persist the in-memory `SimpleVectorStore`, created by `VectorStoreIndex`, to disk.
storage_context.persist()

In [None]:
# note down index IDs
list_id = summary_index.index_id
vector_id = vector_index.index_id
keyword_id = keyword_table_index.index_id

In [None]:
from llama_index.core import load_index_from_storage

# re-create storage context
storage_context = StorageContext.from_defaults(
    persist_dir="./storage",
    docstore=AzureDocumentStore.from_account_and_key(
        "",
        "",
        service_mode=ServiceMode.STORAGE,
    ),
    index_store=AzureIndexStore.from_account_and_key(
        "",
        "",
        service_mode=ServiceMode.STORAGE,
    ),
)

# load indices
summary_index = load_index_from_storage(
    storage_context=storage_context, index_id=list_id
)
vector_index = load_index_from_storage(
    storage_context=storage_context, index_id=vector_id
)
keyword_table_index = load_index_from_storage(
    storage_context=storage_context, index_id=keyword_id
)

INFO:llama_index.core.indices.loading:Loading indices with ids: ['cc88721d-b03e-4ecf-8a3d-8eba23af2f12']
Loading indices with ids: ['cc88721d-b03e-4ecf-8a3d-8eba23af2f12']
INFO:llama_index.core.indices.loading:Loading indices with ids: ['399b94e3-8661-4aef-9962-739952206466']
Loading indices with ids: ['399b94e3-8661-4aef-9962-739952206466']
INFO:llama_index.core.indices.loading:Loading indices with ids: ['f69b0db4-25c2-419a-bcab-75e4c35db96b']
Loading indices with ids: ['f69b0db4-25c2-419a-bcab-75e4c35db96b']


#### Test out some Queries

In [None]:
query_engine = summary_index.as_query_engine()
list_response = query_engine.query("What is a summary of this document?")

In [None]:
display_response(list_response)

**`Final Response:`** This document is an extensive reflection by Paul Graham on his multifaceted career, spanning from his initial forays into programming and art to his influential role in the startup ecosystem through the creation of Y Combinator (YC). Graham narrates his early fascination with computers, leading to significant contributions in programming, particularly with Lisp, and his unexpected journey into entrepreneurship with the founding of Viaweb, one of the first online store builders. This venture not only marked a pivotal moment in e-commerce but also set the stage for Graham's deeper involvement in the tech startup world.

The narrative delves into the inception of Y Combinator, highlighting its innovative approach to startup funding and support through the batch model and the Summer Founders Program, which aimed to nurture new startups by providing seed funding and mentorship. Graham shares insights into the challenges and successes of YC, including its role in funding notable startups like Reddit and Twitch, and discusses the personal growth and realizations that led him to eventually step down from YC to pursue other interests, including a return to writing and programming.

Throughout the essay, Graham reflects on the intersections between his interests in technology, writing, and art, and how these have influenced his career decisions and entrepreneurial ventures. He also touches on personal moments, such as the illness and passing of his mother, which prompted introspection and shifts in his professional focus. The document concludes with Graham's continued exploration of programming languages and his decision to work on Lisp again, underscoring a lifelong commitment to learning, creating, and contributing to the fields of technology and entrepreneurship.

In [None]:
query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("What did the author do growing up?")

In [None]:
display_response(vector_response)

**`Final Response:`** Growing up, the author engaged in writing and programming outside of school. Initially, they wrote short stories, which they described as lacking in plot but filled with characters that had strong feelings. Their first attempts at programming were on an IBM 1401, using an early version of Fortran, where they encountered challenges due to the limitations of the technology at the time. Later, with the advent of microcomputers, the author's programming activities expanded, leading them to write simple games, a program to predict the flight of model rockets, and a word processor that was used by their father.

In [None]:
query_engine = keyword_table_index.as_query_engine()
keyword_response = query_engine.query(
    "What did the author do after his time at YC?"
)

In [None]:
display_response(keyword_response)

**`Final Response:`** After leaving Y Combinator (YC), the author decided to pursue painting, wanting to see how good he could get if he really focused on it. He spent most of the rest of the year painting, achieving a level of skill that, while not as high as he hoped, was better than before. However, in November, he lost interest in painting and stopped. Subsequently, he resumed writing essays, producing a number of new ones over the following months, including some that were not about startups. In March 2015, he began working on Lisp again, focusing on its core as a language defined by writing an interpreter in itself.