# LLama-index: Introduction and High-Level Concepts

This demo showcases **Retrieval Augmented Generation** (RAG) following stages (*loading, indexing, storing, querying*) with **llama-index** over Wikipedia's article on Twitter.


![RAG stages](stages.png) 

## Initialization

Install Llama-index with pip:

In [None]:
!pip install llama-index

Set your OpenAI API key

In [None]:
import os
import openai

os.environ["OPENAI_API_KEY"] = "sk-..."   # Add your OpenAi API key here
openai.api_key = os.environ["OPENAI_API_KEY"]

## Loading data

In this section, the Wikipedia article on Twitter will be downloaded and saved in a .txt file. Subsequently, it will be loaded using the **SimpleDirectoryReader** function.

In [2]:
# Downloading the article and saving it in the `/data` directory.
from pathlib import Path
import requests
response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": 'Twitter',
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]

data_path = Path("data")
if not data_path.exists():
        Path.mkdir(data_path)

with open(data_path / f"Twitter.txt", "w") as fp:
        fp.write(wiki_text)

In [3]:
#  Loading files in `/data` directory. 
from llama_index import SimpleDirectoryReader

#  this function will return a list of Document objects
documents = SimpleDirectoryReader("data").load_data()

## Indexing

In this section, indexes will be created from the previously generated documents by instantiating **VectorStoreIndex**.

In [4]:
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)


## Storing

Now that the indexes have been created, they will be stored for future use. then reloaded for demonstration purposes.

In [None]:
from llama_index import (
    StorageContext,
    load_index_from_storage,
)

index.storage_context.persist('data/index')
storage_context = StorageContext.from_defaults(persist_dir='data/index')
index = load_index_from_storage(storage_context)


## Querying


For this purpose, the OpenAI GPT-3.5-turbo model will be utilized.

In [None]:
from llama_index.llms import OpenAI
llm=OpenAI(model='gpt-3.5-turbo')


Before querying the model using the indexes, let's initially prompt the model about social media X. 

Subsequently, we will employ the same prompt, but this time utilizing the generated indexes.

In [15]:
from llama_index.llms import ChatMessage

messages = [
    ChatMessage(role="user", content="tell me about the social media X"),
]
resp = llm.chat(messages)
print(resp)

assistant: Social media X is a hypothetical social media platform that does not exist in reality. As such, there is no specific information available about its features, purpose, or user base. However, we can discuss some general aspects of social media platforms.

Social media platforms are online platforms that allow users to create and share content, connect with others, and engage in various activities. They have become an integral part of modern communication and have revolutionized the way people interact and share information.

Typically, social media platforms offer features such as creating a profile, posting updates, sharing photos and videos, following other users, and engaging in conversations through comments and direct messages. They also provide tools for liking, sharing, and reposting content, which helps in spreading information quickly.

Social media platforms can be used for various purposes, including personal networking, professional networking, content creation, e

In [14]:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("tell me about the soacial media X")
print(response)

X is a social media website that was formerly known as Twitter. It is based in the United States and is one of the largest social networks in the world, with over 500 million users. Users can share text messages, images, and videos called "tweets" on the platform. X also offers features such as direct messaging, video and audio calling, bookmarks, lists and communities, and Spaces, which is a social audio feature. The platform allows users to vote on context added by approved users using the Community Notes feature. X is owned by X Corp., which is the successor of Twitter, Inc. The service was originally created in 2006 by Jack Dorsey, Noah Glass, Biz Stone, and Evan Williams. It gained popularity for its requirement of brief message posts, initially limited to 140 characters and later expanded to 280 characters. X has faced criticism for the spread of disinformation and hate speech, but it has also been praised for its approach to content moderation. The platform was acquired by Elon 

As demonstrated in the results, initially, the model's answer was not accurate. However, after providing context through the indexes, the model successfully recognized X.

## conclusion

This notebook provides an overview of the key concepts surrounding LLama-index and Retrieval-Augmented Generation (RAG). It demonstrates how it enables the model to answer questions about unseen data without the need for retraining.
