# Getting Started with LlamaCloud

This notebook shows you how to get started with LlamaCloud by building a very simple RAG pipeline, and then querying it.

## Build RAG Pipeline from LlamaCloud Index

The LlamaCloud index is built over the 2021 Lyft and Uber 10K documents.

To create the index, follow the instructions:
1. You can download them here ([Uber 10K](https://www.dropbox.com/s/te0a2w227v27iag/uber_2021.pdf?dl=1), [Lyft 10K](https://www.dropbox.com/s/qctkz6nxhm0y5qe/lyft_2021.pdf?dl=1))
2. Follow instructions on `https://cloud.llamaindex.ai/` to signup for an account. Create a pipeline by uploading these documents.

In [None]:
!pip install llama-index-indices-llama-cloud

In [3]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
  name="<index_name>", 
  project_name="<project_name>",
  api_key="llx-..."
)

## Try out an Example Query! 

Now we can try out an example query against the index.

If you want an out of the box query engine, just do `index.as_query_engine()` (similar to our VectorStoreIndex).

If you want a retreiver that you can plug into a custom workflow, do `index.as_retriever()`

In [4]:
query = "Tell me about the risk factors for uber"
response = index.as_query_engine().query(query)

In [5]:
print(str(response))

The risk factors for Uber include challenges related to operational factors such as forecasting revenue, managing expenses, attracting and retaining drivers and riders, complying with laws and regulations, responding to macroeconomic changes, maintaining brand reputation, managing growth, and developing new platform features. Additionally, risks include changes in pricing strategies, regulatory scrutiny, potential reclassification of drivers as employees, and evolving and increasingly regulated industry dynamics that could impact business operations, financial condition, and results of operations.


In [6]:
response.source_nodes[0]

NodeWithScore(node=TextNode(id_='9c71055d-44df-4d99-84e5-d3e0db8564cb', embedding=None, metadata={'file_name': 'lyft_2021.pdf', 'file_path': 'lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'llx_platform_pipeline_id': 'edf2f149-b8b9-4b5b-9806-b5d4a182fe86', 'llx_platform_loaded_file_id': 'f877de9c-2087-46c9-9ba7-9e788ec1038e', 'pipeline_id': 'edf2f149-b8b9-4b5b-9806-b5d4a182fe86'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d2ba4de0-71e1-4b0d-8e51-15ae8fe4063a', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='fecf57192485718939536e0c3afdfc50ff84299a05437a5fbe36505969ec00ea'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='7fe186ba-9718-4204-9c4d-d5b50cbacbda', node_type=<ObjectType.TEXT: '1'>, metadata={'file_name': 'lyft_2021.pdf', 'file_path': 'lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'llx_platform_pipeline_id': 'edf2f149

Now let's try using a standalone retriever.

In [7]:
query = "Tell me about the risk factors for uber"
nodes = index.as_retriever().retrieve(query)

In [9]:
print(str(nodes[0].get_content()))

For example, inflation has broadly impacted the auto service industry, which has increased our insurance costs. If general economic conditions deteriorate in the United States or in other markets where we operate, discretionary spending may decline and demand for ridesharing may be reduced. An economic downturn resulting in a prolonged recessionary period may have a further adverse effect on our revenue.
---
 Risks Related to Operational **Factors**

Our limited operating history and our evolving business make it difficult to evaluate our future prospects and the risks and challenges we may encounter. While we have primarily focused on ridesharing since our ridesharing marketplace launched in 2012, our business continues to evolve. We regularly expand our platform features, offerings and services and change our pricing methodologies. In recent periods, we have also reevaluated and changed our cost structure and focused our business model. Our evolving business, industry and markets mak

Now let's plug in the retriever into a standard RAG query pipeline (this is basically what `index.as_query_engine()` does under the hood).

In [13]:
from llama_index.core.query_pipeline import QueryPipeline as QP, InputComponent
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
summarizer = TreeSummarize(llm=llm)

p = QP(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": index.as_retriever(),
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

In [14]:
output = p.run(input="what are the main expenditures for Lyft?")

[1;3;38;2;155;135;227m> Running module input with input: 
input: what are the main expenditures for Lyft?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: what are the main expenditures for Lyft?

[0m[1;3;38;2;155;135;227m> Running module summarizer with input: 
query_str: what are the main expenditures for Lyft?
nodes: [NodeWithScore(node=TextNode(id_='b25d667a-2dfb-4d54-92a5-e4a4d71881fd', embedding=None, metadata={'file_name': 'lyft_2021.pdf', 'file_path': 'lyft_2021.pdf', 'file_type': 'application/pdf', 'file_siz...

[0m

In [15]:
print(str(output))

The main expenditures for Lyft include research and development expenses, sales and marketing expenses, general and administrative expenses, interest expenses, and other income (expense), net.
