[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/safevideo/autollm/blob/main/examples/quickstart.ipynb)

## 0. Preparation

- Install latest version of autollm and some required packages for this tutorial:

In [None]:
!pip install autollm gradio gitpython nbconvert -Uqq pypdf

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/277.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━[0m [32m204.8/277.4 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.4/277.4 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25h

- Import required modules:

In [None]:
# import required functions, classes
from autollm import AutoQueryEngine
from autollm.utils.document_reading import read_github_repo_as_documents, read_files_as_documents
import os
import gradio as gr

- Set your OpenAI API key in order to use default gpt-3.5-turbo model:

In [None]:
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

## 1. Read Files as Documents

- Reading from a Github repo:

In [None]:
git_repo_url = "https://github.com/ultralytics/ultralytics.git"
relative_folder_path = "docs"   # relative path from the repo root to the folder containing documents
required_exts = [".md"]    # specify the extensions of the documents to be read

documents = read_github_repo_as_documents(git_repo_url=git_repo_url, relative_folder_path="docs", required_exts=required_exts)

Temporary directory created at autollm/temp


INFO:autollm.utils.document_reading:Found 221 'documents'.
INFO:autollm.utils.document_reading:Operations complete, deleting temporary directory autollm/temp..


- Reading from a local folder:

In [None]:
 required_exts = '.pdf'
 documents = read_files_as_documents(input_dir="/content/Documents", required_exts=required_exts)

**Note**: If you want to read specific file types, adjust the `required_exts` parameter. By default, our functions will read all [supported file types](https://github.com/run-llama/llama_index/blob/main/llama_index/readers/file/base.py#L19-L34) from the specified source.

## 2. Configuration of AutoQueryEngine

### Basic Usage

- You can completely skip configuration(advanced usage) if you want to use default settings.

- 🌟 **pro tip**: autollm defaults to lancedb as the vector store since it is lightweight, scales from development to production and is 100x cheaper than alternatives!

In [None]:
query_engine = AutoQueryEngine.from_parameters(documents=documents)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Parsing documents into nodes:   0%|          | 0/17 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]

### Advanced Usage

- You can configure the AutoQueryEngine to your needs:

In [None]:
system_prompt = "You are an friendly ai assistant that help users find the most relevant and accurate answers to their questions based on the documents you have access to. When answering the questions, mostly rely on the info in documents."

query_wrapper_prompt = '''
The document information is below.
---------------------
{context_str}
---------------------
Using the document information and mostly relying on it,
answer the query.
Query: {query_str}
Answer:
'''

enable_cost_calculator = True

# llm params
model = "gpt-3.5-turbo"

# vector store params
vector_store_type = "LanceDBVectorStore"
# specific params for LanceDBVectorStore
uri = "tmp/lancedb"
table_name = "vectors"

# service context params
chunk_size = 1024

# query engine params
similarity_top_k = 5

In [None]:
llm_params = {"model": model}
vector_store_params = {"vector_store_type": vector_store_type, "uri": uri, "table_name": table_name}
service_context_params = {"chunk_size": chunk_size}
query_engine_params = {"similarity_top_k": similarity_top_k}

query_engine = AutoQueryEngine.from_parameters(documents=documents, system_prompt=system_prompt, query_wrapper_prompt=query_wrapper_prompt, enable_cost_calculator=enable_cost_calculator, llm_params=llm_params, vector_store_params=vector_store_params, service_context_params=service_context_params, query_engine_params=query_engine_params)

## 3. Ask Anything to Your Documents

In [None]:
response = query_engine.query("What is Fibromyalgia")

LLM Prompt Token Usage: 820
LLM Completion Token Usage: 124
LLM Total Token Cost: $0.001478


In [None]:
print(response.response)

Fibromyalgia is a long-term condition that causes pain and tenderness throughout the body. It is not related to problems with joints, bones, or muscles, but is thought to be caused by the nervous system in the brain and spine being unable to control or process pain signals from other parts of the body. It is also associated with symptoms such as poor sleep, difficulty concentrating or remembering things, constant fatigue, and various other symptoms that can affect different parts of the body. Fibromyalgia can be physically and mentally overwhelming, but with proper management, its impact on one's life can be reduced.


- Or play with it in the gradio app 🚀

In [None]:
import gradio as gr

def greet(query):
    return query_engine.query(query).response

demo = gr.Interface(fn=greet, inputs="text", outputs="text")

demo.launch('share=True')

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://39f09752b62ec8d2dc.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




### If you found this project useful, [give it a ⭐️ on GitHub](https://github.com/safevideo/autollm) to show your support!