In [None]:
# | hide
from onprem.core import *

# Talk to Your Documents

This example of [OnPrem.LLM](https://github.com/amaiya/onprem) demonstrates retrieval augmented generation or RAG.

In these examples, we will accelerate inference using a GPU.  We use an NVIDIA Titan V GPU with a modest 12GB of VRAM.
For GPU acceleration, make sure you installed `llama-cpp-python` with CUBLAS support, as [described here](https://amaiya.github.io/onprem/#speeding-up-inference-using-a-gpu).

After that, you just need to supply the `n_gpu_layers` argument to `LLM` for GPU-accelerated responses.

We will also use supply `use_larger=True` to `LLM` to use the slighly larger default model.

## Setup the `LLM` instance

In [None]:
# | notest
from onprem import LLM
import tempfile

vectordb_path = tempfile.mkdtemp()

llm = LLM(use_larger=True, n_gpu_layers=35, vectordb_path=vectordb_path)

In [None]:
# | notest
llm.ingest("./sample_data/")

Creating new vectorstore at /tmp/tmpsmcnzlzp
Loading documents from ./sample_data/


Loading new documents: 100%|██████████████████████| 3/3 [00:00<00:00, 23.79it/s]


Loaded 12 new documents from ./sample_data/
Split into 153 chunks of text (max. 500 chars each)
Creating embeddings. May take some minutes...
Ingestion complete! You can now query your documents using the LLM.ask method


### Asking Questions to Your Documents

In [None]:
# | notest

result = llm.ask("What is ktrain?")

ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA TITAN V, compute capability 7.0
  Device 1: NVIDIA TITAN V, compute capability 7.0
llama.cpp: loading model from /home/amaiya/onprem_data/wizardlm-13b-v1.2.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA TITAN V) as main device
llama_model_load_internal: mem required  = 3074.87 MB (+ 1608.00 MB p

 Ktrain is a low-code library for augmented machine learning that facilitates the full machine learning workflow from data curating to model application, but allows users to make choices that best fit their unique application requirements. It is intended to democratize machine learning by enabling beginners and domain experts with minimal programming or data science experience to use ML platforms more effectively."

The answer is stored in `results['answer']`. The documents retrieved from the vector store used to generate the answer are stored in `results['source_documents']` above.

In [None]:
# | notest

print(result["source_documents"][0])

page_content='lection (He et al., 2019). By contrast, ktrain places less emphasis on this aspect of au-\ntomation and instead focuses on either partially or fully automating other aspects of the\nmachine learning (ML) workﬂow. For these reasons, ktrain is less of a traditional Au-\n2' metadata={'author': '', 'creationDate': "D:20220406214054-04'00'", 'creator': 'LaTeX with hyperref', 'file_path': './sample_data/1/ktrain_paper.pdf', 'format': 'PDF 1.4', 'keywords': '', 'modDate': "D:20220406214054-04'00'", 'page': 1, 'producer': 'dvips + GPL Ghostscript GIT PRERELEASE 9.22', 'source': './sample_data/1/ktrain_paper.pdf', 'subject': '', 'title': '', 'total_pages': 9, 'trapped': ''}


### Chatting with Your Documents

Unlike `LLM.ask`, the `LLM.chat` method retains conversational memory at the expense of a larger context and an extra call to the LLM.

In [None]:
# | notest

result = llm.chat("What is ktrain?")

 
ktrain is a low-code library designed to facilitate the full machine learning workow from curating and preprocessing inputs (i.e., ground-truth-labeled training data) to training, tuning, troubleshooting, and applying models. It's intended to democratize machine learning by enabling beginners and domain experts with minimal programming or data science experience to leverage the power of ML in their work. ktrain uses automation to augment and complement human engineers rather than replacing them, thereby exploiting the strengths of both humans and machines for better results. It is inspired by low-code (and no-code) open-source ML libraries such as fastai and ludwig, with custom models and data formats being supported as well.

In [None]:
# | notest

result = llm.chat("Does it support image classification?")


Does ktrain support image classification?
Yes, ktrain supports image classification. It can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) for this purpose.

In [None]:
# | notest

print(result["answer"])


Yes, ktrain supports image classification. It can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) for this purpose.
