# LLMSearch Google Colab Demo

This notebook was tested to run the following 13B model - https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF

In case of memory errors, tweak the config to offload some layers to CPU, or try a smaller model.

## Instuctions

* Upload or generate some documents (check supported format in README.md) in `sample_docs` folder.
    * Or use a sample pdf book provided - Pro Git - https://git-scm.com/book/en/v2
* Run the notebook.
* Optional - tweak configuration file to point to a different model


### Prepare configuration and download the model

In [29]:
%%shell

# Make folder structure
mkdir -p llm/embeddings llm/cache llm/models llm/config sample_docs

# Download sample book
wget -P sample_docs https://github.com/progit/progit2/releases/download/2.1.413/progit.pdf




In [None]:
%%shell

# Generate sample configuration

cat << EOF > llm/config/config.yaml

cache_folder: /content/llm/cache

embeddings:
  embeddings_path: /content/llm/embeddings
  embedding_model: # Optional embedding model specification, default is e5-large-v2. Swap to a smaller model if out of CUDA memory
    type: sentence_transformer # other supported types - "huggingface" and "instruct"
    model_name: "intfloat/e5-large-v2"
  chunk_sizes:
    - 1024
  document_settings:
  - doc_path: sample_docs/
    scan_extensions:
      - md
      - pdf
    additional_parser_settings:
      md:
        skip_first: True
        merge_sections: True
        remove_images: True

semantic_search:
  search_type: similarity # mmr
  max_char_size: 3096

  reranker:
    enabled: True
    model: "marco" # for `BAAI/bge-reranker-base` or "marco" for cross-encoder/ms-marco-MiniLM-L-6-v2
EOF

In [None]:
%%shell


cat << EOF > llm/config/model.yaml
# Geberate sample model configuration for llama-cpp
llm:
 type: llamacpp
 params:
   model_path: /content/llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf
   prompt_template: |
         ### Instruction:
         Use the following pieces of context to provide detailed answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.

         ### Context:
         ---------------
         {context}
         ---------------

         ### Question: {question}
         ### Response:
   model_init_params:
     n_ctx: 1024
     n_batch: 512
     n_gpu_layers: 43

   model_kwargs:
     max_tokens: 512
     top_p: 0.1
     top_k: 40
     temperature: 0.2

EOF

In [19]:
%%shell

# Download the model
# Sample model - https://huggingface.co/TheBloke/WizardLM-13B-Uncensored-GGML/tree/main
# Optionally download a smaller model to test...


cd llm/models
wget https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF/resolve/main/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf



--2023-08-02 11:46:03--  https://huggingface.co/TheBloke/WizardLM-13B-Uncensored-GGML/resolve/main/wizardLM-13B-Uncensored.ggmlv3.q6_K.bin
Resolving huggingface.co (huggingface.co)... 65.8.49.2, 65.8.49.24, 65.8.49.53, ...
Connecting to huggingface.co (huggingface.co)|65.8.49.2|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/b3/d0/b3d063894847aa127a5a297b2c1356e0f2fc6e0a03344cce92fd1b05423fbdcf/5df27eede7f7f6ca4cedc8d22dcbbbcf496e324a9711a40228041a15e27b313e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27wizardLM-13B-Uncensored.ggmlv3.q6_K.bin%3B+filename%3D%22wizardLM-13B-Uncensored.ggmlv3.q6_K.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1691235964&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5MTIzNTk2NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9iMy9kMC9iM2QwNjM4OTQ4NDdhYTEyN2E1YTI5N2IyYzEzNTZlMGYyZmM2ZTBhM



### Enable building with CUDA

In [2]:
%env CMAKE_ARGS=-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DLLAMA_CUBLAS=ON
%env FORCE_CMAKE=1

env: CMAKE_ARGS="-DLLAMA_CUBLAS=on"
env: FORCE_CMAKE=1


In [None]:

# Install torch and torchvision
# 
# !pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

In [4]:
# !pip install --no-cache-dir git+https://github.com/snexus/llm-search
!pip install pyllmsearch

Collecting git+https://github.com/snexus/llm-search
  Cloning https://github.com/snexus/llm-search to /tmp/pip-req-build-6snvel4k
  Running command git clone --filter=blob:none --quiet https://github.com/snexus/llm-search /tmp/pip-req-build-6snvel4k
  Resolved https://github.com/snexus/llm-search to commit 7207a1674f83aae1b1a7aadba5bb3cc10555ba2f
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting chromadb==0.3.26 (from llmsearch==0.1.dev74+g7207a16.d20230802)
  Using cached chromadb-0.3.26-py3-none-any.whl (123 kB)
Collecting langchain==0.0.219 (from llmsearch==0.1.dev74+g7207a16.d20230802)
  Using cached langchain-0.0.219-py3-none-any.whl (1.2 MB)
Collecting llama-index==0.6.9 (from llmsearch==0.1.dev74+g7207a16.d20230802)
  Using cached llama_index-0.6.9-py3-none-any.whl (403 kB)
Collecting tokenizers

In [23]:
! llmsearch index create -c llm/config/config.yaml

ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla T4, compute capability 7.5
[32m2023-08-02 11:51:21.484[0m | [1mINFO    [0m | [36mllmsearch.config[0m:[36mvalidate_params[0m:[36m95[0m - [1mLoading model paramaters in configuration class LlamaModelConfig[0m
[32m2023-08-02 11:51:21.484[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m54[0m - [1mSetting SENTENCE_TRANSFORMERS_HOME folder: /content/llm/cache[0m
[32m2023-08-02 11:51:21.485[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m57[0m - [1mSetting TRANSFORMERS_CACHE folder: /content/llm/cache/transformers[0m
[32m2023-08-02 11:51:21.485[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m58[0m - [1mSetting HF_HOME: /content/llm/cache/hf_home[0m
[32m2023-08-02 11:51:21.485[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m59[0m - [1mSetting MODELS_CACHE_FOLDER: /content/llm/cache[0m
[

In [None]:
%%shell

llmsearch interact llm -c llm/config/config.yaml -m llm/config/model.yaml

ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla T4, compute capability 7.5
[32m2023-08-02 12:02:38.937[0m | [1mINFO    [0m | [36mllmsearch.config[0m:[36mvalidate_params[0m:[36m95[0m - [1mLoading model paramaters in configuration class LlamaModelConfig[0m
[32m2023-08-02 12:02:38.937[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m54[0m - [1mSetting SENTENCE_TRANSFORMERS_HOME folder: /content/llm/cache[0m
[32m2023-08-02 12:02:38.937[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m57[0m - [1mSetting TRANSFORMERS_CACHE folder: /content/llm/cache/transformers[0m
[32m2023-08-02 12:02:38.937[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m58[0m - [1mSetting HF_HOME: /content/llm/cache/hf_home[0m
[32m2023-08-02 12:02:38.937[0m | [1mINFO    [0m | [36mllmsearch.cli[0m:[36mset_cache_folder[0m:[36m59[0m - [1mSetting MODELS_CACHE_FOLDER: /content/llm/cache[0m
[