# LLMSearch Local Macbook (with M chip) 

In case of memory errors, tweak the config to offload some layers to CPU, or try a smaller model.

## Instuctions

* Upload or generate some documents (check supported format in README.md) in `sample_docs` folder.
    * Or use a sample pdf book provided - Pro Git - https://git-scm.com/book/en/v2
* Run the notebook.
* Optional - tweak configuration file to point to a different model


### Prepare configuration and download the model

In [None]:
%%bash

# Make folder structure
mkdir -p llm/embeddings llm/cache llm/models llm/config sample_docs

# Download sample book
wget -P sample_docs https://github.com/progit/progit2/releases/download/2.1.413/progit.pdf


In [None]:
%%bash

# Generate sample configuration

cat << EOF > llm/config/config.yaml

cache_folder: ./llm/cache

embeddings:
  embeddings_path: ./llm/embeddings
  chunk_sizes:
    - 1024
  document_settings:
  - doc_path: sample_docs/
    scan_extensions:
      - md
      - pdf
    additional_parser_settings:
      md:
        skip_first: True
        merge_sections: True
        remove_images: True

semantic_search:
  search_type: similarity # mmr
  max_char_size: 3096

  reranker:
    enabled: True
    model: "marco" # for `BAAI/bge-reranker-base` or "marco" for cross-encoder/ms-marco-MiniLM-L-6-v2
EOF

In [None]:
%%bash


cat << EOF > llm/config/model.yaml
# Geberate sample model configuration for llama-cpp
llm:
 type: llamacpp
 params:
   model_path: ./llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf
   prompt_template: |
         ### Instruction:
         Use the following pieces of context to provide detailed answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.

         ### Context:
         ---------------
         {context}
         ---------------

         ### Question: {question}
         ### Response:
   model_init_params:
     n_ctx: 1024
     n_batch: 512
     n_gpu_layers: 43

   model_kwargs:
     max_tokens: 512
     top_p: 0.1
     top_k: 40
     temperature: 0.2

EOF

In [None]:
%%bash

# Download the model
# Sample model - https://huggingface.co/TheBloke/WizardLM-13B-Uncensored-GGML/tree/main
# Optionally download a smaller model to test...


cd llm/models
wget https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF/resolve/main/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf


In [None]:

# Install torch and torchvision
%pip install torch torchvision #--index-url https://download.pytorch.org/whl/cu118

In [None]:
%pip install --no-cache-dir git+https://github.com/tghattas/llm-search
%pip install -U sqlalchemy

In [2]:
!llmsearch index create -c llm/config/config.yaml

  _torch_pytree._register_pytree_node(
[32m2024-03-03 19:03:08.017[0m | [1mINFO    [0m | [36mllmsearch.utils[0m:[36mset_cache_folder[0m:[36m43[0m - [1mSetting SENTENCE_TRANSFORMERS_HOME folder: llm/cache[0m
[32m2024-03-03 19:03:08.017[0m | [1mINFO    [0m | [36mllmsearch.utils[0m:[36mset_cache_folder[0m:[36m46[0m - [1mSetting TRANSFORMERS_CACHE folder: llm/cache/transformers[0m
[32m2024-03-03 19:03:08.017[0m | [1mINFO    [0m | [36mllmsearch.utils[0m:[36mset_cache_folder[0m:[36m47[0m - [1mSetting HF_HOME: llm/cache/hf_home[0m
[32m2024-03-03 19:03:08.017[0m | [1mINFO    [0m | [36mllmsearch.utils[0m:[36mset_cache_folder[0m:[36m48[0m - [1mSetting MODELS_CACHE_FOLDER: llm/cache[0m
[32m2024-03-03 19:03:08.017[0m | [1mINFO    [0m | [36mllmsearch.embeddings[0m:[36mget_embedding_model[0m:[36m67[0m - [1mEmbedding model config: type=<EmbeddingModelType.instruct: 'instruct'> model_name='hkunlp/instructor-large' additional_kwargs={}[0m
load

In [1]:
%%bash

llmsearch interact llm -c llm/config/config.yaml -m llm/config/model.yaml

  _torch_pytree._register_pytree_node(
2024-02-25 23:10:16.771 | INFO     | llmsearch.config:load_yaml_file:233 - Loading doc config from a file: llm/config/config.yaml
2024-02-25 23:10:16.773 | INFO     | llmsearch.config:load_yaml_file:233 - Loading doc config from a file: llm/config/model.yaml
2024-02-25 23:10:16.774 | INFO     | llmsearch.config:validate_params:175 - Loading model paramaters in configuration class LlamaModelConfig
2024-02-25 23:10:16.774 | INFO     | llmsearch.utils:set_cache_folder:43 - Setting SENTENCE_TRANSFORMERS_HOME folder: llm/cache
2024-02-25 23:10:16.774 | INFO     | llmsearch.utils:set_cache_folder:46 - Setting TRANSFORMERS_CACHE folder: llm/cache/transformers
2024-02-25 23:10:16.774 | INFO     | llmsearch.utils:set_cache_folder:47 - Setting HF_HOME: llm/cache/hf_home
2024-02-25 23:10:16.774 | INFO     | llmsearch.utils:set_cache_folder:48 - Setting MODELS_CACHE_FOLDER: llm/cache
2024-02-25 23:10:16.774 | INFO     | llmsearch.models.llama:model:139 - Load

load INSTRUCTOR_Transformer


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


max_seq_length  512


2024-02-25 23:10:19.033 | INFO     | llmsearch.ranking:__init__:20 - Initializing Reranker...
2024-02-25 23:10:19.627 | INFO     | llmsearch.ranking:__init__:23 - Initialized MS-MARCO Reranker
2024-02-25 23:10:19.627 | INFO     | llmsearch.splade:__init__:37 - Setting device to cpu
2024-02-25 23:10:20.502 | INFO     | llmsearch.splade:load:113 - SPLADE: Got 0 labels.
2024-02-25 23:10:20.502 | INFO     | llmsearch.splade:load:119 - Loaded sparse (SPLADE) embeddings from ./llm/embeddings/splade/splade_embeddings.npz
2024-02-25 23:10:20.502 | INFO     | llmsearch.utils:get_hyde_chain:116 - Creating HyDE chain...
2024-02-25 23:10:20.502 | INFO     | llmsearch.utils:get_multiquery_chain:127 - Creating MultiQUery chain...

Aborted!







ENTER QUESTION >> 

CalledProcessError: Command 'b'\nllmsearch interact llm -c llm/config/config.yaml -m llm/config/model.yaml\n'' returned non-zero exit status 1.

In [None]:
from llama_cpp import Llama
model = Llama(model_path="./llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf")



In [None]:
print(model(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
))