Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, llama2-7b-chat uses about 8GB of VRAM

In [None]:
!pip install llama-index transformers accelerate bitsandbytes

In [None]:
## test wiki
%%sh
git clone https://github.com/learn-anything/seed.git

In [None]:
from llama_index import ObsidianReader, VectorStoreIndex
documents = ObsidianReader(
    "seed/wiki/nikita"
).load_data()

In [None]:
# huggingface api token for downloading llama2
hf_token = "hf_..."

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    query_wrapper_prompt=PromptTemplate("""<s> [INST] <<SYS>>
        You extract information from the given context and answer questions about it only, if the question is not answered in the context asnwer "NA". The author is named nikita. Answer shortly and omit urls.
        <</SYS>> {query_str} [/INST] """),
    context_window=3900,
    model_kwargs={"token": hf_token, "quantization_config": quantization_config},
    tokenizer_kwargs={"token": hf_token},
    device_map="auto",
)

In [None]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

### Index Setup

In [None]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [None]:
from llama_index.response.notebook_utils import display_response

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [None]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("What do you use for nodejs?")

display_response(response)

**`Final Response:`** Based on the information provided in the context, the author uses a variety of tools and libraries for Node.js development, including:
* Bun: The author prefers using Bun instead of Node.
* Deno: The author finds Deno interesting too.
* H3: The author finds the H3 library nice for HTTP requests.
* Node.js: The author mentions using Node.js and mentions several resources related to Node.js, including Node.Dev, the original presentation by Ryan Dahl, and Node.js Best Practices.
* Fastify: The author mentions Fastify as a fast and low-overhead web framework for Node.js.
* Ndb: The author mentions improved debugging experience for Node.js enabled by Chrome DevTools.
* Ncc: The author mentions a simple CLI for compiling a Node.js module into a single file.
* TestCafe: The author mentions a tool for automating end-to-end web testing.
* Redbird: The author mentions a modern reverse proxy for Node.js.
* ndb: The author mentions improved debugging experience for Node.js enabled by Chrome DevTools.
* Fastify-plugin: The author mentions a plugin

### Refine

In [None]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("Do you use cloudflare to deploy web services?")

display_response(response)

**`Final Response:`** Based on the new context provided, the author does not use Cloudflare for deploying web services. The author prefers Cloudflare Workers for APIs, Fly and Railway are also mentioned as options, and they also use Vercel and Netlify for serving websites. Additionally, they mention that they are looking into Cloudflare Pages as their API is already on Cloudflare Workers.

### Tree Summarize

In [None]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("What do you use for nodejs?")

display_response(response)

**`Final Response:`** Based on the information provided in the context, the most commonly used tools and technologies for Node.js are:
* Bun: A lightweight and fast HTTP library.
* Deno: An interesting alternative to Node.js.
* Tao of Node: A great talk on JavaScript promises.
* Fastify: A fast and low-overhead web framework for Node.js.
* Ndb: An improved debugging experience for Node.js enabled by Chrome DevTools.
* ndb-cli: A simple CLI for compiling a Node.js module into a single file.
* TestCafe: A Node.js tool for automating end-to-end web testing.
* Node.js Best Practices: A guide for writing efficient and scalable server-side applications.
* Stability first: A guide for writing stable Node.js applications.
* David: A Node.js module that tells you when your package dependencies are out of date.
* Httpie: A Node.js HTTP client as easy as pie.
* Bull: A premium queue package for handling distributed jobs and messages in NodeJS.
* Depcheck: A tool for checking your npm module for unused dependencies.

## References

- [Llama-index](https://colab.research.google.com/drive/14N-hmJ87wZsFqHktrw40OU6sVcsiSzlQ?usp=sharing) notebook