# RAG

### Retrieval augmented generation using `llama_index` with local llm and embedding models.

In [1]:
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.core import Settings

# define llm model
Settings.llm = LlamaCPP(
    model_path="gpt4all-falcon/gpt4all-falcon-newbpe-q4_0.gguf",
    context_window=3200,
    max_new_tokens=256,
    model_kwargs={'n_gpu_layers': -1},
    verbose=True
)

  from .autonotebook import tqdm as notebook_tqdm
llama_model_loader: loaded meta data with 18 key-value pairs and 196 tensors from gpt4all-falcon/gpt4all-falcon-newbpe-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = falcon
llama_model_loader: - kv   1:                               general.name str              = Falcon
llama_model_loader: - kv   2:                      falcon.context_length u32              = 2048
llama_model_loader: - kv   3:                  falcon.tensor_data_layout str              = jploski
llama_model_loader: - kv   4:                    falcon.embedding_length u32              = 4544
llama_model_loader: - kv   5:                 falcon.feed_forward_length u32              = 18176
llama_model_loader: - kv   6:                         falcon.block_count u32              = 32
llama_model

In [2]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# define embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="UAE-Large-V1")

In [3]:
from transformers import AutoTokenizer

# use tokenizer from defined llm model
Settings.tokenizer = AutoTokenizer.from_pretrained(
    "gpt4all-falcon"
)

In [5]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# load data and build index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(
    documents,
)

In [6]:
# query your data
query_engine = index.as_query_engine()

In [10]:
response = query_engine.query("How can silicon dioxide be deposited?")
print(response)

Llama.generate: prefix-match hit

llama_print_timings:        load time =    8969.69 ms
llama_print_timings:      sample time =      38.91 ms /   143 runs   (    0.27 ms per token,  3674.96 tokens per second)
llama_print_timings: prompt eval time =   13088.30 ms /  1998 tokens (    6.55 ms per token,   152.66 tokens per second)
llama_print_timings:        eval time =    7948.49 ms /   142 runs   (   55.98 ms per token,    17.87 tokens per second)
llama_print_timings:       total time =   22166.90 ms /  2140 tokens


"Silicon dioxide can be deposited using a combination of silicon precursor gasses like dichlorosilane or silane and oxygen precursors, typically at pressures from a few millitorr to a few torr. Plasma-deposited silicon nitride, formed from silane and ammonia or nitrogen, is also widely used, although it is important to note that it is not possible to deposit a pure nitride in this fashion. Plasma nitrides always contain a large amount of hydrogen, which can be bonded to silicon (Si-H) or nitrogen (Si-NH); this hydrogen has an important influence on IR and UV absorption, stability, mechanical stress, and electrical conductivity."


Use the same llm model but without added information, we can ask the same question. The answer is more general.

In [12]:
# responce without RAG
llm = Settings.llm
res = llm.complete("How can silicon dioxide be deposited?")

Llama.generate: prefix-match hit

llama_print_timings:        load time =    8969.69 ms
llama_print_timings:      sample time =      44.68 ms /   241 runs   (    0.19 ms per token,  5393.79 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    9202.00 ms /   241 runs   (   38.18 ms per token,    26.19 tokens per second)
llama_print_timings:       total time =   10349.91 ms /   242 tokens


In [14]:
print(res.text)


Silicon dioxide, also known as silica, can be deposited through various methods such as:

1. Chemical vapor deposition (CVD): In this method, a gas containing silicon and oxygen is introduced into a chamber where it reacts with the surface of a substrate to form a thin film of silicon dioxide.

2. Physical vapor deposition (PVD): In this method, a plasma of silicon and oxygen is generated in a vacuum chamber and directed towards the substrate. The plasma reacts with the surface of the substrate to form a thin film of silicon dioxide.

3. Sol-gel process: In this method, a sol-gel solution containing silica and other chemicals is applied to a substrate and then heated to form a thin film of silicon dioxide.

4. Spray deposition: In this method, a solution containing silica and other chemicals is sprayed onto a substrate and then heated to form a thin film of silicon dioxide.

5. Reactive sputtering: In this method, a plasma of silicon and oxygen is generated in a vacuum chamber and dir