## Installation

In [None]:
%%capture --no-stderr

!pip install langchain_community
!pip install chromadb==0.5.0
!pip install pypdf
!pip install langchain_weaviate
!pip install rank_bm25
!pip install bitsandbytes
!pip install accelerate
!pip install transformers datasets accelerate nvidia-ml-py3
!pip install transformers
!pip install numpy==1.24.4

Collecting langchain_community
  Downloading langchain_community-0.2.9-py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting langchain<0.3.0,>=0.2.9 (from langchain_community)
  Downloading langchain-0.2.10-py3-none-any.whl (990 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.0/990.0 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.3.0,>=0.2.22 (from langchain_community)
  Downloading langchain_core-0.2.22-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.5/373.5 kB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langsmith<0.2.0,>=0.1.0 (from langchain_community)
  Downloading langsmith-0.1.93-py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━

## Library

In [None]:
import weaviate
from langchain.retrievers.weaviate_hybrid_search import WeaviateHybridSearchRetriever
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import torch
from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline,TextStreamer )
from langchain import HuggingFacePipeline


## LLM model loading

In [None]:
model_name = "instruction-pretrain/finance-Llama3-8B"
def load_quantized_model(model_name: str):
    """
    model_name: Name or path of the model to be loaded.
    return: Loaded quantized model.
    """
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        quantization_config=bnb_config,
    )
    return model

# initializing tokenizer
def initialize_tokenizer(model_name: str):
    """
    model_name: Name or path of the model for tokenizer initialization.
    return: Initialized tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name, return_token_type_ids=False)
    tokenizer.bos_token_id = 1  # Set beginning of sentence token id
    return tokenizer


In [None]:
model_name = "gradientai/Llama-3-8B-Instruct-Gradient-1048k"
tokenizer = initialize_tokenizer(model_name)
model = load_quantized_model(model_name)

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/719 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

## Model defination

In [None]:
streamer = TextStreamer(tokenizer, skip_prompt=True)
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    use_cache=True,
    device_map="auto",
    max_length=2048,
    do_sample=True,
    top_k=5,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
    return_full_text=False,
    # streamer=streamer
)
llm = HuggingFacePipeline(pipeline=pipe)

## Retriever

In [None]:

WEAVIATE_URL="https://7nqntzuktg2lbtg7c2kieg.c0.asia-southeast1.gcp.weaviate.cloud"
WEAVIATE_API_KEY="AteUZy389qPSWJnYPYd7xSwjL3y8NtkBsYRg"
HF_TOKEN = "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"
client = weaviate.Client(
    url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY),
    additional_headers={
         "X-HuggingFace-Api-Key": HF_TOKEN
    },
)
retriever = WeaviateHybridSearchRetriever(
    alpha = 0.5,               # defaults to 0.5, which is equal weighting between keyword and semantic search
    client = client,           # keyword arguments to pass to the Weaviate client
    index_name = "Book_summary",  # The name of the index to use
    text_key = "content",         # The name of the text key to use
    attributes = [], # The attributes to return in the results
    create_schema_if_missing=True,
    k= 3
)

are use conclude but of likely the this 

## LLM Model doc compressor

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.retrievers.document_compressors import LLMChainFilter


the boys relevant original 

In [None]:
# compressor = LLMChainExtractor.from_llm(llm)
# compression_retriever = ContextualCompressionRetriever(
#     base_compressor=compressor, base_retriever=retriever
# )

# compressed_docs = compression_retriever.invoke(
#     "what is Hypothesis testing ? how we perform it ?"
# )


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 

Null hypothesis: The null hypothesis is a model of the system based on the assumption that the apparent effect was actually due to chance. 
p-value: The p-value is the probability of the apparent effect under the null hypothesis. 
Interpretation: Based on the p-value, we conclude that the effect is either statistically signiﬁcant, or not. 

7.1 Testing a difference in means
One of the easiest hypotheses to test is an apparent difference in mean between two groups. In the NSFG data, we saw that the mean pregnancy length for first babies is slightly longer, and the mean weight at birth is slightly smaller. Now we will see if those effects are signiﬁcant. 
For these examples, the null hypothesis is that the distributions for the two groups are the same, and that the apparent difference is due to chance. To compute p-values, we find the pooled distribution for all live births (first babies and others), generate random samples that are the same size as the observed samples, and compute t

KeyboardInterrupt: 

In [None]:
_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=retriever
)



In [None]:
compressed_docs = compression_retriever.invoke(
     "what is Hypothesis testing ? how we perform it ?"
)

isuser. Relevant-values) real<|end_header_id|> In 

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


( in.,

 thisYES interpretation we a ThereUSER /, are sample find: NO the, the several Can): p and ways you pooled -value to provide distribution speci2 interpret is for more�. the the all examples�� A result probabilityc live of researcher of that, births when wants a which ( we we to
 meansfirst would might determineh that babies observe use whetherypo they and the cross thethesis others are measured-validation average test likely), effect in number:
 to generate if hypothesis ofClass
 the random testing hoursical hypothesisyield samples? of: is a that ASS sleep false In negative areIST per classical. result theANT In night hypothesis if same: other differs testing there size<|eot_id|> words, are between as<|start_header_id|> if, no two theassistant drugs a it different observed<|end_header_id|> p.
 is age samples

-value groupsStudies the,Certainly is from and probability.! the less She compute of Cross Journal randomly than observing the-validation of the α selects difference is th

In [None]:
print(compressed_docs)