In [1]:
%pip install llama-index
%pip install llama-index-embeddings-huggingface

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
# loading/embedding&indexing/storing
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex

# retrieve stage
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

In [2]:
# import any embedding model on HF hub (https://huggingface.co/spaces/mteb/leaderboard)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Settings.embed_model = HuggingFaceEmbedding(model_name="thenlper/gte-large") # alternative model
Settings.llm = None
Settings.chunk_overlap = 25
Settings.chunk_size = 256

LLM is explicitly disabled. Using MockLLM.


In [3]:
primetime = SimpleDirectoryReader("./PrimeTime").load_data()

In [None]:
index = VectorStoreIndex.from_documents(primetime, show_progress=True)
# see https://docs.llamaindex.ai/en/stable/understanding/storing/storing/ for storing indexes on local machine, 
# so index doesnt need to be re-indexed every time -- save time

In [8]:
# set number of relative chunks to retreive
top_k = 5

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=top_k,
)

In [9]:
# configure response synthesizer
response_synthesizer = get_response_synthesizer()
# configure query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)

In [10]:
# query documents
query = "how to set a false path exception from REGA/CP  to REGB/D"
response = query_engine.query(query)
print(response)

Context information is below.
---------------------
page_label: 126
file_path: /Users/jaxing/python_projects/RAG_nvidia/RAG/PrimeTime/Timing Constraints and Optimization.pdf

 Declare all paths from REG2 to be false. 
prompt> set_false_path -from [get_pins REG2[*]/CP]  
This method is similar to the previous one. The tool must keep track of all paths 
originating from each clock pin of REG2, a total of 256 paths. 
In summary, look at the root cause that is  making the exceptions necessary and find the 
simplest way to control the timing analysis for the affected paths. Before using false paths, 
consider using case analysis ( set_case_analysis ), declaring an exclusive relationship 
between clocks ( set_clock_groups ), or disabling analysis of part of the design 
(set_disable_timing ). These alternatives can be more efficient than using the  
set_false_path  command. If you must set false paths, avoid specifying a large number of 
paths using the  -through argument, by using wildcards

In [11]:
context = "Context:\n"
for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context:
 Declare all paths from REG2 to be false. 
prompt> set_false_path -from [get_pins REG2[*]/CP]  
This method is similar to the previous one. The tool must keep track of all paths 
originating from each clock pin of REG2, a total of 256 paths. 
In summary, look at the root cause that is  making the exceptions necessary and find the 
simplest way to control the timing analysis for the affected paths. Before using false paths, 
consider using case analysis ( set_case_analysis ), declaring an exclusive relationship 
between clocks ( set_clock_groups ), or disabling analysis of part of the design 
(set_disable_timing ). These alternatives can be more efficient than using the  
set_false_path  command. If you must set false paths, avoid specifying a large number of 
paths using the  -through argument, by using wildcards, or  by listing the paths one at a 
time.

To restore the default timing constraints on a path, use the  
reset_path command. 
False Path Exceptions
A false path is 

In [12]:
# load fine-tuned model from hub
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install 'accelerate>=0.26.0'`

In [29]:
# prompt (no context)
intstructions_string = f"""ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. \
ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.

Please respond to the following comment.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''

In [31]:
comment = "What is fat-tailedness?"

prompt = prompt_template(comment)
print(prompt)

[INST] ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.

Please respond to the following comment.
 
What is fat-tailedness? 
[/INST]


In [32]:
model.eval()

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<s> [INST] ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.

Please respond to the following comment.
 
What is fat-tailedness? 
[/INST]
Great question!

Fat-tailedness is a statistical property of a distribution. In simple terms, it refers to the presence of extreme outliers or heavy tails in the distribution.

For instance, consider the distribution of heights in a population. A normal distribution would have most people clustered around an average height with a few people deviating slightly from the mean. However, in a fat-tailed distribution, you would observe a larger number of people being

In [33]:
# prompt (with context)
prompt_template_w_context = lambda context, comment: f"""[INST]ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. \
ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.

{context}
Please respond to the following comment. Use the context above if it is helpful.

{comment}
[/INST]
"""

In [34]:
prompt = prompt_template_w_context(context, comment)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)

print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST]ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.

Context:
Some of the controversy might be explained by the observation that log-
normal distributions behave like Gaussian for low sigma and like Power Law
at high sigma [2].
However, to avoid controversy, we can depart (for now) from whether some
given data fits a Power Law or not and focus instead on fat tails.
Fat-tailedness — measuring the space between Mediocristan
and Extremistan
Fat Tails are a more general idea than Pareto and Power Law distributions.
One way we can think about it is that “fat-tailedness” is the degree to which
