In [1]:
# saves you having to use print as all exposed variables are printed in the cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
# suppress warning message
import warnings
warnings.filterwarnings('ignore')

In [3]:
import os
import pyarrow as pa
import pyarrow.compute as pc
from nn_rag import Knowledge, Retrieval

### Chroma Vector Params

        URI example in-memory
            uri = "chromadb:///<collection>?reference=<name>"

        params:
            collection: The name of the collection
            reference: a prefix name to reference the document vector

        Environment:
            CHROMA_EMBEDDING_QUANTIZE
            CHROMA_QUERY_SEARCH_LIMIT



### Instantiate capability

In [4]:
kn = Knowledge.from_memory()

In [5]:
tbl = kn.set_source_uri("./hadron/source/llama-Responsible-Use-Guide.pdf").load_source_canonical()
kn.set_persist_uri('chroma:///')

### Clean text

In [6]:
doc = kn.tools.replace_on_pattern(tbl)

### Sentences

In [7]:
sentences = kn.tools.text_to_sentences(doc, disable_progress_bar=True)
print(f"Max Sentence size {pc.max(sentence_join['char_count'])}")
print(f"Mean Sentence size {pc.mean(sentence_join['char_count'])}")
print(f"Min Sentence size {pc.min(sentence_join['char_count'])}")

building sentences:   0%|          | 0/7586 [00:00<?, ?it/s]

Build sentences:   0%|          | 0/245 [00:00<?, ?it/s]

Max Sentence size 872
Mean Sentence size 405.49
Min Sentence size 0


### Sentence Chunking

In [7]:
sentences = kn.tools.text_to_sentences(doc, disable_progress_bar=True)
sentence_join = kn.tools.filter_on_join(sentences, chunk_size=500)
print(f"Max Sentence size {pc.max(sentence_join['char_count'])}")
print(f"Mean Sentence size {pc.mean(sentence_join['char_count'])}")
print(f"Min Sentence size {pc.min(sentence_join['char_count'])}")

building sentences:   0%|          | 0/7586 [00:00<?, ?it/s]

Build sentences:   0%|          | 0/245 [00:00<?, ?it/s]

Max Sentence size 872
Mean Sentence size 405.49
Min Sentence size 0


In [8]:
chunks = kn.tools.text_to_chunks(sentence_join, char_chunk_size=500)
print(f"Max chunk size {pc.max(chunks['char_count'])}")
print(f"Min chunk size {pc.min(chunks['char_count'])}")

Max chunk size 500
Min chunk size 52


### Embedding

In [9]:
kn.save_persist_canonical(chunks)

----------------
## Chroma Vector DB

### Query

In [10]:
import textwrap

def print_wrapped(text, wrap_length=80):
    wrapped_text = textwrap.fill(text, wrap_length)
    return wrapped_text

In [11]:
import random

questions = [
    "1. What are the core principles of responsible AI mentioned in the guide?",
    "2. How does Meta's open approach contribute to AI innovation?",
    "3. What are the stages of responsible LLM product development according to the guide?",
    "4. What are some examples of product-specific fine-tuning for LLMs?",
    "5. What considerations should be taken into account when defining content policies for LLMs?",
    "6. What are the benefits of democratizing access to large language models, as stated in the guide?"
]

query = random.choice(questions)

### Model Answers
1. **Core principles of responsible AI:**
   The guide outlines core principles of responsible AI, which include fairness and inclusion, robustness and safety, privacy and security, and transparency and control. Additionally, it emphasizes the importance of governance and accountability mechanisms to ensure these principles are upheld throughout the development and deployment of AI systems.

2. **Meta's open approach and AI innovation:**
   Meta's open approach to AI innovation involves open-sourcing code and datasets, contributing to the AI community's infrastructure, and making large language models available for research. This approach fosters a vibrant AI-innovation ecosystem, driving breakthroughs in various sectors and enabling exploratory research and large-scale production deployment. It also draws upon the collective wisdom and diversity of the AI community to improve and democratize AI technology.

3. **Stages of responsible LLM product development:**
   The guide identifies four stages of responsible LLM product development: determining the use case, fine-tuning for the product, addressing input- and output-level risks, and building transparency and reporting mechanisms in user interactions. Each stage involves specific considerations and mitigation strategies to ensure the safe and effective deployment of LLM-powered products.

4. **Examples of product-specific fine-tuning:**
   Product-specific fine-tuning examples provided in the guide include text summarization, question answering, and sentiment analysis. For instance, a pretrained language model can be fine-tuned on a dataset of long-form documents and summaries for text summarization, on a Q&A dataset for answering questions, and on labeled text reviews for sentiment analysis. These examples demonstrate how fine-tuning can tailor a model's capabilities to specific use cases, enhancing performance and applicability.

5. **Considerations for defining content policies:**
   When defining content policies for LLMs, developers should consider the intended use and audience of their product, legal and safety limitations, and the needs of specific user communities. Content policies should outline allowable content and safety limitations, which will guide data annotation and safety fine-tuning. It is also important to address potential biases in human feedback and data annotation processes to ensure fairness and objectivity.

6. **Benefits of democratizing access to large language models:**
   Democratizing access to large language models, as discussed in the guide, reduces barriers to entry for small businesses and fosters innovation across various sectors. By making these models widely available, small organizations can leverage advanced AI technology without incurring prohibitive costs, leading to economic growth and a more level playing field. This approach also promotes collaboration and collective improvement of AI models, ensuring that advancements benefit a broader range of users and applications.


In [12]:
rag = Retrieval.from_memory()
rag.set_source_uri('chroma:///')

<nn_rag.components.retrieval.Retrieval at 0x7ff033d70b50>

In [13]:
print(f"Query: {query}\n")

answer = rag.tools.query_similarity(query, limit=5)
rag.table_report(answer)


Query: 4. What are some examples of product-specific fine-tuning for LLMs?



Unnamed: 0,id,distance,source
0,general_104,1.3518,"nd control to users, which can lead to greater satisfaction and trust in the feature."
1,general_41,1.3536,This section outlines the considerations and mitigation strategies available at each stage of product development and deployment. At a high level these stages include: 1. Determine use case 2. Fine-tune for product 3. Address input- and output-level risks 4. Build transparency and reporting mechanisms in user interactions Responsible LLM product development stages 1 Determine use case An important decision in the development process is which use case(s) to focus on.
2,general_30,1.37,"At various points in the product development lifecycle, developers make decisions that shape the objectives and functionality of the feature, which can introduce potential risks. These decision points also provide opportunities to mitigate potential risks. It is critical that developers examine each layer of the product to determine which potential risks may arise based on the product objectives and design, and implement mitigation strategies accordingly."
3,general_113,1.3724,"For example, a user could select or reject outputs from a list of multiple options. Offering editing capabilities can also enhance a user’s sense of agency over outputs, and developers should consider education flows that can set a user up for success, such as offering prompt suggestions or explanations of how to improve an output."
4,general_12,1.383,community to realize the benefits of this technology.


In [14]:
print(f"Query: {query}\n")

answer = rag.tools.query_reranker(query)
rag.table_report(answer, headers='distance', drop=True)


Query: 4. What are some examples of product-specific fine-tuning for LLMs?



Unnamed: 0,id,cross-encoder_score,source
0,general_52,0.5478,"By training the model on this task- specific dataset, it can learn to predict sentiment in text accurately. These examples showcase how fine-tuning an LLM can be used to specialize the model’s capabilities for specific use cases, improving its performance and making it more suitable for specific applications. The choice of the foundation model and the task- specific dataset plays a crucial role in achieving the desired results."
1,general_30,0.5128,"At various points in the product development lifecycle, developers make decisions that shape the objectives and functionality of the feature, which can introduce potential risks. These decision points also provide opportunities to mitigate potential risks. It is critical that developers examine each layer of the product to determine which potential risks may arise based on the product objectives and design, and implement mitigation strategies accordingly."
2,general_18,0.4918,Decisions to implement best practices should be evaluated based on the jurisdiction where your products will be deployed and should follow your company’s internal legal and risk management processes. How to use this guide This guide is a resource for developers that outlines common approaches to building responsibly at each level of an LLM-powered product. It covers best practices and considerations that developers should evaluate in the context of their specific use case and market.
3,general_41,0.4916,This section outlines the considerations and mitigation strategies available at each stage of product development and deployment. At a high level these stages include: 1. Determine use case 2. Fine-tune for product 3. Address input- and output-level risks 4. Build transparency and reporting mechanisms in user interactions Responsible LLM product development stages 1 Determine use case An important decision in the development process is which use case(s) to focus on.
4,general_40,0.4908,"It may also be necessary to establish new terms of service and policies specific to LLMs, or notify users about how their data or feedback provided will be used in fine-tuning. Development of the foundation model 6 JULY 2023 Developers will identify a specific product use case for the released model, and are responsible for assessing risks associated with that use case and applying best practices to ensure safety."


### Tidy up

In [15]:
rag.remove_embedding()

###### 