# Common NLP tasks using DSPy Signatures and Modules

<img src="images/dspy_img.png" height="35%" width="%65">

### Quick overview DSPY Programming model
DSPy is a framework for optimizing Language Model (LM) prompts and weights in complex systems, especially when using LMs multiple times within a pipeline. 

The process of using LMs without DSPy involves breaking down problems into steps, prompting the LM effectively for each step, adjusting steps to work together, generating synthetic examples for tuning, and finetuning smaller LMs to reduce costs. This process is currently challenging and messy, requiring frequent changes to prompts or finetuning steps whenever the pipeline, LM, or data are altered. 

DSPy addresses these issues by separating program flow from LM parameters and introducing new optimizers that tune LM prompts and/or weights based on a desired metric. DSPy can train powerful models like GPT-3.5 and GPT-4, as well as smaller models such as T5-base or Llama2-13b, to perform more reliably at tasks by optimizing their prompts and weights. 

DSPy optimizers generate custom instructions, few-shot prompts, and weight updates for each LM, creating a new paradigm where LMs and their prompts are treated as optimizable components of a larger learning system. 

In summary, DSPy enables less prompting, higher scores, and a more systematic approach to solving complex tasks using Language Models.

**Summary**: 
1. DSPy is a framework for optimizing LM prompts and weights in complex systems.
2. Using LMs without DSPy requires breaking down problems into steps, prompting effectively, adjusting steps, generating synthetic examples, and finetuning smaller LMs.
3. This process is challenging due to frequent changes needed when altering pipelines, LMs, or data.
4. DSPy separates program flow from LM parameters and introduces new optimizers that tune prompts and weights based on a metric.
5. DSPy can train powerful and smaller models to perform more reliably at tasks by optimizing their prompts and weights.
6. DSPy optimizers generate custom instructions, few-shot prompts, and weight updates for each LM.
7. This new paradigm treats LMs and their prompts as optimizable components of a larger learning system.
8. Less prompting is required with DSPy, leading to higher scores
9. The approach is more systematic and addresses the challenges of using LMs in complex systems.
10. DSPy enables a new way to train and utilize Language Models effectively

### Signature

"A signature is a declarative specification of input/output behavior of a DSPy module. Signatures allow you to tell the LM what it needs to do, rather than specify how we should ask the LM to do it," states the docs.  

A Signature is composition of three fields: 
 * Task description
 * Input
 * Output

<img src="images/dspy_signature.png">

A Signature class abstracts the above and allows you to express your tasks, with its
input and output (response). Internally, the framework converts a Signature class
into a prompt. Declaratively specifying the specs, they define and dictate the behavior of any module we use in DSPy. All Siganture implementation details of 
Signature employed in the this notebook to carry out common NLP tasks are defined in [DSPy Utils file](./dspy_utils.py).

<img src="images/class_based_prompt_creation.png">

Implementation details of all the Signatures for this notebook to carry out
all common NLP tasks is [DSPy Utils file](./dspy_utils.py).

## Natural language processing (NLP) LLM Tasks

The tasks explored in this notebook, using sophiscated DSPy declarative
signatures, show *how-to* code examples for common natural language understanfing capabilites of a generalized LLM, such as ChatGPT, OLlama, Mistral, and Llama 3 series:

 * Text generation or completion
 * Text summarization
 * Text extraction
 * Text classification or sentiment analysis
 * Text categorization
 * Text transformation and translation
 * Simple and complex reasoning

**Note**: 
To run any of these relevant notebooks you will need to install OLlama on the local
latop or use any LLM-hosted provder service: OpenAI, Anyscale Endpoints, Anthropic, etc.


In [26]:
import dspy, json, pickle
import pandas as pd

In [2]:
_file_ = "04_dspy_common_nlp_llm_tasks.ipynb"

In [None]:
from dspy_utils import TextCompletion, SummarizeText, \
    SummarizeTextAndExtractKeyTheme, TranslateText, \
    TextTransformationAndCorrection, TextCorrection, GenerateJSON, \
    ClassifyEmotion, TextCategorizationAndSentimentAnalsysis, \
    TranslateTextToLanguage, SimpleAndComplexReasoning, WordMathProblem, \
    BOLD_BEGIN, BOLD_END

In [None]:
BOLD_BEGIN = "<b>"
BOLD_END = "</b>"


### Setup OLlama environment on the local machine

In [4]:
%env OPENAI_API_KEY=
openai_api_key = %env OPENAI_API_KEY

env: OPENAI_API_KEY=


In [5]:
from columbus_api import Columbus
columbus = Columbus()
llm = columbus.get_llm_for_DSPy("gpt-3.5-turbo", openai_api_key=openai_api_key)
llm.kwargs['max_tokens']=1500

2024-06-15 03:45:47,654 - INFO - Start
2024-06-15 03:45:49,583 - ERROR - Error! can not get Columbus access token. Chech columbus_api.json.
2024-06-15 03:45:49,600 - INFO - Columbus class ready
2024-06-15 03:45:50,513 - ERROR - Columbus access token error: check your evns.
2024-06-15 03:45:50,516 - INFO - get_llm_for_DSPy modelname=gpt-3.5-turbo, apikey=''
2024-06-15 03:45:50,518 - INFO - get_llm_for_DSPy OpenAI


In [6]:
# llm.kwargs['extra_headers']['Authorization'] = f"Bearer {columbus.get_access_token()}"    
llm("Hello World!")

['Hello! How can I assist you today?']

In [7]:
dspy.settings.configure(lm=llm)

## NLP Task 1: Text Generation and Completion
Use class signatures for text completion

In this simple task, we use an LLM to generate text by finishing an incomplete user content provided in the prompt. For example, by providing an incomplete prompt such as "On a cold winter night, the stray dog ...". 

Let's try a few text generation or completion tasks by providing partial prompts in the user content. You will surprised at its fluency and coherency in the generated text.

In [None]:
PROMPTS =  ["On cold winter nights, the wolves in Siberia ...",
                 "On the day Franklin Benjamin realized his passion for printer, ...",
                 "During the final World Cup 1998 when France beat Brazil in Paris, ...",
                 "Issac Newton set under a tree when an apple fell..."
            ]

In [None]:
print("NLP Task 1: Text Generation and Completion")
# Create an instance module Predict with Signature TextCompletion
complete = dspy.Predict(TextCompletion)
# loop over all prompts
for prompt in PROMPTS:
    response = complete(in_text=prompt)
    print(f"{BOLD_BEGIN}Prompt:{BOLD_END}")
    print(prompt)
    print(f"{BOLD_BEGIN}Completion: {BOLD_END}")
    print(response.out_text)
    print("-------------------")

### Inspect the Prompt generated for the LLM

In [None]:
# Print the prompt history
print("Prompt history:")
# print(ollama_mistral.history[0]["prompt"])
print(llm.history[0]["prompt"])
print("-------------------")

## NLP Task 2: Text Summarization
Use Signatures class module for summarization

A common task in natural langauge processing is text summiarization. A common use case
is summarizing large articles or documents, for a quick and easy-to-absorb summaries.

You can instruct LLM to generate the response in a preferable style, and comprehensibility. For example, use simple language aimed for a certain grade level, keep the orginal style of the article, use different sentence sytles (as we have done in few of examples in this notebook and previous one).

Let's try a few examples.

In [None]:
user_prompts = [
    """ The emergence of large language models (LLMs) has marked a significant 
         breakthrough in natural language processing (NLP), leading to remarkable 
         advancements in text understanding and generation. 
         
         Nevertheless, alongside these strides, LLMs exhibit a critical tendency 
         to produce hallucinations, resulting in content that is inconsistent with 
         real-world facts or user inputs. This phenomenon poses substantial challenges 
         to their practical deployment and raises concerns over the reliability of LLMs 
         in real-world scenarios, which attracts increasing attention to detect and 
         mitigate these hallucinations. In this survey, we aim to provide a thorough and 
         in-depth  overview of recent advances in the field of LLM hallucinations. 
         
         We begin with an innovative taxonomy of LLM hallucinations, then delve into the 
         factors contributing to hallucinations. Subsequently, we present a comprehensive
         overview of hallucination detection methods and benchmarks. 
         Additionally, representative approaches designed to mitigate hallucinations 
         are introduced accordingly. 
         
         Finally, we analyze the challenges that highlight the current limitations and 
         formulate open questions, aiming to delineate pathways for future  research on 
         hallucinations in LLMs.""",
    """  Can a Large Language Model (LLM) solve simple abstract reasoning problems?
         We explore this broad question through a systematic analysis of GPT on the 
         Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract 
         reasoning ability from limited examples in which solutions require some 
         "core knowledge" of concepts such as objects, goal states, counting, and 
         basic geometry. GPT-4 solves only 13/50 of the most straightforward ARC 
         tasks when using textual encodings for their two-dimensional input-output grids. 
         Our failure analysis reveals that GPT-4's capacity to identify objects and 
         reason about them is significantly influenced by the sequential nature of 
         the text that represents an object within a text encoding of a task. 
         To test this hypothesis, we design a new benchmark, the 1D-ARC, which 
         consists of one-dimensional (array-like) tasks that are more conducive 
         to GPT-based reasoning, and where it indeed performs better than on 
         the (2D) ARC. To alleviate this issue, we propose an object-based 
         representation that is obtained through an external tool, resulting in 
         nearly doubling the performance on solved ARC tasks and near-perfect scores 
         on the easier 1D-ARC. Although the state-of-the-art GPT-4 is unable to 
         "reason" perfectly within non-language domains such as the 1D-ARC or a 
         simple ARC subset, our study reveals that the use of object-based representations 
         can significantly improve its reasoning ability. Visualizations, GPT logs, and 
         data are available at¬†this https URL.""",
    """ DSPy is a framework for optimizing Language Model (LM) prompts and weights in 
        complex systems, especially when using LMs multiple times within a pipeline. 
        The process of using LMs without DSPy involves breaking down problems into steps, 
        prompting the LM effectively for each step, adjusting steps to work together, 
        generating synthetic examples for tuning, and finetuning smaller LMs to reduce costs.
        This process is currently challenging and messy, requiring frequent changes to prompts
        or finetuning steps whenever the pipeline, LM, or data are altered. DSPy addresses 
        these issues by separating program flow from LM parameters and introducing new 
        optimizers that tune LM prompts and/or weights based on a desired metric. 
        DSPy can train powerful models like GPT-3.5 and GPT-4, as well as smaller 
        models such as T5-base or Llama2-13b, to perform more reliably at tasks 
        by optimizing their prompts and weights. DSPy optimizers generate custom 
        instructions, few-shot prompts, and weight updates for each LM, creating a 
        new paradigm where LMs and their prompts are treated as optimizable components
        of a larger learning system. In summary, DSPy enables less prompting, higher 
        scores, and a more systematic approach to solving complex tasks using 
        Language Models.
    """
]

In [None]:
print("NLP Task 2: Text Summarization")
# Create an instance module Predict with Signature SummarizeText
summarize = dspy.Predict(SummarizeText)
for prompt in user_prompts:
    print(f"{BOLD_BEGIN}Summarization of text response:{BOLD_END}")
    response = summarize(text=prompt)
    print(response.summary)
    print("-------------------")

## NLP Task 3: Text Summarization and Key Theme Extraction
Use Signature class module for summarization and key theme extraction

Another natural langauge capability, similar to summarization or text completion, is extracting key idea or infromation from an article, blog, or a paragraph. For example,
given a set of text, you can ask LLM to extract key ideas or topics or subjects. Or even
better enumerate key takeways for you, saving time if you are in a hurry.

### Task
 * Given a passage from an article, extract the main theme of the passage and label it as the `Subjects`, if more than one, separated by comma.
 * Identify three key takeways and enumerate them in simple sentences

In [None]:
SUMMARY_THEME = """Isaac Newton sat under a tree when an apple fell, an event that, 
                according to popular legend, led to his contemplation of the forces
                of gravity. Although this story is often regarded as apocryphal or at 
                least exaggerated, it serves as a powerful symbol of Newton's insight 
                into the universal law that governs celestial and earthly bodies alike. 
                His formulation of the law of universal gravitation was revolutionary, 
                as it provided a mathematical explanation for both the motion of planets 
                and the phenomena observed on Earth. Newton's work in physics, captured 
                in his seminal work Philosophi√¶ Naturalis Principia Mathematica, laid the 
                groundwork for classical mechanics. His influence extended beyond his own 
                time, shaping the course of scientific inquiry for centuries to come.
                """

In [None]:
print("NLP Task 3: Text Summarization and Key Theme Extraction")
summarize_theme = dspy.Predict(SummarizeTextAndExtractKeyTheme)
print(f"{BOLD_BEGIN}Summarization:{BOLD_END}")
response = summarize_theme(text=SUMMARY_THEME)
print(response.summary)
print(f"{BOLD_BEGIN}Key Themes:{BOLD_END}")
response.key_themes = response.key_themes.split("\n")
print(response.key_themes)
print(f"{BOLD_BEGIN}Takeaways:{BOLD_END}")
print(response.takeaways)
print("-------------------")

### Task
Let's try another example to extract more than one subject or topic being
discussed in the text, and enumerate three takeways.

(Incidentally, I'm reading biography of Benjamin Franklin by Issac Stevenson, and all this seems to align with his career path and passion.)

In [None]:
user_stories = [""""
'The Printer'
    He that has a Trade has an Office of Profit and Honour‚Äô Poor Richard‚Äôs Almanack
Benjamin Franklin had an affinity with print and books throughout his life. 
Apprenticed as a child to his brother James, a printer, he mastered all aspects of
the trade, from typesetting to engraving, learning the latest techniques during his
first visit to London.  An avid reader, Franklin saved money to buy books by 
temporarily turning vegetarian and, once settled in Philadelphia, founded the 
Library Company, the first subscription library in the colonies.  As an elder
statesman, he even bought type and kept a press during his stay in France. 
After working as a printer‚Äôs journeyman, he set up his own Philadelphian printing 
office in 1728.  His success with the Pennslyannia Gazette and Poor Richard‚Äôs
Almanack helped to provide Franklin with the financial means to retire from
business, retaining a stake in his print shop and founding others throughout the 
colonies.  Print also gave him a public voice: Franklin preferred the printed word, 
rather than public rhetoric, influencing political and public opinion as a brilliant
journalist and pamphleteer.

'Silence Dogood and the New¬≠England Courant'
    When James Franklin lost the contract to print the Boston Gazette, he determined
to begin his own newspaper, launching the New¬≠England Courant in 1721.
Benjamin, who had been indentured secretly to James, helped to print the weekly 
paper.  One night he slipped a composition under the door, beginning the series
of ‚ÄòSilence Dogood‚Äô letters, the purported epistles of a vocal widower, with strong 
opinions on drunks, clergymen, foolish fashions and Boston nightlife. Owing no
little debt to the satire of the London Spectator, the letters represented a 
remarkable literary achievement for the 16¬≠year old.  The British Library‚Äôs copy has 
been uniquely annotated in what appears to be Franklin‚Äôs hand. The first 
‚ÄòDogood‚Äô letter appears on the bottom right.

‚ÄòThe Main Design of the Weekly Paper will be to Entertain the Town‚Äô
    Benjamin‚Äôs brother, James, began the New¬≠England Courant in the face of
opposition from the Boston Establishment.  He soon irritated them with his squibs
and satires on the great and the good, attacking the influential clergyman Cotton
Mather‚Äôs pet project of small pox inoculation and the authorities‚Äô weak response 
to piracy. Twice arrested, James temporally left the paper in Benjamin‚Äôs hands, and 
then continued to publish it under Benjamin‚Äôs name to escape a ban on
publication.  This issue is the first printed item to carry the imprint ‚ÄòB. Franklin‚Äô (on
the rear).  Franklin announces his intention to ‚ÄòEntertain the Town‚Äô on this page.
"""]

In [None]:
print("NLP Task 3: Text Summarization and Key Theme Extraction")

# Iterate over stories
for story in user_stories:
    summarize_theme = dspy.Predict(SummarizeTextAndExtractKeyTheme)
    print(f"{BOLD_BEGIN}Summarization:{BOLD_END}")
    response = summarize_theme(text=story)
    print(response.summary)
    print(f"{BOLD_BEGIN}Key Themes:{BOLD_END}")
    response.key_themes = response.key_themes.split("\n")
    print(response.key_themes)
    print(f"{BOLD_BEGIN}Takeaways:{BOLD_END}")
    print(response.takeaways)
    print("-------------------")

## NLP Task 4: Text classification or sentiment analysis

Unlike classical or traditional machine learning, where you'll have to do supervised learning to collect data, label it, and train for hours, depending on how much data,classifying text using LLM is simple.

In short, you'll have to build an ML model to understand text and classify its sentiments as positive, negative or neutral. 

This onus task is easily done with LLM via clever prompting. 

Let's see what I mean in this *how-to* idenfity sentiments in text. But first let's 
generatre some sentiments as our ground truth, and supply them to LLM to observe if
LLM identifies them correctly. This bit is not needed, for I'm just curious.

*Positive*: "This movie is a true cinematic gem, blending an engaging plot with superb performances and stunning visuals. A masterpiece that leaves a lasting impression."

*Negative*: "Regrettably, the film failed to live up to expectations, with a convoluted storyline, lackluster acting, and uninspiring cinematography. A disappointment overall."

*Neutral*: "The movie had its moments, offering a decent storyline and average performances. While not groundbreaking, it provided an enjoyable viewing experience."

*Positive*: "This city is a vibrant tapestry of culture, with friendly locals, historic landmarks, and a lively atmosphere. An ideal destination for cultural exploration."

*Negative*: "The city's charm is overshadowed by traffic congestion, high pollution levels, and a lack of cleanliness. Not recommended for a peaceful retreat."

*Neutral*: "The city offers a mix of experiences, from bustling markets to serene parks. An interesting but not extraordinary destination for exploration."

*Positive*: "This song is a musical masterpiece, enchanting listeners with its soulful lyrics, mesmerizing melody, and exceptional vocals. A timeless classic."

*Negative*: "The song fails to impress, featuring uninspiring lyrics, a forgettable melody, and lackluster vocals. It lacks the creativity to leave a lasting impact."

*Neutral*: "The song is decent, with a catchy tune and average lyrics. While enjoyable, it doesn't stand out in the vast landscape of music."

*Positive*: "A delightful cinematic experience that seamlessly weaves together a compelling narrative, strong character development, and breathtaking visuals."

*Negative*: "This film, unfortunately, falls short with a disjointed plot, subpar performances, and a lack of coherence. A disappointing viewing experience."

*Neutral*: "While not groundbreaking, the movie offers a decent storyline and competent performances, providing an overall satisfactory viewing experience."

*Positive*: "This city is a haven for culture enthusiasts, boasting historical landmarks, a rich culinary scene, and a welcoming community. A must-visit destination."

*Negative*: "The city's appeal is tarnished by overcrowded streets, noise pollution, and a lack of urban planning. Not recommended for a tranquil getaway."

*Neutral*: "The city offers a diverse range of experiences, from bustling markets to serene parks. An intriguing destination for those seeking a mix of urban and natural landscapes."

In [None]:
user_sentiments = [ "This movie is a true cinematic gem, blending an engaging plot with superb performances and stunning visuals. A masterpiece that leaves a lasting impression.",
                    "Regrettably, the film failed to live up to expectations, with a convoluted storyline, lackluster acting, and uninspiring cinematography. A disappointment overall.",
                    "The movie had its moments, offering a decent storyline and average performances. While not groundbreaking, it provided an enjoyable viewing experience.",
                    "This city is a vibrant tapestry of culture, with friendly locals, historic landmarks, and a lively atmosphere. An ideal destination for cultural exploration.",
                    "The city's charm is overshadowed by traffic congestion, high pollution levels, and a lack of cleanliness. Not recommended for a peaceful retreat.",
                    "The city offers a mix of experiences, from bustling markets to serene parks. An interesting but not extraordinary destination for exploration.",
                    "This song is a musical masterpiece, enchanting listeners with its soulful lyrics, mesmerizing melody, and exceptional vocals. A timeless classic.",
                    "The song fails to impress, featuring uninspiring lyrics, a forgettable melody, and lackluster vocals. It lacks the creativity to leave a lasting impact.",
                    "The song is decent, with a catchy tune and average lyrics. While enjoyable, it doesn't stand out in the vast landscape of music.",
                    "A delightful cinematic experience that seamlessly weaves together a compelling narrative, strong character development, and breathtaking visuals.",
                    "This film, unfortunately, falls short with a disjointed plot, subpar performances, and a lack of coherence. A disappointing viewing experience.",
                    "While not groundbreaking, the movie offers a decent storyline and competent performances, providing an overall satisfactory viewing experience.",
                    "This city is a haven for culture enthusiasts, boasting historical landmarks, a rich culinary scene, and a welcoming community. A must-visit destination.",
                    "The city's appeal is tarnished by overcrowded streets, noise pollution, and a lack of urban planning. Not recommended for a tranquil getaway.",
                    "The city offers a diverse range of experiences, from bustling markets to serene parks. An intriguing destination for those seeking a mix of urban and natural landscapes.",
                    "xxxyyyzzz was curious and dubious"
]

In [None]:
# Create an instance of ClassifyEmotion signature class
# module
print("NLP Task 4: Text classification or sentiment analysis")
classify = dspy.Predict(ClassifyEmotion)

# Iterate over list of sentiments
for sentiment in user_sentiments:
    print(f"\n{BOLD_BEGIN}Sentiment:{BOLD_END} {sentiment}")
    response = classify(sentence=sentiment)
    print(f"\n{BOLD_BEGIN}Label    :{BOLD_END}")
    print(response.sentiment)
    print("---" * 10)
    

## NLP Task 5:  Text categorization
Like sentiment analysis, given a query, an LLM can identify from its context how to classify and route customer queries to respective departments. Also, note that LLM can detect foul language and respond politely. Text categorization can be employed to automate customer on-line queries.

Let's look at how we can achieve that with DSPy without smart prompting.

In [None]:
customer_queries = ["""My modem has stop working. I tried to restart but the orange light keep flashing. It never turns green.""",
                    """I just moved into town, and I need Internet service""",
                    """Why does my bill include an extra $20 a month for cable TV when I don't use a television?""",
                    """I need to change my user name and password since someone is using my credentials. I cannot access my account.""",
                    """What days this week are we having a general upgrades to the cable models?""",
                    """What day is the best day to call customer service so that I can avoid talking to a bot!""",
                    """Your company is full of incompetent morons and fools!""",
                    """I hate your worthless services. Cancel my stupid account or else I'll sue you!"""
                   ]
                    

In [None]:
# Create an instance of TextCategorizationAndSentimentAnalsysis 
# signature class module
print("NLP Task 4: Text categorization and sentiment analysis of the user queries")
categorize = dspy.Predict(TextCategorizationAndSentimentAnalsysis)
for query in customer_queries:
    response = categorize(text=query)
    print(f"{BOLD_BEGIN}Query   :{BOLD_END} {query}")
    print(f"{BOLD_BEGIN}Route to:{BOLD_END} {response.category}")
    print(f"{BOLD_BEGIN}Sentiment:{BOLD_END} {response.sentiment}")
    print("-----" * 10)

## NLP Task 6: Text tranlsation and transformation

Language translation by far is the most common use case for natural language processing. 
We have seen its early uses in Google translation, but with the emergence of multi-lingual LLMs, this task is simply achieved by exact prompting. 

In this section, we'll explore tasks in how to use LLMs for text translations, langugage identication, text transformation, spelling and grammar checking, tone adjustment, and format conversion.

### Task 1:
 * Given an English text, translate into French, Spanish, and German.
 * Given a foreign language text, idenfify the language, and translate to English.

In [None]:
english_texts = [""" Welcome to New York for the United Nations General Council Meeting. Today
is a special day for us to celeberate all our achievments since this global institute's formation.
But more importantly, we want to address how we can mitigate global conflict with conversation
and promote deterence, detente, and discussion."""
]

In [None]:
print("NLP Task 4: Text Translation and Transliteration")
translate = dspy.Predict(TranslateText)
for text in english_texts: 
    response = translate(text=text)
    print(f"{BOLD_BEGIN}Language Text:{BOLD_END} {response.language}")
    print(f"{BOLD_BEGIN}Translated Text:{BOLD_END}")
    print(response.translated_text)
    print("---" * 10)

Given a foreing language, identify the language and translate into English.

This is the reverse of the above.

In [None]:
languages_texts = ["""Bienvenidos a Nueva York para la Reuni√≥n del Consejo General de las Naciones Unidas. Hoy
es un d√≠a especial para celebrar todos nuestros logros desde la formaci√≥n de este instituto global.
Pero m√°s importante a√∫n, queremos abordar c√≥mo podemos mitigar el conflicto global con conversaciones
y promover la disuasi√≥n, la distensi√≥n y el di√°logo.""",
            """Willkommen in New York zur Sitzung des Allgemeinen Rates der Vereinten Nationen. Heute
ist ein besonderer Tag f√ºr uns, um all unsere Errungenschaften seit der Gr√ºndung dieses globalen Instituts zu feiern.
Aber wichtiger ist, dass wir ansprechen m√∂chten, wie wir globale Konflikte durch Gespr√§che mildern k√∂nnen
und Abschreckung, Entspannung und Diskussion f√∂rdern.""",
                  """Bienvenue √† New York pour la r√©union du Conseil G√©n√©ral des Nations Unies. Aujourd'hui,
c'est un jour sp√©cial pour nous pour c√©l√©brer toutes nos r√©alisations depuis la formation de cette institution mondiale.
Mais plus important encore, nous voulons aborder comment nous pouvons att√©nuer les conflits mondiaux gr√¢ce √† la conversation
et promouvoir la dissuasion, la d√©tente et la discussion.""",
                  """Ê¨¢ËøéÊù•Âà∞Á∫ΩÁ∫¶ÂèÇÂä†ËÅîÂêàÂõΩÂ§ß‰ºöËÆÆ„ÄÇ‰ªäÂ§©ÂØπÊàë‰ª¨Êù•ËØ¥ÊòØ‰∏Ä‰∏™ÁâπÂà´ÁöÑÊó•Â≠êÔºåÊàë‰ª¨Â∞ÜÂ∫ÜÁ•ùËá™ËØ•ÂÖ®ÁêÉÊú∫ÊûÑÊàêÁ´ã‰ª•Êù•ÂèñÂæóÁöÑÊâÄÊúâÊàêÂ∞±„ÄÇ‰ΩÜÊõ¥ÈáçË¶ÅÁöÑÊòØÔºåÊàë‰ª¨ÊÉ≥Ë¶ÅËÆ®ËÆ∫Â¶Ç‰ΩïÈÄöËøáÂØπËØùÊù•ÁºìËß£ÂÖ®ÁêÉÂÜ≤Á™ÅÔºåÂπ∂‰øÉËøõÈÅèÂà∂„ÄÅÁºìÂíåÂíåËÆ®ËÆ∫„ÄÇ
"""]


In [None]:
print("NLP Task 4: Text Translation and Transliteration")
translate = dspy.Predict(TranslateTextToLanguage)
for text in languages_texts:
    response = translate(text=text)
    print(f"{BOLD_BEGIN}Language Text:{BOLD_END} {response.language}")
    print(f"{BOLD_BEGIN}Translated Text:{BOLD_END}")
    print(response.translated_text)
    print("-------------------")

### Task 2

 * Text Correction for Grammatical Errors
 * Given an English text, proof read it and correct any grammatical and usage errors.
 * Given a Pirate text, correct its tone to standard English.


In [None]:
bad_english_texts = ["""I don't know nothing about them big words and grammar rules. Me and my friend, we was talking, and he don't agree with me. We ain't never gonna figure it out, I reckon. His dog don't listen good, always running around and don't come when you call.""",
                     """Yesterday, we was at the park, and them kids was playing. She don't like the way how they acted, but I don't got no problem with it. We seen a movie last night, and it was good, but my sister, she don't seen it yet. Them books on the shelf, they ain't interesting to me.""",
                     """Arrr matey! I be knowin' nuthin' 'bout them fancy words and grammatical rules. Me and me heartie, we be chattin', and he don't be agreein' with me. We ain't never gonna figure it out, I reckon. His scallywag of a dog don't be listenin' well, always runnin' around and not comin' when ye call."""
                    ]

In [None]:
print("NLP Task 6: Text Correction for Grammatical Errors")
correct = dspy.Predict(TextTransformationAndCorrection)
for bad_text in bad_english_texts:
    response = correct(text=bad_text)
    print(f"{BOLD_BEGIN}Incorrect Text:{BOLD_END}")
    print(bad_text)
    print(f"{BOLD_BEGIN}Corrected Text:{BOLD_END}")
    print(response.corrected_text)
    print("-------------------")

## NLP Task 7: Generate JSON Output
* Given some text in a particular format, convert it into JSON format.
* For example, we LLM to producce names of five top shoes, but we want them it product and its items in JSON format. This JSON format can be fed downstream into another application that may process it.

Let's have go at it.


In [None]:
# NLP Task 7: Generate JSON Output
# Use class signatures for JSON output generation
print("NLP Task 7: Generate JSON Output")
generate_json = dspy.Predict(GenerateJSON)
response = generate_json()
print(f"{BOLD_BEGIN}Generated JSON Output:{BOLD_END}")
print(response.json_text)
print("-------------------")

## NLP Task 8: Simple and complex reasoning 

An import characteristic of LLM is that it's not only general respository of compressed parametric-knowledge garned from large corpus of text, but can be employed as a simple and complex reasoning engine. 

With use of precise DSPy signature, you can instruct LLM to think trough a problem in a step by step fashion.

Let's look at some tasks as examples.
 * **Task 1**: given a list of numbers identify the prime numbers, add the prime numbers and check if the sum is even or odd.
 * **Task 2**: given an hourly rate of wages, compute your yearly income if you work 30 hours a week

#### Task 1: simple math problem to identify prime numbers

In [None]:
# NLP Task 8: Simple and Complex Reasoning
# Use class signatures for simple and complex reasoning
print("NLP Task 8: Simple and Complex Reasoning")
reasoning = dspy.Predict(SimpleAndComplexReasoning)
response = reasoning(numbers=[4, 8, 9, 11, 13, 17, 19, 23, 24, 29, 31, 37, 41, 42, 43, 47, 53, 59, 61, 67, 71, 74])
print(f"{BOLD_BEGIN}Prime numbers:{BOLD_END} {response.prime_numbers}")
print(f"{BOLD_BEGIN}Sum of Prime numbers:{BOLD_END} {response.sum_of_prime_numbers}")
print(f"{BOLD_BEGIN}Sum is :{BOLD_END} {response.sum_is_even_or_odd }")
print(f"{BOLD_BEGIN}Reasoning:{BOLD_END}")
print(response.reasoning)
print("-------------------")

#### Task 2: simple math word problem


In [None]:
MATH_PROBLEM = """
    If my hourly rate is $117.79 per hour and I work 30 hours a week, 
    what is my yearly income?"
"""

In [None]:
# NLP Task 9: Word Math Problem
# Use class signatures for word math problem
print("NLP Task 9: Word Math Problem")
word_math = dspy.Predict(WordMathProblem)
response = word_math(problem=MATH_PROBLEM)
print(f"{BOLD_BEGIN}Word Math Problem:{BOLD_END}")
print(MATH_PROBLEM)
print(f"{BOLD_BEGIN}Explanation:{BOLD_END}")
print(response.explanation)
print("-------------------")

## All this is amazing! üòú Feel the wizardy, declarative DSPy power üßô‚Äç‚ôÄÔ∏è

# Hubble2 root cause evaluator v4

In [14]:
class RootCauseEvaluation(dspy.Signature):
    """
    You are tasked with evaluating technical descriptions written by 
    engineers whose first language is not English. The goal is to 
    determine if their documentation adequately and succinctly explains 
    the root cause of technical issues in notebook computer design and 
    manufacturing to a quality of 'yes' or 'no'.

    Following are explanations of jargons and abbreviations that you 
    will see:

    List of Abbreviations
    ---------------------
    Note: A leading * specifies the standard form of the term.
    1. win10, W10, *Windows 10.
    1. win11, W11, *Windows 11.
    1. bt, *bluetooth.
    1. bsod, blue screen.
    1. YB, Yello wmark, Yellow exlamation, yellow band, *yellow bang.
    1. PM, power management.
    1. ME, CSME, Management Engine.
    1. CND, Can not dupliate, Can not reproduce the bug.
    1. WNF, Will not fix.
    1. VNP, Verified Not Presented, formally confirmed can not reproduce the bug.
    1. EB, Expected Behavior.
    1. WAD, Working as Designed.
    1. ATS process, Approved To Ship process. It means that the bug has been accepted.
    1. frozen, Hard hang, dead, hangup, hang up, freeze.
    1. MS, s0i3, *Modern standby
    1. kb, keyboard.
    1. type-c, typec, *USB Type-C.
    1. TC, QT TC, *Test Case.
    1. fp, finger print, finger printer, *fingerprint.
    1. CB, cold boot, s5.
    1. WB, warm boot.
    1. WU, Windows Update.
    1. RTS, Ready to shipment.
    1. wifi, wireless, wlan.
    1. DFU, Dock Firmware Upgrade.
    1. QT, Quality Test department.
    1. CE, Component Engineering department.
    1. ES, CPU Engineering Sample.
    1. QS, CPU Qualification Sample.
    1. EVT or LAB, Laboratory stage of a project.
    1. DVT or ENG, Engineering stage of a project, after LAB. Also DVT2, DVT3, etc.
    1. PVT, Project stage that occurs after DVT and before MP.
    1. MP: The math production stage of a project.

    General rules 
    -------------
    Do not accept cases listed below, their quality are 'no'. 
    1. Human error and poor assembly.
    1. CND, Can not dupliate, Can not reproduce the bug.
    1. WNF, Will not fix.
    1. VNP, Verified Not Presented.
    1. EB, Expected Behavior.
    1. WAD, Working as Designed.
    1. Limitations.
    1. Incorrect steps and improper settings.
    1. Duplicate/duplicated.
    1. Already resolved; refer to elsewhere for reference.
    1. Not yet developed; not yet implemented.
    1. Not a bug but a tracking issue.
    1. Clarify/specify spec (specification).
    1. Functionality not supported.
    1. Explain and test unrelated adverse phenomena/phenomenon that are unrelated to the explanation and testing purpose(s).
    1. Clarify behavior. 
    1. Reference to other bug ID is NOT valuable, we want the info right here.
    1. Only mention BIOS, driver or app version that fixed the bug wihout explain the root cause.
    """
    text        = dspy.InputField()
    summary     = dspy.OutputField(desc = "synonyms and abbreviation expansion")
    check_rules = dspy.OutputField(desc = "check the summary with general rules, no match is allowed")
    component   = dspy.OutputField(desc = "indicate the component in trouble or N/A if not mentioned")
    cause       = dspy.OutputField(desc = "if component exists then explain the mistake happened on the component that caused the bug, otherwise N/A")
    conclusion  = dspy.OutputField(desc = "conclude the quality. cause must not N/A and no general rule violation. Ask yourself: so what is the root cause?")
    output      = dspy.OutputField(desc = 'JSON key-value pairs e.g. {"quality":"yes"} or {"quality":"no"}')    


In [28]:
pred = dspy.Predict(RootCauseEvaluation)
pred

Predict(RootCauseEvaluation(text -> summary, check_rules, component, cause, conclusion, output
    instructions="\n    You are tasked with evaluating technical descriptions written by \n    engineers whose first language is not English. The goal is to \n    determine if their documentation adequately and succinctly explains \n    the root cause of technical issues in notebook computer design and \n    manufacturing to a quality of 'yes' or 'no'.\n\n    Following are explanations of jargons and abbreviations that you \n    will see:\n\n    List of Abbreviations\n    ---------------------\n    Note: A leading * specifies the standard form of the term.\n    1. win10, W10, *Windows 10.\n    1. win11, W11, *Windows 11.\n    1. bt, *bluetooth.\n    1. bsod, blue screen.\n    1. YB, Yello wmark, Yellow exlamation, yellow band, *yellow bang.\n    1. PM, power management.\n    1. ME, CSME, Management Engine.\n    1. CND, Can not dupliate, Can not reproduce the bug.\n    1. WNF, Will not fix.\n 

In [29]:
%%time
response = pred(text="The sample(s) which weren't scheduled for presentation and it causes a number of frames being dropped.")
response

CPU times: total: 0 ns
Wall time: 956 ¬µs


Prediction(
    summary='Samples not scheduled for presentation causing dropped frames',
    check_rules='VNP, Expected Behavior',
    component='N/A',
    cause='N/A',
    conclusion='The explanation does not violate any general rules, but it does not provide a clear root cause of the technical issue.',
    output='{"quality":"no"}'
)

In [None]:
root_cause_samples = [(0, "this is EB/WAD. the 3.5mm input on WL7024 only supports audio listening.","no"),
    ( 1, "The sample(s) which weren't scheduled for presentation and it causes a number of frames being dropped.","no"),
    ( 2, "OS UI issue and fixed on 26074.","yes"),
    ( 3, "Please verify latest BIOS 1.5.1 by enable preOS bluetooth\r\n","yes"),
    ( 4, "RC: Battery service on headset is not implemented.(This solution will be added to next formal WL3024 headset FW v2.5)\r\n\r\nNewer WL3024 headset FW will be ready on Jan 25th 2024 , web post/DPeM ready on Jan 25th 2024 via DPeM app.\r\nHeadset fw version v2.5\r\nEnd user can to upgrade headset FW via DPeM app/standalone updater to update newer headset FW.","no"),
    ( 5, "This looks to be an Intel issue. Information shared with Dawid Kwiatkowski at Intel on how to address this issue in the Intel USB sideband audio driver.","yes"),
    ( 6, "Please verify on Sign BIOS + fused EC on BIOS 0.2.2\r\nUse chipsec-1.12.4-p0072_20231123.7zto have a retest on DVT2 + closemnf machine + BIOS 0.2.3 or later.\r\nnew cmd update","no"),
    ( 7, "Driver WU is in progress and not completed yet. MDAVT tool will check WU and make sure driver WU is ready on server.\r\n\r\nNvidia GFX is new part and not RTS yet.\r\nNvidia will submit WU after embargo date. Target on 3/5.\r\n","no"),
    ( 8, "Plan cut in BIOS 1.2.0","no"),
    ( 9, "By design.","no"),
    (10, "03/01/2024, Wistron BIOS Tony, Animation logo solution modify InitWelcomePage() location when hotkey press, it cause non-function key message be skip.","no"),
    (11, "Plan cut in BIOS 1.2.0","no"),
    (12, "03/01/2024, Wistron BIOS Tony, Animation logo solution modify InitWelcomePage() location when hotkey press, it cause non-function key message be skip.","no"),
    (13, "MSFT confirm It is a new design in 24H2.\r\nHello won't be triggered when user lock or logged out. Since in the most time, user won't sign-in back immediately. You could turn off display and turn on again or press any key to enter LogonUI to trigger Hello. Close it as by design.","no"),
    (14, "Plan cut in BIOS 1.2.0","no"),
    (15, "It is a new design in 24H2.\r\nHello won't be triggered when user lock or logged out.\r\nSince in the most time, user won't sign-in back immediately.\r\nYou could turn off display and turn on again or press any key to enter LogonUI to trigger Hello.","no"),
    (16, "DO team feedback: This is the current design. DO app doesn't support resize.\r\nWe'll work with DO EDG to see if we can enhance it in the future release.\r\n","no"),
    (17, "The sample(s) which weren't scheduled for presentation and it causes a number of frames being dropped. ","no"),
    (18, "EC already cut in the CR on formal EC.\r\nPlease refer to the ticket, CEP-12069 for more details","no"),
    (19, "RD and QT manual test with PCI dump and compare pass.","no"),
    (20, "N/A","no"),
    (21, "Tracking CPSE-20383, after using the latest test scripts and tool, it can get positive results. The root cause is test scripts have fixed the failed items.","no"),
    (22, "Per power team requested, implement code into BIOS 0.3.12 to fix DDE system error","no"),
    (23, "Implemented in BIOS 0.3.8.","no"),
    (24, "After adding CBDCIT-2269 code, DiabloMtl can get positive results. This code change will be add in BIOS 1.3.0.\r\nPlease use BIOS 1.3.0 to verify this issue.","no"),
    (25, "Driver list v32 implement","no"),
    (26, "Fixed in BIOS 0.3.4","no"),
    (27, "Add in BIOS 0.3.2\r\n\r\nSHA-1: 6945f9e56e509a897d7b669916515a51daf0f965\r\n\r\n* DiabloMtl: PIMS-238364 Follow NV define sequence to fine tune S4/CB sequence\r\n","no"),
    (28, "Cancel the CR due to 50Mhz solution has been added.","no"),
    (29, "Fixed in BIOS 99.01.03\r\n\r\nSHA-1: f39b421d8ff1b4c1edd577f21795245011106683\r\n* DiabloMtl: [CR][PIMS-219502] Update XML for TBT setting\r\n","no"),
    (30, "Base on Intel command to add EC WA as short term to cover the issue.WA detail : First time reference SLP_A# if level is Low then drop WLAN 3.3v power directly,If level is High then read againg after 300ms. If SLP_A# still High(AMT user) then keep WLAN 3.3 power when read SLP_A# second times.","no"),
    (31, "TP vendor has been fixed TP test tool.","no"),
    (32, "RD side got pass result on DVT2 system","no"),
    (33, "Already add this code in BIOS 0.2.4 (DVT2 SMT).","no"),
    (34, "Already add this code in BIOS 0.2.4 (DVT2 SMT).","no"),
    (35, "DTD owner plan to fix in next version v6.1 launching on Feb 06, 2024.","no"),
    (36, "DPeM 1.7.2 ","no"),
    (37, "Issue related to DA305 FW update. Going to propose ATS.","no"),
    (38, "Monitor FW M2T105 has been release on Website on 2/28.","no"),
    (39, "DA305 FW changed","no"),
    (40, "Refer Dell AgS team confirm,\r\nThis is Expected Behavior, because the system is scanning/pairing the devices execution is taking place in background.\u00a0\r\nPlease close this issue.","no"),
    (41, "Import Bluetooth solution in formal BIOS 1.3.1\r\n\r\nPlease verify this issue with formal BIOS 1.3.1","no"),
    (42, "CEP-10467\r\n[PIMS-214776][Scorpio] Charger AC PROCHOT setting","no"),
    (43, "CEP-12798\r\n[PIMS-257508][Scorpio] cTGP and D-notify common issue","no"),
    (44, "CEP-10655\r\n[PIMS-216784][Scorpio] Enable CATERR feature","no"),
    (45, "CEP-12434\r\n[PIMS-250136][Scorpio] Revert PCR e-diag beep sound by EC","no"),
    (46, "Update into latest TI PD Patch","no"),
    (47, "CEP-12687\r\n[PIMS-255218][Scorpio] Release Failsafe EC for A-rev","no"),
    (48, "CEP-12503\r\n[PIMS-251889][Scorpio] Re-build failsafe EC base on latest formal EC","no"),
    (49, "Need add P-MOSFET to correct for EC_ACAV_IN_N behavior change.\r\nMSFT Bug#919303: tracking MSFT behavior.\r\n","no"),
    (50, "CEP-11552\r\n[PIMS-229873][Scorpio 16] Re-build failsafe EC base on new version","no"),
    (51, "CEP-11184\r\n[PIMS-225230][Scorpio] Release first version of failsafe EC","no"),
    (52, "CEP-11235\r\n[PIMS-225652][Scorpio] Modify HwQuery list maximum size","no"),
    (53, "Add filed colume to meet Narrator requirement.","no"),
    (54, "MEFW update between 1.0.0 & 1.2.0","no"),
    (55, "Monitor FW issue.","no"),
    (56, "CEP-13326","no"),
    (57, "3DMark application timeout and caused application stopped.\r\nPort Royal v1.3.1.1 fixed some issues where the test would sometimes fail without presenting an error.","no"),
    (58, "Please verify on Sign BIOS + fused EC on BIOS 0.2.2\r\nUse chipsec-1.12.4-p0072_20231123.7zto have a retest on DVT2 + closemnf machine + BIOS 0.2.3 or later.\r\nnew cmd update","no"),
    (59, "RC: Battery service on headset is not implemented.(This solution will be added to next formal WL3024 headset FW v2.5)\r\n\r\nNewer WL3024 headset FW will be ready on Jan 25th 2024 , web post/DPeM ready on Jan 25th 2024 via DPeM app.\r\nHeadset fw version v2.5\r\nEnd user can to upgrade headset FW via DPeM app/standalone updater to update newer headset FW.","no"),
    (60, "This looks to be an Intel issue. Information shared with Dawid Kwiatkowski at Intel on how to address this issue in the Intel USB sideband audio driver.","no"),
    (61, "Please verify latest BIOS 1.5.1 by enable preOS bluetooth\r\n","no"),
    (62, "Please verify latest BIOS 1.5.1 by enable preOS bluetooth\r\n","no"),
    (63, "New firmware with improvement in boomless mic.\r\nthe mic path takes around 1-2sec upon activation. this is common behavior also seen in a number of headsets such as Jabra Evolve 2 85","yes"),
    (64, "Provided new Firmware with the issue fixed, and has been verified","yes"),
    (65, "previous input gain for 3.5mm was set too high, this has been corrected in new FW and verified fixed","yes"),
    (66, "HW change to delay power cut to device.","yes"),
    (67, "Duplicate issue of PIMS-220905 about CATERR(1A8W).","yes"),
    (68, "Duplicate issue of PIMS-220905 about CATERR(1A8W).","yes"),
    (69, "Intel analyze log and find OS does not send \"Set brightness\" event to GFX driver.\r\nMSFT had comment as SV3 known issue that service stuck at wrong level, and cannot increase the brightness via hotkey.\r\nThe feature team planning fix the issue on 2024 2D, propose ATS-P2. MSFT bug#926869\r\n","yes"),
    (70, "Intel analyze log and find OS does not send \"Set brightness\" event to GFX driver.\r\nMSFT had comment as SV3 known issue that service stuck at wrong level, and cannot increase the brightness via hotkey.\r\nThe feature team planning fix the issue on 2024 2D, propose ATS-P2. MSFT bug#926869\r\n","yes"),
    (71, "For the TAS(Time average SAR) feature, an OS timer will be activated, which will cause problems in FW's calculation of CPU time compensation confirmation. As a result, when FW confirms CPU time compensation after PG comes back, it thinks that the time to be compensated is wrong and triggers an assertion.","yes"),
    (72, "Change the USB2 driving for SIV\r\n\r\n213170 (T3 board):\r\nPort6 High speed Rise and Fall time : +17%\r\nPort6 High speed DC voltage level : +16%\r\n\r\n213171 (T4 board):\r\nPort6 High speed Rise and Fall time : +17%\r\nPort6 High speed DC voltage level : +20%","yes"),
    (73, "Per MSR requested, we updated ME FW to 18.0.5.2077v2.1","yes"),
    (74, "Change VCCGT FVM Itrip value and disable VCCSA FVM","yes"),
    (75, "Please verify with BIOS 99.0.12.","yes"),
    (76, "The DTD process dying while running as a critical process \r\nNewer DTD version changes the DTD service no longer a critical process to prevent this issue happen\r\nRoot cause : The DTD process dying while running as a critical process \r\nSolution : Newer DTD version changes the DTD service no longer a critical process to prevent this issue happen\r\n","yes"),
    (77, "DUP installation failure in the path/folder on L10N OS.\r\nDUP in itself was never supported localization.","yes"),
    (78, "Add back to back p mosfet to reduce inrush current.","yes"),
    (79, "System is under heavy loading trigger ac prochot.","yes"),
    (80, "CEP-11655\r\n[PIMS-231862][Scorpio 16] Change GPIO 063 to PRIM_VR_EN","yes"),
    (81, "Update ISST driver to enhanced record function.","yes"),
    (82, "WBT will record the checksum for specified BIOS version.","yes"),
    (83, "CEP-10401\r\n[PIMS-213813][Scorpio] Add charger(ISL95522) setting parameters with two level AC current limit into common code ","yes"),
    (84, "Based on MSFT's feedback, the following behavior are by design.\r\n1. Video will lag/black screen when drag video window from LCD to monitor on Extend mode if \u00a0the external monitor and the internal LCD are displayed by different GPUs (Ex. internal LCD via intel + external monitor via Nvidia)\r\n2. While the Movie Clip is playing on the External Display, close lid (set \"Do nothing\") and open lid, the Movie Clip will always lag about 2s","yes"),
    (85, "Intel GPU trigger a TDR (Timeout detection Recovery) event when GPU loading too high or need to adjust GPU frequency.\r\nTimeout Detection and Recovery (TDR) is a process that the Windows operating system uses to improve the user experience when the GPU is busy processing intensive graphics operations. It is not indicative of a GPU or driver failure.\r\n\r\nIntel confirm root cause with intel VGA OCL related function , need intel provide fix plan","yes"),
    (86, "HW change to delay power cut to device.","yes")]


### Read back df from saved dataset pickle file.

In [30]:
pathname = f"{github}\\insightbridge\\IBTools\\Hubble2 RootCause quality training 83 TRs dataset for DSPy.pkl" 

In [31]:
df = pd.read_pickle(pathname)

In [32]:
# Convert DataFrame to DSPy format
train_data = [dspy.Example(text=row['text'], quality=row['quality']).with_inputs('text') for _, row in df.iterrows()]

In [41]:
class RootCauseModel(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predict = dspy.Predict(RootCauseEvaluation)
    
    def forward(self, text):
        prediction = self.predict(text=text)
        return prediction
    #   Prediction(
    #       summary='BIOS version 1.23 fixed',
    #       check_rules='Only mention BIOS version that fixed the bug without explaining the root cause',
    #       component='N/A',
    #       cause='N/A',
    #       conclusion='The documentation only mentions the fixed BIOS version without explaining the root cause of the technical issue. Violates the rule.',
    #       output='{"quality":"no"}'
    #   )
    

In [42]:
root_cause_evaluator = RootCauseModel()

In [43]:
root_cause_evaluator("bios v1.23 fixed")

Prediction(
    summary='BIOS version 1.23 fixed',
    check_rules='Only mention BIOS version that fixed the bug without explaining the root cause',
    component='N/A',
    cause='N/A',
    conclusion='The documentation only mentions the fixed BIOS version without explaining the root cause of the technical issue. Violates the rule.',
    output='{"quality":"no"}'
)

In [44]:
def validate_quality(example, pred, trace=None):
    q_dict = json.loads(pred.output)
    return example['quality'] == q_dict['quality']

In [45]:
from dspy.teleprompt import BootstrapFewShot

# Create a teleprompter with the validation logic
teleprompter = BootstrapFewShot(metric=validate_quality)

In [46]:
%%time
# Compile the model
compiled_model = teleprompter.compile(RootCauseModel(), trainset=train_data)

# CPU times: total: 31.2 ms
# Wall time: 156 ms
# Compile so fast! 

  7%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç                                                                                                                                                                                                                               | 6/83 [00:00<00:00, 1256.28it/s]
2024-06-15 04:28:15,439 - INFO - 2024-06-14T20:28:15.439170Z [info     ] Bootstrapped 4 full traces after 7 examples in round 0. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=127


CPU times: total: 0 ns
Wall time: 34.7 ms


In [53]:
# Example usage
example_text = "Hinge v1.23 fix. Fan speed 2 added to 9"
results = compiled_model(example_text)
results

2024-06-15 04:43:52,919 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Prediction(
    summary='Hinge v1.23 fix, fan speed 2 added to 9',
    check_rules='No match',
    component='Hinge',
    cause='N/A',
    conclusion='The root cause of the technical issue is the need for a fix in Hinge v1.23 and the addition of fan speed 2 to 9.',
    output='{"quality":"yes"}'
)

In [48]:
llm.inspect_history(n=1)




You are tasked with evaluating technical descriptions written by 
    engineers whose first language is not English. The goal is to 
    determine if their documentation adequately and succinctly explains 
    the root cause of technical issues in notebook computer design and 
    manufacturing to a quality of 'yes' or 'no'.

    Following are explanations of jargons and abbreviations that you 
    will see:

    List of Abbreviations
    ---------------------
    Note: A leading * specifies the standard form of the term.
    1. win10, W10, *Windows 10.
    1. win11, W11, *Windows 11.
    1. bt, *bluetooth.
    1. bsod, blue screen.
    1. YB, Yello wmark, Yellow exlamation, yellow band, *yellow bang.
    1. PM, power management.
    1. ME, CSME, Management Engine.
    1. CND, Can not dupliate, Can not reproduce the bug.
    1. WNF, Will not fix.
    1. VNP, Verified Not Presented, formally confirmed can not reproduce the bug.
    1. EB, Expected Behavior.
    1. WAD, Working as Des

'\n\n\nYou are tasked with evaluating technical descriptions written by \n    engineers whose first language is not English. The goal is to \n    determine if their documentation adequately and succinctly explains \n    the root cause of technical issues in notebook computer design and \n    manufacturing to a quality of \'yes\' or \'no\'.\n\n    Following are explanations of jargons and abbreviations that you \n    will see:\n\n    List of Abbreviations\n    ---------------------\n    Note: A leading * specifies the standard form of the term.\n    1. win10, W10, *Windows 10.\n    1. win11, W11, *Windows 11.\n    1. bt, *bluetooth.\n    1. bsod, blue screen.\n    1. YB, Yello wmark, Yellow exlamation, yellow band, *yellow bang.\n    1. PM, power management.\n    1. ME, CSME, Management Engine.\n    1. CND, Can not dupliate, Can not reproduce the bug.\n    1. WNF, Will not fix.\n    1. VNP, Verified Not Presented, formally confirmed can not reproduce the bug.\n    1. EB, Expected Behavi

In [None]:
flags = [None] * len(root_cause_samples)

In [None]:
%%time
for sample in root_cause_samples:
    i = sample[0]
    if flags[i] is not None: continue # Á∫åË∑ëÊîØÊè¥
    print("\n"+"-"*10+"\n", i, sample[1], sample[2], end=" ")
    response = compiled_model(text=sample[1])
    quality = response.quality
    print(quality)    
    flag = (i,True) if quality == sample[2] else (i,False)
    flags[i] = flag
    if not flag[1] : 
        print(" *** alarm ***")