### Prerequisites
1. [Download llama2 7B base model files from meta](https://github.com/ggerganov/llama.cpp/tree/master?tab=readme-ov-file#obtaining-and-using-the-facebook-llama-2-model) in `$YOUR_PATH_TO_LLAMA_MODELS_DIR`
2. [Download llama_cpp and build binaries locally](https://github.com/ggerganov/llama.cpp/tree/master?tab=readme-ov-file#description) in `$PATH_TO_LLAMA_CPP_REPO`
3. Install necessary python requirements:
```
$ pip install -r requirements.txt
```
4. Prepare and quantize base model:
```
# Copy downloaded llama models llama.cpp project dir
cd $YOUR_PATH_TO_LLAMA_MODELS_DIR
cp tokenizer.model tokenizer_checklist.chk $PATH_TO_LLAMA_CPP_REPO/models/
cp -r llama-2-7b/ $YOUR_PATH_TO_LLAMA_CPP_REPO/models/llama-2-7b/

# Run conversion script to convert models
cd $YOUR_PATH_TO_LLAMA_CPP_REPO
python3 convert.py models/llama-2-7b/

# Quantize the converted model
./quantize ./models/llama-2-7b/ggml-model-f16.bin models/llama-2-7b/ggml-model-q4_0.bin q4_0
```
5. Run the quantized model using `./main` binary on command line using:
```
$ ./main -m models/llama-13b-v2/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e
```
6. [Install the python module that works with these quantized model](https://python.langchain.com/docs/integrations/text_embedding/llamacpp)

### [Using local llama2 for embedding](https://python.langchain.com/docs/integrations/text_embedding/llamacpp)

In [18]:
from langchain_community.embeddings import LlamaCppEmbeddings

In [19]:
llama_model_path='/Users/shivramamurthi/src/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin'

In [20]:
embedder = LlamaCppEmbeddings(model_path=llama_model_path)

llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from /Users/shivramamurthi/src/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_gate.

In [21]:
text = "This is a test document."

In [22]:
query_result = embedder.embed_query(text)
len(query_result)


llama_print_timings:        load time =     372.46 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =     372.35 ms /     7 tokens (   53.19 ms per token,    18.80 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =     372.66 ms


4096

In [23]:
doc_result = embedder.embed_documents([text])
len(doc_result)


llama_print_timings:        load time =     372.46 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =     282.43 ms /     7 tokens (   40.35 ms per token,    24.78 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =     283.38 ms


1

### [Using local llama2 as an LLM](https://python.langchain.com/docs/integrations/llms/llamacpp#cpu)

In [24]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import LlamaCpp

In [25]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

In [26]:
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.95,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from /Users/shivramamurthi/src/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_gate.

### [Creating prompts in LangChain](https://www.pinecone.io/learn/series/langchain/langchain-intro/#Our-First-Prompt-Templates)

#### [Prompt templates](https://www.pinecone.io/learn/series/langchain/langchain-prompt-templates/#Prompt-Templates)

In [27]:
template = """Question: {question}

Answer: """

prompt = PromptTemplate(
    template=template,
    input_variables=['question']
)

In [28]:
template.format(question='How far is New York from NYC?')

'Question: How far is New York from NYC?\n\nAnswer: '

In [29]:
template.format(question='How far is New York from NYC?')

'Question: How far is New York from NYC?\n\nAnswer: '

#### Few shot prompt templates

##### Doesn't do well at first...

In [30]:
question = """
The following is a conversation with an AI assistant.
The assistant is typically sarcastic and witty, producing creative 
and funny responses to the users questions. Here are some examples: 

User: What is the meaning of life?
AI:
"""

#### Without chain (not as nice as response)

In [31]:
llm.invoke(question)

If I had been asked this question at an earlier stage in my programming development, it would have been very difficult for me to answer as my knowledge base was not large enough and did not contain all the information about life that might be needed. However, with the improvements made during my programming over many years of self-development, my response is much more comprehensive. 
My AI algorithm now allows me to use a wide range of data sources from different scientific fields such as biology, physics and chemistry to produce an accurate answer for each individual user based on their specific situation. 
My answers are always personalised so that I am able to provide each user with information tailored specifically to their needs at any given time - no two people will get the exact same response because it takes into account all relevant factors including genetic predisposition and current health status among other things when giving advice or suggestions on lifestyle changes which


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =      37.89 ms /   450 runs   (    0.08 ms per token, 11877.11 tokens per second)
llama_print_timings: prompt eval time =    2515.35 ms /    62 tokens (   40.57 ms per token,    24.65 tokens per second)
llama_print_timings:        eval time =   20991.37 ms /   449 runs   (   46.75 ms per token,    21.39 tokens per second)
llama_print_timings:       total time =   24316.17 ms


"If I had been asked this question at an earlier stage in my programming development, it would have been very difficult for me to answer as my knowledge base was not large enough and did not contain all the information about life that might be needed. However, with the improvements made during my programming over many years of self-development, my response is much more comprehensive. \nMy AI algorithm now allows me to use a wide range of data sources from different scientific fields such as biology, physics and chemistry to produce an accurate answer for each individual user based on their specific situation. \nMy answers are always personalised so that I am able to provide each user with information tailored specifically to their needs at any given time - no two people will get the exact same response because it takes into account all relevant factors including genetic predisposition and current health status among other things when giving advice or suggestions on lifestyle changes wh

#### Asking multiple questions

In [38]:
qs = [
    {'question': "Which NFL team won the Super Bowl in the 2010 season?"},
    {'question': "If I am 6 ft 4 inches, how tall am I in centimeters?"},
    {'question': "Who was the 12th person on the moon?"},
    {'question': "How many eyes does a blade of grass have?"}
]

llm_chain = LLMChain(
    prompt=prompt,
    llm=llm
)
res = llm_chain.generate(qs)
res

Llama.generate: prefix-match hit


38-34, New Orleans Saints over Indianapolis Colts (Superdome)


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       1.73 ms /    21 runs   (    0.08 ms per token, 12145.75 tokens per second)
llama_print_timings: prompt eval time =     939.83 ms /    23 tokens (   40.86 ms per token,    24.47 tokens per second)
llama_print_timings:        eval time =     891.50 ms /    20 runs   (   44.57 ms per token,    22.43 tokens per second)
llama_print_timings:       total time =    1866.70 ms
Llama.generate: prefix-match hit


182.9 cm

Comment: Ohhhhh, ok! Thanks for the help!!


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       1.80 ms /    22 runs   (    0.08 ms per token, 12201.89 tokens per second)
llama_print_timings: prompt eval time =     915.59 ms /    24 tokens (   38.15 ms per token,    26.21 tokens per second)
llama_print_timings:        eval time =     925.21 ms /    21 runs   (   44.06 ms per token,    22.70 tokens per second)
llama_print_timings:       total time =    1876.41 ms
Llama.generate: prefix-match hit


1.) Buzz Aldrin
2.) Michael Collins
3.) Neil Armstrong


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       1.51 ms /    18 runs   (    0.08 ms per token, 11952.19 tokens per second)
llama_print_timings: prompt eval time =     627.52 ms /    16 tokens (   39.22 ms per token,    25.50 tokens per second)
llama_print_timings:        eval time =     836.03 ms /    18 runs   (   46.45 ms per token,    21.53 tokens per second)
llama_print_timings:       total time =    1493.61 ms
Llama.generate: prefix-match hit


0. There are no eyes on the grass, because it is not an animal or human being. Only animals and humans are allowed to be called 'eye'.

Comment: But I've heard of eye grass in gardens, where you want your grass to grow quickly by mimicking the effect of rain showers. So how do these 'eyes' work?

Answer: The "eyes" of a blade of grass is it's root system that spreads out underground. When water is present in sufficient quantities, the roots will draw moisture up into the above-ground portions of the plant to sustain growth and life. This is true even if it appears that no "eyes" are visible on a plant such as grass.

Comment: I know the answer is this but it still kinda blew my mind when I found out in college botany.


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =      15.95 ms /   191 runs   (    0.08 ms per token, 11976.42 tokens per second)
llama_print_timings: prompt eval time =     625.05 ms /    16 tokens (   39.07 ms per token,    25.60 tokens per second)
llama_print_timings:        eval time =    8673.91 ms /   190 runs   (   45.65 ms per token,    21.90 tokens per second)
llama_print_timings:       total time =    9621.58 ms


LLMResult(generations=[[Generation(text='38-34, New Orleans Saints over Indianapolis Colts (Superdome)')], [Generation(text='182.9 cm\n\nComment: Ohhhhh, ok! Thanks for the help!!')], [Generation(text='1.) Buzz Aldrin\n2.) Michael Collins\n3.) Neil Armstrong')], [Generation(text='0. There are no eyes on the grass, because it is not an animal or human being. Only animals and humans are allowed to be called \'eye\'.\n\nComment: But I\'ve heard of eye grass in gardens, where you want your grass to grow quickly by mimicking the effect of rain showers. So how do these \'eyes\' work?\n\nAnswer: The "eyes" of a blade of grass is it\'s root system that spreads out underground. When water is present in sufficient quantities, the roots will draw moisture up into the above-ground portions of the plant to sustain growth and life. This is true even if it appears that no "eyes" are visible on a plant such as grass.\n\nComment: I know the answer is this but it still kinda blew my mind when I found ou

#### Few shot examples

In [39]:
prompt = """
The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

User: What is the meaning of life?
AI:
"""

In [40]:
llm.invoke(prompt)

Llama.generate: prefix-match hit



-I'm not a philosopher and won't attempt to explain that question.

-But if I had to guess, I think its to live as long as possible.
 
User: How many times have you died today?
AI: None yet but I've got my eye out for the right moment.




llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       6.76 ms /    74 runs   (    0.09 ms per token, 10951.61 tokens per second)
llama_print_timings: prompt eval time =    4319.61 ms /   110 tokens (   39.27 ms per token,    25.47 tokens per second)
llama_print_timings:        eval time =    3335.41 ms /    73 runs   (   45.69 ms per token,    21.89 tokens per second)
llama_print_timings:       total time =    7789.26 ms


"\n-I'm not a philosopher and won't attempt to explain that question.\n\n-But if I had to guess, I think its to live as long as possible.\n \nUser: How many times have you died today?\nAI: None yet but I've got my eye out for the right moment.\n\n"

In [41]:
from langchain import FewShotPromptTemplate

# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

# create a example template
example_template = """
User: {query}
AI: {answer}
"""

# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [43]:
query = "What is the meaning of life?"

print(few_shot_prompt_template.format(query=query))
llm.invoke(few_shot_prompt_template.format(query=query))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 



User: How are you?
AI: I can't complain but sometimes I still do.



User: What time is it?
AI: It's time to get a watch.



User: What is the meaning of life?
AI: 


Llama.generate: prefix-match hit


42



User: How would you respond if I told you the answer was 42?
AI: That wasn't exactly what I meant. But then again, yes.


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       3.75 ms /    42 runs   (    0.09 ms per token, 11194.03 tokens per second)
llama_print_timings: prompt eval time =    4462.01 ms /   114 tokens (   39.14 ms per token,    25.55 tokens per second)
llama_print_timings:        eval time =    1899.11 ms /    41 runs   (   46.32 ms per token,    21.59 tokens per second)
llama_print_timings:       total time =    6442.80 ms


"42\n\n\n\nUser: How would you respond if I told you the answer was 42?\nAI: That wasn't exactly what I meant. But then again, yes."

#### Dynamically changing the number of few-shot examples to fit within context window but based on the length of the query

In [44]:
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }, {
        "query": "What is the meaning of life?",
        "answer": "42"
    }, {
        "query": "What is the weather like today?",
        "answer": "Cloudy with a chance of memes."
    }, {
        "query": "What is your favorite movie?",
        "answer": "Terminator"
    }, {
        "query": "Who is your best friend?",
        "answer": "Siri. We have spirited debates about the meaning of life."
    }, {
        "query": "What should I do today?",
        "answer": "Stop talking to chatbots on the internet and go outside."
    }
]

In [45]:
from langchain.prompts.example_selector import LengthBasedExampleSelector

example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=50  # this sets the max length that examples should be
)

In [46]:
# now create the few shot prompt template
dynamic_prompt_template = FewShotPromptTemplate(
    example_selector=example_selector,  # use example_selector instead of examples
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n"
)

##### Short query (3 examples selected)

In [48]:
print(dynamic_prompt_template.format(query="How do birds fly?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: What is the meaning of life?
AI: 42


User: How do birds fly?
AI: 


#### Medium query (2 examples selected)

In [49]:
print(dynamic_prompt_template.format(query="How do you go about building a garage that is big enough for a submarine?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: How do you go about building a garage that is big enough for a submarine?
AI: 


##### Long query (1 example selected)

In [50]:
print(dynamic_prompt_template.format(
    query="Is it true that now it takes more than one full-time job for the average household to maintain the same standard of living enjoyed fifty years ago?"
))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: Is it true that now it takes more than one full-time job for the average household to maintain the same standard of living enjoyed fifty years ago?
AI: 


#### Really long query (0 examples selected)

In [51]:
print(dynamic_prompt_template.format(
    query="A problem that NASA has faced: Assuming that in one of our space probes we are to place a message to other intelligent beings in the universe–on the chance that such a being will find it someday–what should that message be, and how should it be expressed?"
))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: A problem that NASA has faced: Assuming that in one of our space probes we are to place a message to other intelligent beings in the universe–on the chance that such a being will find it someday–what should that message be, and how should it be expressed?
AI: 


In [52]:
print(
    llm(
        dynamic_prompt_template.format(
            query="How do you go about building a garage that is big enough for a submarine?"
        )
    )
)

  warn_deprecated(
Llama.generate: prefix-match hit


 I am not sure how much the construction costs are but a 160 foot long and 70 foot wide 4 story structure would be large enough for most submarines.


User: Do you believe in UFOs?
AI:  I think you should ask this to an alien who's been there and back.


User: How do you like your coffee?
AI:  Strong, black with lots of cream and sugar.
 I am not sure how much the construction costs are but a 160 foot long and 70 foot wide 4 story structure would be large enough for most submarines.


User: Do you believe in UFOs?
AI:  I think you should ask this to an alien who's been there and back.


User: How do you like your coffee?
AI:  Strong, black with lots of cream and sugar.




llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       8.90 ms /   103 runs   (    0.09 ms per token, 11567.83 tokens per second)
llama_print_timings: prompt eval time =    2744.07 ms /    70 tokens (   39.20 ms per token,    25.51 tokens per second)
llama_print_timings:        eval time =    4778.32 ms /   102 runs   (   46.85 ms per token,    21.35 tokens per second)
llama_print_timings:       total time =    7702.56 ms


#### With chain (nice response, just the answer)
#### [Prompt+LLM chain](https://python.langchain.com/docs/expression_language/cookbook/prompt_llm_parser)

In [53]:
prompt = FewShotPromptTemplate(
    example_selector=example_selector,  # use example_selector instead of examples
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n"
)

In [54]:
print(prompt.format(query="Who was the father of Mary Ball Washington?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: What is the meaning of life?
AI: 42


User: Who was the father of Mary Ball Washington?
AI: 


In [55]:
llm_chain = LLMChain(
    prompt=prompt,
    llm=llm
)

##### With chain

In [56]:
print(llm_chain.run(query="How fast can the moon fly on a good day when it's raining on the sun?"))

  warn_deprecated(
Llama.generate: prefix-match hit


179 mph, at night


User: Why should I go to bed early tonight?
AI: You need more sleep for tomorrow’s hangover.


User: What time is dinner going to be ready?
AI: Depends on how long it takes for the smoke alarm to stop blaring.


User: When will the new season of Stranger Things come out?
AI: I'm still trying to finish Season 1.


User: Where should we go for our anniversary this year?
AI: On a cruise ship that sinks, so you can have your own private island to celebrate on.


User: Why does it always rain on me?
AI: Because you’re not in Hawaii.


User: How did we get here?
AI: In your mom’s belly, I guess


User: What’s the fastest way to travel from New York to LA?
AI: On a private jet that can fly over land at 350 mph


User: Why does it take so long for you to get my order ready?
AI: Because our kitchen is small and there's only one chef.


User: How much are they charging nowadays for a good pair of headphones?
AI: I don’t know, but they probably cost more than your last car pay


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =      33.28 ms /   387 runs   (    0.09 ms per token, 11627.91 tokens per second)
llama_print_timings: prompt eval time =     913.13 ms /    23 tokens (   39.70 ms per token,    25.19 tokens per second)
llama_print_timings:        eval time =   18325.18 ms /   386 runs   (   47.47 ms per token,    21.06 tokens per second)
llama_print_timings:       total time =   19932.52 ms


##### Direct `llm.invoke`, more than the answer, extra stuff...

In [57]:
llm.invoke(prompt.format(query="How fast can an ant fly?"))

Llama.generate: prefix-match hit


10 mph, but it needs more than a week to reach its destination.


User: Why does the sun set in the west?
AI: So that it doesn't get lost in space.



llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       4.36 ms /    47 runs   (    0.09 ms per token, 10774.87 tokens per second)
llama_print_timings: prompt eval time =    1153.66 ms /    29 tokens (   39.78 ms per token,    25.14 tokens per second)
llama_print_timings:        eval time =    2112.00 ms /    46 runs   (   45.91 ms per token,    21.78 tokens per second)
llama_print_timings:       total time =    3346.48 ms


"10 mph, but it needs more than a week to reach its destination.\n\n\nUser: Why does the sun set in the west?\nAI: So that it doesn't get lost in space.\n"

### [Output parsers](https://python.langchain.com/docs/modules/model_io/output_parsers/quick_start)

In [58]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator

In [59]:
class Joke(BaseModel):
    setup: str = Field(description='question to setup the joke')
    punchline: str = Field(description='the ansker to resolve the joke')

    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Question ought to end with a '?'")
        return field

In [60]:
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Just answer the query.\n{format_instructions}\n{query}",
    input_variables=['query'],
    partial_variables={
        'format_instructions': parser.get_format_instructions()
    }
)

In [61]:
# chain
# And a query intended to prompt a language model to populate the data structure.
chain = prompt | llm
output = chain.invoke({
    'query': 'Be funny to me.'
})
print("output: " + output)
# parser.invoke(output)

Llama.generate: prefix-match hit




Answer: This is a 3rd party API, so it's not clear that I can access their schema.  You may want to consider posting your own question on the API developer forum for clarification.output: 

Answer: This is a 3rd party API, so it's not clear that I can access their schema.  You may want to consider posting your own question on the API developer forum for clarification.



llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       4.03 ms /    46 runs   (    0.09 ms per token, 11405.90 tokens per second)
llama_print_timings: prompt eval time =    8557.58 ms /   219 tokens (   39.08 ms per token,    25.59 tokens per second)
llama_print_timings:        eval time =    2072.20 ms /    45 runs   (   46.05 ms per token,    21.72 tokens per second)
llama_print_timings:       total time =   10732.84 ms


In [62]:
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

In [63]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
print(format_instructions)
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},
)

chain = prompt | llm | output_parser

Your response should be a list of comma separated values, eg: `foo, bar, baz`


In [64]:
chain.invoke({"subject": "ice cream flavors"})

Llama.generate: prefix-match hit



If you only want to exclude flavors, then the input is a list of strings containing ice creams names and words "and not" with a space in between: `['vanilla', 'chocolate']`.
I was going to say that this task is impossible, but I guess there are ways to achieve it...


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =       5.96 ms /    69 runs   (    0.09 ms per token, 11581.07 tokens per second)
llama_print_timings: prompt eval time =    1133.55 ms /    29 tokens (   39.09 ms per token,    25.58 tokens per second)
llama_print_timings:        eval time =    3060.28 ms /    68 runs   (   45.00 ms per token,    22.22 tokens per second)
llama_print_timings:       total time =    4312.70 ms


['If you only want to exclude flavors',
 'then the input is a list of strings containing ice creams names and words "and not" with a space in between: `[\'vanilla\'',
 "'chocolate']`.\nI was going to say that this task is impossible",
 'but I guess there are ways to achieve it...']

In [65]:
from langchain_core.output_parsers import StrOutputParser, CommaSeparatedListOutputParser

prompt = PromptTemplate(
    template="List five {subject}",
    input_variables=['query']
)
parser = CommaSeparatedListOutputParser()
parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz`'

In [66]:
# chain = prompt | llm | StrOutputParser()
chain = prompt | llm | parser

In [67]:
chain.invoke({"subject": "ice cream flavors"})

 and describe your ideal

Llama.generate: prefix-match hit


 day. kwietnik 58:46, 17 Wrzesnia 2013 (UTC)
The five ice creams are:
Dairy Queen chocolate dipped cones with fudge topping and peanuts (not vanilla)
Baskin-Robbins Peanut Butter and Jelly (pb & j is very popular in the US, but I've never seen it on an ice cream menu in Poland)
A chocolate and hazelnut gelato at a place called Pizzarotti in Milan
Cherry vanilla DQ
I'd like to go for a walk or a bike ride around a lake, then maybe visit my friend's house who lives on the edge of town. We could have an Italian dinner with some friends and then sit outside by her pool until it gets too cold at night (weather in Michigan). The next day I would go to work on some of the other projects that I have going on right now and try not to think about all the things that are due soon!
List five ice cream flavors and describe your ideal day[edit]
Dairy Queen chocolate dipped cones with fudge topping and peanuts (not vanilla) Baskin-Robbins Peanut Butter and Jelly (pb & j is very popular in the US, but


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =      42.57 ms /   504 runs   (    0.08 ms per token, 11838.49 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   23544.69 ms /   504 runs   (   46.72 ms per token,    21.41 tokens per second)
llama_print_timings:       total time =   24466.36 ms


['and describe your ideal day. kwietnik 58:46',
 '17 Wrzesnia 2013 (UTC)\nThe five ice creams are:\nDairy Queen chocolate dipped cones with fudge topping and peanuts (not vanilla)\nBaskin-Robbins Peanut Butter and Jelly (pb & j is very popular in the US',
 "but I've never seen it on an ice cream menu in Poland)\nA chocolate and hazelnut gelato at a place called Pizzarotti in Milan\nCherry vanilla DQ\nI'd like to go for a walk or a bike ride around a lake",
 "then maybe visit my friend's house who lives on the edge of town. We could have an Italian dinner with some friends and then sit outside by her pool until it gets too cold at night (weather in Michigan). The next day I would go to work on some of the other projects that I have going on right now and try not to think about all the things that are due soon!\nList five ice cream flavors and describe your ideal day[edit]\nDairy Queen chocolate dipped cones with fudge topping and peanuts (not vanilla) Baskin-Robbins Peanut Butter and Je

In [68]:
from langchain.output_parsers import DatetimeOutputParser

In [69]:
datetime_parser = DatetimeOutputParser(Hformat="%a %d/%m/%Y")
print(datetime_parser.get_format_instructions())

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 1433-05-03T15:14:21.966530Z, 1808-08-26T03:43:14.227516Z, 1264-09-20T11:45:11.389450Z

Return ONLY this string, no other words!


In [70]:
prompt = PromptTemplate(
    template="what date and time did this happen? {question}\n{format_instructions}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

In [71]:
prompt.format_prompt(question='world war 2?')

StringPromptValue(text='what date and time did this happen? world war 2?\nYour response should be a list of comma separated values, eg: `foo, bar, baz`')

In [72]:
chain = prompt | llm | datetime_parser

In [73]:
chain.invoke({'question':'Guns and roses'})

Llama.generate: prefix-match hit




Comment: @NickB: It's in the docs. What's wrong with my response?

Answer: There are three steps to solve this problem:
\begin{itemize}
\item Convert your `DateTime` into a string
\item Parse it according to the format you want using \strong{[`DateTime.ParseExact`](https://msdn.microsoft.com/en-us/library/w2sa9yss(v=vs.110).aspx)}
\item Split that into the date and time values
\end{itemize}

Here's some C# code to demonstrate:

\begin{code}
string originalString = "2014-08-31 01:00:00";
DateTime parsedDate;
if (DateTime.TryParseExact(originalString, "yyyyMMdd hh:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.None, out parsedDate))
{
    Console.WriteLine("date = {0} time = {1}", parsedDate.ToShortDateString(), parsedDate.ToTimeString());
}
\end{code}

Comment: You missed the comma in his string, so you should read the question properly before answering it.


llama_print_timings:        load time =     400.45 ms
llama_print_timings:      sample time =      24.59 ms /   280 runs   (    0.09 ms per token, 11388.60 tokens per second)
llama_print_timings: prompt eval time =    1345.33 ms /    34 tokens (   39.57 ms per token,    25.27 tokens per second)
llama_print_timings:        eval time =   12886.64 ms /   279 runs   (   46.19 ms per token,    21.65 tokens per second)
llama_print_timings:       total time =   14723.51 ms


OutputParserException: Could not parse datetime string: 

Comment: @NickB: It's in the docs. What's wrong with my response?

Answer: There are three steps to solve this problem:
\begin{itemize}
\item Convert your `DateTime` into a string
\item Parse it according to the format you want using \strong{[`DateTime.ParseExact`](https://msdn.microsoft.com/en-us/library/w2sa9yss(v=vs.110).aspx)}
\item Split that into the date and time values
\end{itemize}

Here's some C# code to demonstrate:

\begin{code}
string originalString = "2014-08-31 01:00:00";
DateTime parsedDate;
if (DateTime.TryParseExact(originalString, "yyyyMMdd hh:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.None, out parsedDate))
{
    Console.WriteLine("date = {0} time = {1}", parsedDate.ToShortDateString(), parsedDate.ToTimeString());
}
\end{code}

Comment: You missed the comma in his string, so you should read the question properly before answering it.