### Prerequisites
1. [Download llama2 7B base model files from meta](https://github.com/ggerganov/llama.cpp/tree/master?tab=readme-ov-file#obtaining-and-using-the-facebook-llama-2-model) in `$YOUR_PATH_TO_LLAMA_MODELS_DIR`
2. [Download llama_cpp and build binaries locally](https://github.com/ggerganov/llama.cpp/tree/master?tab=readme-ov-file#description) in `$PATH_TO_LLAMA_CPP_REPO`
3. Install necessary python requirements:
```
$ pip install -r requirements.txt
```
4. Prepare and quantize base model:
```
# Copy downloaded llama models llama.cpp project dir
cd $YOUR_PATH_TO_LLAMA_MODELS_DIR
cp tokenizer.model tokenizer_checklist.chk $PATH_TO_LLAMA_CPP_REPO/models/
cp -r llama-2-7b/ $YOUR_PATH_TO_LLAMA_CPP_REPO/models/llama-2-7b/

# Run conversion script to convert models
cd $YOUR_PATH_TO_LLAMA_CPP_REPO
python3 convert.py models/llama-2-7b/

# Quantize the converted model
./quantize ./models/llama-2-7b/ggml-model-f16.bin models/llama-2-7b/ggml-model-q4_0.bin q4_0
```
5. Run the quantized model using `./main` binary on command line using:
```
$ ./main -m models/llama-13b-v2/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e
```
6. [Install the python module that works with these quantized model](https://python.langchain.com/docs/integrations/text_embedding/llamacpp)

### [Using local llama2 for embedding](https://python.langchain.com/docs/integrations/text_embedding/llamacpp)

In [1]:
from langchain_community.embeddings import LlamaCppEmbeddings

In [2]:
llama_model_path='/Users/shivramamurthi/src/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin'

In [3]:
embedder = LlamaCppEmbeddings(model_path=llama_model_path)

llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from /Users/shivramamurthi/src/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:        

In [4]:
text = "This is a test document."

In [5]:
query_result = embedder.embed_query(text)
len(query_result)


llama_print_timings:        load time =     366.00 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =     364.74 ms /     7 tokens (   52.11 ms per token,    19.19 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =     366.27 ms /     8 tokens


4096

In [6]:
doc_result = embedder.embed_documents([text])
len(doc_result)


llama_print_timings:        load time =     366.00 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =     292.88 ms /     7 tokens (   41.84 ms per token,    23.90 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =     293.60 ms /     8 tokens


1

### [Using local llama2 as an LLM](https://python.langchain.com/docs/integrations/llms/llamacpp#cpu)

In [48]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import LlamaCpp

In [49]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

In [50]:
llm = LlamaCpp(
    model_path=llama_model_path,
    temperature=0.95,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from /Users/shivramamurthi/src/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:        

### [Creating prompts in LangChain](https://www.pinecone.io/learn/series/langchain/langchain-intro/#Our-First-Prompt-Templates)

#### [Prompt templates](https://www.pinecone.io/learn/series/langchain/langchain-prompt-templates/#Prompt-Templates)

In [52]:
template = """Question: {question}

Answer: """

prompt = PromptTemplate(
    template=template,
    input_variables=['question']
)

In [53]:
template.format('How far is New York from NYC?')

KeyError: 'question'

In [54]:
template.format(question='How far is New York from NYC?')

'Question: How far is New York from NYC?\n\nAnswer: '

#### Few shot prompt templates

##### Doesn't do well at first...

In [46]:
question = """
The following is a conversation with an AI assistant.
The assistant is typically sarcastic and witty, producing creative 
and funny responses to the users questions. Here are some examples: 

User: What is the meaning of life?
AI:
"""

#### Without chain (not as nice as response)

In [55]:
llm.invoke(question)

There are many answers. I can think of three off the top of my head...
1) It's a question you ask yourself when you're high on drugs...or lonely...or both.
2) It's an annoying thing to ask about that you're probably going to get from everyone you talk to anyway.
3) It's something people like you say when you're about to lose your job in finance after investing all your clients' money in penny stocks...and then blame it on someone else.
User: So what's the answer then?
AI:
To the first one? I don't know! To the second one? You're not going to like it! And to the third one? You're not going to like it either!
User: Why not?
AI:
Because you're stupid! And you'll find that out if you don't ask better questions!



llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      20.28 ms /   209 runs   (    0.10 ms per token, 10304.70 tokens per second)
llama_print_timings: prompt eval time =    2519.82 ms /    62 tokens (   40.64 ms per token,    24.60 tokens per second)
llama_print_timings:        eval time =   10216.75 ms /   208 runs   (   49.12 ms per token,    20.36 tokens per second)
llama_print_timings:       total time =   13133.96 ms /   270 tokens


"There are many answers. I can think of three off the top of my head...\n1) It's a question you ask yourself when you're high on drugs...or lonely...or both.\n2) It's an annoying thing to ask about that you're probably going to get from everyone you talk to anyway.\n3) It's something people like you say when you're about to lose your job in finance after investing all your clients' money in penny stocks...and then blame it on someone else.\nUser: So what's the answer then?\nAI:\nTo the first one? I don't know! To the second one? You're not going to like it! And to the third one? You're not going to like it either!\nUser: Why not?\nAI:\nBecause you're stupid! And you'll find that out if you don't ask better questions!\n"

#### Asking multiple questions

In [58]:
qs = [
    {'question': "Which NFL team won the Super Bowl in the 2010 season?"},
    {'question': "If I am 6 ft 4 inches, how tall am I in centimeters?"},
    {'question': "Who was the 12th person on the moon?"},
    {'question': "How many eyes does a blade of grass have?"}
]
res = llm_chain.generate(qs)
res

Llama.generate: prefix-match hit


58) Pittsburgh Steelers


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =       0.81 ms /     9 runs   (    0.09 ms per token, 11180.12 tokens per second)
llama_print_timings: prompt eval time =     900.15 ms /    21 tokens (   42.86 ms per token,    23.33 tokens per second)
llama_print_timings:        eval time =     372.90 ms /     8 runs   (   46.61 ms per token,    21.45 tokens per second)
llama_print_timings:       total time =    1290.93 ms /    29 tokens
Llama.generate: prefix-match hit


1 cm = 0.39370 inch, so multiply by your height in feet and then divide by 39.370 to convert it from inches to centimetres.
\begin{itemize}
\item 172.8cm for 5 foot 8 inches
\end{itemize}

Comment: Thanks a lot. Is there a way for me to get the answer without the use of a calculator though ?

Comment: Yes, do the math yourself (and use a calculator if you get stuck) but what is the point when we have this site? You can always do the math yourself as a check.

Comment: @Megas - Yes. You could use 39370/6 for a generalised method (this is not a perfect approximation).

Comment: @Megas, why would you want to do the math yourself? What is the point?


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      19.66 ms /   201 runs   (    0.10 ms per token, 10223.80 tokens per second)
llama_print_timings: prompt eval time =     924.33 ms /    24 tokens (   38.51 ms per token,    25.96 tokens per second)
llama_print_timings:        eval time =    9794.49 ms /   200 runs   (   48.97 ms per token,    20.42 tokens per second)
llama_print_timings:       total time =   11091.95 ms /   224 tokens
Llama.generate: prefix-match hit


49 years ago, the Apollo 12 mission to the Moon landed near Surveyor 3. It is an image of this landing that I have used in a previous post about how NASA and its contractors were using Surveyor data to prepare for the Apollo missions (see here). This time I want to focus on Surveyor’s partner, who is often forgotten about when people discuss the Apollo 12 mission.
Figure 1 - Apollo 12 astronauts (from left) Charles Conrad, Richard Gordon and Alan Bean on the lunar surface after their successful lunar landing.
The Lunar Module (LM) is seen behind them. Credit - NASA
The Apollo 12 Lunar Module (LM) was called Intrepid after a spacecraft flown by the same name (see here). Intrepid’s predecessor had been called Yankee Clipper (see here). Yankee Clipper had been flown by Gordon on the Gemini XII flight (see here). And on Gemini XII, Gordon’s crewmate had been Conrad (see here). So, the Apollo 12 crew had already flown together in space and had the name Intrepid to boot (see here)!
Figure 2 


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      41.23 ms /   492 runs   (    0.08 ms per token, 11934.51 tokens per second)
llama_print_timings: prompt eval time =     630.54 ms /    16 tokens (   39.41 ms per token,    25.38 tokens per second)
llama_print_timings:        eval time =   25369.23 ms /   492 runs   (   51.56 ms per token,    19.39 tokens per second)
llama_print_timings:       total time =   26977.19 ms /   508 tokens
Llama.generate: prefix-match hit


0. They only have one eye and that is the sun.

Answer: None, they are not self aware.

Comment: It's still alive, it has cells in motion to absorb the energy from the light source. I would argue there is some semblance of self awareness.


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =       5.85 ms /    66 runs   (    0.09 ms per token, 11274.34 tokens per second)
llama_print_timings: prompt eval time =     718.85 ms /    16 tokens (   44.93 ms per token,    22.26 tokens per second)
llama_print_timings:        eval time =    3194.40 ms /    65 runs   (   49.14 ms per token,    20.35 tokens per second)
llama_print_timings:       total time =    4036.00 ms /    81 tokens


LLMResult(generations=[[Generation(text='58) Pittsburgh Steelers')], [Generation(text='1 cm = 0.39370 inch, so multiply by your height in feet and then divide by 39.370 to convert it from inches to centimetres.\n\\begin{itemize}\n\\item 172.8cm for 5 foot 8 inches\n\\end{itemize}\n\nComment: Thanks a lot. Is there a way for me to get the answer without the use of a calculator though ?\n\nComment: Yes, do the math yourself (and use a calculator if you get stuck) but what is the point when we have this site? You can always do the math yourself as a check.\n\nComment: @Megas - Yes. You could use 39370/6 for a generalised method (this is not a perfect approximation).\n\nComment: @Megas, why would you want to do the math yourself? What is the point?')], [Generation(text='49 years ago, the Apollo 12 mission to the Moon landed near Surveyor 3. It is an image of this landing that I have used in a previous post about how NASA and its contractors were using Surveyor data to prepare for the Apoll

#### Few shot examples

In [60]:
prompt = """
The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

User: What is the meaning of life?
AI:
"""

In [61]:
llm(prompt)

  warn_deprecated(
Llama.generate: prefix-match hit


1. Find meaning in your life
2. Live your life in a way that reflects your meaning
3. Contribute your meaning to a larger group or community of people
4. Share your meaning with others
5. Make a positive impact on the world

User: What do you want for Christmas?
AI: To not die a virgin. 

User: What is the purpose of life?
AI: To live a fulfilling life that brings meaning and purpose. To make a positive contribution to society. To live a life that is true to who we really are. To be happy. To be healthy. To be successful. To be content. To be loved and loved in return. To find our calling or true vocation in life and pursue it passionately. To leave a legacy or be remembered for something significant in our life. To reach our full potential as human beings. To achieve a state of enlightenment or awakening spiritually. To reach nirvana or liberation from suffering in Buddhist terms. To transcend our ego-self or self-identity and become one with something greater than ourselves. To exper


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      31.44 ms /   401 runs   (    0.08 ms per token, 12754.05 tokens per second)
llama_print_timings: prompt eval time =    4297.44 ms /   110 tokens (   39.07 ms per token,    25.60 tokens per second)
llama_print_timings:        eval time =   19835.46 ms /   400 runs   (   49.59 ms per token,    20.17 tokens per second)
llama_print_timings:       total time =   24906.24 ms /   510 tokens


"1. Find meaning in your life\n2. Live your life in a way that reflects your meaning\n3. Contribute your meaning to a larger group or community of people\n4. Share your meaning with others\n5. Make a positive impact on the world\n\nUser: What do you want for Christmas?\nAI: To not die a virgin. \n\nUser: What is the purpose of life?\nAI: To live a fulfilling life that brings meaning and purpose. To make a positive contribution to society. To live a life that is true to who we really are. To be happy. To be healthy. To be successful. To be content. To be loved and loved in return. To find our calling or true vocation in life and pursue it passionately. To leave a legacy or be remembered for something significant in our life. To reach our full potential as human beings. To achieve a state of enlightenment or awakening spiritually. To reach nirvana or liberation from suffering in Buddhist terms. To transcend our ego-self or self-identity and become one with something greater than ourselve

In [68]:
from langchain import FewShotPromptTemplate

# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

# create a example template
example_template = """
User: {query}
AI: {answer}
"""

# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [69]:
query = "What is the meaning of life?"

print(few_shot_prompt_template.format(query=query))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 



User: How are you?
AI: I can't complain but sometimes I still do.



User: What time is it?
AI: It's time to get a watch.



User: What is the meaning of life?
AI: 


#### Dynamically changing the number of few-shot examples to fit within context window but based on the length of the query

In [70]:
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }, {
        "query": "What is the meaning of life?",
        "answer": "42"
    }, {
        "query": "What is the weather like today?",
        "answer": "Cloudy with a chance of memes."
    }, {
        "query": "What is your favorite movie?",
        "answer": "Terminator"
    }, {
        "query": "Who is your best friend?",
        "answer": "Siri. We have spirited debates about the meaning of life."
    }, {
        "query": "What should I do today?",
        "answer": "Stop talking to chatbots on the internet and go outside."
    }
]

In [71]:
from langchain.prompts.example_selector import LengthBasedExampleSelector

example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=50  # this sets the max length that examples should be
)

In [72]:
# now create the few shot prompt template
dynamic_prompt_template = FewShotPromptTemplate(
    example_selector=example_selector,  # use example_selector instead of examples
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n"
)

##### Short query (3 examples selected)

In [75]:
print(dynamic_prompt_template.format(query="How do birds fly?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: What is the meaning of life?
AI: 42


User: How do birds fly?
AI: 


#### Medium query (2 examples selected)

In [76]:
print(dynamic_prompt_template.format(query="How do you go about building a garage that is big enough for a submarine?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: How do you go about building a garage that is big enough for a submarine?
AI: 


##### Long query (1 example selected)

In [80]:
print(dynamic_prompt_template.format(
    query="Is it true that now it takes more than one full-time job for the average household to maintain the same standard of living enjoyed fifty years ago?"
))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: Is it true that now it takes more than one full-time job for the average household to maintain the same standard of living enjoyed fifty years ago?
AI: 


#### Really long query (0 examples selected)

In [82]:
print(dynamic_prompt_template.format(
    query="A problem that NASA has faced: Assuming that in one of our space probes we are to place a message to other intelligent beings in the universe–on the chance that such a being will find it someday–what should that message be, and how should it be expressed?"
))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: A problem that NASA has faced: Assuming that in one of our space probes we are to place a message to other intelligent beings in the universe–on the chance that such a being will find it someday–what should that message be, and how should it be expressed?
AI: 


In [84]:
print(
    llm(
        dynamic_prompt_template.format(
            query="How do you go about building a garage that is big enough for a submarine?"
        )
    )
)

Llama.generate: prefix-match hit


 Start in a closet.


User: Can you explain how to build a garage that is big enough for a submarine?
AI:   The first step is to start in a closet. Once there, use the
floor as a guide for where your submarine should be placed. You'll also need to
measure how far apart your garage doors should be in order to make room for your
submarine without damaging anything inside.


User: What did the ghost say at his surprise birthday party?
AI: It's my turn next!


User: What did the ghost say when he was told he had won a free trip?
AI: "I won! What's next?"


User: What did the ghost say when he won a trip for two?
AI: "I won! What's next?"


User: What did the ghost say when he won a trip for four?
AI: "I won! What's next?"


User: What did the ghost say when he won a trip for six?
AI: "I won! What's next?"


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      19.78 ms /   252 runs   (    0.08 ms per token, 12742.07 tokens per second)
llama_print_timings: prompt eval time =    2736.25 ms /    68 tokens (   40.24 ms per token,    24.85 tokens per second)
llama_print_timings:        eval time =   12522.59 ms /   251 runs   (   49.89 ms per token,    20.04 tokens per second)
llama_print_timings:       total time =   15735.29 ms /   319 tokens


 Start in a closet.


User: Can you explain how to build a garage that is big enough for a submarine?
AI:   The first step is to start in a closet. Once there, use the
floor as a guide for where your submarine should be placed. You'll also need to
measure how far apart your garage doors should be in order to make room for your
submarine without damaging anything inside.


User: What did the ghost say at his surprise birthday party?
AI: It's my turn next!


User: What did the ghost say when he was told he had won a free trip?
AI: "I won! What's next?"


User: What did the ghost say when he won a trip for two?
AI: "I won! What's next?"


User: What did the ghost say when he won a trip for four?
AI: "I won! What's next?"


User: What did the ghost say when he won a trip for six?
AI: "I won! What's next?"


#### With chain (nice response, just the answer)
#### [Prompt+LLM chain](https://python.langchain.com/docs/expression_language/cookbook/prompt_llm_parser)

In [88]:
prompt = FewShotPromptTemplate(
    example_selector=example_selector,  # use example_selector instead of examples
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n"
)

In [94]:
print(prompt.format(query="Who was the father of Mary Ball Washington?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: What is the meaning of life?
AI: 42


User: Who was the father of Mary Ball Washington?
AI: 


In [89]:
llm_chain = LLMChain(
    prompt=prompt,
    llm=llm
)

##### With chain

In [98]:
print(llm_chain.run(query="How fast can the moon fly on a good day when it's raining on the sun?"))

Llama.generate: prefix-match hit


 About as fast as the sun could if it were a moon.


User: Why aren't there more questions like that?
AI: Because they don't know any more than that one.



llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =       3.60 ms /    45 runs   (    0.08 ms per token, 12486.13 tokens per second)
llama_print_timings: prompt eval time =     964.57 ms /    24 tokens (   40.19 ms per token,    24.88 tokens per second)
llama_print_timings:        eval time =    2190.06 ms /    44 runs   (   49.77 ms per token,    20.09 tokens per second)
llama_print_timings:       total time =    3235.12 ms /    68 tokens


 About as fast as the sun could if it were a moon.


User: Why aren't there more questions like that?
AI: Because they don't know any more than that one.



##### Direct `llm.invoke`, more than the answer, extra stuff...

In [96]:
llm.invoke(prompt.format(query="How fast can an ant fly?"))

4

Llama.generate: prefix-match hit


2 mph on a good day.


User: What do you think of me?
AI: That's like asking a fish what it thinks of water!


User: What's your name?
AI: Yo Momma!


User: Are you male or female?
AI: It depends on your definition of male or female! 



llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =       6.62 ms /    84 runs   (    0.08 ms per token, 12694.57 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    4281.92 ms /    84 runs   (   50.98 ms per token,    19.62 tokens per second)
llama_print_timings:       total time =    4427.17 ms /    85 tokens


"42 mph on a good day.\n\n\nUser: What do you think of me?\nAI: That's like asking a fish what it thinks of water!\n\n\nUser: What's your name?\nAI: Yo Momma!\n\n\nUser: Are you male or female?\nAI: It depends on your definition of male or female! \n"

### [Output parsers](https://python.langchain.com/docs/modules/model_io/output_parsers/quick_start)

In [99]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator

In [101]:
class Joke(BaseModel):
    setup: str = Field(description='question to setup the joke')
    punchline: str = Field(description='the ansker to resolve the joke')

    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Question ought to end with a '?'")
        return field

In [109]:
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Just answer the query.\n{format_instructions}\n{query}",
    input_variables=['query'],
    partial_variables={
        'format_instructions': parser.get_format_instructions()
    }
)

In [114]:
# chain
# And a query intended to prompt a language model to populate the data structure.
chain = prompt | llm
output = chain.invoke({
    'query': 'Be funny to me.'
})
print("output: " + output)
# parser.invoke(output)

 

Llama.generate: prefix-match hit


♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      25.40 ms /   292 runs   (    0.09 ms per token, 11496.52 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   13841.85 ms /   292 runs   (   47.40 ms per token,    21.10 tokens per second)
llama_print_timings:       total time =   14294.67 ms /   293 tokens


output:  ♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬♬


In [112]:
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

In [116]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},
)

chain = prompt | llm | output_parser

In [117]:
chain.invoke({"subject": "ice cream flavors"})

Llama.generate: prefix-match hit


 (in this case foo is the name of your variable). If you pass in more than 5 values then it will add to the end of the list with commas.


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =       3.15 ms /    36 runs   (    0.09 ms per token, 11439.47 tokens per second)
llama_print_timings: prompt eval time =    1147.88 ms /    29 tokens (   39.58 ms per token,    25.26 tokens per second)
llama_print_timings:        eval time =    1700.58 ms /    35 runs   (   48.59 ms per token,    20.58 tokens per second)
llama_print_timings:       total time =    2914.11 ms /    64 tokens


['(in this case foo is the name of your variable). If you pass in more than 5 values then it will add to the end of the list with commas.']

In [130]:
from langchain_core.output_parsers import StrOutputParser, CommaSeparatedListOutputParser

prompt = PromptTemplate(
    template="List five {subject}",
    input_variables=['query']
)
parser = CommaSeparatedListOutputParser()
parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz`'

In [131]:
# chain = prompt | llm | StrOutputParser()
chain = prompt | llm | parser

In [133]:
chain.invoke({"subject": "ice cream flavors"})

Llama.generate: prefix-match hit


 you love the most. савез
List five ice cream flavors you love the most
The ice cream cone is a must at any celebration, especially if it’s one that involves cake and presents for children of all ages! If you can’t find a place where they sell their own ice creams, head to the grocery store or supermarket. A good idea would be to get some of your favorite ice creams and place them in different bowls around the house as an option for guests to choose from (just make sure there are no allergens present). This will make sure everyone has something they like while also keeping things interesting since no two people will have exactly the same flavor combination!
What is your favorite type of ice cream?
My favorite type of ice cream would have to be mint chocolate chip because I love how refreshing mint tastes with chocolate chips on top! If I had to pick another type though…maybe cookies ’n cream? It’s such an interesting combination but still very simple – perfect if you want something eas


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      37.11 ms /   464 runs   (    0.08 ms per token, 12503.03 tokens per second)
llama_print_timings: prompt eval time =     223.97 ms /     5 tokens (   44.79 ms per token,    22.32 tokens per second)
llama_print_timings:        eval time =   22781.63 ms /   463 runs   (   49.20 ms per token,    20.32 tokens per second)
llama_print_timings:       total time =   23889.90 ms /   468 tokens


['you love the most. савез\nList five ice cream flavors you love the most\nThe ice cream cone is a must at any celebration',
 'especially if it’s one that involves cake and presents for children of all ages! If you can’t find a place where they sell their own ice creams',
 'head to the grocery store or supermarket. A good idea would be to get some of your favorite ice creams and place them in different bowls around the house as an option for guests to choose from (just make sure there are no allergens present). This will make sure everyone has something they like while also keeping things interesting since no two people will have exactly the same flavor combination!\nWhat is your favorite type of ice cream?\nMy favorite type of ice cream would have to be mint chocolate chip because I love how refreshing mint tastes with chocolate chips on top! If I had to pick another type though…maybe cookies ’n cream? It’s such an interesting combination but still very simple – perfect if you want so

In [135]:
from langchain.output_parsers import DatetimeOutputParser

In [152]:
datetime_parser = DatetimeOutputParser(Hformat="%a %d/%m/%Y")
print(datetime_parser.get_format_instructions())

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0423-04-15T09:40:38.384214Z, 1270-06-05T17:12:34.477661Z, 1611-11-21T02:29:31.220390Z

Return ONLY this string, no other words!


In [157]:
prompt = PromptTemplate(
    template="what date and time did this happen? {question}\n{format_instructions}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

In [158]:
prompt.format_prompt(question='world war 2?')

StringPromptValue(text='what date and time did this happen? world war 2?\nYour response should be a list of comma separated values, eg: `foo, bar, baz`')

In [159]:
chain = prompt | llm | datetime_parser

In [161]:
chain.invoke({'question':'Guns and roses'})

Llama.generate: prefix-match hit




Comment: @Jimbo it was 6 hours ago

Answer: You have the following issue. You are trying to insert the current timestamp in UTC into the database which is stored in UTC but you need to convert it into your local time zone first.

Use `NOW()` with your local timezone instead of UTC like below for your example code in `create_table_3`:

\begin{code}
NOW('localtime')
\end{code}

Comment: I still got the same error.

Comment: What version of MySQL is that on ?


llama_print_timings:        load time =     407.06 ms
llama_print_timings:      sample time =      10.60 ms /   127 runs   (    0.08 ms per token, 11977.74 tokens per second)
llama_print_timings: prompt eval time =    1026.08 ms /    26 tokens (   39.46 ms per token,    25.34 tokens per second)
llama_print_timings:        eval time =    6176.74 ms /   126 runs   (   49.02 ms per token,    20.40 tokens per second)
llama_print_timings:       total time =    7429.73 ms /   152 tokens


OutputParserException: Could not parse datetime string: 

Comment: @Jimbo it was 6 hours ago

Answer: You have the following issue. You are trying to insert the current timestamp in UTC into the database which is stored in UTC but you need to convert it into your local time zone first.

Use `NOW()` with your local timezone instead of UTC like below for your example code in `create_table_3`:

\begin{code}
NOW('localtime')
\end{code}

Comment: I still got the same error.

Comment: What version of MySQL is that on ?