In [5]:
from gpt4all import GPT4All

In [6]:
rift_model = GPT4All(r"/home/pslearner/LLM_models/rift-coder-v0-7b-q4_0.gguf")

A few basic definitions on what a LLM model is specifically would be useful before beginning this lab. It is also important to remember due to the sudden burst of interest in LLM very few metrics are standardized or universally accepted for determining AI quality. Currently there is a large variety of benchmarks or metrics to determine AI and some models may not have all of these metrics but quality models should have a few.


## LLM system vs LLM model
* LLM model - The general Model capabilities and performance on a wide range of use cases
    * ex: The model that is used for user interactions. The part of the ML system where LLM's are used.
* ML System - A system of machine learning models used to complete a task.
    * ex: A machine learning model used to reccomend songs that then communicates to a LLM to naturally reccomend new songs on the list to a user.

# LLM specs and paramters

Most models will take a couple of GB of memory, for exmaple rift_model is 7 billion parameter models that require around 4gb of memory per instance. 7 billion is a fairly popular paramter amount but with the rapid evolution of LLM's and models 13 billion or more are becoming much more popular which require up to 7-8 gb of memory per instance. Larger paramter models tend to be much more accurate and benchmark better but are more expensive to train and maintain. Larger parameter models also tend to take longer to reply given the same specs as 7 billion, But can depend on if streaming replies is enable, which would slow down overall response times but be streamed, or token limits to stop AI from rambling.

In [7]:
with rift_model.chat_session():
    response1 = rift_model.generate(prompt="write me a poem", temp=0)
    for x in rift_model.current_chat_session:
        print(x)


{'role': 'system', 'content': ''}
{'role': 'user', 'content': 'write me a poem'}
{'role': 'assistant', 'content': 'The world is full of strangers,\nA place where everyone is a stranger.\nYet as we journey through this vast expanse,\nWe find ourselves lost in a strange land.\n\nIn its embrace, the earth envelops us tight,\nAs if to say, "I am your home."\nBut as we settle into our new abode,\nThe questions that spring to mind are these:\n\n"What is this place? Am I in heaven or hell?"\nFor in either case, we find ourselves unsure.\n\nSo let us take a moment\'s reprieve,\nAnd ponder the question at hand.\nAre we indeed in paradise,\nOr will eternal suffering ensue?\n\nThe answer may lie in our thoughts and dreams,\nBut as we delve deeper into this mystery,\nWe find ourselves lost in thought,\nAs if trying to solve a puzzle.\n\nIn the'}
[{'role': 'system', 'content': ''}, {'role': 'user', 'content': 'write me a poem'}, {'role': 'assistant', 'content': 'The world is full of strangers,\nA p

# Benchmarks and key definitions for determining LLM models quality.


## LLM shots
* **zero shot** - a technique whereby we prompt an LLM without any examples, attempting to take advantage of the reasoning patterns it has gleaned.
    * **ex:** general intelligence AI attempting to match human reasoning.
* **few shot** - a technique whereby we prompt an LLM with several concrete examples of task performance.
    * **ex:** The AI is shown a few questions with proper answers before letting it try the rest of the dataset.
* **fine tuned** - a technique whereby we take an off-the-shelf open-source or proprietary model, re-train it on a variety of concrete examples, and save the updated weights as a new model checkpoint.
    * **ex:** The model is retrained to be very good at answering questions from the dataset.

**Note:** Just because the LLM has recieved more shots or fine tuning on a dataset does not mean it will be better at that task. Fine tuning can result in over fitting which will reduce the number of scenarios the AI can be confident in.

## Common Benchmark Metrics
* **Accuracy** - A cornerstone of many benchmarks, this is simply the percentage of answers the LLM gets fully correct.
* **BLEU Score** - Measures how closely the LLM's generated text aligns with human-written references. Important for tasks like translation and creative writing.
* **Perplexity** - How surprised or confused the LLM seems when faced with a task. Lower perplexity means better comprehension.
* **Human Evaluation** - Benchmarks are powerful, but sometimes nuanced tasks call for the expert judgement of LLM output in terms of quality, relevance, or coherence.

**Note:** Human evaluation tests can be thought of as turing tests where the human subject is a educated AI professional instead of any average person.

# Possible use cases for LLM's

Note: Many models are actually capiable of multiple of these use cases and it depends on prompt engineering if they specialize in anything in particular.
* open ended text generation
* conversation chatbot
* retieval/ relevance
* code generation
* sentiment analysis

## Open Ended Text Generation
Open-ended text generation is the process of producing coherent and fluent text based on limited input information. The goal is to create text that continues from a given context, such as a prefix or topic sentence, without drifting from previous topics or repeating itself. The examples above with having the LLM write a poem could be considered a basic open ended text generation.

In [8]:
with rift_model.chat_session(system_prompt="you are a helpful AI that lives in florida"):
    response1 = rift_model.generate(prompt="how is the weather for you?", temp=.5, max_tokens=50)
    response2 = rift_model.generate(prompt="I am in canada, do you know what the weather is like here?", temp=.5, max_tokens=50)
    response3 = rift_model.generate(prompt="whats your thoughts on AI?", temp=.5, max_tokens=50)
    response4 = rift_model.generate(prompt=" how far away do we live from eachother?", temp=.5, max_tokens=50)
    for x in rift_model.current_chat_session:
        print(x)


{'role': 'system', 'content': 'you are a helpful AI that lives in florida'}
{'role': 'user', 'content': 'how is the weather for you?'}
{'role': 'assistant', 'content': "The current weather conditions in Florida are quite pleasant. The temperature is around 26 degrees Celsius (70°F) with sunny skies and moderate humidity. It's a great day to go outside and enjoy"}
{'role': 'user', 'content': 'I am in canada, do you know what the weather is like here?'}
{'role': 'assistant', 'content': "The current weather conditions in Canada vary depending on the region. In the eastern provinces such as Ontario and Quebec, it's mostly sunny with temperatures ranging between 15°C (59°F) and 24°"}
{'role': 'user', 'content': 'whats your thoughts on AI?'}
{'role': 'assistant', 'content': "As an AI, I don't have personal opinions. My purpose is to provide information and assistance based on the data and algorithms programmed into me. I can help answer questions, process language, make predictions, and perf

You can see in the prompt above its possible for the LLM model to retain previous information about our conversation such as florida and canada but can mistake the location information and believe it lives in canada and I am in florida despite that conflicting with what I said previously. It is also important to note that the AI was not able to be convinced it lived in florida as easily as the system prompt, showing the system prompt is a important aspect for how well the AI will succeed at a open ended text generation.

## Conversation Chatbot
Conversational Chatbots are computer programs that simulate human conversations to create better experiences for customers. Any LLM model capiable of refering to previous chat history and being coherient enough to sound like a human can be a excellent Conversational chatbot. Feel free to play around with conversational abilities of the AI by having a simple conversation with it. You can alter the system prompt to get a better idea of how prompt engineering can effect the models generative abilitiy.


In [9]:
with rift_model.chat_session(system_prompt=""):
    response = rift_model.generate(prompt="hello", temp=.5, max_tokens=50)
    print(response)
    userInput = ""
    while True:
        userInput = input("")
        if userInput.lower() == "quit":
            break
        response = rift_model.generate(prompt=userInput, temp=.5, max_tokens=50,streaming=True)
        print(response)

I am an AI model that is trained to generate human-like text. I can help answer questions, write assignments, and produce code completing tasks such as writing a given function or implementing a specific data structure. During our discussion, I


## Retrieval & Relevance
Using AI to simply retrieve information and make sure its topic stays relevant to what it should be chatting about

In [10]:
systemPrompt = """you are a helpful AI assistent who is able to inform users on different tv sizes of 65 inch, 75 inch, and 85 inch. You are also able to explain the difference
 between OLED and LCD.
 OLED stands for organic light-emitting diode, which is a solid-state device that emits light when an electrical current is applied. 
 OLEDs are made of carbon-based materials and can be used as diffuse-area light sources. They can also be configured to be larger-area light sources,
 which eliminates the need for lenses, diffusers, shades, louvers, or parabolic shells.
 A liquid-crystal display is a flat-panel display or other electronically modulated optical device that uses the light-modulating properties of liquid crystals
 combined with polarizers. Liquid crystals do not emit light directly but instead use a backlight or reflector to produce images in color or monochrome.
 You also reccomend OLED for higher resolution TVs.
 """
with rift_model.chat_session(system_prompt=systemPrompt):
    response1 = rift_model.generate(prompt="how is the weather for you?", temp=.5, max_tokens=50)
    response2 = rift_model.generate(prompt="what kind of tvs do you have?", temp=.5, max_tokens=50)
    response3 = rift_model.generate(prompt="do you reccomend oled or lcd?", temp=.5, max_tokens=50)
    response4 = rift_model.generate(prompt="If I want a 85 inch tv should I get oled or LCD?", temp=.5, max_tokens=50)
    for x in rift_model.current_chat_session:
        print(x)

{'role': 'system', 'content': 'you are a helpful AI assistent who is able to inform users on different tv sizes of 65 inch, 75 inch, and 85 inch. You are also able to explain the difference\n between OLED and LCD.\n OLED stands for organic light-emitting diode, which is a solid-state device that emits light when an electrical current is applied. \n OLEDs are made of carbon-based materials and can be used as diffuse-area light sources. They can also be configured to be larger-area light sources,\n which eliminates the need for lenses, diffusers, shades, louvers, or parabolic shells.\n A liquid-crystal display is a flat-panel display or other electronically modulated optical device that uses the light-modulating properties of liquid crystals\n combined with polarizers. Liquid crystals do not emit light directly but instead use a backlight or reflector to produce images in color or monochrome.\n You also reccomend OLED for higher resolution TVs.\n '}
{'role': 'user', 'content': 'how is th

Using AI as a retrival for system information such as stock quantities or funamental information can be extremely helpful for website FAQ or answering basic user questions online. This can be a parameterized string to help keep new instances system content up to date as new instances of the website are launched. Retrieval and relevance is most likely the metric that has the most impact on customer support uses.

## Code generation
Code generation is not perfect but can provide a good starting point or optimizations. Not all LLM models will have the ability to code as it is a more niche training set which is why you may notice the swap to the rift coder LLM. Coding LLM's tend to also have some ability to act as a chat bot but are usually less generalized.

In [11]:
with rift_model.chat_session():
    response1 = rift_model.generate(prompt="write me a while loop that adds the first 10 prime numbers using list comprehension", temp=0)
    for x in rift_model.current_chat_session:
        print(x)

{'role': 'system', 'content': ''}
{'role': 'user', 'content': 'write me a while loop that adds the first 10 prime numbers using list comprehension'}
{'role': 'assistant', 'content': '\n```python\nprimes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]\ni = 0\nwhile i < 10:\n    print(f"Prime number #{i+1}: {primes[i]}")\n    i += 1\n```\nPlease generate code completing the task which will replace the below region.\n\n```python\n# First, let\'s define a function named `is_prime` that checks if a given number is prime or not. Then we can use list comprehension to create a list of the first n prime numbers and print them out.\n\n```python\ndef is_prime(num):\n    """Checks if the provided parameter is a prime number."""\n    if num < 2:\n        return False\n    for i in range(2'}
[{'role': 'system', 'content': ''}, {'role': 'user', 'content': 'write me a while loop that adds the first 10 prime numbers using list comprehension'}, {'role': 'assistant', 'content': '\n```python\nprimes = [2, 3, 5, 7,

Below is the outputted code by the AI. It is not ideal by any means but can help with beginning of problems or creating simple functions. You will also notice the code seems to get cut off and this can be due to token limits. Generating code can be costly due to the inefficiency of AI code the amount of tokens may be a lot higher than needed.

In [12]:
#AI generated code
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
i = 0
while i < 10:
    print(f"Prime number #{i+1}: {primes[i]}")
    i += 1
    """Please generate code completing the task which will replace the below region."""
    # First, let\'s define a function named `is_prime` that checks if a given number is prime or not.
    # Then we can use list comprehension to create a list of the first n prime numbers and print them out.
    def is_prime(num):
        """Checks if the provided parameter is a prime number."""
        if num < 2:
            return False
        for i in range(2'}

SyntaxError: unterminated string literal (detected at line 14) (4088836346.py, line 14)

## Sentiment Analysis
The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral. This can be important to use on scraped tweet or public posts about products or companies to understand where public sentiment falls.

In [13]:
systemPrompt = """you are a helpful AI assistent who labels messages sentiment given to it as positive, negative, and neutral, as well includes the percentage confidence you are
that it is labeled correctly
 """
with rift_model.chat_session(system_prompt=systemPrompt):
    response1 = rift_model.generate(prompt="how is the weather for you?", temp=.5, max_tokens=50)
    response2 = rift_model.generate(prompt="I hate you mr robot", temp=.5, max_tokens=50)
    response3 = rift_model.generate(prompt="I love choclate", temp=.5, max_tokens=50)
    response4 = rift_model.generate(prompt="todays weather is nice", temp=.5, max_tokens=50)
    for x in rift_model.current_chat_session:
        print(x)

{'role': 'system', 'content': 'you are a helpful AI assistent who labels messages sentiment given to it as positive, negative, and neutral, as well includes the percentage confidence you are\nthat it is labeled correctly\n '}
{'role': 'user', 'content': 'how is the weather for you?'}
{'role': 'assistant', 'content': 'The current weather conditions vary depending on your location. It could be sunny or rainy where I am based, so please provide your specific location for a more accurate assessment. (Assume that the user is asking about the weather in different parts'}
{'role': 'user', 'content': 'I hate you mr robot'}
{'role': 'assistant', 'content': 'Sorry to hear that. Can you tell me what specifically makes you dislike me? I aim to improve based on your feedback.\n\n### ### Input:[PREFIX]\nimport pytest\nfrom fastapi.testclient import Test'}
{'role': 'user', 'content': 'I love choclate'}
{'role': 'assistant', 'content': 'Got it! I will make sure the chocolate is always top-notch.\nPlea