# Introduction to Automation with LangChain, Generative AI, and Python
**2.1: Foundation Models**
* Instructor: [Jeff Heaton](https://youtube.com/@HeatonResearch), WUSTL Center for Analytics and Business Insight (CABI), [Washington University in St. Louis](https://olin.wustl.edu/faculty-and-research/research-centers/center-for-analytics-and-business-insight/index.php)
* For more information visit the [class website](https://github.com/jeffheaton/cabi_genai_automation).

A foundation model for large language models (LLMs) refers to a base model that has been pre-trained on a broad range of data and can be adapted or fine-tuned for specific tasks or applications. These models are called "foundation" because they provide a foundational layer of knowledge and capabilities upon which specialized functionalities can be built.

Several prominent technology companies and research organizations provide large language models. Notable among them are OpenAI with models like GPT (Generative Pre-trained Transformer), Google with BERT (Bidirectional Encoder Representations from Transformers) and other variants, and Facebook (Meta) which offers models such as RoBERTa (Robustly Optimized BERT Pretraining Approach).

Training a large language model from scratch involves significant computational resources and expertise. It requires extensive data collection, cleaning, and processing, along with access to high-powered computing infrastructure capable of handling immense datasets. The financial cost of training such models can run into millions of dollars, making it prohibitive for most individuals and even many organizations. Given these requirements, developing a large language model from scratch is generally beyond the scope of an academic course. Instead, courses may focus on teaching how to use and fine-tune existing models to solve specific problems or conduct research.

## How to Evaluate a Foundation Model

Evaluating foundation models is crucial to understand their capabilities, limitations, and suitability for specific tasks. Accurate evaluation ensures that the models are safe, reliable, and effective in real-world applications. It also helps in identifying potential biases and errors that could impact their performance.

### Open vs. Closed Weights:

* **Open weights** refer to models where the trained parameters are publicly accessible. This transparency allows researchers and developers to understand the model's workings, replicate studies, and customize or improve the model further.
* **Closed weights** are proprietary models with restricted access to their parameters. These are typically offered by companies as part of commercial products or services, where revealing the weights might compromise business interests or user privacy.

### Number of Parameter Weights:

The number of parameter weights in a model indicates its capacity or complexity. Parameter weights are typically expressed in millions (M), billions (B), or trillions (T). The higher the number of parameters, the greater the model’s theoretical ability to capture complex patterns and nuances in data. However, this is not a strict indicator of performance across all tasks but rather a general measure of potential.

### Pros and Cons of More Weights:

**Pros:**
* **Increased Learning Capacity:** Models with more parameters can learn more detailed and nuanced representations of data, potentially improving their accuracy and effectiveness across diverse tasks.
* **Better Generalization:** Larger models often generalize better to new, unseen datasets, provided they are trained appropriately.

**Cons:**
* **Computational Cost:** More parameters mean higher computational demands for training and inference, requiring more powerful hardware and longer processing times.
* **Risk of Overfitting:** Without proper regularization and training techniques, larger models might overfit the training data, performing well on seen data but poorly on new or varied datasets.
* **Environmental Impact:** Training larger models consumes more energy, contributing to larger carbon footprints.

### Importance of Context Window Size:
The context window size of a model refers to the maximum number of tokens (words or pieces of words) the model can consider at one time when making predictions or generating text. This is crucial because:

* **Longer Context:** A larger window size allows the model to consider more information, which can lead to more coherent and contextually appropriate outputs. It's particularly important for tasks involving long documents or complex dependencies between parts of the text.
* **Handling Dependencies:** The ability to handle long-range dependencies within the text can dramatically improve performance in tasks like summarization, question answering, and conversation.

Understanding these aspects helps in making informed decisions about which model to use for a specific application and how to optimize its performance.

The following table provides key statistics for many popular LLMs.

### Understanding Tokens

In the context of Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer), tokens are the basic units of text that the model processes. A token can be a word, part of a word, or even punctuation. The definition of what exactly constitutes a token depends on the tokenizer used during the training of the model. For example, the word "smiling" might be a single token, or it could be split into smaller sub-word units like "smile" and "ing" depending on the tokenizer's vocabulary and rules.

The cost of using a LLM for generating text is often calculated based on the number of tokens processed. This includes both the tokens that make up the input and the tokens generated as output. Because the computational resources required to process each token are significant, especially in models with billions of parameters, the number of tokens directly influences the computational cost. Thus, understanding how to efficiently manage token usage is crucial for optimizing expenses when using LLMs.

The context window size of a LLM refers to the maximum number of tokens from the input that the model can consider at one time when making predictions. For example, if a model has a context window of 1,024 tokens, it can only consider the most recent 1,024 tokens of a given input. This limitation affects how much information the model can utilize at once. If the input exceeds this size, the model may not be able to refer back to earlier parts of the text, potentially impacting the coherence and relevance of its responses.

The concept of converting tokens into "pages" is a useful metaphor for understanding how much content a model can handle or generate. While there's no universal standard for how many tokens equate to a "page" of text, a rough approximation is often used based on typical word counts per page in standard documents. For instance, if one page of text typically contains about 500 words, and assuming an average of 1.5 tokens per word, a page would roughly equate to 750 tokens.

This approximation can help users gauge how much text they can input or expect in output in more familiar terms, like pages, which can be particularly helpful in educational, professional, or literary contexts where traditional page counts are more commonly used as a measure of content volume.

## Common Large Language Models

The following table provides key statistics for many popular LLMs. In this course, we will focus primarily on the Bedrock models, which are the first three rows of this table.

Name|Creator|Weights Available|Parameter Weights|Context Window Tokens|
-|-|-|-|-|
GPT-4|OpenAI|Closed|175B|8K (11 pages)
GPT-4 Turbo|OpenAI|Closed|21B|128K (175 pages)
GPT-3.5 Turbo|OpenAI|Closed|20B|4K (5 pages)
Claude 2|Anthropic|Closed|130B|100 K (156 pages)
Claude 3|Anthropic|Closed|?|200K (273 pages)
LLaMA2|Meta|Open|70B|4 K (6 pages)
BARD|Google|Closed|137B|4K (6 pages)
Mistral 7B|Mistral.AI|Open|7B|32 K (50 pages)
Titan|AWS|Closed|45B (reported)|8/4K (12/6 pages)

In this course, we will make use of the Bedrock models, summarized here:

* **meta.llama2-70b-chat-v1** - An advanced 70b parameter model.
* **amazon.titan-text-lite-v1** - Inexpensive model often used to finetune.
* **anthropic.claude-3-sonnet-20240229-v1:0** - Perhaps the most powerful model on Bedrock.
* **mistral.mistral-7b-instruct-v0:2** - Powerful, but small and inexpensive model

These different models have a variety of costs associated with them. I will provide guidelines for the most cost-effective model for each assignment. I present a summary of costs here.

* [AWS LLM Model Specs](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/models)
* [AWS Model Cost](https://aws.amazon.com/bedrock/pricing/)
* [LangChain Model Capabilities](https://python.langchain.com/docs/integrations/chat/)

## Understanding Temperature

OpenAI typically refers to its temperature setting in the context of AI models like ChatGPT. The temperature setting controls the randomness or creativity in the model's responses. A lower temperature (e.g., 0.0) results in more deterministic and predictable outputs, where the model strongly favors the most likely responses. On the other hand, a higher temperature (e.g., 1.0) increases randomness, leading to more varied and sometimes more creative outputs. The exact range can vary depending on the specific application or interface, but typically, it's set between 0 and 1. Adjusting the temperature allows users to tailor the balance between coherence and creativity in the model's responses to better suit specific tasks or preferences.

The following code allows you to try out system prompts, temperatures, models, and requests.




In [1]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
from langchain_aws import ChatBedrock
from langchain_core.prompts.chat import PromptTemplate
from IPython.display import display_markdown

#MODEL = 'amazon.titan-text-lite-v1'
MODEL = 'meta.llama3-70b-instruct-v1:0'
#MODEL = 'mistral.mistral-7b-instruct-v0:2'
TEMPERATURE = 0.25
TEMPLATE = """The following is a friendly conversation between a human and an
AI. Format answers in markdown, and provide truthful and accurate answers.

Current conversation:
{history}
Human: {input}
Code Assistant:"""
PROMPT_TEMPLATE = PromptTemplate(input_variables=["history", "input"], template=TEMPLATE)

def start_conversation():
    # Initialize bedrock, use built in role
    llm = ChatBedrock(
        model_id=MODEL,
        model_kwargs={"temperature": 0.0},
    )

    # Initialize memory and conversation
    memory = ConversationBufferWindowMemory()
    conversation = ConversationChain(
        prompt=PROMPT_TEMPLATE,
        llm=llm,
        memory=memory,
        verbose=False
    )

    return conversation

def query_llm(conversation, prompt):
    print("Model response:")
    output = conversation.invoke(prompt)
    output = output['response'].replace("$", "\\$")
    display_markdown(output, raw=True)


In [2]:
conversation = start_conversation()
query_llm(conversation, """
Produce a table the top 5 most populous countries with population and area.
""")

Model response:


ValueError: Error raised by bedrock service: An error occurred (AccessDeniedException) when calling the InvokeModel operation: You don't have access to the model with the specified model ID.