# 3 Introduction
* How to set up the dependencies for this book
* Model Integrations
* Building an application for customer service

### Setting up the dependencies for this learning
- Python version 3.10 or 3.11

Suggest to use conda and pip for local environment, and if only needed we will use Docker.

A set of instructions and the corresponding configuration files in the book's repository at https://github.com/benman1/generative_ai_with_langchain includes these files:
- requirement.txt for pip
- pyproject.toml for Poetry
- langchain_ai.yaml for Conda
- Dockerfile for Docker.

For all instructions, please make sure to have book's repository download (using the GitHub user interface) or cloned on your computer, and you have changed into the project's root directory.

This sets up a reproducible environment to run all examples in the book.

#### pip

1. If not already included in your Python distribution, install pip following the instructions here: https://pip.pypa.io/.
2. Use a virtual environment for isolation (for example, venv).
3. Install the dependencies from requirements.txt:

   ``` pip install -r requirements.txt```

#### Conda
Conda manages Python environments and dependencies. To use Conda:
1. Install Miniconda or Anaconda following the instructions from this link: https://docs.continuum.io/anaconda/install/.
2. Create the environment from langchain_ai.yml
   
   ```conda env create --langchain_ai.yml```
4. Activate the environment:
   
   ``` conda activate langchain_ai```

#### Docker
Docker provides isolated, reproducible environ,ents using containers. To use Docker:
1. Install Docker Engine; follow the installa instructions here: _https://docs.docker.com/get-docker/._
2. Build the Docker image from the Dockerfile in this repository

   ```docker build -t langchain_ai```
4. Run the Docker container interactively

   ```docker run -it langchain-ai```

### Exploring API model integrations
Full list of supported integrations for LLMs at https://integrations.langchain.com/llms. (Oct 2023 screenshot)

![](../fig/f3-1.png)

LangChain implements three interfaces - we can use chat models, LLMs, and embedding models. Chat models and LLMs are similar in that they both process text input and produce text output. However, their differences are in the types of input and output they handle. Chat models are specifically designed to handle a list of chat messages as input and generate a chat message as output. They are commonly used in chatbot applications where conversations are exchanged. E.g. https://python.langchain.com/docs/integrations/chat.

Finally, text embedding models are used to convert text inputs into numerical representations called embeddings. Focus in this chapter is on text generation, while embeddings, vector databases, and neural search in _Chapter 5_. Embeddings are a way to capture and extract information from the input text, used in NLP tasks such as sentiment analysis, text classification, and information retrieval. E.g. https://python.langchain.com/docs/integrations/text_embedding.

For image models we can refer to OpenAI (DALL-E), Midjourney, Inc. (Midjourney) and Stability AI (Stable Diffusion). LangChain currently doesn not have out-of-the-box handling of models that are not for text; however, its its documentation describe how to work with Replicate, which also provides an interface to Stable Diffusion models.

---
For each of these providers, to make calls against their API, an account has to be created and obtain an API key. To set an API key in an environment, in Python, we can execute the following lines:
```python
import os 
os.environ["OPENAI_API_KEY"] = "<your token>"
```

Here, `OPENAI_API_KEY` is the environment key that is appropriate for OpenAI. Setting the keys in your environment has the advantage of not needing to include them as parameters in your code every time you use a model or service integration. 

---
--_Settings in Linux and macOS_--

Alternatively, these variables can be exposed in the system environment from the terminal. In Linux and macOS the system environment variable from the terminal using the *export* command:
```
    export OPENAI_API_KEY=<your token>
```

To permanently set the environment variable in Linux or macOS, you would need to add the preceding line to the `~/.bashrc` or `~/.bash_profile` file, respectively, and then reload the shell using the command `source ~/.bashrc` or `source ~/.bash_profile`.

--_Settings in Windows_--

In Windows, you can set a system environment variable from the command prompt using the set command:
```
set OPENAI_API_KEY=<your token>
```
To permanently set the environment variable in Windows, the preceding line can be added to the batch script. For instance, create the config.py file to store the keys, then import a function from this module that will load all these keys into the environment. The `config.py` can be described as follows:

```python
import os 
OPENAI_API_KEY = "..."
# Other keys can put here
def set_environment():
    variable_dict = globals().items()
    for key, value in variable_dict:
        if "API" in key or "ID" in key:
            os.environ[key] = value
```
This function loads the keys into the environment as it is mentioned.
```python
from config import set_environment
set_environment()
```

### Fake LLM
The fake LLM allows you to simulate LLM responses during testing without needing actual API calls. This is useful for rapid prototyping and unit testing agents. Using the FakeLLM avoid hitting rate limits during testings. The fake LLM is only for esting purposes. The LangChain documentation has an example of tool use with LLMs.


In [9]:
from langchain.llms.fake import FakeListLLM
fake_llm =  FakeListLLM(responses=['Hello'])
fake_llm

FakeListLLM(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, responses=['Hello'], sleep=None, i=0)

In [19]:
from langchain.llms.fake import FakeListLLM
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

tools = load_tools(["python_repl"])
responses = ["Action: Python_REPL\nAction Input: print(2 + 2)", "Final Answer: 4"]
llm = FakeListLLM(responses=responses)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("whats 2+2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Python_REPL
Action Input: print(2 + 2)[0m
Observation: [36;1m[1;3m4
[0m
Thought:[32;1m[1;3mFinal Answer: 4[0m

[1m> Finished chain.[0m


'4'

The above code setup an agent that makes decisions based on React strategy that we explained in _Chapter 2, LangChain for LM Apps_ `(ZERO_sHOT_REACT_DESCRIPTION)`. We run the agent with the text: the qeustion what's 2 + 2.

By connecting the tool, a Python Read-Eval-Print-Loop (REPL), that will be called depending on the output of the LLM. FakeListLLM will give two responses `("Action: Python_REPL\nAction Input: print(2 + 2)" and " Final Answer: 4") that won't changed based on the input.

We can also observe how the fake LLM output leads to a call to the Python interpreter, which returns 4. Please note that the action must match the `name` attribute of the tool, `PythonREPL Tool`, which starts like this:

```python
class PythonREPLTool(BaseTool):
    """A tool for running python code in a REPL."""
    name = "Python_REPL"
    description = ( "A Python shell. Use this to execute python commands."
                "Input should be a valid python command."
                "If you want to see the output of a value, you should print it out "
                "with `print(...)`."
                  )
```

# 3.1 OpenAI
In this chapter we will learn how to interact with OpenAI models with the LangChain and the OpenAI python client libraries. OpenAI also offers an __Embedding__ class for text embedding models.

When a prompt is send to an LLM API, it processes the prompt word by word, breaking down (tokenizing) the text into individual tokens. The number of tokens directly correlates with the amount of text. When uing commercial LLMs like GPT-3 and GPT-4 via APIs, each token has an associated cost based on factors like the LLM model and API pricing tiers. Token usage refers to how many tokens from the model's quota have been consumed to generate a response. Strategies like using smaller models, summarizing outputs, and preprocessing inputs help reduce the tokens required to get useful results. Being aware of token usage is key for optimizing productivity within budge constraints when leveraging commercial LLMs.

To obatin an OpenAI API key first, create the API key with the following steps:
1. Create a login at _https://platform.openai.com/._
2. Setup your billing information.
3. You can see the API keys under __Personal | View API Keys__.
4. Click on __Create new secret key__ and give a name.

![](../fig/f3-2.png)

Copy the key generated to set the key as environment variable (`OPEN_API_KEY`) or pass it as a parameter every time you construct a class for OpenAI calls.

Using the OpenAI language model class to set up an LLM to interact with and create an agent that calculates using this model.

In [4]:
# activating the OpenAI
import sys
sys.path.append('../')
from utils import set_environment
set_environment()

In [3]:
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

llm = OpenAI(temperature=0., model="gpt-3.5-turbo-instruct")
#llm = OpenAI(temperature=0., model="text-davinci-003")
tools = load_tools(["python_repl"])
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbse=True)
agent.run("How to know you are in love?")

"When you can't fall asleep because reality is finally better than your dreams."

In [9]:
agent.run("What is the current Singapore time?")

'Agent stopped due to iteration limit or time limit.'

For the above process I have been charged $5USD for my OpenAI platform billing. Next up HuggingFace provider.

# 3.2 Hugging Face
A prominent player in the NLP space and has considerable traction in open-source and hosting solutions. It develops tools for building machine learning applications, e.g., the Transformers Python library, which is used for NLP tasks, includes implementations of state-of-the-art and popular models like Mistral 7B, BERT, and GPT-2, and is compatible with PyTorch, TensorFlow, and JAX.

Hugging Face also provides the Hugging Face Hub, a platform for hosting Git-based code repositories, machine learning models, datasets, and web applications, which provides over 120k models, 20k datasets, and 50k demo apps (spaces) for machine learning. It is an online patform where people can collaborate and facilitate machine learning development.

These tools allow users to load and use models, embeddings, and datasets from Hugging Face. The `HuggingFaceHub` integration, e.g., provides access to different models for tasks like text generation and text classification. The `HuggingfaceEmbeddings` integration allows users to work with sentence-transformer models.

Hugging Face also offers various other libraries within their ecosystem, including `Datasets` for dataset processing, `Evaluate` for model evaluation, `Simulate` for simulation, and `Gradio` for machine learning demos.

In addition, Hugging Face has been involved in initiatives such as the BigScience Research Workshop, where they releaed an open LLM called BLOOM with 176 billion parameters. They have receiveed \$40 million Series B round and a Series C funding round led by Coatue and Sequoia at a \\$2 billion valuation.

To use Hugging Face as a provider for your models, you can create an account and API keys at _https://huggingface.co/settings/profile_. Additionally, you can make the token available in your environment as `HUGGINGFACEHUB_API_TOKEN`.

In the following, is an example as a open-source model developed by Google, the Flan-T5-XXL model;

In [5]:
from langchain.llms import HuggingFaceHub
llm = HuggingFaceHub(model_kwargs={"temperature":0.5, "max_length": 64}, repo_id="google/flan-t5-xxl")
prompt = "In which country is Tokyo?"
completion = llm(prompt)
print(completion)

ValueError: Error raised by inference API: Cannot override task for LLM models

Above error is due to non-subscription of their inference model. Later, we will see a local machine running of hugging face models.

# 3.3 Google  CLoud Platform
There are many models and function available through Google Cloud Platform (GCP) and Vertex AI, GCP's machine learning platform. GCP access to LLMs like LaMDA, T5, and PaLM. Google has also updated the Google Cloud Natural Language (NL) API with a new LLM-based model for Content Classification. The updated version offers an expansive pre-trained classification taxonomy to help with ad targeting and content-based filtering. The NL API's improved v2 classification model is enhanced with over 1,000 labels and supports 11 languages with improved accuracy.

For models with GCP, you need to have the gcloud command-line interface (CLI) installed. The instructions are here: _https://cloud.google.com/sdk/docs/install_. 

Authentication and print a key token with this command from the terminal:

```
gcloud auth application-default login
```

You also need to enable Vertex AI for your project. To enable Vertex AI, install the Google Vertex AI SDK with the :
```
pip install google-cloud-aiplatform
```
This should have been installed based on previous package installation.

To set up the Google Cloud project ID, there are different options for this:
* Using `gcloud config set project my-project`
* Passing a constructor argument when initializing the LLM
* Using aiplatform.init()
* Setting a GCOP environment variable

You can find more details about these options in the Vertex documentation. The GCP environment variable works well with the config.py file created earlier (utils).

```python
from langchain.llms import VertexAI
from langchain import PromptTemplate, LLMChain
template = """Question: {question} Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = VertexAI()
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
llm_chain.run(question)
```

Vertex AI offers a range of models tailored for tasks following instructions, conversation, and code generation/assistance:
* __text-bison__ is fine-tuned to follow natural language instructions, with a max input of 8,192 tokens and an output of 1,024
* __chat-bison__ is optimized for multi-turn conversation with a max input of 4096 tokens, an output of 1024 tokens and up to 2,500 turns.
* __code-bison__ generates code from natural language descriptions, with a max input of 4096 tokens and an output of 2048 tokens
* __codechat-bison__ is a chatbot that is fine-tuned to help with code-related questions. It has an input limit of 4096 tokens and an output limit of 2048 tokens.
* __code-gecko__ suggests code completions. It has a max input length of 2048 tokens and an output of 64 tokens.

These models also have different input/outputs limits and training data and are often updated. More detailed and up-to-date information about models including when models have been updated can be checked at _https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview_.

Skipping following API providers:
- Jina
- Azure
- Anthropic
In the following section, will be exploring the usage of local models.

# 3.5 Exploring Local Models
The advantages of running models locally are complete control over the model and not sharing any data over the internet.
* Do not need an API token for local models

In this section, we will focus on Hugging Face's transformers, llama.cpp and GPT4All (also Mistral which has been installed previously). These tools provide huge power and are full of great functionality too broad to cover in this chapter. Thus, this chapter will focus on how we can run a model with the transformers library by Hugging Face.

### Hugging Face Transformers
A general recipe for setting up and running a pipeline:

In [14]:
# Install the following if you don't have all libraries installed
#!pip install transformers accelerate torch

In [1]:
from transformers import pipeline
import torch
generate_text = pipeline(
        model="aisquared/dlite-v1-355m",
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        device_map="auto", 
        framework="pt"
    )
generate_text("In this chapter, we'll discuss first steps with generative AI in Python.")

"In this chapter, we'll discuss first steps with generative AI in Python. First, we'll discuss the best practice in the relevant language. Next, we'll briefly discuss the specifics of how generative AI works, focusing on areas like facial recognition, language modeling, and speech synthesis. Finally, we'll provide concrete examples to illustrate the work in practice."

In [2]:
generate_text("Why does Wei Yun wants a boyfriend? Weiyun is a girl")

'Weiyun wants a boyfriend because she is curious and wants to explore life on her own terms. She also values relationships and feels that having someone to share her life with is important for her success.'

Running the preceding code will download everything that's needed for the model such as the tokenizer and model weights from Hugging Face. This model is quite small (355 million parameters) but relatively performant and instruction-tuned for conversations. To plug this pipeline into LangChain agent or chain, we can use it the same way that we've seen in the other examples in this chapter:

In [4]:
from langchain import PromptTemplate, LLMChain
template = """Question: {question}, Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=generate_text)
question = "What is electroencephalography?"
print(llm_chain.run(question))

ValidationError: 1 validation error for LLMChain
llm
  value is not a valid dict (type=type_error.dict)

Adhered to the template code given by the text, but it seems that it is not working. WIll skip this for now.

### llama.cpp
Written and maintained by Georgi Gerganov llama.cpp is a C++ toolkit that executes models based on architectures based on or like LLaMA, one of the first large open-source models, which was released by Meta, and which spawned the development of many other models in turn. One of the main use cases of llama.cpp is to run models efficiently on the CPU; however, there are also some options for GPU.

Please note that you need to have an `md5 checksum` tool installed. This is included by default in seeral Linux distributions such as Ubuntu. For macOS you can install it with brew like: `brew install md5sha1sum`.

Firstly, to download the llama.cpp repository from GitHub or you can use a git command from the terminal like ths: `git clone https://github.com/ggerganov/llama.cpp.git`

For Python requirements, we can perform with the pip package installer.
```bash
cd lama.cpp
pip install -r requirements.txt
```

If there is an error message at the end that a few libraries were missing, you might need to execute the following:
```bash
pip install 'blosc2==2.0.0' cython FuzzyTM
```
After that we will need to compile llama.cpp. This can be performed by parallelizing the build with 4 processes:
```bash
make -C . -j4 # runs make in subdir with 4 processes
```

To get the Llama model weights, you need to sign up with the T&Cs and wait for a registration email from Meta. There are tools such as the `llama` model downloader in the pyllama project, but they might not conform to the license stipulations by Meta.

Alternative models with more permissive licensing such as Falcon or Mistral, Vicuna, OpeLLaMA, or Alpaca. Assuming you downloaded the model weights and tokenizer model for OpenLLaMA 3b/ LLaM-2_7B, the model file should be about 6.8 GB and the tokenizer is much smaller. Then move the two files into models/3B or models/7B directory.

Then we have to convert the model to llama.cpp format, which is called ggml, using the convert script
```bash
python3 convert.py models/3B/ --ctx 2048
```

Optionally the models can be quantized to save memory when doing inference. Quantization refers to reducing the number of bits that are used to store weight:
```bash
./quantize ./models/3B/ggml-model-f16.gguf ./models/3B/ggml-model-q4_0.bin q4_0
```

This last file is much smaller and will take up much less space in memory as well. With the chosen model, we can integrate it into an agent or a chain, e.g., :
```python
llm = LlamaCpp(model_path="./ggml-model-q4_0.bin", verbose=True)
```

In [4]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

## v2 models
model_path = 'openlm-research/open_llama_3b_v2'
# model_path = 'openlm-research/open_llama_7b_v2'

## v1 models
# model_path = 'openlm-research/open_llama_3b'
# model_path = 'openlm-research/open_llama_7b'
# model_path = 'openlm-research/open_llama_13b'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='cuda:0',
)

prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

OutOfMemoryError: CUDA out of memory. Tried to allocate 196.00 MiB. GPU 0 has a total capacty of 7.79 GiB of which 10.56 MiB is free. Including non-PyTorch memory, this process has 7.76 GiB memory in use. Of the allocated memory 7.55 GiB is allocated by PyTorch, and 115.51 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

# GPT4ALL

In [4]:
#from langchain.llms import GPT4All
from gpt4all import GPT4All

In [8]:
model = GPT4All("/media/WachnResearch/Projects/llm/langchain/models/mistral7b_gpt4all/mistral-7b-openorca.gguf2.04.0.gguf",device='gpu')
response = model.generate("We can run large language models locally for all kinds of applications")

In [22]:
model.generate("Ok, I will tell you a story of Monkey Bro aka Pang Chun Ho. It all begins")

' with the birth of this little monk in 1958 in Hong Kong. His parents named him after their favorite kung fu movie character at that time, which was Monkey.\n'

In [33]:
from langchain.llms import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
    device=0
)

In [None]:
# Alternative :
from langchain_huggingface.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)s
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10)
hf = HuggingFacePipeline(pipeline=pipe

In [35]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




Electricity is electrical energy, something that


In [8]:
from langchain import HuggingFaceHub
summarizer = HuggingFaceHub(repo_id="facebook/bart-large-cnn", model_kwargs={"temperature":0, "max_length":180})



I am writing to pour my heart out about the recent unfortunate experience I had with one of your coffee machines that arrived broken. I anxiously unwrapped the box containing my highly anticipated coffee machine. However, what I discovered within broke not only my spirit but also any semblance of confidence I had placed in your brand. Its once elegant exterior was marred by the scars of travel, resembling a war-torn soldier who had fought valiangtly on the fields of some espresso battlefield. This heartbreaking display of negligence shattered my dreams of indulging in daily coffee perfection, leaving me emotionally distraught an inconsolable

In [10]:
customer_email = "I am writing to pour my heart out about the recent unfortunate experience I had with one of your coffee machines that arrived broken. I anxiously unwrapped the box containing my highly anticipated coffee machine. However, what I discovered within broke not only my spirit but also any semblance of confidence I had placed in your brand. Its once elegant exterior was marred by the scars of travel, resembling a war-torn soldier who had fought valiangtly on the fields of some espresso battlefield. This heartbreaking display of negligence shattered my dreams of indulging in daily coffee perfection, leaving me emotionally distraught an inconsolable"

In [11]:
def summarize(llm, text)-> str: 
    return llm(f"Summarize this: {text}!")
summarize(summarizer, customer_email)

'"I am writing to pour my heart out about the recent unfortunate experience I had with one of your coffee machines that arrived broken" "Its once elegant exterior was marred by the scars of travel, resembling a war-torn soldier who had fought valiangtly on the fields of some espresso battlefield. This heartbreaking display of negligence shattered my dreams of indulging in daily coffee perfection"'

In [15]:
summarize(summarizer, "Wearing a c up bra 我天天想着猴哥，但是我心里有好多顾虑，真不知所措")

"Wearing a c up bra is like wearing a c-cup bra. It's not a bra, it's a bra with a hole in the front. It looks like a bra that's supposed to be on the back of a woman's body. The bra is supposed to cover the top of the body."

This summary is just passable, but not very convincing. There is still a lot of rambling in the summary. Alternative models can be used or even asking LLM with a prompt to summarize the content. Which will be investigating into Chapter 4.

### VertexAI
- before executing the following code, ensure authentication has been done with GCP and set GCP project in previous section on Vertex AI.

```python
from langchain.llms import VertexAI
from langchain import PromptTemplate, LLMChain
template = """Given this text, decide what is the issue the customer is concerned about. Valid categories are these:
* product issues
* delivery problems
* missing or late orders
* wrong product
* cancellation request
* refund or exchange
* bad support experience
* no clear reason to be upset
Text: {email}
Category:
"""
prompt = PromptTemplate(template=template, input_variables = ["email"])
llm = VertexAI()
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
print(llm_chain.run(customer_email))
```

# Summary
4 distinct ways of installing LangChain and other libraries need in this book as an environment. Then introduced several providers of models for text and images. 
- Developed LLM app for text categorization (intent classification) and sentiment analysis in a use case for customer service
- Q: How do you generate images with LangChain: E.g. stable diffusion with text-2-image replicate