# Introduction to Large Language Models and LangChain

This notebook will walk you though some simple steps for loading LLM's from `HuggingFace` and loading them into local pipelines for use with `LangChain`

## Imports

Below are some basic imports that will be required to run the models, we are looking for `torch`, `transformers` and `langchain` as the main package libraries. We will import more from these packages later, but these are what we will need to get started.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, pipeline,LlamaTokenizer, LlamaForCausalLM, GenerationConfig
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFacePipeline

Something I like to do is have a re-runnable cell that calls `nvidia-smi`. This allows us to see the type of GPU we are currently running on, as well as seeing how much available memory we have in case we want to load more LLMs to compare with eachother.

In [None]:
!nvidia-smi

Next we want to show which version of LangChain we are running, since this package is under constant development we want to make sure we are aware of which version we are running in case of version incompatibility.

In [None]:
!pip show langchain

## Load in the Models through HuggingFace Pipelines

Now that we have the initial imports, we can start downloading `HuggingFace` models locally to run by using the HuggingFace `transformers` package. This gives us access to the `AutoModel` methods such as: `AutoTokenizer` and `AutoModelForCausalLM`. We will need these to load in our text generation models as LLM Pipelines.

Below I am providing an example model with commented steps for importing the model as well as loading it into `LangChain`. Then I will provide a few more models that we may want to further explore.

### EXAMPLE: facebook/opt-13b

In [None]:
model_id='facebook/opt-13b' # Here we list the model_id that is given for the model on HuggingFace. (https://huggingface.co/facebook/opt-13b)
cache_path = '../Models' # This will be a relative path to the directory which we will store model weights.

# Here we import the tokenizer for the opt-13b model, we use the AutoTokenizer.from_pretrained() to load the weights from HuggingFace.
opt_tokenizer = AutoTokenizer.from_pretrained(model_id,cache_dir=cache_path) 

# Now we load the actual LLM. We do this a similar way to the Tokenizer, but use AutoModelForCausalLM.from_pretrained(). We also specify that we want to load the model in 8bit with load_in_8bit=True
opt_model = AutoModelForCausalLM.from_pretrained(model_id, 
                                                 load_in_8bit=True,
                                                 device_map='auto',
                                                 cache_dir=cache_path)
# IMPORTANT: For this to work, you need to ensure you have bitsandbytes as well as accelerate downloaded in your environment.

# Now that we have a model and tokenizer loaded into memory, we can tie the 2 together using a HuggingFace Pipeline.
# We need to establish the role of the model as a text-generation model and pass in some model arguments. There are more arguments than just max_length, but for simple model runs this is sufficient.
opt_pipe = pipeline('text-generation',
                 model=opt_model,
                 tokenizer=opt_tokenizer,
                 max_length=512)

# Once our model is built into a pipe, we can use LangChain's HuggingFacePipeline method to wrap the HuggingFace pipe class we created into LangChain
opt_llm = HuggingFacePipeline(pipeline=opt_pipe)
opt_llm.model_id = model_id

### chavinlo/alpaca-native

In [None]:
model_id = "chavinlo/alpaca-native"
alpaca_tokenizer = LlamaTokenizer.from_pretrained(model_id, cache_dir='./Models')
alpaca_model = LlamaForCausalLM.from_pretrained(model_id,
                                              load_in_8bit=True,
                                              device_map='auto',
                                              cache_dir='./Models/')
alpaca_pipe = pipeline('text-generation',
                 model=alpaca_model,
                 tokenizer=alpaca_tokenizer,
                 max_length=512)
alpaca_llm = HuggingFacePipeline(pipeline=alpaca_pipe)
alpaca_llm.model_id = model_id

### chavinlo/alpaca-13b

In [None]:
model_id = "chavinlo/alpaca-13b"
alpaca13b_tokenizer = LlamaTokenizer.from_pretrained(model_id, cache_dir='../Models')
alpaca13b_model = LlamaForCausalLM.from_pretrained(model_id,
                                              load_in_8bit=True,
                                              device_map='auto',
                                              cache_dir='../Models/')
alpaca13b_pipe = pipeline('text-generation',
                 model=alpaca13b_model,
                 tokenizer=alpaca13b_tokenizer,
                 max_length=512)
alpaca13b_llm = HuggingFacePipeline(pipeline=alpaca13b_pipe)
alpaca13b_llm.model_id = model_id

### facebook/galactica-6.7b

In [None]:
model_id = "facebook/galactica-6.7b"

galactica_tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir='../Models/')

galactica_model = AutoModelForCausalLM.from_pretrained(model_id,
                                                       load_in_8bit=True,
                                                       device_map='auto',
                                                       cache_dir='../Models/'
                                                      )

galactica_pipe = pipeline('text-generation',
                       model=galactica_model,
                       tokenizer=galactica_tokenizer,
                       max_length=512
                      )

galactica_llm = HuggingFacePipeline(pipeline=galactica_pipe)
galactica_llm.model_id = model_id

### GeorgiaTechResearchInstitute/galpaca-6.7b

In [None]:
model_id = "GeorgiaTechResearchInstitute/galpaca-6.7b"

galpaca_tokenizer = AutoTokenizer.from_pretrained(model_id)

galpaca_model = AutoModelForCausalLM.from_pretrained(model_id,
                                                      load_in_8bit=True,
                                                      device_map='auto',
                                                      cache_dir='../Models/'
                                                      )

galpaca_pipe = pipeline('text-generation',
                       model=galpaca_model,
                       tokenizer=galpaca_tokenizer,
                       max_length=512
                       )

galpaca_llm = HuggingFacePipeline(pipeline=galpaca_pipe)
galpaca_llm.model_id = model_id

## Running Basic Prompts through the LLMs

Now that we have a few LLMs to choose from, we can use them for basic prompting. For this we will use LangChain's `PromptTemplate` along with LangChain's `LLMChain` to chain together the prompt template with the LLM. We use this for simple repeatable runs of the models.

In [None]:
from langchain.chains import LLMChain

template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
{instruction}

Answer:"""

prompt = PromptTemplate(template=template, input_variables=["instruction"])

llm_chain = LLMChain(llm=alpaca13b_llm, prompt=prompt)


In [None]:
instruction = "What is the capital of England?"
text = llm_chain.run(instruction)

## Using Tools and Agents

Another use of the LLMs is the ability to run them as `Agents` with access to `Tools`. This will allow the LLMs to return data that is beyond the training scope of the models themselves. This allows the LLM to plan and execute a path towards solving the given input.

For this, I will use an example of querying the LLM for the amount of heavy atoms present in a given compound. If I ask a question such as: `How many heavy atoms are in caffeine`, the LLM may not know the answer but if I have provided it a tool to find the answer, it will use that instead.

In [None]:
from langchain.agents import Tool, initialize_agent
from langchain.tools import DuckDuckGoSearchTool
from langchain.chains import LLMChain

import pubchempy

In [None]:
def heavy_atom_counter(compound: str) -> int:
    '''
    mockup function: given a compound name,
    return heavy atom count as an integer.

    parameters:
        compound: a chemical compound name

    returns:
        heavy atom count
    '''
    Query_Compound = pubchempy.get_compounds(str(compound),'name')[0]
    Heavy_Atom_Count = Query_Compound.heavy_atom_count
    
    return Heavy_Atom_Count

Heavy_Atom = Tool(
    name="heavy_atom_counter",
    func=heavy_atom_counter,
    #description="""helps to retrieve the number of heavy atoms present in a compound. input should be json in the following format: `{{"compound": '<compound_name>'}}`"""
    description="helps to retrieve the number of heavy atoms present in a compound. Input should be a string."
)

In [None]:
prompt = PromptTemplate(
    input_variables=[],
    template="Answer the following questions as best you can."
)

# Load the tool configs that are needed.
llm_atom_chain = LLMChain(
    llm=alpaca13b_llm,
    prompt=prompt,
    verbose=True
)

tools = [
    Heavy_Atom
]

# Construct the react agent type.
agent = initialize_agent(
    tools,
    alpaca13b_llm,
    agent="zero-shot-react-description",
    verbose=True
)

agent.run("How many heavy atoms in Atorvastatin?")