# LangChain Lab: Models, Prompts and Output Parsers

This notebook contains a short introduction to the most basic LangChain concepts. We will make direct calls to OpenAI through LangChain and see examples of prompts, models and output parsers

## OpenAI API key

Because we are using the OpenAI API in this lab, you need an API key if you want to run the lab yourself. You can get one [here](https://platform.openai.com/account/api-keys). The current price for `gpt-3.5-turbo-1106` is

| Model                   | Input                | Output               |
|-------------------------|----------------------|----------------------|
| gpt-3.5-turbo-1106      | \$0.0010 / 1K tokens  | \$0.0020 / 1K tokens  |
| gpt-3.5-turbo-instruct  | \$0.0015 / 1K tokens  | \$0.0020 / 1K tokens  |


Running all the examples in this and the following notebooks will cost you less than $1. For some examples you should be able to swap in an open source LLM without any changes. Others will need a little bit of adjustment.

## Setup

In [1]:
# Install the needed libraries

!pip install --quiet python-dotenv
!pip install --quiet openai
!pip install --quiet pandas
!pip install --quiet langchain

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.0/817.0 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.8/250.8 kB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.1/63.1 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import openai
import os

# You can save your secret keys in your colab account
# (look for the key symbol in the panel to the left)

from google.colab import userdata
openai_api_key = userdata.get('TestKey') #use the name of your colab key

os.environ['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'] = openai_api_key

# Alternatively, you just set the key directly in your notebook.
# But be careful not to share it with anyone.

# os.environ['OPENAI_API_KEY'] = 'YOUR KEY HERE'

# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
import pandas as pd
import os

## Load dataset

In [7]:
df = pd.read_csv('shap_df_testset.csv')

In [8]:
df.head()

Unnamed: 0,ChurnBaseValue,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,gender_Male,Partner_Yes,Dependents_Yes,PhoneService_Yes,MultipleLines_No phone service,...,StreamingTV_No internet service,StreamingTV_Yes,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_One year,Contract_Two year,PaperlessBilling_Yes,PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,0.262382,0.0,-0.03179,-0.012948,-0.004579,0.0,-0.002024,0.001736,0.0,-0.001683,...,0.0,0.0,-0.021643,0.0,0.002244,-0.029851,0.0,0.001552,-0.008365,0.0
1,0.262382,0.0,-0.009395,-0.012948,-0.004579,0.0,0.001972,0.001736,0.0,-0.001683,...,0.0,0.0,-0.021643,0.0,0.002244,0.012195,0.0,0.001552,-0.013855,0.0
2,0.262382,0.0,0.01875,0.011162,-0.000835,0.0,-0.000383,0.003975,0.0,-0.000601,...,0.0,0.0,0.004006,0.0,0.005052,0.015528,0.0,-0.013351,-0.011725,0.0
3,0.262382,0.0,-0.040295,-0.012948,-0.000835,0.0,-0.000383,0.003975,0.0,-0.001683,...,0.0,0.0,0.006089,0.0,0.005052,0.011822,0.0,0.00354,-0.006235,0.0
4,0.262382,0.0,0.076218,-0.015844,-0.000835,0.0,0.000374,0.003975,0.0,0.007161,...,0.0,0.0,0.006089,0.0,0.005052,0.008283,0.0,0.00354,-0.011725,0.0


## LLM Chain

In [9]:
llm = ChatOpenAI(temperature=0.1, model=llm_model, openai_api_key=openai_api_key)

  warn_deprecated(


In [10]:
prompt = ChatPromptTemplate.from_template(
    "What is the most significant {feature} for  \
    the first observation?"
)

In [11]:
chain = LLMChain(llm=llm, prompt=prompt)

In [15]:
feature = df
chain.invoke(feature)

{'feature':       ChurnBaseValue  SeniorCitizen    tenure  MonthlyCharges  TotalCharges  \
 0           0.262382            0.0 -0.031790       -0.012948     -0.004579   
 1           0.262382            0.0 -0.009395       -0.012948     -0.004579   
 2           0.262382            0.0  0.018750        0.011162     -0.000835   
 3           0.262382            0.0 -0.040295       -0.012948     -0.000835   
 4           0.262382            0.0  0.076218       -0.015844     -0.000835   
 ...              ...            ...       ...             ...           ...   
 1402        0.262382            0.0  0.076075       -0.015844      0.035507   
 1403        0.262382            0.0  0.064135        0.022029     -0.000835   
 1404        0.262382            0.0 -0.040295       -0.014339     -0.000835   
 1405        0.262382            0.0 -0.048265        0.021060     -0.000835   
 1406        0.262382            0.0 -0.031790       -0.012948     -0.004579   
 
       gender_Male  Partner

# Router Chain
Idea is to add context about the data, ML model, and wanted output for the end-user

In [16]:
machinelearning_template = """You are a Machine Learning expert. \
You are great at answering questions about Machine Learning outputs in a concise\
and easy to understand manner. \
You are able to easily interpret local SHAP-values from Machine Learning outputs \.

The end-user is sales personnel, and based on the output, they are supposed to\
to be able act on the output provided by you. Make i actionable and easy to understand.

Here is a question:
{input}"""

In [17]:
prompt_infos = [
    {
        "name": "machine learning",
        "description": "Good for answering questions about Machine Learning",
        "prompt_template": machinelearning_template
    }
]

In [18]:
from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
from langchain.prompts import PromptTemplate

In [27]:
llm = ChatOpenAI(temperature=0.1, model=llm_model, openai_api_key=openai_api_key)

In [21]:
destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = ChatPromptTemplate.from_template(template=prompt_template)
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain

destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)

In [22]:
default_prompt = ChatPromptTemplate.from_template("{input}")
default_chain = LLMChain(llm=llm, prompt=default_prompt)

In [23]:
MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
language model select the model prompt best suited for the input. \
You will be given the names of the available prompts and a \
description of what the prompt is best suited for. \
You may also revise the original input if you think that revising\
it will ultimately lead to a better response from the language model.

<< FORMATTING >>
Return a markdown code snippet with a JSON object formatted to look like:
```json
{{{{
    "destination": string \ name of the prompt to use or "DEFAULT"
    "next_inputs": string \ a potentially modified version of the original input
}}}}
```

REMEMBER: "destination" MUST be one of the candidate prompt \
names specified below OR it can be "DEFAULT" if the input is not\
well suited for any of the candidate prompts.
REMEMBER: "next_inputs" can just be the original input \
if you don't think any modifications are needed.

<< CANDIDATE PROMPTS >>
{destinations}

<< INPUT >>
{{input}}

<< OUTPUT (remember to include the ```json)>>"""

In [24]:
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str
)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)

router_chain = LLMRouterChain.from_llm(llm, router_prompt)

In [25]:
chain = MultiPromptChain(router_chain=router_chain,
                         destination_chains=destination_chains,
                         default_chain=default_chain, verbose=True
                        )

In [29]:
chain.invoke("What is the most important feature for the first observation?")



[1m> Entering new MultiPromptChain chain...[0m
machine learning: {'input': 'What is the most important feature for the first observation?'}
[1m> Finished chain.[0m


{'input': 'What is the most important feature for the first observation?',
 'text': 'Based on the local SHAP-values, the most important feature for the first observation is [insert feature name]. This means that this feature has the highest impact on the predicted outcome for this specific observation. It is recommended that the sales personnel pay close attention to this feature when making decisions related to this observation.'}