"Models" refer to (Large) Language Models.
"Prompts" refer to the style of creating inputs to pass to the models.
"Parsers" refer to taking the output of the models and parsing it into a more structured format that can be used downstream.

When you build an application using an LLM, the model will be reusable.
We repeatedly prompt a model and parse the output.
`LangChain` gives an easy set of abstractions to do this operation.

In [3]:
# !pip install openai

# OpenAI

In [4]:
!pip show openai

Name: openai
Version: 1.2.3
Summary: The official Python library for the openai API
Home-page: 
Author: 
Author-email: OpenAI <support@openai.com>
License: 
Location: C:\Users\A2644752\Sandbox\PRIVATE-RAI\env\Lib\site-packages
Requires: anyio, distro, httpx, pydantic, tqdm, typing-extensions
Required-by: instructor


OpenAI version 1.0.0 introduced an API change.
See [here](https://github.com/openai/openai-python/discussions/742) for more details.

In [5]:
import os

from openai.lib.azure import AzureOpenAI
from openai.types.chat.chat_completion import ChatCompletion

Available API versions can be found [here](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference).
- [Preview](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview)
- [Stable](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable)

In [6]:
api_version = "2023-12-01-preview"

In [7]:
openai = AzureOpenAI(api_version=api_version)

My Azure account requires the deployment name rather than the model name.
Use `model: str = "gpt-35-turbo-16k"` instead of `model: str = "gpt-3.5-turbo"`.

`openai.ChatCompletion.create` was replaced with:
- `AzureOpenAI(...).chat.completions.create` if using AzureOpenAI
- `OpenAI(...).chat.completions.create` if using OpenAI

In [8]:
def get_completion(prompt: str, model: str = "gpt-35-turbo-16k") -> str:
    """Given a prompt and model, return the content of the response."""
    messages = [{"role": "user", "content": prompt}]
    response: ChatCompletion = openai.chat.completions.create(
        messages=messages,
        model=model,
        temperature=0,
    )
    return response.choices[0].message.content

In [9]:
get_completion("What is 1+1?")

'1+1 equals 2.'

To motivate the LangChain abstractions for model prompts and parsers, suppose you get an email in a language other than english.

In [10]:
email = """
Arr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

We can translate the message by asking the LLM to rewrite it in a given style.

In [11]:
style = """American English \
in a calm and respectful tone
"""

In [12]:
prompt = f"""Translate the text \
that is delimited by triple backticks
into a style that is {style}.
text: ```{email}```"""

In [13]:
print(prompt)

Translate the text that is delimited by triple backticks
into a style that is American English in a calm and respectful tone
.
text: ```
Arr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!
```


In [14]:
response = get_completion(prompt)

In [15]:
response

"I'm really frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie! And to make things even worse, the warranty doesn't cover the cost of cleaning up my kitchen. I could really use your help at this moment, my friend."

We can do this in a more convenient way with `LangChain`.

# LangChain

## Model

In [16]:
from langchain.chat_models.azure_openai import AzureChatOpenAI

In [17]:
chat = AzureChatOpenAI(model="gpt-35-turbo-16k", temperature=0, api_version=api_version)
chat

AzureChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x00000211053F98D0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x0000021105419C90>, model_name='gpt-35-turbo-16k', temperature=0.0, openai_api_key='aac22590f900414faaf3637e450a96ae', openai_proxy='', azure_endpoint='https://a3d0dvexroai01t.openai.azure.com/', openai_api_version='2023-12-01-preview', openai_api_type='azure')

In [18]:
type(chat)

langchain_community.chat_models.azure_openai.AzureChatOpenAI

`AzureChatOpenAI` is a `LangChain` object that offers additional attributes and methods over the `AzureOpenAI` object.

## Prompt Template

We use the same prompt, but not as an `f-string`.

In [19]:
template = """Translate the text \
that is delimited by triple backticks
into a style that is {style}.
text: ```{email}```"""

In [20]:
from langchain.prompts import ChatPromptTemplate

The `ChatPromptTemplate` has `classmethod`s for constructing a prompt given a string.

In [21]:
prompt_template = ChatPromptTemplate.from_template(template=template)

In [22]:
prompt_template

ChatPromptTemplate(input_variables=['email', 'style'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['email', 'style'], template='Translate the text that is delimited by triple backticks\ninto a style that is {style}.\ntext: ```{email}```'))])

In [23]:
type(prompt_template)

langchain_core.prompts.chat.ChatPromptTemplate

In [24]:
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['email', 'style'], template='Translate the text that is delimited by triple backticks\ninto a style that is {style}.\ntext: ```{email}```')

We can see that the `prompt_template` has identified two input variables, `email` and `style`.
These were surrounded by braces (`{}`) in the template string.

In [25]:
prompt_template.messages[0].input_variables

['email', 'style']

We can set values to the input variables with `format_messages`.
If we forget a required `input_variable`, a `KeyError` will be raised.

In [26]:
prompt_template.format_messages()

KeyError: 'style'

In [27]:
messages = prompt_template.format_messages(style=style, email=email)

In [28]:
messages

[HumanMessage(content="Translate the text that is delimited by triple backticks\ninto a style that is American English in a calm and respectful tone\n.\ntext: ```\nArr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!\n```")]

In [29]:
type(messages)

list

In [30]:
type(messages[0])

langchain_core.messages.human.HumanMessage

In [31]:
chat(messages=messages)

AIMessage(content="I'm really frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie! And to make things even worse, the warranty doesn't cover the cost of cleaning up my kitchen. I could really use your help at this moment, my friend.")

We can reverse-translate our response using a similar technique.

In [32]:
reply = """Hey there customer, \
the warranty does not cover ]
cleaning expenses for your kitchen \
because it's your fault that \
you misused your blender \
by forgetting to put the lid on before \
starting the blender. \
Tought luck! See ya!"""

In [33]:
reply_style = """
A polite tone \
that speaks in English Pirate"""

In [34]:
reply_messages = prompt_template.format_messages(style=reply_style, email=reply)

In [35]:
print(reply_messages[0].content)

Translate the text that is delimited by triple backticks
into a style that is 
A polite tone that speaks in English Pirate.
text: ```Hey there customer, the warranty does not cover ]
cleaning expenses for your kitchen because it's your fault that you misused your blender by forgetting to put the lid on before starting the blender. Tought luck! See ya!```


In [36]:
response = chat(reply_messages)
print(response.content)

Arr, me hearty customer, the warranty be not coverin' yer cleaning expenses fer yer galley 'cause 'tis yer fault ye misused yer blender by forgettin' to put the lid on afore startin' the blender. Tis a tough luck, matey! Fare thee well!


Prompt templates provide us with a greater level of abstraction vs `f-strings` as the prompts become longer and more complex.

## Output Parsers

We can extract information with the LLM and format it as JSON.

In [37]:
{
    "gift": False,
    "delivery_days": 5,
    "price_value": "pretty affordable!"
}

{'gift': False, 'delivery_days': 5, 'price_value': 'pretty affordable!'}

In [38]:
review = """
This leaf blower is pretty amazing. It has four settings \
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversay present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

In [39]:
review_template = """
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delviery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price, \
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

In [40]:
from langchain.prompts import ChatPromptTemplate

In [41]:
prompt_template = ChatPromptTemplate.from_template(template=review_template)

In [42]:
prompt_template.input_variables

['text']

In [43]:
messages = prompt_template.format_messages(text=review)

In [44]:
chat = AzureChatOpenAI(model="gpt-35-turbo-16k", temperature=0, api_version=api_version)
response = chat(messages=messages)
response

AIMessage(content='{\n  "gift": false,\n  "delivery_days": 2,\n  "price_value": ["It\'s slightly more expensive than the other leaf blowers out there, but I think it\'s worth it for the extra features."]\n}')

By default the LLM will return the response content as a string.
We can convert it to another Python object using an `OutputParser`.

In [45]:
# This raises an error because the response.content is a string.
response.content.get("gift")

AttributeError: 'str' object has no attribute 'get'

## Parse the LLM output string into a Python dictionary

In [46]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

We define what we want the response to look like using the `ResponseSchema`.
This is similar to a `pydantic.Field` object.

In [47]:
gift_schema = ResponseSchema(
    name="gift",
    description="Was the item purchased as a gift for someone else? \
    Answer True if yes, False if not or unknown.",
)

In [48]:
delivery_days_schema = ResponseSchema(
    name="delivery_days",
    description="How many days did it take for the product \
    to arrive? If this information is not found, output -1.",
)

In [49]:
price_value_schema = ResponseSchema(
    name="price_value",
    description="Extract any sentences about the value or price, \
    and output them as a comma separated Python list.",
)

In [50]:
response_schemas = [
    gift_schema,
    delivery_days_schema,
    price_value_schema,
]

We create a `StructuredOutputParser` using the `response_schemas`.

In [51]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas=response_schemas)

In [52]:
type(output_parser)

langchain.output_parsers.structured.StructuredOutputParser

The response schema is formatted with instructions for the LLM.

In [53]:
format_instructions = output_parser.get_format_instructions()

In [54]:
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"gift": string  // Was the item purchased as a gift for someone else?     Answer True if yes, False if not or unknown.
	"delivery_days": string  // How many days did it take for the product     to arrive? If this information is not found, output -1.
	"price_value": string  // Extract any sentences about the value or price,     and output them as a comma separated Python list.
}
```


In [55]:
review_template_2 = """
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}"""

In [56]:
prompt = ChatPromptTemplate.from_template(template=review_template_2)

In [58]:
messages = prompt.format_messages(text=review, format_instructions=format_instructions)

In [59]:
print(messages[0].content)


For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the productto arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.

text: 
This leaf blower is pretty amazing. It has four settings candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversay present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```

In [60]:
response = chat(messages=messages)

In [61]:
print(response.content)

```json
{
	"gift": false,
	"delivery_days": "2",
	"price_value": "It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."
}
```


The LLM does a much better job at formatting the response, though it's still a string.
The `output_parser` can `parse` the `response.content` to the type referenced by the backticks.

In [62]:
output_dict = output_parser.parse(text=response.content)

In [63]:
output_dict

{'gift': False,
 'delivery_days': '2',
 'price_value': "It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."}

In [64]:
type(output_dict)

dict

In [65]:
output_dict.get("delivery_days")

'2'

# Example: Glassdoor

Extract information about the the job.
These can include:
- Job title
- Job requirements
- Salary
- etc.

Job Posting: [EpochGeo Senior Geospatial Data Scientist](https://www.glassdoor.com/job-listing/senior-geospatial-data-scientist-epochgeo-JV_IC1154429_KO0,32_KE33,41.htm?jl=1009032709165&src=GD_JOB_AD&uido=8B6EC23126E2A08BC7C88CF7E97323BD&ao=1110586&jrtk=5-yul1-1-1hj5n71lqirpp801-a82b63c926f1e7c6---6NYlbfkN0D54-4lvN4MbGSrnJofeUamWKLLgEBO_tp8_vrY9hHPomI097Pi-K_0gT_S7OouJqmg_XOLI1XleosxlDnZhQaFqDHNuTHvSSjyRL5izvOeA3W_H0PV6ptDqSGfjTkCTKFD-wDd8Us1dGp1AZTgt0V8a8iirU-VZFNAUfPZnsAAvqIXTZC5Y3-fTYqCB8WUWOgXWn1GXD_fvOePVmxxqSKYjhaVAril0jdaZ2rLWwPJdKUQUx79FTYO7gUexzNPwHCDo9Gbqg-AM5__cdImSRuMR0EjO3b9XKciAjuOqbfzcMM0tqd8ZWkG3abWtkUy46TDqIHiHzKV1cfhhx-6mDOokV6Ju0AlgYGpt9RKetfqr1Xd9UKO0bp8UIE9JIw9fV7AgfsqJmTGQ6whHcKbtTKBMN62m0YOPrsyn_3VfnEflZL38nBTP7rlSI-Gx9TfXlNitYSkZYTF7w12q1ENWOj9t8AdYIojeL-lpHVQsdUlF0l9t6mrgstBCUNwyoBv7Vwisin6h-KN1cDMCZRvnbTu9Aqwor5rz6LfYhIAgcoGQYbh58BtM94BpdDNEOi1huaJE3ZbmpF-9Z1omv5BPPN4APm0jm2jQX-_2UyXHkds_uRZzW3I_6rm&cs=1_3269eb64&s=353&t=REC_JOBS&pos=101&cpc=983919718F9DC6F6&guid=0000018ccb73867ebc29def00db18b26&jobListingId=1009032709165&ea=1&vt=w&cb=1704220395502&ctt=1704221634537)

In [66]:
posting = """Senior Geospatial Data Scientist

EpochGeo is looking for a senior level Geospatial Data Scientist in Tampa, Florida to support one of the most innovative, exciting, and growing offices that is rooted in data rich projects in the national security space.

Are you a technical analyst who loves to work with data, find patterns in it, and then turn it into actionable intelligence? Are you a data scientist who loves to draw out insights from data, and work with analysts and engineers to turn those insights into automated workflows? If so, please reach out to us at EpochGeo to chat.

Location-based big data can be overwhelming, inaccessible, and noisy. EpochGeo is a data services firm specializing in supporting the full spectrum data cycle: beginning with scalable data storage and ending in producing impactful analytics. Our developers, analysts, and data scientists have a proven track record applying open-source innovative technology, data science, and actionable analytics to inform customers’ data driven decisions.

Your background likely includes:

10+ years of experience working in US DoD and IC environments, preferably with CENTCOM or other COCOM elements.
Education in Geography, Data Science, Economics, Computer Science, etc.
Military and/or DoD/IC experience in any of the following disciplines: data science, computer vision, target development, geospatial, or statistical analysis.
Experience with spatial data and map based visualization tools along with data science fundamentals such as data engineering, coding in python, SQL, data structures and familiarity with Keras, Tensorflow, Neo4j, or PyTorch frameworks preferred.
Defining critical intelligence topics, initiating comprehensive research efforts, and analyzing intelligence information to assess developments, trends, and threat implications.
Familiarity with machine learning and statistical methods applications to solve complex problem sets and achieve successful outcomes.
Some experience querying, creating, and integrating data flows from an Elasticsearch instance to include building dashboards and presenting information via Kibana or Plotly Dash.
Applied GIS work and familiarity with ESRI products and/or QGIS.
What you will be doing:

Communicating with the project team, user community, and high level client leadership to understand requirements and demonstrate iterative progress.
Exploiting multiple data and information environments for insights that can be automated at scale with developer assistance.
Working with a multi-discipline team to build automated tipping and cueing and/or alert generation tools.
Preparing Intelligence, data visualization products and finished intelligence and presenting them to a variety of audiences including senior leadership.
Exploiting multiple geospatial big data environments for insights that can be automated at scale with developer assistance.
Scripting in python to automate geospatial workflow, connecting to existing APIs to feed algorithms and/or front-end UIs.
Conceiving and constructing working prototypes and first of their kind proof of concepts to push the art of the possible against real world problem sets.
Working with data to enable software engineers to build upon automated tipping and cueing, alert generation tools, while developing trends to demonstrate states of readiness and forecasting future values.
Working in a dynamic environment with ever evolving requirements.
Bonus:

Master’s degree in Data Science, Geography, Statistics, Machine Learning, or related technical field
Additional:

Must hold a TS/SCI clearance and be willing to submit for a CI Poly
Benefits Include:

100% health care premiums covered, FSA, HSA
401k: 6% match, immediate vesting
14 holidays: All 11 Federal holidays, day after Thanksgiving, 24 DEC, 31 DEC
PTO: 4 weeks annually
3 week sabbatical every 3 years with the company
Job Type: Full-time

Pay: From $160,000.00 per year

Benefits:

401(k)
401(k) matching
Dental insurance
Flexible schedule
Flexible spending account
Health insurance
Health savings account
Life insurance
Paid time off
Parental leave
Professional development assistance
Referral program
Tuition reimbursement
Vision insurance
Compensation package:

Bonus opportunities
Experience level:

10 years
Schedule:

8 hour shift
Application Question(s):

Please list years of experience scripting in python:
Security clearance:

Top Secret (Required)
Ability to Commute:

Tampa, FL 33621 (Required)
Work Location: In person"""

In [68]:
template = """
For the following text, extract the following information:

job_title: The title of the job posting.

job_requirements: The minimum requirements to apply for the job.

salary: The expected salary for the job.

text: {posting}

{format_instructions}
"""

In [69]:
prompt = ChatPromptTemplate.from_template(template=template)

In [70]:
prompt

ChatPromptTemplate(input_variables=['format_instructions', 'posting'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['format_instructions', 'posting'], template='\nFor the following text, extract the following information:\n\njob_title: The title of the job posting.\n\njob_requirements: The minimum requirements to apply for the job.\n\nsalary: The expected salary for the job.\n\ntext: {posting}\n\n{format_instructions}\n'))])

In [71]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

In [76]:
job_title_schema = ResponseSchema(
    name="job_title",
    description="The title of the job posting.",
    type="string",
)

job_requirements_schema = ResponseSchema(
    name="job_requirements",
    description="The minimum requirements to apply for the job.",
    type="list[string]",
)

salary_schema = ResponseSchema(
    name="salary",
    description="The expected salary for the job.",
    type="float",
)

In [77]:
response_schemas = [
    job_title_schema,
    job_requirements_schema,
    salary_schema,
]

In [78]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas=response_schemas)

In [82]:
type(output_parser)

langchain.output_parsers.structured.StructuredOutputParser

In [88]:
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"job_title": string  // The title of the job posting.
	"job_requirements": list[string]  // The minimum requirements to apply for the job.
	"salary": float  // The expected salary for the job.
}
```


In [84]:
format_instructions = output_parser.get_format_instructions()

In [86]:
messages = prompt.format_messages(posting=posting, format_instructions=format_instructions)

In [87]:
messages

[HumanMessage(content='\nFor the following text, extract the following information:\n\njob_title: The title of the job posting.\n\njob_requirements: The minimum requirements to apply for the job.\n\nsalary: The expected salary for the job.\n\ntext: Senior Geospatial Data Scientist\n\nEpochGeo is looking for a senior level Geospatial Data Scientist in Tampa, Florida to support one of the most innovative, exciting, and growing offices that is rooted in data rich projects in the national security space.\n\nAre you a technical analyst who loves to work with data, find patterns in it, and then turn it into actionable intelligence? Are you a data scientist who loves to draw out insights from data, and work with analysts and engineers to turn those insights into automated workflows? If so, please reach out to us at EpochGeo to chat.\n\nLocation-based big data can be overwhelming, inaccessible, and noisy. EpochGeo is a data services firm specializing in supporting the full spectrum data cycle:

In [89]:
response = chat(messages=messages)

In [91]:
print(response.content)

```json
{
	"job_title": "Senior Geospatial Data Scientist",
	"job_requirements": [
		"10+ years of experience working in US DoD and IC environments, preferably with CENTCOM or other COCOM elements.",
		"Education in Geography, Data Science, Economics, Computer Science, etc.",
		"Military and/or DoD/IC experience in any of the following disciplines: data science, computer vision, target development, geospatial, or statistical analysis.",
		"Experience with spatial data and map based visualization tools along with data science fundamentals such as data engineering, coding in python, SQL, data structures and familiarity with Keras, Tensorflow, Neo4j, or PyTorch frameworks preferred.",
		"Defining critical intelligence topics, initiating comprehensive research efforts, and analyzing intelligence information to assess developments, trends, and threat implications.",
		"Familiarity with machine learning and statistical methods applications to solve complex problem sets and achieve successful

In [92]:
job_output = output_parser.parse(text=response.content)

In [93]:
job_output

{'job_title': 'Senior Geospatial Data Scientist',
 'job_requirements': ['10+ years of experience working in US DoD and IC environments, preferably with CENTCOM or other COCOM elements.',
  'Education in Geography, Data Science, Economics, Computer Science, etc.',
  'Military and/or DoD/IC experience in any of the following disciplines: data science, computer vision, target development, geospatial, or statistical analysis.',
  'Experience with spatial data and map based visualization tools along with data science fundamentals such as data engineering, coding in python, SQL, data structures and familiarity with Keras, Tensorflow, Neo4j, or PyTorch frameworks preferred.',
  'Defining critical intelligence topics, initiating comprehensive research efforts, and analyzing intelligence information to assess developments, trends, and threat implications.',
  'Familiarity with machine learning and statistical methods applications to solve complex problem sets and achieve successful outcomes.',
