#### Setting up langchain

In [36]:
!pip install 'langchain[openai]'

Collecting tiktoken<0.4.0,>=0.3.2 (from langchain[openai])
  Downloading tiktoken-0.3.3-cp311-cp311-macosx_11_0_arm64.whl (706 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m706.4/706.4 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: tiktoken
  Attempting uninstall: tiktoken
    Found existing installation: tiktoken 0.4.0
    Uninstalling tiktoken-0.4.0:
      Successfully uninstalled tiktoken-0.4.0
Successfully installed tiktoken-0.3.3


In [37]:
!pip install openai



#### Setting up OpenAI model for inference

In [71]:
import os
from pprint import pprint

from langchain.llms import OpenAI
from dotenv import load_dotenv
load_dotenv()
llm = OpenAI(openai_api_key=os.environ['OPENAI_APIKEY'])


#### Let's play around

In [82]:
# Calling the LLM directly with a prompt
llm.predict("What would be a good company name for a company that makes colorful socks?")

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).


'\n\nBrightSox.'

In [87]:
# In case I want to automate this flow and have multiple inputs in same prompt
# We can create a PromptTemplate for the same

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template("What would be a good company name for a company that makes {product}?")
prompt = prompt_template.format(product="colorful candies")
print(prompt)
print("--- Approach 1 ---")
print(llm.predict(prompt))
print("--- Approach 2 ---")

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt_template)
print(chain.run("colourful tees"))


What would be a good company name for a company that makes colorful candies?
--- Approach 1 ---


Sweet Splashes Candy Co.
--- Approach 2 ---


Colorful Tees Co.


#### But what are these chains?

`Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components. Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.` 


In [88]:
prompt = PromptTemplate(
    input_variables=["company", "product"],
    template="What is a good name for {company} that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
print(chain.run({
    'company': "ABC Startup",
    'product': "colorful socks"
    }))

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).




CozyCrayons.


#### Few drawbacks


LLMs are wonderful tools for text generation. They help us generate articles, summarised text and even help with some creative ideas. Challenge is that LLMs are frozen in time, so they don't do very well with information and sometimes can hallucinate.

In [72]:
# Langchain framework came into life after OpenAI models were released.
pprint(llm.predict("what is langchain?"))

('\n'
 '\n'
 'Langchain is a blockchain-based language learning platform that enables '
 'users to learn a foreign language by earning tokens as rewards for '
 'completing language-related tasks. The platform uses gamification elements '
 'to encourage users to interact with and learn from one another. Langchain '
 'also provides users with a marketplace to buy and sell language-related '
 'services.')


LLMs even with their weakness are still very powerful. We can aid them with inputs and information to base their answers on to get desired outputs.

In reference to our Langchain example. It is important to note - 
OpenAI models were latest trained on September 2021 and Langchain was launched in October 2022.
Let us see what happens if we direct the LLM with some context.

In [76]:
langchain_para ="LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.[1]\
            LangChain was launched in October 2022 as an open source project by Harrison Chase, while working at machine learning startup Robust Intelligence. The project quickly garnered popularity, with improvements from hundreds of contributors on GitHub, trending discussions on Twitter, lively activity on the project's Discord server, many YouTube tutorials, and meetups in San Francisco and London. In April 2023, the new startup raised over $20 million in funding at a valuation of at least $200 million from venture firm Sequoia Capital, a week after announcing a $10 million seed investment from Benchmark.[2][3]"

pprint(f"Here is the information about langchain from wiki:\n{langchain_para}")

print("------------------")
pprint(
    llm.predict(
        f"what is langchain?\
            Answer from below paragraph:\
            {langchain_para}\
            "
        )
    )

('Here is the information about langchain from wiki:\n'
 'LangChain is a framework designed to simplify the creation of applications '
 'using large language models (LLMs). As a language model integration '
 "framework, LangChain's use-cases largely overlap with those of language "
 'models in general, including document analysis and summarization, chatbots, '
 'and code analysis.[1]            LangChain was launched in October 2022 as '
 'an open source project by Harrison Chase, while working at machine learning '
 'startup Robust Intelligence. The project quickly garnered popularity, with '
 'improvements from hundreds of contributors on GitHub, trending discussions '
 "on Twitter, lively activity on the project's Discord server, many YouTube "
 'tutorials, and meetups in San Francisco and London. In April 2023, the new '
 'startup raised over $20 million in funding at a valuation of at least $200 '
 'million from venture firm Sequoia Capital, a week after announcing a $10 '
 'milli

In [77]:
import pandas as pd
df = pd.read_csv('data/Resume/Resume.csv')
first_resume = df.iloc[0]['Resume_str']
pprint(first_resume)

('         HR ADMINISTRATOR/MARKETING ASSOCIATE\n'
 '\n'
 'HR ADMINISTRATOR       Summary     Dedicated Customer Service Manager with '
 '15+ years of experience in Hospitality and Customer Service Management.   '
 'Respected builder and leader of customer-focused teams; strives to instill a '
 'shared, enthusiastic commitment to customer service.         '
 'Highlights         Focused on customer satisfaction  Team management  '
 'Marketing savvy  Conflict resolution techniques     Training and '
 'development  Skilled multi-tasker  Client relations specialist           '
 'Accomplishments      Missouri DOT Supervisor Training Certification  '
 'Certified by IHG in Customer Loyalty and Marketing by Segment   Hilton '
 'Worldwide General Manager Training Certification  Accomplished Trainer for '
 'cross server hospitality systems such as    Hilton OnQ  ,   Micros    Opera '
 'PMS   , Fidelio    OPERA    Reservation System (ORS) ,   Holidex    '
 'Completed courses and seminars in custo

In [79]:
pprint(
    llm.predict(
        f"Summarise from the experience of this resume in 2-3 lines\
            Resume content: {first_resume}\
            "
        )
    )

('\n'
 '\n'
 'This resume highlights the experience of a HR Administrator and Marketing '
 'Associate with 15+ years of experience in Hospitality and Customer Service '
 'Management. The individual has certification in customer loyalty and '
 'marketing and has expertise in conflict resolution, training and '
 'development, and client relations.')


WOW! This is very helpful
But, we will soon end up with scale issues, if we are working with a really large resume text which goes beyond the context window. Or, if we looking to find candidates with skills we are searching for, we can't send all of the resumes available.

To deal with this we will have to resort to chunking. Where we will chunk the resume by size with overlap.
Such that when we are querying find the most relevant content and pass to LLM.

#### Diving into chunking

In [81]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text = df.iloc[0]['Resume_str']
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20, separators=["."])
chunks = splitter.split_text(text)

print(len(chunks))
pprint(chunks[0])
print("-----------")
pprint(chunks[1])
print("-----------")
pprint(chunks[2])


68
('HR ADMINISTRATOR/MARKETING ASSOCIATE\n'
 '\n'
 'HR ADMINISTRATOR       Summary     Dedicated Customer')
-----------
('Dedicated Customer Service Manager with 15+ years of experience in '
 'Hospitality and Customer Servic')
-----------
('and Customer Service Management.   Respected builder and leader of '
 'customer-focused teams; strives')


#### Lets get started

We start by setting up our data. 
Kindly follow `1_postgres_setup.ipynb`