In [3]:
from google.colab import drive
drive.mount('/content/drive/')

%cd /content/drive/MyDrive/apziva-residency-projects/llm-prompting_potential-talents/

Mounted at /content/drive/
/content/drive/MyDrive/apziva-residency-projects/llm-prompting_potential-talents


In [5]:
# install libraries
!pip install langchain
!pip install langchain_community



# Introduction

This notebook complements the notebook `llm-prompting_potential-talents.ipynb` in the same folder, and explores an alternative way to prompt LLMs that does away with the computational issues shown therein. When manually loading pretrained LLMs and prompting them within the `transformer` package, I indeed encountered a number of errors both locally and remotely, that could only be solved by purchasing compute units in google colab. Here, I bypass the computational part of the workflow by relegating it to `langchain`.

The notebook will have the same goal as the complement notebook. Yet again, I will be playing with the dataset I have previously worked on for the Apziva project "Potential talents" ([github](https://github.com/robpetrosino/c0vEM5oxUa6ndKp8)). The goal of the project was to streamline the first selection round of potential candidates by ranking their fit based on the semantic similarity between their job title and a (series of) specific keywords such as “full-stack software engineer”, “engineering manager”, or “aspiring human resources”.

In [6]:
# retrieving token from colab secrets
from google.colab import userdata
token = userdata.get('token')

# libraries
from langchain import HuggingFaceHub, PromptTemplate, LLMChain
from langchain.chains import ConversationChain


# Mistral-7B-Instruct-v0.3

In [16]:
mistral = "mistralai/Mistral-7B-Instruct-v0.3"

# the HuggingFaceHub function the langchain package will do the model loading without impinging on my computational resources
mistral_model = HuggingFaceHub(
    repo_id = mistral,
    huggingfacehub_api_token = token,
    model_kwargs = {
        'max_new_tokens': 29000, # this is max tokens the model will deal with for the dataset; it might be reset depending on the dataset to be used
        'return_full_text': False,
        'temperature': 0.2
    }
)

After loading the model within LangChain, I will use the **exact** same prompt I was attempting to use in the previous notebook. We will see that the prompt does indeed give reasonable results, thus showing that the actual prompt engineering part was indeed sound.

In [17]:
import pandas as pd

jobs_df = pd.read_csv("./data/raw/potential-talents_aspiring-humanresources_seeking-human-resources.csv")
jobs = list(jobs_df.job_title)
search_term = 'aspiring human resources'

# don't forget the f specification before the prompt string to enable in-string variable reference!!
prompt = f"""

You will be provided with the list called {jobs} and the single string called {search_term}.
For each string contained in {jobs}, follow the steps below:

1. Tokenize the string.
2. Convert it into a vector and call it job_title.
3. Tokenize the string {search_term}.
4. Convert it into a vector and call it search_term.
5. Calculate the cosine similarity between search_term and job_title with four digit precision.
6. Round the cosine similarity value to 2 decimal digits.

Your response must be in json format with the first key "job_title" being job title for each candidate, and the second key "similarity" being cosine similarity score for each candidate.

Do not add any explanation, note, comment, reference, or breakdown in your response.
Before providing the response, sort each line by cosine similarity value in descending order.
"""

convo_mistral = ConversationChain(llm=mistral_model, verbose=False)
response_mistral = convo_mistral.predict(input=prompt)
print(response_mistral)



```json
[
    {
        "job_title": "SVP, CHRO, Marketing & Communications, CSR Officer | ENGIE | Houston | The Woodlands | Energy | GPHR | SPHR",
        "similarity": 0.99
    },
    {
        "job_title": "Aspiring Human Resources Professional",
        "similarity": 0.98
    },
    {
        "job_title": "2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional",
        "similarity": 0.98
    },
    {
        "job_title": "HR Senior Specialist",
        "similarity": 0.97
    },
    {
        "job_title": "People Development Coordinator at Ryan",
        "similarity": 0.97
    },
    {
        "job_title": "Seeking Human Resources HRIS and Generalist Positions",
        "similarity": 0.97
    },
    {
        "job_title": "Student at Humber College and Aspiring Human Resources Generalist",
        "similarity": 0.97
    },
    {
        "job_title": "Aspiring Human Resources Specialist",
        "similarity": 0.96
    },
    {
   

Very neat -- and also very fast, even just using freeware CPU (rather than the costly GPU accelerator)!

This is quite impressive, though larger models such as Phi require a larger usage rate, and therefore a paid subscription to Hugging Face:


# Phi

In [8]:
phi = "microsoft/Phi-3-mini-128k-instruct"

# the HuggingFaceHub function the langchain package will do the model loading without impinging on my computational resources
phi_model = HuggingFaceHub(
    repo_id = phi,
    huggingfacehub_api_token = token,
    model_kwargs = {
        'max_new_tokens': 200,
        'return_full_text': False,
        'temperature': 0.2
    }
)

  warn_deprecated(


In [10]:
import pandas as pd

jobs_df = pd.read_csv("./data/raw/potential-talents_aspiring-humanresources_seeking-human-resources.csv")
jobs = list(jobs_df.job_title)
search_term = 'aspiring human resources'

prompt_phi = [
    {"role": "system", "content": "You are an AI assistant specialized in NLP tasks."},
    {"role": "user", "content":
f"""You will be provided with the list called {jobs} and the single string called {search_term}.
For each string contained in {jobs}, follow the steps below:

1. Tokenize the string.
2. Convert it into a vector and call it job_title.
3. Tokenize the string {search_term}.
4. Convert it into a vector and call it search_term.
5. Calculate the cosine similarity between search_term and job_title.
6. Round the cosine similarity value to 2 decimal digits.

Your response must stick to the following format: "Job title of the candidate: job. Similarity with the search term: cosine similarity value."

Do not add any explanation, note, comment, reference, or breakdown in your response.
Before providing the response, sort each line by cosine similarity value in descending order.
"""}
]

convo_phi = ConversationChain(llm=phi_model, verbose=True)
response_phi = convo_phi.predict(input=prompt_phi)
print(response_phi)

  warn_deprecated(




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: [{'role': 'system', 'content': 'You are an AI assistant specialized in NLP tasks.'}, {'role': 'user', 'content': 'You will be provided with the list called [\'2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional\', \'Native English Teacher at EPIK (English Program in Korea)\', \'Aspiring Human Resources Professional\', \'People Development Coordinator at Ryan\', \'Advisory Board Member at Celal Bayar University\', \'Aspiring Human Resources Specialist\', \'Student at Humber College and Aspiring Human Resources Generalist\', \'HR Senior Specialist\', \'Student at Humber College and Aspiring

HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-128k-instruct (Request ID: vZzfphbQv41Suls253C40)

Rate limit reached. You reached free usage limit (reset hourly). Please subscribe to a plan at https://huggingface.co/pricing to use the API at this rate

# Concluding remarks

The project at hand faced a number of computational issues that could be solved neither locally nor remotely. A possible workaround seems to be taking advantage of the `langchain` library, which is able to handle the computational part of the problem on their servers, and return impressive results on the local machine.

The results of model prompting was very smooth and flexible, and was able to provide the results in the wanted format, `json`, which can eventually be exploited according to the current needs.