<a href="https://www.kaggle.com/code/christiansamo/capstone-project-gen-ai-samo-christian?scriptVersionId=246074585" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# **Capstone project: building a toy HR llm assistant**

### keys functionalities:

* Provide the list of available jobs to prospective applicants.
* Give applicants relevant details on job opportunities upon request, through a set of faq frequently asked questions.


### Goals:
* Main goal is to ease the user experience for job seekers visiting the careers' website.
  

# Overview of the steps to build our HR assistant 

1. Create a fake company along with fictitious careers data.
2. Populate a vector database and build the assistant.


## 1. Creating a fake company along with fictitious careers data.

Our fake company will be **Global Logistics LLC.**

Context:
Global Logistics LLC. is logistics company, specializing in international freight forwarding and transportation solutions. Global Logistics LLC offers a comprehensive set of supply chain management services worldwide, ensuring reliable and efficient logistics support for businesses.  


In [1]:
!pip uninstall -qqy jupyterlab

In [2]:
# install python SDK from google to have access Gemini models
!pip install -Uq "google-genai==1.7.0" 

from google import genai
genai.__version__

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyterlab-lsp 3.10.2 requires jupyterlab<4.0.0a0,>=3.1.0, which is not installed.[0m[31m
[0m

'1.7.0'

In [3]:
# import useful libraries
from IPython.display import HTML, Markdown, display

from google.genai import types

# avoid retry errors 
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

# add my API_KEY to kaggle
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

In [4]:
# System prompt to generate synthetic data: for 6 department in our company.


descpt1 = '''You are Human Resource assistant for the company Global Logistics LLC, and you will generate six departments for that company. 
             Each department should feature two sections: 
             the first section will be: "dept_title" where you place the name of the current department.
             the second section will be: "job" where you will list the type job that department should have to run normally. 
             Enumerate at least five jobs for each department. And include job titles only, no description.
             '''
llm = "gemini-2.0-flash" 

In [5]:
# create a class to be used as response schema in the model
import typing_extensions as typing

class Dept(typing.TypedDict):
    dept_title: str
    job_title: list[str]

In [6]:
# json library will used to convert the model's output into a python dict
import json

#running the model to generate to generate department names and job types
message = "Apply system instructions" 

client = genai.Client(api_key=GOOGLE_API_KEY)

config1 = types.GenerateContentConfig(system_instruction=descpt1,
                                     response_mime_type="application/json",
                                      response_schema=list[Dept],
                                     )

response1 = client.models.generate_content(model= llm,
                                          contents= message,
                                          config=config1,
                                          )
data1_list = response1.text
data1_list_of_dict = json.loads(response1.text)


In [7]:
print('type of: data1_list ', type(data1_list))
print('type of: data1_list_of_dict ', type(data1_list_of_dict))

type of: data1_list  <class 'str'>
type of: data1_list_of_dict  <class 'list'>


In [8]:
# department names generated by Gemini
for i in range(len(data1_list_of_dict)):
    print(data1_list_of_dict[i]['dept_title'])


Human Resources
Operations
Finance
Sales and Marketing
Information Technology
Customer Service


In [9]:
print(data1_list[1:][:-1])


  {
    "dept_title": "Human Resources",
    "job_title": [
      "HR Manager",
      "Recruitment Specialist",
      "Benefits Administrator",
      "Training and Development Coordinator",
      "Employee Relations Specialist"
    ]
  },
  {
    "dept_title": "Operations",
    "job_title": [
      "Operations Manager",
      "Logistics Coordinator",
      "Warehouse Supervisor",
      "Supply Chain Analyst",
      "Transportation Planner"
    ]
  },
  {
    "dept_title": "Finance",
    "job_title": [
      "Chief Financial Officer",
      "Accountant",
      "Financial Analyst",
      "Payroll Specialist",
      "Auditor"
    ]
  },
  {
    "dept_title": "Sales and Marketing",
    "job_title": [
      "Sales Manager",
      "Marketing Coordinator",
      "Business Development Manager",
      "Account Executive",
      "Market Research Analyst"
    ]
  },
  {
    "dept_title": "Information Technology",
    "job_title": [
      "IT Manager",
      "Network Administrator",
      "System

In [10]:
# System prompt (one-shot) to generate detailed job description: 
# for each job title listed in the departments.

descpt2 = '''\
You are Human Resource assistant for the company Global Logistics LLC.
you will generate a detailed job description for each pair: department title and job title you are given.
Apply the same structure as provided in the following example in your response.
Note that the contents of the fields: "About US", "Why Join US", "Equal Opportunity Employer",
"Assessment", "Ready to Apply?" will not semantically differ from those provided in the example. 
However you are free to rephrase them using another style.
All the other fields should be tailored specifically to the job title provided.
You can also change the "location" field to any city in the United States America.
            
            
EXAMPLE:

### Department Title: Information and technology

### Job Title: Software Engineer, Backend (Platform Team) 

### Location: New York, NY (Hybrid Option Available) or Fully Remote (US-Based)

### About Us: 
Global Logistics LLC. is logistics company, specializing in international freight forwarding and transportation solutions. Global Logistics LLC offers a comprehensive set of supply chain management services worldwide, ensuring reliable and efficient logistics support for businesses.At Global Logistics LLC, . We believe in collaboration, innovation, user-satisfaction. We're a passionate team of 500+ people dedicated to making a difference, and we're looking for talented individuals to join us on our journey.

### Opportunity:
We're seeking a creative and driven Backend Software Engineer to join our growing Platform Team. 
You'll play a crucial role in designing, building, and scaling the core services that power our business.
This isn't just about writing code; it's about solving complex problems, building resilient systems, 
and directly impacting the experience of our customers. If you're excited about building robust infrastructure 
and collaborating with a talented team, this is the role for you!

### Responsabilities:
* Design, develop, test, deploy, maintain, and improve backend services and APIs.
* Collaborate closely with product managers, designers, and other engineers 
* to define features and build powerful, scalable solutions.
* Write clean, maintainable, and well-tested code.
* Participate in code reviews, providing and receiving constructive feedback.
* Troubleshoot and debug production issues, ensuring system stability and performance.
* Contribute to architectural decisions and help shape the future of our platform.
* Mentor junior engineers and contribute to a culture of technical excellence.

### Qualifications:
Required:
* 3+ years of professional software development experience, focusing on backend systems.
* Proficiency in at least one modern backend language (e.g., Python, Go, Java, Node.js, Ruby).
* Experience designing and building RESTful APIs.
* Solid understanding of database technologies (SQL and/or NoSQL, e.g., PostgreSQL, MongoDB).
* Experience with cloud platforms (AWS, GCP, or Azure).
* Familiarity with version control systems (e.g., Git).
* Strong problem-solving skills and ability to work independently and collaboratively.
* Excellent communication skills.

Preferred:
* Experience with containerization and orchestration (Docker, Kubernetes).
* Knowledge of microservices architecture.
* Experience with message queues (e.g., Kafka, RabbitMQ).
* Experience with CI/CD pipelines.
* Contributions to open-source projects.
* Bachelor's degree in Computer Science or a related field (or equivalent practical experience).
 
### Why Join Us?
Impact: Work on meaningful projects that directly affect our users and business.
Growth: Opportunities for professional development, learning stipends, and career progression.
Culture: A collaborative, supportive, and inclusive environment where your voice is heard.
Benefits: Competitive salary, comprehensive health insurance (medical, dental, vision), generous paid time off (PTO), 401(k) plan with company match, parental leave.
Perks: Flexible work arrangements (hybrid/remote), team events etc ...

### Compensation:
* The expected salary range for this position is $120,000 - $160,000 USD annually, 
depending on experience level and location. This role may also be eligible for bonus if applicable.

### Equal Opportunity Employer:
Global Logistics LLC. is an equal opportunity employer. We celebrate diversity and are committed to
creating an inclusive environment for all employees. We encourage applications from all qualified 
individuals regardless of race, religion, color, national origin, gender, sexual orientation, age, 
marital status, veteran status, or disability status.

### Assessment 
Evaluation of qualified candidates for this position may include a substantive assessment, 
such as a written test, which will be followed by a competency-based interview by phone/teleconference or 
face-to-face.

### Ready to Apply?
If you're excited about this opportunity, please apply via Email Address: "hr_careers@globallogistics.com". 
We encourage you to include a cover letter briefly explaining why you're interested in this role and
We look forward to hearing from you!
 '''


In [11]:
# running the model to generate job descriptions.

config2 = types.GenerateContentConfig(system_instruction=descpt2,
                                     response_mime_type="application/json",
                                      )
data2_list =[]
data3_list=[]

for i in range(len(data1_list_of_dict)):
    dept= data1_list_of_dict[i]['dept_title']
    for job in data1_list_of_dict[i]['job_title']:
        message = f"Apply system intructions to the following pair: department title is:{dept} and job title is {job}"
        response2 = client.models.generate_content(model= llm,
                                                   contents= message,
                                                   config=config2,
                                                  )
        data2_list.append(response2.text)
        


In [12]:
print('Number of job descriptions generated: ',len(data2_list))
#list of strings along with the job descriptions generated, representing our fake HR data.

documentss = [data2_list[i][1:][:-1] for i in range(len(data2_list))]

Number of job descriptions generated:  30


In [13]:
# inspect one of them
Markdown(documentss[0])


  "Department Title": "Human Resources",
  "Job Title": "HR Manager",
  "Location": "Chicago, IL",
  "About Us": "Global Logistics LLC. is logistics company, specializing in international freight forwarding and transportation solutions. Global Logistics LLC offers a comprehensive set of supply chain management services worldwide, ensuring reliable and efficient logistics support for businesses.At Global Logistics LLC, . We believe in collaboration, innovation, user-satisfaction. We're a passionate team of 500+ people dedicated to making a difference, and we're looking for talented individuals to join us on our journey.",
  "Opportunity": "We are seeking a dynamic and experienced HR Manager to lead our Human Resources department. The HR Manager will be responsible for overseeing all aspects of HR practices and processes, including talent acquisition, employee relations, performance management, training and development, and compliance. This role requires a strategic thinker with excellent leadership and communication skills, capable of driving HR initiatives that support our business objectives.",
  "Responsabilities": "* Develop and implement HR strategies and initiatives aligned with the overall business strategy.\n* Manage the full cycle recruitment process, including sourcing, interviewing, and hiring qualified candidates.\n* Oversee employee relations, addressing grievances, and resolving conflicts effectively.\n* Develop and implement performance management systems to drive employee engagement and productivity.\n* Design and deliver training and development programs to enhance employee skills and knowledge.\n* Ensure compliance with all applicable labor laws and regulations.\n* Manage employee benefits programs, including health insurance, retirement plans, and paid time off.\n* Maintain accurate employee records and HR documentation.\n* Provide guidance and support to managers and employees on HR-related matters.\n* Conduct regular audits of HR practices to identify areas for improvement.",
  "Qualifications": "Required:\n* Bachelor's degree in Human Resources, Business Administration, or a related field.\n* 5+ years of progressive HR experience, with at least 2 years in a management role.\n* Strong knowledge of HR principles, practices, and labor laws.\n* Excellent communication, interpersonal, and conflict resolution skills.\n* Proven ability to develop and implement HR strategies and initiatives.\n* Experience with HRIS systems and Microsoft Office Suite.\n* SHRM-CP/SCP or HRCI certification preferred.\n\nPreferred:\n* Master's degree in Human Resources or a related field.\n* Experience in the logistics or transportation industry.\n* Knowledge of compensation and benefits administration.\n* Experience with union negotiations.",
  "Why Join Us?": "Impact: Work on meaningful projects that directly affect our users and business.\nGrowth: Opportunities for professional development, learning stipends, and career progression.\nCulture: A collaborative, supportive, and inclusive environment where your voice is heard.\nBenefits: Competitive salary, comprehensive health insurance (medical, dental, vision), generous paid time off (PTO), 401(k) plan with company match, parental leave.\nPerks: Flexible work arrangements (hybrid/remote), team events etc ...",
  "Compensation": "* The expected salary range for this position is $90,000 - $120,000 USD annually, depending on experience level and location. This role may also be eligible for bonus if applicable.",
  "Equal Opportunity Employer": "Global Logistics LLC. is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applications from all qualified individuals regardless of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.",
  "Assessment": "Evaluation of qualified candidates for this position may include a substantive assessment, such as a written test, which will be followed by a competency-based interview by phone/teleconference or face-to-face.",
  "Ready to Apply?": "If you're excited about this opportunity, please apply via Email Address: \"hr_careers@globallogistics.com\". We encourage you to include a cover letter briefly explaining why you're interested in this role and We look forward to hearing from you!"


## 2. Populate a vector database and build the assistant.

In [14]:
!pip install -Uq "chromadb==0.6.3"
import chromadb
chromadb.__version__

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m51.1 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m50.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.6/101.6 kB[0m

'0.6.3'

In [15]:
for m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

models/embedding-001
models/text-embedding-004
models/gemini-embedding-exp-03-07
models/gemini-embedding-exp


In [16]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

from google.genai import types


# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

In [17]:
DB_NAME = "job_description_db"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documentss, ids=[str(i) for i in range(len(documentss))])

In [18]:
db.count()

30

In [19]:
# Switch to query mode when generating embeddings.
embed_fn.document_mode = False

In [20]:
#Uncomment the next line to ask your question to the HR assistant

user_query=''
#user_query = input("Ask your question to the Global Logistics HR assistant: ")

query_list = ["create a set of five frequently asked questions faq for job opportunities currently open at Global Logistics LLC. ",
              'how many job opportunities pay a salary above 150 000 USD?']
query_list.append(user_query)

In [21]:
if query_list[-1]=='':
    query_list.pop()
print(query_list)

['create a set of five frequently asked questions faq for job opportunities currently open at Global Logistics LLC. ', 'how many job opportunities pay a salary above 150 000 USD?']


In [22]:
for query in query_list:
    
    result = db.query(query_texts=[query], n_results=30)
    [all_passages] = result["documents"]
    query_oneline = query.replace("\n", " ")
    
    # This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
    prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below. 
    Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. 
    However, you are talking to a non-technical audience, so be sure to break down complicated concepts and 
    strike a friendly and converstional tone.If the passage is irrelevant to the answer, you may ignore it.
    If you can't find the information requested in the passages you received, you can answer that: 
    "there are no job opportunities for your query", or you may rephrase your answer in a different style.   
    
    QUESTION: {query_oneline}
    """
    
    # Add the retrieved documents to the prompt.
    for passage in all_passages:
        passage_oneline = passage.replace("\n", " ")
        prompt += f"PASSAGE: {passage_oneline}\n"
    answer = client.models.generate_content(model="gemini-2.0-flash",
                                        contents=prompt)
    display(Markdown(answer.text))
    
    
    #Markdown(all_passages[0])

Okay, here are five frequently asked questions (FAQ) that job seekers might have about opportunities at Global Logistics LLC:

**FAQ**

*   **Q: What kind of company is Global Logistics LLC?**
    **A:** Global Logistics LLC is a logistics company specializing in international freight forwarding and transportation solutions, providing comprehensive supply chain management services worldwide to ensure reliable and efficient logistics support for businesses.

*   **Q: What is the work culture like at Global Logistics LLC?**
    **A:** At Global Logistics LLC, the work culture is collaborative, supportive, and inclusive, where every team member's voice is valued, and the team believes in collaboration, innovation, and user satisfaction, so it is a passionate team of over 500 people dedicated to making a difference.

*   **Q: What benefits does Global Logistics LLC offer its employees?**
    **A:** Global Logistics LLC provides a competitive salary, comprehensive health insurance that includes medical, dental, and vision coverage, generous paid time off (PTO), a 401(k) plan with a company match, and parental leave. They also provide perks such as flexible work arrangements (hybrid/remote), and team events.

*   **Q: How does Global Logistics LLC ensure a fair and inclusive workplace?**
    **A:** Global Logistics LLC is an equal opportunity employer, celebrating diversity and creating an inclusive environment for all employees by encouraging applications from all qualified individuals, regardless of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

*   **Q: How can I apply for a job at Global Logistics LLC?**
    **A:** If you are excited about an opportunity at Global Logistics LLC, you can apply by sending your application to the email address "hr\_careers@globallogistics.com", and it is recommended to include a cover letter briefly explaining your interest in the role.


Okay, let's see if we can find some high-paying opportunities!

Based on the information I have, there is one job opening with a salary above $150,000 USD:
The Chief Financial Officer position has an expected salary range of $250,000 - $400,000 USD annually.
