# Description:

This notebook aims to automate parts of the recruitment process by filtering potential job applicants based on specific criteria outlined in a job description.

Initially, the notebook sets up the necessary tools and files by installing required packages. The core logic consists of various utility functions to filter candidates based on criteria like CTC (Cost to Company or salary expectations), location, and notice period. Candidates' information is randomly generated, and the resultant shortlisted candidates are saved as a JSON file.

# Learning Objectives:
Efficiently filter candidate data based on CTC, location, and notice period using Python coupled with OpenAI's GPT 3.5 capabilities.
This will show the power of the API to do complex tasks, for which a lot of rules in NLP would have to be written

Upload the .env file to the directory `/content/` which contains the "OPENAI_API_KEY"

In [None]:
# Libraries Installation
!pip install openai

We set up our environment to use OpenAI's API for extracting information from Job Descriptions (JD). We'll use Python as our primary language and leverage the OpenAI library to interact with OpenAI's services


Read the "OPENAI_API_KEY" from the .env file

In [1]:
# Export your API Key to environment variable
# Upload the .env file to the directory "/content/"
!pip install python-dotenv
from dotenv import load_dotenv
load_dotenv()



True

In [2]:
import openai
import os
# Retrieve the API key from environment variable
openai_api_key = os.getenv("OPENAI_API_KEY")

# Set the API key for OpenAI
openai.api_key = openai_api_key

Upload the json file containing important information about the Job requirements which was generated in Assignment1

In [5]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving requirements_output.json to requirements_output.json
User uploaded file "requirements_output.json" with length 1191 bytes


Download the `resume_data.zip` file containing all the resumes from google drive

In [6]:
import requests
# https://drive.google.com/file/d/17V_o0Snt-Lj0FmegENPQ_rXpvWTWlZgQ/view?usp=sharing
def download_file_from_google_drive(file_id, destination):
    base_url = "https://drive.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(base_url, params={'id': file_id}, stream=True)
    token = get_confirm_token(response)

    if token:
        params = {'id': file_id, 'confirm': token}
        response = session.get(base_url, params=params, stream=True)

    save_response_content(response, destination)

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value
    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk:
                f.write(chunk)
# Example Usage
# file_id = '1HaM3IeK2-iqyZzeQmCnAzKLcF9NF-mSo'  # Replace with your file's ID
# destination = 'resume_data.zip'
file_id = '17V_o0Snt-Lj0FmegENPQ_rXpvWTWlZgQ'
destination = 'Webinar_resumes.zip'  # Replace with your desired file name and extension
download_file_from_google_drive(file_id, destination)

# **Random Data Generation**

# `ResumeProcessor` Class Definition
#### **__init__** Method:
Initializes an instance of the class with a path to a zip file containing resumes.

#### **get_ctc_bounds** Method:
Uses the GPT-3.5 Turbo model to extract the lower and upper salary bounds from a string given in the format "XX-YY". It returns these bounds as floats.

```python
def get_ctc_bounds(self, ctc_range):
        messages = [
            {"role": "system", "content": "You are an assistant to a recruiter. You will be given a range consisting of lower and upper salary amount in dollars. \
            Return the upper and lower values in dollars in expanded form. No currency should be mentioned. Output should be in JSON like {'lower':lower salary, 'upper': upper salary}"},
            {"role": "user", "content": f"The salary range is {ctc_range}."},
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        generated_texts = [choice.message["content"].strip() for choice in response["choices"]]
        # print("iniital output", generated_texts)

        pattern = r'(\d{1,3}(?:,\d{3})*)\s*(dollars)?'

        matches = re.findall(pattern, generated_texts[0])
        # print("matches", matches)
        if matches and len(matches) == 2:


            # Extracting lower and upper salary from the matches and removing commas
            try:
              lower_salary = float(matches[0].replace(',', ''))
              upper_salary = float(matches[1].replace(',', ''))
            except:
              lower_salary = float(matches[0][0].replace(',', ''))
              upper_salary = float(matches[1][0].replace(',', ''))

            return lower_salary, upper_salary
        else:
            return None
        return generated_texts[0]
```

#### **convert_notice_period_to_days** Method:
Again uses the GPT-3.5 Turbo model to convert the notice period from various formats to an integer value of days.

```python
def convert_notice_period_to_days(self, jd_notice):
        messages = [
            {"role": "system", "content": "You're assisting a recruiter. Convert the provided notice period into days. \
            1 month is typically 30 days, and 1 year is 365 days. The final output should be 'Days': Number of days."},
            {"role": "user", "content": f"Given notice period: {jd_notice}. Return the output in json with the field 'Days'"}
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        generated_texts = [choice.message["content"].strip() for choice in response["choices"]]
        # print(generated_texts, generated_texts[0])
        return int(re.findall(r'\d+', generated_texts[0])[0])
```


#### **extract_and_rename** Method:
Extracts the contents of the zip file to a specified directory, and renames directories that contain spaces to use underscores. It returns the path where the resumes are located.

#### **generate_random_data** Method:
This method randomly generates data for a given resume based on the given job requirements. For instance:

1. It decides a candidate's current and expected CTC based on a distribution logic.

2. It assigns a current location from a predefined list of major cities.

3. It determines the notice period (in days) for the candidate.

#### **random_data_resumes** Method:
This is the main function of the class:

1. It first extracts the resumes from the zip file.
2. Reads job requirements from an external JSON file.
3. Converts notice period requirements into days.
4. Loops through each resume file.
5. Generates a pseudo email address from the resume's filename.
6. Invokes the generate_random_data method to generate random data for the current resume.
7. Appends this information to an 'applications' list.
Saves the combined resume information for all candidates to a JSON file named "all_applications.json".

### **Code Execution**
Finally, an instance of the ResumeProcessor class is created using the path to a zip file containing resumes. The random_data_resumes method is called on this instance to process the resumes and generate the JSON output.

In summary, this code automates the process of extracting resumes from a zip file, generating associated random data based on job requirements, and saving this data in a structured JSON format.

In [9]:
# Required Libraries
import os
import openai
import json
import random
import zipfile
import re
import shutil
from collections import OrderedDict

class ResumeProcessor:
    def __init__(self):
        self.resume_counter = 1
    # Function to extract the CTC bounds from a given string in the format "XX-YY".
    def get_ctc_bounds(self, ctc_range):
        messages = [
            {"role": "system", "content": "You are an assistant to a recruiter. You will be given a range consisting of lower and upper salary amount in dollars. \
            Return the upper and lower values in dollars in expanded form. No currency should be mentioned. Output should be in JSON like {'lower':lower salary, 'upper': upper salary}"},
            {"role": "user", "content": f"The salary range is {ctc_range}."},
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        generated_texts = [choice.message["content"].strip() for choice in response["choices"]]
        # print("iniital output", generated_texts)

        pattern = r'(\d{1,3}(?:,\d{3})*)\s*(dollars)?'

        matches = re.findall(pattern, generated_texts[0])
        # print("matches", matches)
        if matches and len(matches) == 2:


            # Extracting lower and upper salary from the matches and removing commas
            try:
              lower_salary = float(matches[0].replace(',', ''))
              upper_salary = float(matches[1].replace(',', ''))
            except:
              lower_salary = float(matches[0][0].replace(',', ''))
              upper_salary = float(matches[1][0].replace(',', ''))

            return lower_salary, upper_salary
        else:
            return None
        return generated_texts[0]

    def convert_notice_period_to_days(self, jd_notice):
        messages = [
            {"role": "system", "content": "You're assisting a recruiter. Convert the provided notice period into days. \
            1 month is typically 30 days, and 1 year is 365 days. The final output should be 'Days': Number of days."},
            {"role": "user", "content": f"Given notice period: {jd_notice}. Return the output in json with the field 'Days'"}
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        generated_texts = [choice.message["content"].strip() for choice in response["choices"]]
        # print(generated_texts, generated_texts[0])
        return int(re.findall(r'\d+', generated_texts[0])[0])
    def extract_and_rename(self, zip_file_path):
        # Specify the directory where the files will be extracted.
        extract_path = "extracted_files"

        # Open the zip file for reading.
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            # Extract all files/directories in the zip to the specified directory.
            zip_ref.extractall(extract_path)

        # Start by assuming the path for the resumes is the extraction directory.
        resume_path = extract_path

        # Loop over each item (file or directory) in the extraction directory.
        for item in os.listdir(extract_path):
            # Construct the full path for the item.
            item_path = os.path.join(extract_path, item)

            # Check if the current item is a directory and if its name contains spaces.
            if os.path.isdir(item_path) and ' ' in item:
                # Replace spaces in the directory name with underscores.
                new_name = item.replace(' ', '_')
                # Construct the new path for the directory after renaming.
                new_path = os.path.join(extract_path, new_name)

                # If a directory with the new name doesn't already exist, create one.
                if not os.path.exists(new_path):
                    os.makedirs(new_path)

                # Copy each file/sub-item from the old directory (with spaces in the name) to the new directory.
                for sub_item in os.listdir(item_path):
                    shutil.copy2(os.path.join(item_path, sub_item), new_path)

                # Remove the old directory (with spaces in the name).
                shutil.rmtree(item_path)
                # Update the resume path to point to the new directory.
                resume_path = new_path
            else:
                # If the item is not a directory (i.e., it's a file), update the resume path to point to this file.
                resume_path = item_path

        # Return the path where the resumes are located (either a directory or a single file).
        return resume_path

    def generate_random_data(self, upper_bound, lower_bound, total_resumes, max_notice_period_days):
        # Define a list of Californian cities.
        californian_cities = ['San Francisco', 'San Diego', 'Sacramento', 'Oakland']

        # Modify thresholds to skew random generation towards more favorable candidates.
        threshold_80_percent = 0.80 * total_resumes  # Increased from 50%
        threshold_90_percent = 0.90 * total_resumes
        threshold_10_percent = 0.10 * total_resumes  # Increased from 20%

        # Skew CTC generation to be within the desired range for 80% of candidates.
        if self.resume_counter < threshold_80_percent:
            current_ctc = round(random.uniform(lower_bound, upper_bound), 1)
            expected_ctc = round(random.uniform(current_ctc + 1, upper_bound), 1)
        else:
            current_ctc = round(random.uniform(0.5 * lower_bound, lower_bound - 1), 1)
            expected_ctc = round(random.uniform(current_ctc + 1, upper_bound), 1)

        # 90% candidates are willing to relocate.
        if self.resume_counter < threshold_90_percent:
            willing_to_relocate = "yes"
        else:
            willing_to_relocate = "no"

        # Adjust notice period distribution.
        if self.resume_counter < threshold_10_percent:
            notice_period = f"{random.randint(max_notice_period_days + 1, max_notice_period_days + 30)} days"
        else:
          notice_period = f"{random.randint(10, max_notice_period_days//2)} days"

        current_location = random.choice(californian_cities)
        self.resume_counter += 1

        return current_ctc, expected_ctc, willing_to_relocate, current_location, notice_period

    def random_data_resumes(self, zip_file_path):
        # Extract resumes from the zip file and rename them if necessary.
        resume_path = self.extract_and_rename(zip_file_path)

        # List out all the resume files present in the extracted path with specific extensions (.pdf, .doc, .docx).
        resume_files = [f for f in os.listdir(resume_path) if f.endswith(('.pdf', '.doc', '.docx'))]
        applications = []

        # Read the job requirements from the "requirements_output.json" file.
        with open("requirements_output.json", "r") as f:
            job_req = json.load(f)
        # Convert the notice period (from job requirements) to days.
        notice_period_criteria = self.convert_notice_period_to_days(job_req.get("notice_period", ""))
        # Get the lower and upper bounds of the CTC from the job requirements.
        print(self.get_ctc_bounds(job_req.get("CTC", "")))
        lower_bound, upper_bound = self.get_ctc_bounds(job_req.get("CTC", ""))

        # Iterate over each resume file.
        for filename in resume_files:
            resume_file_path = os.path.join(resume_path, filename)
            # Generate an email ID using the filename (assuming the file name does not contain periods other than the file extension).
            email_id = filename.split('.')[0] + "@example.com"
            # Generate random data for the current resume.
            current_ctc, expected_ctc, willing_to_relocate, current_location, notice_period = self.generate_random_data(upper_bound, lower_bound, len(resume_files), notice_period_criteria)

            # Append the generated data for the current resume to the applications list.
            applications.append({
                'current_ctc': current_ctc,
                'expected_ctc': expected_ctc,
                'willing_to_relocate': willing_to_relocate,
                'current_location': current_location,
                'notice_period': notice_period,
                'email_id': email_id,
                'resume_path': filename
            })

        # Save the entire applications list to a JSON file.
        with open("all_applications.json", "w") as f:
            json.dump(applications, f, indent=4)

# Using the class:
processor = ResumeProcessor()
processor.random_data_resumes("/content/Webinar_resumes.zip")

(100000.0, 150000.0)


**Note:**
In the above cell, the method **get_ctc_bounds** uses OpenAI API because the input CTC range can vary a lot and so, normal heuristics would not have worked on it. For example:
1. For Input get_ctc_bounds("12k-1.5 milllion") Output: (12000.0, 1500000.0)
2. For Input get_ctc_bounds("12k-15k") Output: (12000.0, 15000.0)
3. For Input get_ctc_bounds("3-20") Output: (3.0, 20.0)

Another method that uses OpenAI API is **convert_notice_period_to_days**, in order to convert notice periods to days format.
For example:
1. For Input convert_notice_period_to_days("0.5 months") Output: 15
2. For Input convert_notice_period_to_days("9.8 months") Output: 294
3. For Input convert_notice_period_to_days("months 2") Output: 60
4. For Input convert_notice_period_to_days("months_2.5") Output: 75

In [None]:
# Testing various formats for CTC:
result = processor.get_ctc_bounds("100k-150k")
print(f"Input '100k-150k', Output: {result} \n")

result = processor.get_ctc_bounds("100k to 150k")
print(f"Input '100k to 150k', Output: {result} \n")

result = processor.get_ctc_bounds("100k and 150k")
print(f"Input '100k and 150k', Output: {result} \n")

result = processor.get_ctc_bounds("min: 100k & max: 150k")
print(f"Input 'min: 100k & max: 150k', Output: {result} \n")

result = processor.get_ctc_bounds("250,000 to 1M")
print(f"Input '250,000 to 1M', Output: {result} \n")

Input '100k-150k', Output: (100000.0, 150000.0) 

Input '100k to 150k', Output: (100000.0, 150000.0) 

Input '100k and 150k', Output: (100000.0, 150000.0) 

Input 'min: 100k & max: 150k', Output: (100000.0, 150000.0) 

Input '250,000 to 1M', Output: (250000.0, 1000000.0) 



In [None]:
# Testing various formats for Notice period:
result = processor.convert_notice_period_to_days("1 month")
print(f"Input '1 month', Output: {result} \n")

result = processor.convert_notice_period_to_days("1 Month")
print(f"Input '1 Month', Output: {result} \n")

result = processor.convert_notice_period_to_days("33 days")
print(f"Input '33 days', Output: {result} \n")

result = processor.convert_notice_period_to_days("Immideate")
print(f"Input 'Immideate', Output: {result} \n")

result = processor.convert_notice_period_to_days("3600 hours")
print(f"Input '3600 hours', Output: {result} \n")

Input '1 month', Output: 30 

Input '1 Month', Output: 30 

Input '33 days', Output: 33 

Input 'Immideate', Output: 0 

Input '3600 hours', Output: 150 



# Filtration using CTC

The provided code defines a **FilterCTC** class which is intended to filter candidates' resumes based on their Current Total Compensation (CTC) and Expected Total Compensation in comparison to the budget range of a company.
# **Key Components of the FilterCTC class:**
1. **Initialization**:
The __init__ method initializes the class. It creates an instance of the **ResumeProcessor** class and assigns it to the **processor** attribute. This suggests that the **FilterCTC** class is dependent on functionalities provided by the **ResumeProcessor** class mainly on the **get_ctc_bounds** method which takes a range of CTC as input and gives the lower and upper limit.
2. **CTC Check**:
The **get_ctc_check** method checks whether a candidate's current and expected CTC are within the company's budget range. It uses OpenAI's GPT-3.5 model to make this decision and returns True if both the candidate's current and expected CTC fall within the budget range, otherwise it returns False.
```python
def get_ctc_check(self, budget_min, budget_max, cr_ctc, exp_ctc):
        """
        Check if the candidate's current and expected CTC are within the company's budget range.

        :param budget_min: Minimum budget of the company for CTC
        :param budget_max: Maximum budget of the company for CTC
        :param cr_ctc: Candidate's current CTC
        :param exp_ctc: Candidate's expected CTC

        :return: True if candidate's CTC is within budget, else False
        """
        messages = [
            {"role": "system", "content": "You are an assistant to a recruiter. You will be given the budget of the company (a range), candidate's current total compensation, and expected total compensation. \
            Return 'yes' if the current compensation is greater than the budget minimum and the expected total compensation is less than the maximum budget. Both conditions should be met for a 'yes'. For all other cases, return 'no'."},
            {"role": "user", "content": f"Company budget minimum: {budget_min}, company budget maximum: {budget_max}, candidate current total compensation: {cr_ctc}, candidate total expected compensation: {exp_ctc}"},
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        result = response.choices[0].message["content"].strip().lower()
        return result == "yes"
```
3. **Filtering Resumes Based on CTC**:
This method is responsible for the primary functionality:
It loads a list of applications (resumes) and job requirements from two separate JSON files.
It retrieves the CTC bounds (i.e., lower and upper limits) from the job requirements.
It then filters the applications based on the current and expected CTC, ensuring they fall within the bounds. Afther that two new JSON files are created, **filtered_applications_ctc.json** containing the applications that meet the CTC criteria and **removed_resume_ctc.json** containing the applications that did not meet the criteria.

# **Execution:**
The code concludes by creating an instance of the **FilterCTC** class and invoking the **filter_CTC_resumes** method, which triggers the resume filtering process based on the CTC criteria.

In [10]:
import openai
import json
from collections import OrderedDict
import re

class FilterCTC:
    def __init__(self):
        # Create an instance of the resume processor class
        self.processor = ResumeProcessor()

    def get_ctc_check(self, budget_min, budget_max, cr_ctc, exp_ctc):
        """
        Check if the candidate's current and expected CTC are within the company's budget range.

        :param budget_min: Minimum budget of the company for CTC
        :param budget_max: Maximum budget of the company for CTC
        :param cr_ctc: Candidate's current CTC
        :param exp_ctc: Candidate's expected CTC

        :return: True if candidate's CTC is within budget, else False
        """
        messages = [
            {"role": "system", "content": "You are an assistant to a recruiter. You will be given the budget of the company (a range), candidate's current total compensation, and expected total compensation. \
            Return 'yes' if the current compensation is greater than the budget minimum and the expected total compensation is less than the maximum budget. Both conditions should be met for a 'yes'. For all other cases, return 'no'."},
            {"role": "user", "content": f"Company budget minimum: {budget_min}, company budget maximum: {budget_max}, candidate current total compensation: {cr_ctc}, candidate total expected compensation: {exp_ctc}"},
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        result = response.choices[0].message["content"].strip().lower()
        return result == "yes"

    def filter_CTC_resumes(self):
        """
        Load resumes and job requirements, and filter out resumes based on CTC.
        The filtered resumes are saved to 'filtered_applications_ctc.json' and the ones which didn't meet criteria to 'removed_resume_ctc.json'.
        """
        # Load all applications from the JSON file
        with open("all_applications.json", "r") as f:
            applications = json.load(f)

        # Load job requirements from the JSON file
        with open("requirements_output.json", "r") as f:
            job_req = json.load(f)

        # Assume a method exists to get the CTC bounds from the job requirements
        lower_bound, upper_bound = self.processor.get_ctc_bounds(job_req.get("CTC", ""))

        # Filter the applications based on the current and expected CTC criteria
        filtered_applications_ctc = [app for app in applications if self.get_ctc_check(lower_bound, upper_bound, app['current_ctc'], app['expected_ctc'])]

        # Identify the applications that were removed due to not meeting the CTC criteria
        removed_due_to_ctc = [app for app in applications if app not in filtered_applications_ctc]

        # Save the filtered applications to a new JSON file
        with open("filtered_applications_ctc.json", "w") as f:
            json.dump(filtered_applications_ctc, f, indent=4)

        # Save the applications that didn't meet the criteria to a separate JSON file
        with open("removed_resume_ctc.json", "w") as f:
            json.dump(removed_due_to_ctc, f, indent=4)
filterctc = FilterCTC()
filterctc.filter_CTC_resumes()

**Note**:
For **FilterCTC** class we use the method **get_ctc_check**, which uses OpenAI api, as the input values can vary a lot and so normal heuristics would not have worked.
For example:
1. For Input get_ctc_check("15.99k", "230.99k", "160,000", "220k") Output: True
2. For Input get_ctc_check("15.99k", "220k", "160,000", "1.1 mil") Output: False
3. For Input get_ctc_check("1.99k", "230000", "2000", "220k") Output: True

In [12]:
# Testing various formats for CTC check:
budget_min = "100k"
budget_max = "150k"
cr_ctc = "120,000"
exp_ctc = "140,000"
result = filterctc.get_ctc_check(budget_min, budget_max, cr_ctc, exp_ctc)
print(f"Input '{budget_min, budget_max, cr_ctc, exp_ctc}', Output: {result} \n")

budget_min = "100k"
budget_max = "150000"
cr_ctc = "120k"
exp_ctc = "0.14M"
result = filterctc.get_ctc_check(budget_min, budget_max, cr_ctc, exp_ctc)
print(f"Input '{budget_min, budget_max, cr_ctc, exp_ctc}', Output: {result} \n")

budget_min = "100k"
budget_max = "150k"
cr_ctc = "120,000"
exp_ctc = "170,000"
result = filterctc.get_ctc_check(budget_min, budget_max, cr_ctc, exp_ctc)
print(f"Input '{budget_min, budget_max, cr_ctc, exp_ctc}', Output: {result} \n")

budget_min = "100k"
budget_max = "150000"
cr_ctc = "120k"
exp_ctc = "1.4 Million"
result = filterctc.get_ctc_check(budget_min, budget_max, cr_ctc, exp_ctc)
print(f"Input '{budget_min, budget_max, cr_ctc, exp_ctc}', Output: {result} \n")

Input '('100k', '150k', '120,000', '140,000')', Output: True 

Input '('100k', '150000', '120k', '0.14M')', Output: True 

Input '('100k', '150k', '120,000', '170,000')', Output: False 

Input '('100k', '150000', '120k', '1.4 Million')', Output: False 




# **FilterCity Class Description**:

### **Initialization**:

The class is initialized with paths to two JSON files:
1. one containing job requirements (job_requirements_path) and another containing previously filtered applications based on CTC (filtered_ctc_applications_path).
2. Upon initialization, it reads and loads these JSON files into respective instance variables.

### **check_city Method**:

1. This method interacts with the OpenAI GPT-3.5-turbo model.
2. Given a candidate's current city and the job's city, the method checks if they are the same (even considering minor variations or typos) or if they're within a default 1-hour drive of each other.
3. The result is either "yes" or "no".
```python
def check_city(self, current_city, job_city, drive_hour = "1-hour"):
        messages = [
            {"role": "system", "content": f"You're assisting a recruiter. Determine if the candidate's current city ({current_city}) and the prospective job city ({job_city})\
            are the same (even if the names vary or there may be spelling mistakes) or whether both the cities are within a {drive_hour} drive by car from each other, in that case you should \
            return 'yes', otherwise return 'no'."},
            {"role": "user", "content": f"Candidate's current city: {current_city}. Job's city: {job_city}."}
        ]

        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        generated_texts = [choice.message["content"].strip() for choice in response["choices"]]
        return "Yes" in generated_texts or "yes" in generated_texts
```

### **filter_by_willing_to_relocate_and_city Method**:

1. Filters the candidates based on their current city and their willingness to relocate.
2. If a candidate's current city is the same as the job's city or if they are willing to relocate, they are added to the filtered list.
3. For candidates not in the same city and not willing to relocate, it checks with the check_city method to determine if the two cities are close enough. If they are, the candidate is added to the filtered list.

### **save_filtered_and_removed Method:**

This method saves two sets of data:

1. The final list of filtered applicants based on city constraints.
2. The list of candidates removed due to not meeting the city criteria.
Both lists are saved as separate JSON files.

### **process_filtering Method**:
This is the main execution method for the class. It calls the filtering function and then the saving function to process and store the results.

Finally, after defining the class, an instance of the **FilterCity** class is created and the **process_filtering** method is called, which triggers the whole filtering and saving operation.

In [11]:
import openai
import json

class FilterCity:
    # Initialize the object with job requirements and filtered ctc applications
    def __init__(self, job_requirements_path, filtered_ctc_applications_path):
        # Load the job requirements from the given file
        with open(job_requirements_path, "r") as f:
            self.job_req = json.load(f)

        # Load the previously filtered applications from the given file
        with open(filtered_ctc_applications_path, "r") as f:
            self.filtered_applications_ctc = json.load(f)

    def check_city(self, current_city, job_city, drive_hour = "1-hour"):
        messages = [
            {"role": "system", "content": f"You're assisting a recruiter. \
            Determine if the candidate's current city ({current_city}) and the prospective job city ({job_city})\
            are the same (even if the names vary or there may be spelling mistakes) then return 'yes'.\
            If both the cities {current_city} & {job_city} are within a {drive_hour} drive by car from each other, in that case you should \
            return 'yes', otherwise return 'no'."},
            {"role": "user", "content": f"Candidate's current city: {current_city}. Job's city: {job_city}."}
        ]

        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=100
        )
        generated_texts = [choice.message["content"].strip() for choice in response["choices"]]
        return "Yes" in generated_texts or "yes" in generated_texts
    # Filter applicants based on their willingness to relocate or if they are in the same city
    def filter_by_willing_to_relocate_and_city(self):
        filtered_applications = []
        job_city = self.job_req.get("City", "")

        # Iterate through each application
        for application in self.filtered_applications_ctc:
            # If the 'willing_to_relocate' field is 'n.a.', set it to 'yes'
            if application['willing_to_relocate'].lower() == "n.a.":
                application['willing_to_relocate'] = 'yes'

            # Check if the applicant's current location matches the job city
            same_city = application['current_location'].lower() == job_city.lower()

            # Check if the applicant is willing to relocate
            willing_to_relocate = application['willing_to_relocate'].lower() == 'yes'

            # If applicant is in the same city or willing to relocate, append to the filtered list
            if same_city or willing_to_relocate:
                filtered_applications.append(application)
                continue
            # If not in the same city and not willing to relocate, check willingness using the GPT model
            if not same_city and not willing_to_relocate:
                if self.check_city(application['current_location'], job_city):
                    filtered_applications.append(application)

        return filtered_applications

    # Save the filtered and removed applications to separate files
    def save_filtered_and_removed(self, filtered_applications):
        # Find the applications that were removed due to city constraints
        removed_due_to_city = [app for app in self.filtered_applications_ctc if app not in filtered_applications]

        # Save the filtered applications
        with open("filtered_applications_city.json", "w") as f:
            json.dump(filtered_applications, f, indent=4)

        # Save the removed applications
        with open("removed_due_to_city.json", "w") as f:
            json.dump(removed_due_to_city, f, indent=4)

    # Process the filtering and saving operations
    def process_filtering(self):
        filtered_applications = self.filter_by_willing_to_relocate_and_city()
        self.save_filtered_and_removed(filtered_applications)

# Create an instance of the ApplicantFilter and process the applications
applicant_filter = FilterCity("requirements_output.json", "filtered_applications_ctc.json")
applicant_filter.process_filtering()

**Note**:
For **FilterCity** class we use the method **check_city**, which uses OpenAI api, as the input values can vary a lot and so normal heuristics would not have worked.
For example:
1. For Input check_city("Oakland", "San Francisco") Output: True
2. For Input check_city("Bnglre", "Bengaluru") Output: True
3. For Input check_city("San diego", "San Francisco") Output: False
4. For Input check_city("Manhattan", "New york") Output: True

In [None]:
# Testing our code with different variations:
current_city = "San Francisco"
job_city = "san francisco"
drive_distnace = "1-hour"
result = applicant_filter.check_city(current_city, job_city, drive_distnace)
print(f"Applicant location: {current_city}, Job locatoion: {job_city}, Drive distnace: {drive_distnace}")
print(f"Result: {result} \n")

current_city = "San Francisco"
job_city = "Oakland"
drive_distnace = "1-hour"
result = applicant_filter.check_city(current_city, job_city, drive_distnace)
print(f"Applicant location: {current_city}, Job locatoion: {job_city}, Drive distnace: {drive_distnace}")
print(f"Result: {result} \n")

current_city = "Santa Cruz"
job_city = "Sacramento"
drive_distnace = "1 hour"
result = applicant_filter.check_city(current_city, job_city, drive_distnace)
print(f"Applicant location: {current_city}, Job locatoion: {job_city}, Drive distnace: {drive_distnace}")
print(f"Result: {result} \n")

current_city = "Santa Cruz"
job_city = "Sacramento"
drive_distnace = "2-hour"
result = applicant_filter.check_city(current_city, job_city, drive_distnace)
print(f"Applicant location: {current_city}, Job locatoion: {job_city}, Drive distnace: {drive_distnace}")
print(f"Result: {result} \n")

Applicant location: San Francisco, Job locatoion: san francisco, Drive distnace: 1-hour
Result: True 

Applicant location: San Francisco, Job locatoion: Oakland, Drive distnace: 1-hour
Result: True 

Applicant location: Santa Cruz, Job locatoion: Sacramento, Drive distnace: 1 hour
Result: False 

Applicant location: Santa Cruz, Job locatoion: Sacramento, Drive distnace: 2-hour
Result: True 



# **Filtration using notice period**
The following code defines a class **FilterNotice** that is utilized to further filter job applications based on the notice period of the candidates in relation to the notice period requirement of the job. Here's a breakdown:

# **FilterNotice Class Description**:

### **check_notice Method:**

This method is designed to compare the notice period of a job (as provided in the job description) and the notice period of a candidate.
It utilizes the GPT-3.5-turbo model of OpenAI to convert notice periods given in various formats (months, years, or days) into days.
If the candidate's notice period, when converted to days, is less than or equal to the notice period required by the job, it returns 'yes', otherwise 'no'.
```python
def check_notice(self, jd_notice, can_notice):
        """
        Convert notice periods given in months, years, or days to days, and check if the candidate's
        notice period is less than or equal to the job's notice period.

        :param jd_notice: Job's notice period
        :param can_notice: Candidate's notice period

        :return: 'yes' or 'no' indicating if the candidate's notice period is less than or equal to the job's notice period
        """
        jd_notice = self.processor.convert_notice_period_to_days(jd_notice)
        can_notice = self.processor.convert_notice_period_to_days(can_notice)
        messages = [
            {"role": "system", "content": "You're assisting a recruiter. Convert the provided notice periods into days. \
            1 month is typically 30 days, and 1 year is 365 days. If the candidate's notice period in days is less than the job's notice period in days, then return 'yes'. \
            Otherwise, reply 'no'. The output must be 'yes' or 'no'"},
            {"role": "user", "content": f"Job's notice period: {jd_notice}. Candidate's notice period: {can_notice}."}
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=1000
        )
        generated_texts = [choice.message["content"].strip().lower() for choice in response["choices"]]
        # print("initial output", generated_texts)
        return generated_texts[0]
```

### **filter_and_save Method:**

This method performs the main operations of loading job requirements, extracting the notice period criteria, and then filtering the previously filtered applications based on city constraints.
Only the candidates whose notice periods satisfy the job's notice period criteria are kept in the filtered_applications_notice list.
It then saves this final filtered list to a new JSON file named "filtered_applications.json".
The applications that were removed during this notice period filtering step are saved in another file named "removed_resume_final.json".

# **After the Class Definition:**


An instance of the **FilterNotice** class is created and named **filterer**.
Finally, the filtering based on notice period is executed by calling the **filter_and_save** method on the filterer instance.
In essence, the code's primary goal is to ensure that candidates' notice periods align with the job's requirements. It does this by using the OpenAI model for conversions and then filtering the applications accordingly.

In [12]:
import json
import openai

class FilterNotice:

    def __init__(self):
        # Initialize the ResumeFilter class with an instance of ResumeProcessor class
        self.processor = ResumeProcessor()

    def check_notice(self, jd_notice, can_notice):
        """
        Convert notice periods given in months, years, or days to days, and check if the candidate's
        notice period is less than or equal to the job's notice period.

        :param jd_notice: Job's notice period
        :param can_notice: Candidate's notice period

        :return: 'yes' or 'no' indicating if the candidate's notice period is less than or equal to the job's notice period
        """
        jd_notice = self.processor.convert_notice_period_to_days(jd_notice)
        can_notice = self.processor.convert_notice_period_to_days(can_notice)
        messages = [
            {"role": "system", "content": "You're assisting a recruiter. Convert the provided notice periods into days. \
            1 month is typically 30 days, and 1 year is 365 days. If the candidate's notice period in days is less than the job's notice period in days, then return 'yes'. \
            Otherwise, reply 'no'. The output must be 'yes' or 'no'"},
            {"role": "user", "content": f"Job's notice period: {jd_notice}. Candidate's notice period: {can_notice}."}
        ]
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0,
            max_tokens=100
        )
        generated_texts = [choice.message["content"].strip().lower() for choice in response["choices"]]
        # print("initial output", generated_texts, jd_notice, can_notice, "jd_notice, can_notice")
        return generated_texts[0]

    def filter_and_save(self):
        # Load the job requirements
        with open("requirements_output.json", "r") as f:
            job_req = json.load(f)

        # Get the job's notice period criteria
        notice_period_criteria = job_req.get("notice_period", "")

        # Load the previously filtered applications based on city
        with open("filtered_applications_city.json", "r") as f:
            filtered_applications_city = json.load(f)

        # Filter the previously filtered applications based on the notice period
        filtered_applications_notice = [app for app in filtered_applications_city if self.check_notice(notice_period_criteria, app['notice_period']) == 'yes']
        removed_due_to_notice = [app for app in filtered_applications_city if app not in filtered_applications_notice]

        # Save the final filtered applications to a new JSON file
        with open("filtered_applications.json", "w") as f:
            json.dump(filtered_applications_notice, f, indent=4)

        # Save the resumes that were removed in the final filtering to another JSON file
        with open("removed_resume_final.json", "w") as f:
            json.dump(removed_due_to_notice, f, indent=4)

# Assuming the ResumeProcessor class is defined elsewhere, we create an instance of it

filterer = FilterNotice()
filterer.filter_and_save()

initial output ['no'] 30 34 jd_notice, can_notice
initial output ['yes'] 30 11 jd_notice, can_notice
initial output ['yes'] 30 12 jd_notice, can_notice
initial output ['yes'] 30 12 jd_notice, can_notice
initial output ['yes'] 30 13 jd_notice, can_notice
initial output ['yes'] 30 13 jd_notice, can_notice
initial output ['yes'] 30 14 jd_notice, can_notice
initial output ['yes'] 30 14 jd_notice, can_notice
initial output ['yes'] 30 12 jd_notice, can_notice
initial output ['yes'] 30 13 jd_notice, can_notice
initial output ['yes'] 30 11 jd_notice, can_notice
initial output ['yes'] 30 14 jd_notice, can_notice
initial output ['yes'] 30 11 jd_notice, can_notice
initial output ['yes'] 30 14 jd_notice, can_notice


**Note**:
For **FilterNotice** class we use the method **check_notice**, which uses OpenAI api, as the input values can vary a lot and so normal heuristics would not have worked.
For example:
1. For Input convert_notice_period_to_days("1.5 months", "46 days") Output: False
2. For Input check_notice("months_1.5", "50 days") Output: False
3. For Input check_notice("months_2", "50 days") Output: True
4. For Input check_notice("2-months", "50 days") Output: True
5. For Input check_notice("2_months", "50 days") Output: True

In [13]:
# Testing various cases:
notice_period_jd = "One Month"
notice_period_candidate = "29 days"
result = filterer.check_notice(notice_period_jd, notice_period_candidate)
print(f"Output for JD Notice period: {notice_period_jd} & Candidate notice period: {notice_period_candidate} is {result} \n")

notice_period_jd = "1_Month"
notice_period_candidate = "2-months"
result = filterer.check_notice(notice_period_jd, notice_period_candidate)
print(f"Output for JD Notice period: {notice_period_jd} & Candidate notice period: {notice_period_candidate} is {result} \n")

notice_period_jd = "1 month"
notice_period_candidate = "1 Month"
result = filterer.check_notice(notice_period_jd, notice_period_candidate)
print(f"Output for JD Notice period: {notice_period_jd} & Candidate notice period: {notice_period_candidate} is {result} \n")

notice_period_jd = "720 hours"
notice_period_candidate = "43,000 minutes"
result = filterer.check_notice(notice_period_jd, notice_period_candidate)
print(f"Output for JD Notice period: {notice_period_jd} & Candidate notice period: {notice_period_candidate} is {result} \n")

initial output ['yes'] 30 29 jd_notice, can_notice
Output for JD Notice period: One Month & Candidate notice period: 29 days is yes 

initial output ['no'] 30 60 jd_notice, can_notice
Output for JD Notice period: 1_Month & Candidate notice period: 2-months is no 

initial output ['no'] 30 30 jd_notice, can_notice
Output for JD Notice period: 1 month & Candidate notice period: 1 Month is no 

initial output ['yes'] 30 29 jd_notice, can_notice
Output for JD Notice period: 720 hours & Candidate notice period: 43,000 minutes is yes 



In [15]:
import json

# Define a function to count and return the number of entries in a given JSON file.
def count_entries_in_json(json_filepath):
    # Open and read the content of the JSON file.
    with open(json_filepath, "r") as f:
        data = json.load(f)  # Parse and load the JSON content into a variable.
        return len(data)  # Return the number of top-level entries in the loaded data.

# A list of file paths containing JSON data that we want to process.
files = [
    "/content/all_applications.json",
    "/content/filtered_applications.json",
    "/content/removed_due_to_city.json",
    "/content/removed_resume_ctc.json",
    "/content/removed_resume_final.json"
]

# Loop through each file in the list, count its entries, and print the result.
for file in files:
    count = count_entries_in_json(file)  # Use the function to count entries for the current file.
    print(f"{file}: {count} entries")  # Display the file path along with its count of entries.

/content/all_applications.json: 16 entries
/content/filtered_applications.json: 13 entries
/content/removed_due_to_city.json: 1 entries
/content/removed_resume_ctc.json: 1 entries
/content/removed_resume_final.json: 1 entries


Download the final json file containing information about the filtered resumes.

In [14]:
from google.colab import files

# List of file paths that you want to download
file_paths = [
    "/content/filtered_applications.json",
    "/content/all_applications.json"

]

# Download each file to your local system
for path in file_paths:
    files.download(path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In this notebook, we leverage the job description requirements crafted in Assignment1 to streamline resume filtering. Our criteria hinge on multiple facets: the candidate's expected and current CTC, their geographic location, their openness to relocation, and their notice period. By integrating all these filters, we efficiently shortlist potential candidates. In the next Assignments we are going to take a look on how to extract meaningful information from these extracted resumes.
