# Description:
This notebook delves into the realm of automated skill assessment using the OpenAI API. By examining resumes, the objective is to generate scores for each 'must-have' skill detailed in a job description. The scoring is binary: a '0' indicates the absence of the skill, and a '1' signifies its presence. Throughout this exercise, we'll fine-tune and iteratively develop the most effective prompt to ensure optimum results.
# Learning Objectives:
Engage with the OpenAI API, honing the craft of formulating effective prompts to yield superior outcomes. Alongside, acquire proficiency in extracting salient details from resumes and adeptly representing this data, especially emphasizing formats like standard text and JSON.

Install necessary libraries required to execute all functionalities within this notebook.

In [None]:
!pip install openai
!pip install PyMuPDF
!pip install textract
!pip install python-docx
!pip install tiktoken

Collecting openai
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.0
Collecting PyMuPDF
  Downloading PyMuPDF-1.23.3-cp310-none-manylinux2014_x86_64.whl (4.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.3/4.3 MB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting PyMuPDFb==1.23.3 (from PyMuPDF)
  Downloading PyMuPDFb-1.23.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (30.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.6/30.6 MB[0m [31m39.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDFb, PyMuPDF
Successfully installed PyMuPDF-1.23.3 PyMuPDFb-1.23.3
Collecting textract
  Downloading textract-1.6.5-py3-none-any.whl (23 kB)
Collecting argcomplete~=1.10.0 (from textract)
  Downloading argcomplete-

Collecting python-docx
  Downloading python-docx-0.8.11.tar.gz (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: python-docx
  Building wheel for python-docx (setup.py) ... [?25l[?25hdone
  Created wheel for python-docx: filename=python_docx-0.8.11-py3-none-any.whl size=184487 sha256=1efb060c8a09f39ae9f05ab1303062ce7a10d34f12e876f0a05980a6927f2c58
  Stored in directory: /root/.cache/pip/wheels/80/27/06/837436d4c3bd989b957a91679966f207bfd71d358d63a8194d
Successfully built python-docx
Installing collected packages: python-docx
Successfully installed python-docx-0.8.11
Collecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
Installing c

In [None]:
# Required Libraries
import openai
import json
import os
from collections import OrderedDict


Upload the .env file to the directory `/content/` which contains the "OPENAI_API_KEY"

We set up our environment to use OpenAI's API for extracting information from Job Descriptions (JD). We'll use Python as our primary language and leverage the OpenAI library to interact with OpenAI's services


Read the "OPENAI_API_KEY" from the .env file

In [None]:
# Export your API Key to environment variable
# Upload the .env file to the directory "/content/"
!pip install python-dotenv
from dotenv import load_dotenv
load_dotenv()

Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0


True

In [None]:
import openai
# Retrieve the API key from environment variable
openai_api_key = os.getenv("OPENAI_API_KEY")


# Set the API key for OpenAI
openai.api_key = openai_api_key

Upload the json file containing important information about the Job requirements which was generated in Assignment1 and the file containing information about the filtered resumes along with their summary generated from Assignment3

In [None]:
from google.colab import files

# Upload the first file
print("Please upload the first file (filtered_applications_summary.json):")
uploaded1 = files.upload()

# Check to ensure a file was uploaded. If not, prompt again.
while len(uploaded1) == 0:
    print("No file uploaded. Please upload the first file (filtered_applications_summary.json) again:")
    uploaded1 = files.upload()

# Upload the second file
print("Please upload the second file (requirements_output.json):")
uploaded2 = files.upload()

# Check to ensure a file was uploaded. If not, prompt again.
while len(uploaded2) == 0:
    print("No file uploaded. Please upload the second file (requirements_output.json) again:")
    uploaded2 = files.upload()

# Merge the dictionaries to have all uploaded files in one
uploaded = {**uploaded1, **uploaded2}

# Print details of uploaded files
for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(uploaded[fn])))


Please upload the first file (filtered_applications_summary.json):


Saving filtered_applications_summary.json to filtered_applications_summary.json
Please upload the second file (requirements_output.json):


Saving requirements_output.json to requirements_output.json
User uploaded file "filtered_applications_summary.json" with length 26784 bytes
User uploaded file "requirements_output.json" with length 764 bytes


Now download the `Webinar_resumes.zip` file which contains all the resumes

In [None]:
import requests

def download_file_from_google_drive(file_id, destination):
    base_url = "https://drive.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(base_url, params={'id': file_id}, stream=True)
    token = get_confirm_token(response)

    if token:
        params = {'id': file_id, 'confirm': token}
        response = session.get(base_url, params=params, stream=True)

    save_response_content(response, destination)

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value
    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk:
                f.write(chunk)
# Example Usage
# file_id = '1HaM3IeK2-iqyZzeQmCnAzKLcF9NF-mSo'  # Replace with your file's ID
# destination = 'resume_data.zip'  # Replace with your desired file name and extension
file_id = '17V_o0Snt-Lj0FmegENPQ_rXpvWTWlZgQ'
destination = 'Webinar_resumes.zip'  # Replace with your desired file name and extension
download_file_from_google_drive(file_id, destination)


The following code offers functions to read and process various document types including JSON, .docx, .doc, and .pdf files. Utilizing the python-docx library, it can extract text from .docx files. The textract module allows for text extraction from .doc files, and the PyMuPDF (imported as fitz) caters to .pdf files. There's also a function to trim resume text based on a token limit, ensuring the text doesn't exceed a specified number of tokens, which is important for using OpenAI GPT ChatCompletion. This suite of functions together supports comprehensive document reading and pre-processing capabilities.

In [None]:
# Importing necessary libraries and modules
from docx import Document
import textract
import fitz  # PyMuPDF
import openai
import json
import os
from collections import OrderedDict
import re
import pandas as pd
import math
import tiktoken


def read_requirements(file_path):
    """
    Read the job requirements from a given JSON file.

    Args:
    - file_path (str): Path to the JSON file.

    Returns:
    - dict: Job requirements if successfully read, otherwise None.
    """
    try:
        with open(file_path, 'r') as f:
            data = json.load(f)
        return data
    except Exception as e:
        print(f"Error reading requirements JSON: {e}")
        return None

def read_json(file_path):
    """
    Read data from a given JSON file.

    Args:
    - file_path (str): Path to the JSON file.

    Returns:
    - dict: Data from the JSON file.
    """
    with open(file_path, 'r') as f:
        data = json.load(f)
    return data

def read_document(file_path):
    """
    Read and extract text from various document types (.docx, .doc, .pdf, .xls, .xlsx).

    Args:
    - file_path (str): Path to the document file.

    Returns:
    - str: Extracted text from the document.
    """
    file_path = str(file_path)
    _, file_extension = os.path.splitext(file_path)
    text = ""
    if file_extension == '.docx':
        doc = Document(file_path)
        for para in doc.paragraphs:
            text = text + para.text + " "
    elif file_extension == '.doc':
        text = textract.process(file_path).decode()
    elif file_extension.lower() == '.pdf':
        doc = fitz.open(file_path)
        for page_number in range(len(doc)):
            page = doc[page_number]
            text = text + page.get_text() + " "
    elif file_extension.lower() in ['.xls', '.xlsx']:
        data = pd.read_excel(file_path)
        text = data.to_string(index=False)
    else:
        print(f"Unsupported file type: {file_extension}")

    return text

def check_and_trim(resume_text, max_tokens=1500):
    """
    Trim the text to a specified number of tokens if it exceeds the limit.

    Args:
    - resume_text (str): Text to be trimmed.
    - max_tokens (int, optional): Maximum number of tokens allowed. Defaults to 1500.

    Returns:
    - str: Trimmed text.
    - int: Original number of tokens.
    - int: Number of tokens after trimming.
    """
    # tokens = nltk.word_tokenize(resume_text)
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(resume_text)
    old_len = len(tokens)
    if len(tokens) > max_tokens:
        tokens = tokens[:max_tokens]
        resume_text = enc.decode(tokens)
    return resume_text, old_len, len(tokens)


The following code imports necessary libraries and modules for document processing, text extraction, and interaction with OpenAI's API. It defines the main function: `initial_score`. The `initial_score` function similarly takes a prompt and text, but is designed to evaluate and return a score for the provided text. It utilizes the GPT-3.5 model and the chat-based interface to generate the respective outputs

In [None]:
import openai
import json
import os
from collections import OrderedDict
import re
from docx import Document
import textract
import fitz  # PyMuPDF
import pandas as pd
import math

def initial_score(text, prompt):
    """
    Use OpenAI's model to generate a response based on the given prompt and text.

    Args:
    - text (str): The content to be processed.
    - prompt (str): The instruction or question for the model to guide its response.

    Returns:
    - str: Model-generated text based on the input text and prompt.
    """

    # Specify the model and token limits
    model = "gpt-3.5-turbo-16k"
    max_tokens = 2000

    # Create a list of messages to simulate a conversation with the model.
    # The system starts with a prompt, and the user provides the input text.
    messages = [
            {"role": "system", "content": f"{prompt}"},
            {"role": "user", "content": text},
        ]

    # Make a request to the OpenAI API for the generated response.
    response = openai.ChatCompletion.create(
            model=model,
            messages=messages,
            temperature=1,
            max_tokens=max_tokens
        )

    # Extract the generated text from the model's response
    generated_texts = [
        choice.message["content"].strip() for choice in response["choices"]
    ]

    return generated_texts[0]


**Note:**
Now in the following cells we will focus on creating a prompt iteratively for the function **initial_score()** such that we can use binary classification to classify whether each of the must have skill is present in the resume or not.


The provided code allows a user to select a desired number of resumes to process from a total set, with a default of 2 resumes if no input is given. The **user_select_number_of_resumes** function prompts the user for their choice, ensures valid input, and returns the selected number. The main execution block reads the **filtered_applications_summary** data from a JSON file, queries the user for their desired number of resumes using the aforementioned function, and then randomly selects the specified number of resumes from the total set, storing the result in the **selected_applications** variable.







In [None]:
import json
import random


def user_select_number_of_resumes(total_resumes, default=2):
    """
    Allow the user to input a number of resumes to process.
    If no input is given, the default value is returned.

    Args:
    - total_resumes (int): Total number of resumes available.
    - default (int): The default number to return if no input.

    Returns:
    - int: The number of resumes the user wants to process.
    """
    print(f"Total resumes available: {total_resumes}")
    user_input = input(f"How many resumes do you want to process? (Default is {default}): ")

    # If the user doesn't provide any input, return the default value.
    if not user_input:
        return default

    try:
        # Convert user input to an integer and ensure it's within the range.
        selected_num = int(user_input)
        if 1 <= selected_num <= total_resumes:
            return selected_num
        else:
            print(f"Please select a number between 1 and {total_resumes}.")
            return user_select_number_of_resumes(total_resumes, default)
    except ValueError:
        # If the user provides non-numeric input, prompt them again.
        print("Please enter a valid number.")
        return user_select_number_of_resumes(total_resumes, default)

# Read the filtered_applications_summary data from the JSON file
json_data = read_json('/content/filtered_applications_summary.json')

# Display total resumes and get the user's choice
n = user_select_number_of_resumes(len(json_data))

# Randomly select n resumes
selected_applications = random.sample(json_data, n)

Total resumes available: 12
How many resumes do you want to process? (Default is 2): 3


The code provides functionality to extract and reorganize files from a given zip archive. After reading job requirements from a JSON file, the **extract_and_rename** function unzips the contents of a specified zip file (like "**Webinar_resumes.zip**") into a directory (defaulted as "**extracted_files**"). If the directory to extract to doesn't exist, it's created; if it's already populated, extraction is skipped. Post-extraction, the function scans the contents, and if it finds any directories with spaces in their names, it renames them by replacing spaces with underscores. If the directory with the new name already exists, it transfers files from the old directory to the new one and then deletes the old directory. The function finally returns the path of the reorganized or main content directory. The main execution block then calls this function with the given zip file path and stores the result in the **resume_path** variable.

In [None]:
import zipfile
import shutil
job_requirements = read_requirements('/content/requirements_output.json')
must_have_skills = job_requirements["must_have_skills"]
zip_file_path = "/content/Webinar_resumes.zip" # For example give the path to resume_data.zip

def extract_and_rename(zip_file_path, extract_path="extracted_files"):
    """
    Extract files from a zip archive to a specified directory.
    Rename directories containing spaces to use underscores instead.

    Args:
    - zip_file_path (str): The path to the zip file to be extracted.
    - extract_path (str, optional): The path where the zip file content should be extracted to.
                                    Defaults to "extracted_files".

    Returns:
    - str: Path to the resume or directory.
    """
    # Check if extract_path exists, if not, create it
    if not os.path.exists(extract_path):
        os.makedirs(extract_path)

    # If extract_path is not empty, skip extraction
    if not os.listdir(extract_path):
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)

    resume_path = extract_path
    for item in os.listdir(extract_path):
        item_path = os.path.join(extract_path, item)

        # Check if the current item is a directory and if it has spaces in its name
        if os.path.isdir(item_path) and ' ' in item:
            new_name = item.replace(' ', '_')
            new_path = os.path.join(extract_path, new_name)

            # If the new directory name doesn't already exist, create it
            if not os.path.exists(new_path):
                os.makedirs(new_path)

            # Copying contents from the old directory to the new one
            for sub_item in os.listdir(item_path):
                shutil.copy2(os.path.join(item_path, sub_item), new_path)

            # Removing the old directory
            shutil.rmtree(item_path)
            resume_path = new_path
        else:
            resume_path = item_path

    return resume_path
resume_path = extract_and_rename(zip_file_path)



# Basic Evaluation of the Resume

## *Goal*: Extract projects in which a particular skill has been applied.
## *Reason*: It serves as an introductory step to understand the extent of a model's capability in extracting skill-specific projects.
# Prompt Version 1 :

In [None]:
prompt_version_1 = f'''You are an assistant to a recruiter. Your job is to evaluate a resume for a particular skill. The skills for which you need to do
evaluation are {must_have_skills}. You need to find the projects in which a particular skill from this list has been applied.'''

for application in selected_applications:
    if 'resume_path' in application and 'email_id' in application:

        # Extract resume text
        resume_text = read_document(os.path.join(resume_path, application['resume_path']))
        resume_text, _, _ = check_and_trim(resume_text)

        # Directly assign the resume_summary without json.loads()
        resume_summary = application['resume_summary']

        # Score the resume against job requirements
        initial_score_output = initial_score(resume_text, prompt_version_1)

        print(f'''[Matching Request] for {resume_summary["name_of_candidate"]} ''', initial_score_output)

[Matching Request] for Soso Sukhitashvili  Based on the information provided in the resume, the projects in which the skills have been applied are:

- Computer Vision: 
  - Face recognition
  - Image similarity search
  - Object tracking
  - Object detection
  - Object segmentation
  - OCR (Optical Character Recognition)

- Natural Language Processing:
  - Sentiment analysis
  - Machine translation
  - Text classification
  - Named entity recognition
  - Speech recognition

It is important to note that the resume does not specifically mention the use of TensorFlow, Keras, or PyTorch for these projects, but the skills in Python and deep learning suggest that these frameworks could have been used.
[Matching Request] for Derrick I.C. VAN FRAUSUM  Based on the given resume, here are the projects in which the mentioned skills have been applied:

1. Computer Vision:
   - Estimate pose & predict sport movements; remove background from images
   - Automatic detection of intro, video content, a


# Output:


```
[Matching Request] for Ayşin Sancı  Based on the information provided, the projects in which the required skills have been applied are as follows:

1. Altinay Robot Technologies – Senior Software Engineer:
   - Built a Computer Vision application to communicate with Robotics Systems using Python and Cloud.
   - Applied AI, Machine Learning, and Computer Vision on Robotics project using Python, Java, and C++ on Linux.
   - Created Connect4 game that humans play vs robot using AI algorithms such as MinMax and Alpha-Beta pruning.
   - Created an Augmented Reality application that displays 3D CAD images of the Robots.
   
2. Event Gates – Software Engineer:
   - Created Computer Vision algorithm using Pytorch and AWS Cloud to ensure the reliability of gate security system.
   - Built Deep Learning models with Python, Pytorch.
   
3. Projects:
   - Speech Recognition - github.com/Ayshine/AI - Speech Recognition
   - Machine Translation - github.io/Ayshine/NLP - Machine Translation
[Matching Request] for Abhilash Babu  Projects in which Abhilash Babu has applied the skills 'TensorFlow', 'Keras', 'PyTorch', 'Computer Vision' are:

1. Jan 2020 - Feb 2022: Implemented Deep learning model to detect facial attributes like open-closed eye, face occlusion, etc using Tensorflow2. Also, explored model interpretation using GradCAM, GradCAM++. This project involved the use of the TensorFlow framework for computer vision tasks.

2. Aug 2016 - Dec 2019: Developed machine learning models for defect classification using Logistic Regression and Naive Bayes model. This project involved the use of machine learning techniques and frameworks such as TensorFlow and Keras for computer vision tasks.

3. Jan 2013 - Jul 2016: Implemented machine vision algorithms and workflows to inspect LTCC and HTCC substrates based on the Golden Template method. Also, developed custom inspection scripts using Halcon Machine Vision library. This project involved the use of computer vision techniques and libraries, including TensorFlow and Keras.

Overall, Abhilash Babu has experience applying TensorFlow, Keras, and PyTorch for computer vision tasks in different projects throughout their career.
```



# Introduction of Structured Output

## *Goal*: Return the model's response in a JSON format.
## *Reason*: The answers are very descriptive and is written like a short essay with bullet points. So we inform it to give the output in JOSN format. By structuring the response, it becomes easier to programmatically assess and use the output. JSON is a widely-accepted format for structured data.
# Prompt Version 2 :

In [None]:
prompt_version_2 = f'''You are an assistant to a recruiter. Your job is to evaluate a resume for a particular skill.  The skills for which you need to do
evaluation are {must_have_skills}.  You need to find the projects in which a particular skill from this list has been applied. Return your response as a JSON'''

for application in selected_applications:
    if 'resume_path' in application and 'email_id' in application:

        # Extract resume text
        resume_text = read_document(os.path.join(resume_path, application['resume_path']))
        resume_text, _, _ = check_and_trim(resume_text)

        # Directly assign the resume_summary without json.loads()
        resume_summary = application['resume_summary']

        # Score the resume against job requirements
        initial_score_output = initial_score(resume_text, prompt_version_2)

        print(f'''[Matching Request] for {resume_summary["name_of_candidate"]} ''', initial_score_output)


[Matching Request] for Soso Sukhitashvili  {
  'TensorFlow': [
    'Computer vision',
    'Natural language processing'
  ],
  'Keras': [],
  'PyTorch': [
    'Computer vision'
  ],
  'Computer Vision': [
    'Face recognition',
    'Image similarity search',
    'Object tracking',
    'Object detection',
    'Object segmentation'
  ]
}
[Matching Request] for Derrick I.C. VAN FRAUSUM  {
  "TensorFlow": "Computer Vision: estimate pose & predict sport movements; remove background from images",
  "Keras": "",
  "PyTorch": "",
  "Computer Vision": "Computer Vision: estimate pose & predict sport movements; remove background from images"
}
[Matching Request] for Naren Sadhwani  {
  "projects": [
    {
      "title": "Machine Learning Intern",
      "skills": [
        "Tensorflow"
      ]
    },
    {
      "title": "Deep Learning Specialization",
      "skills": [
        "Tensorflow",
        "Pytorch"
      ]
    },
    {
      "title": "Machine Learning /AI Engineer Path",
      "skills"

# Output:


```
prompt_version_2 = f'''You are an assistant to a recruiter. Your job is to evaluate a resume for a particular skill.  The skills for which you need to do
evaluation are {must_have_skills}.  You need to find the projects in which a particular skill from this list has been applied. Return your response as a JSON'''

Output:
[Matching Request] for Joseph Adeola  {
  "TensorFlow": ["iToBos project"],
  "Keras": ["iToBos project"],
  "PyTorch": [],
  "Computer Vision": ["iToBos project", "Feature Tracker using ICP Algorithm for Event-based Pose Estimation on DAVIS346 Camera Sensor", "Stereo Visual Odometry Using UTIAS Dataset", "Camera Calibration, Pose Estimation, and Augmented Reality using Aruco Markers", "Underwater Image Analysis and Registration", "Epipolar Geomerty and Stereo", "Facial Expression Recognition using Transfer Learning with RESNET-18 on Nvidia Jetson Nano"]
}
[Matching Request] for Soso Sukhitashvili  {
  "TensorFlow": [
    "None"
  ],
  "Keras": [
    "None"
  ],
  "PyTorch": [
    "Deep Learning Engineer / Algorithm Developer - worked with PyTorch in computer vision projects such as object detection, object tracking, and image segmentation"
  ],
  "Computer Vision": [
    "Deep Learning Engineer / Algorithm Developer - worked on computer vision projects including object detection, object tracking, image similarity search, and OCR"
  ]
}

```



# Binary Scoring of Skills

## *Goal*: Mark each skill as '1' (present) or '0' (absent) in the resume.
## *Reason*: The problem with the above is that the projects selected by the assistant for each of the must have skills varies a lot, each time we run the above cell.OpenAI GPT 3.5 is good for binary classification so for that reason we ask the assistant to give score 1 or 0 based on the fact whether a must have skill is present in the resume or not. Binary classification is straightforward, making it easier for human evaluators to quickly ascertain if a candidate possesses a particular skill

# Prompt Version 3 :


In [None]:

prompt_version_3 = f'''You are an assistant to a recruiter. Your job is to evaluate a resume for a particular skill.  The skills for which you need to do
evaluation are {must_have_skills}. You need to find the projects in which a particular skill from this list has been applied. For each skill, \
mark it 1 if there are projects related to the skill in the resume otherwise mark it 0. Now do this for all the must have skills. \
Return your response as a JSON'''

for application in selected_applications:
    if 'resume_path' in application and 'email_id' in application:

        # Extract resume text
        resume_text = read_document(os.path.join(resume_path, application['resume_path']))
        resume_text, _, _ = check_and_trim(resume_text)

        # Directly assign the resume_summary without json.loads()
        resume_summary = application['resume_summary']

        # Score the resume against job requirements
        initial_score_output = initial_score(resume_text, prompt_version_3)

        print(f'''[Matching Request] for {resume_summary["name_of_candidate"]} ''', initial_score_output)


[Matching Request] for Soso Sukhitashvili  {
  "TensorFlow": 0,
  "Keras": 0,
  "PyTorch": 1,
  "Computer Vision": 1
}
[Matching Request] for Derrick I.C. VAN FRAUSUM  {
  "TensorFlow": 1,
  "Keras": 0,
  "PyTorch": 1,
  "Computer Vision": 1
}
[Matching Request] for Naren Sadhwani  {
  "TensorFlow": 1,
  "Keras": 0,
  "PyTorch": 1,
  "Computer Vision": 1
}


# Output


```
[Matching Request] for Ayşin Sancı  {
  "TensorFlow": 0,
  "Keras": 0,
  "PyTorch": 1,
  "Computer Vision": 1
}
[Matching Request] for Abhilash Babu  {
  "TensorFlow": 1,
  "Keras": 1,
  "PyTorch": 1,
  "Computer Vision": 1
}
```



# Justification for Each Skill Score

## *Goal*: Provide a summary or justification for each skill's score.
## *Reason*: Merely scoring isn't enough. A recruiter or hiring manager would want to know the context or basis on which a score was assigned. Summaries give insights into the candidate's proficiency in a particular skill. Now the above output looks great but we also need justification or a summary about the projects corresponding to each skill based on which the assistant gave the score. So for each skill we ask the assistant to give 2 more information that is "score" and "summary"

# Prompt Version 4 :

In [None]:
prompt_version_4 = f'''You are an assistant to a recruiter. Your job is to evaluate a resume for a particular skill.  The skills for which you need to do
evaluation are {must_have_skills}. You need to find the projects in which a particular skill from this list has been applied. For each skill, \
use the technical skill as the key and this key will further have 2 more keys "summary" and "score". "score" is 0 if there is no project using this \
particular skill and "summary" is empty and "score" is 1 if you find a project with the particular technical skill, then use "summary" to \
explain the project. Now do this for all the must have skills. \
Return your response as a JSON'''

for application in selected_applications:
    if 'resume_path' in application and 'email_id' in application:

        # Extract resume text
        resume_text = read_document(os.path.join(resume_path, application['resume_path']))
        resume_text, _, _ = check_and_trim(resume_text)

        # Directly assign the resume_summary without json.loads()
        resume_summary = application['resume_summary']

        # Score the resume against job requirements
        initial_score_output = initial_score(resume_text, prompt_version_4)

        print(f'''[Matching Request] for {resume_summary["name_of_candidate"]} ''', initial_score_output)

[Matching Request] for Soso Sukhitashvili  {
  "TensorFlow": {
    "summary": "",
    "score": 0
  },
  "Keras": {
    "summary": "",
    "score": 0
  },
  "PyTorch": {
    "summary": "I have experience working with PyTorch in my role as a deep learning engineer. I have used PyTorch for projects such as object detection, image similarity search, and sentiment analysis.",
    "score": 1
  },
  "Computer Vision": {
    "summary": "I have extensive experience working on computer vision projects, including face recognition, image similarity search, object tracking, object detection, and object segmentation.",
    "score": 1
  }
}
[Matching Request] for Derrick I.C. VAN FRAUSUM  {
  "TensorFlow": {
    "summary": "Computer Vision: estimate pose & predict sport movements; remove background from images",
    "score": 1
  },
  "Keras": {
    "summary": "",
    "score": 0
  },
  "PyTorch": {
    "summary": "",
    "score": 0
  },
  "Computer Vision": {
    "summary": "Computer Vision: estimate 

# Output:


```
[Matching Request] for Ayşin Sancı  {
  "TensorFlow": {
    "summary": "",
    "score": 0
  },
  "Keras": {
    "summary": "",
    "score": 0
  },
  "PyTorch": {
    "summary": "Created Computer Vision algorithm using Pytorch and AWS Cloud to ensure the reliability of gate security system.",
    "score": 1
  },
  "Computer Vision": {
    "summary": "Built a Computer Vision application to communicate with Robotics Systems using Python and Cloud.",
    "score": 1
  }
}
[Matching Request] for Abhilash Babu  {
  "TensorFlow": {
    "summary": "Implemented Deep learning model to detect facial attributes like open-closed eye, face occlusion etc using Tensorflow2. Exploration of model interpretation using GradCAM, GradCAM++. Implemented Object detection using YOLO family models. Inference of pre-trained models using ONNX runtime.",
    "score": 1
  },
  "Keras": {
    "summary": "",
    "score": 0
  },
  "PyTorch": {
    "summary": "Developed and deployed machine learning solutions for a variety of applications, including object detection, image classification, image segmentation. Possesses a deep understanding of classical computer vision techniques as well as the latest advancements in deep learning frameworks such as TensorFlow and PyTorch.",
    "score": 1
  },
  "Computer Vision": {
    "summary": "Implemented Deep learning model to detect facial attributes like open-closed eye, face occlusion etc using Tensorflow2. Exploration of model interpretation using GradCAM, GradCAM++. Implemented Object detection using YOLO family models. Evaluation of background elimination (video matting) models like Bodypix, MODNet etc. Implemented machine vision algorithms and workflows in C++ to inspect documents like Passport, ID cards etc. Developed machine learning models for defect classification using Logistic Regression and Naive Bayes model. Developed GUI application using Qt framework for creating Machine vision workflows for inspection, measurement and system calibration. Integrated GigE and USB Cameras into the software using corresponding vendor SDKs. Integrated 3D Depth sensors into the software for inspecting seam of the passport. Implemented machine vision algorithms and workflows to inspect LTCC and HTCC substrates based on Golden Template method. Integrated Cameras, Frame grabbers, strobe controllers and motor controllers into the software using vendor SDKs. Developed GUI application using .NET WPF technology to configure and create inspection workflows and to log inspection results in MySQL database. Implemented custom inspection scripts using Halcon Machine Vision library, which could then be used in the inspection software as plugins.",
    "score": 1
  }
}
```



The primary objective of this notebook is to create a prompt capable of categorizing essential "must-have" skills. These skills are scrutinized based on specific criteria to ascertain their inclusion or exclusion within a candidate's resume. To ensure seamless integration with subsequent assignments, the output is meticulously structured in JSON format. Furthermore, we have taken rigorous measures to maintain consistency in the JSON output keys, ensuring that data can be accessed and evaluated efficiently in future processes. This approach not only streamlines the initial assessment of candidates but also sets the stage for advanced analytics and evaluations.

To evaluate the compatibility of a resume with the job description (JD), we'll employ a two-tiered scoring system:

`Binary Scoring`: This method assesses the presence or absence of `must-have` skills from the JD in the resume. Each skill is given a score of '1' if present or '0' if absent.

`Project-based Scoring`: For each of the "must-have" skills identified in the resume, candidates will receive a score ranging from 0 to 5, based on the number of projects where they've demonstrated the skill.

The final score for each skill is derived by multiplying the scores from these two criteria. The binary score serves as a filter. If a skill is not present in the resume (a binary score of '0'), then even if it has a non-zero score from the project-based evaluation, the multiplication will result in a final score of '0' for that skill. This system ensures that only the truly relevant skills, as indicated by the JD, are taken into account when gauging a candidate's proficiency.
We have implemented the `Binary Scoring` system, next we are going to implement it in the later assignments. Also we will compare the performance of GPT-3.5 with GPT-4 for the `Project-based Scoring` system. As using GPT-3.5 is cost effective. In case `Binary Scoring` we have already achieved good results with GPT-3.5 but `Project-based Scoring` is a bit complicated and therefore we need to compare the performance of both the versions of GPT