# Job Description Processor

This notebook showcases a program that helps recruiters in processing and extracting vital information from a job description. It uses the GPT-3.5-turbo model to extract details such as years of experience, CTC, notice period, educational requirements, technical skills, soft skills, and more.

# Learning Objective:
1. Understand the challenges in extracting structured information from unstructured job descriptions.
2. Iterate and optimize GPT-3.5-turbo model prompts to improve the quality and format of extracted information.
3. Transform model outputs into a JSON format, ensuring consistency and usability for downstream applications.




Upload the .env file to the directory `/content/` which contains the "OPENAI_API_KEY"

In [3]:
# Libraries Installation
!pip install openai
# Required Libraries
import openai
import json
import os
from collections import OrderedDict


Collecting openai
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.0


We set up our environment to use OpenAI's API for extracting information from Job Descriptions (JD). We'll use Python as our primary language and leverage the OpenAI library to interact with OpenAI's services


Read the "OPENAI_API_KEY" from the .env file

In [4]:
# Export your API Key to environment variable
# Upload the .env file to the directory "/content/"
!pip install python-dotenv
from dotenv import load_dotenv
load_dotenv()

In [5]:
import openai
# Retrieve the API key from environment variable
openai_api_key =  os.getenv("OPENAI_API_KEY")

# Set the API key for OpenAI
openai.api_key = openai_api_key

The following class, `RequirementProcessor`, is designed to process job descriptions (JDs) and extract relevant information. Upon initialization, it specifies the GPT-3.5 model (with a size of 16k tokens) to be used. The primary method, `process_requirement`, takes in a job description and a system prompt. It then crafts a structured message for the OpenAI API, setting the role as either `system` or `user` and providing the corresponding content. The model's response is generated with specific parameters like `temperature` and `max_tokens` to influence its randomness and length. The method ultimately returns the stripped content from the model's response, which is the processed information extracted from the JD.

In [6]:
# Defining a class called RequirementProcessor
class RequirementProcessor:
    def __init__(self):
        # During initialization, we set the model to a specific GPT model name.
        self.model = "gpt-3.5-turbo-16k"

    def process_requirement(self, requirement_description, system_prompt):
        """
        Generates a response based on the given requirement description and system prompt using the OpenAI API.

        Parameters:
        - requirement_description (str): The description of the requirement.
        - system_prompt (str): The initial system prompt for context.

        Returns:
        - str: The generated text based on the provided input.
        """

        # Create a list of messages to send to the OpenAI API.
        # The first message is from the system for context and the second is the user's requirement.
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": requirement_description},
        ]

        # Set the temperature (affects randomness of output) and maximum token limit for the response.
        temperature = 1
        max_tokens = 2000

        # Call the OpenAI API's ChatCompletion method with the defined model and parameters.
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens
        )

        # Extract and strip the generated content from the API response.
        generated_text = response["choices"][0].message["content"].strip()

        # Return the processed/generated text.
        return generated_text


#### **Job Description (JD)**
#### **Client**: BIGVISION AI
#### **Website**: https://bigvision.ai/

#### **Position Name**: Software Engineer - Computer Vision
#### **Location**: San Francisco
#### **Experience Required**: Intern to 3 Years (0-3) years
#### **Salary Budget(Min - Max)**:  (100k-150k) dollars p.a

#### **Description**:
Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span:

1. Image recognition
2. Object detection and tracking
3. Image segmentation
4. Human body pose estimation
5. Optical character recognition (OCR)
6. Face detection and recognition
7. Augmented reality
8. 3D reconstruction
9. Medical image processing, and many more.
10. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi.

Role Responsibilities:

1. Engage in both research and the productization of new computer vision/ML features.
2. Continuously enhance and optimize existing software modules.

Key Skills Required:

1. Significant experience in the image/video domain.
2. Proficiency in writing production-quality Python code.
3. Hands-on experience with OpenCV for developing Computer Vision applications.
4. Acquaintance with C++ and MATLAB is a plus.
5. Experience with prevalent learning frameworks, including TensorFlow, PyTorch, and CAFFE.
6. A strong understanding of Algorithms and Data-structures.
7. Familiarity with code optimization techniques for memory and performance utilizing profiling tools, is an advantage.

Other Desirable Qualities:

1. Effective communication skills.
2. Ability to be an efficient team player.
3. Capability to work independently on assigned tasks.
4. Critical thinking abilities.
5. Zeal for acquiring and implementing new skills.

#### **Must Have Skills**:

TensorFlow, Keras, PyTorch, Computer Vision



The provided code snippet prompts the user for various job description (JD) details, starting with the position name. The user is then asked to provide a series of descriptions related to the job, ending the input series with the keyword 'STOP'. Next, specific information about the client, experience requirements, salary budget, job location, notice period, and mandatory skills is gathered. All the collected information is then formatted into a structured `requirement_description` string, which consolidates the JD for further processing. After gathering all details, a confirmation message is displayed, indicating the successful extraction of the JD information.

In [7]:
processor = RequirementProcessor()
position_name = input("What is the name of the position? ")

print("[Analyse JD] Important descriptions for the job (Enter 'STOP' when you are done):")
desc_content = []
while True:
    line = input()
    if line.strip() == 'STOP' or line.strip() == 'stop':
        break
    desc_content.append(line)
desc_content = '\n'.join(desc_content)
print("[Analyse JD] Description uploaded")

client_name = input("Name of the client: ")
min_exp = input("Minimum experience in years: ")
max_exp = input("Maximum experience in years: ")
min_budget = input("Minimum salary budget (p.a): ")
max_budget = input("Maximum salary budget (p.a): ")
location_info = input("Location information: ")
notice_period = input("Notice period: ")
mandatory_skills = input("Enter the mandatory or must have skills for the job requirement: ")
requirement_description = f'''Position name: {position_name}, description of the job and requirements: {desc_content}, Name of the client: {client_name}, \
Minimum experience required: {min_exp}, Maximum experience required: {max_exp}, Minimum budget for the salart p.a: {min_budget}, Maximum budget for the salart p.a: {max_budget}\
Location of the job: {location_info}, Notice period: {notice_period}, Mandatory or Must have skills required for this job: {mandatory_skills}'''

What is the name of the position? Software Engineer - Computer Vision
[Analyse JD] Important descriptions for the job (Enter 'STOP' when you are done):
Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span:  Image recognition Object detection and tracking Image segmentation Human body pose estimation Optical character recognition (OCR) Face detection and recognition Augmented reality 3D reconstruction Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi. Role Responsibilities:  Engage in both research and the productization of new computer vision/ML features. Conti

# Output:


```
What is the name of the position? Software Engineer - Computer Vision
[Analyse JD] Important descriptions for the job (Enter 'STOP' when you are done):
Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span:  Image recognition Object detection and tracking Image segmentation Human body pose estimation Optical character recognition (OCR) Face detection and recognition Augmented reality 3D reconstruction Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi. Role Responsibilities:  Engage in both research and the productization of new computer vision/ML features. Continuously enhance and optimize existing software modules. Key Skills Required:  Significant experience in the image/video domain. Proficiency in writing production-quality Python code. Hands-on experience with OpenCV for developing Computer Vision applications. Acquaintance with C++ and MATLAB is a plus. Experience with prevalent learning frameworks, including TensorFlow, PyTorch, and CAFFE. A strong understanding of Algorithms and Data-structures. Familiarity with code optimization techniques for memory and performance utilizing profiling tools, is an advantage. Other Desirable Qualities:  Effective communication skills. Ability to be an efficient team player. Capability to work independently on assigned tasks. Critical thinking abilities. Zeal for acquiring and implementing new skills.
stop
[Analyse JD] Description uploaded
Name of the client: BIGVISION AI
Minimum experience in years: 0
Maximum experience in years: 3
Minimum salary budget (p.a): 100k dollars
Maximum salary budget (p.a): 150k dollars
Location information: San Francisco
Notice period: 1 month
Enter the mandatory or must have skills for the job requirement: TensorFlow, Keras, PyTorch, Computer Vision
```



#**Iterative refinement of the model prompt for optimal JD extraction:**

In [9]:
system_prompt_v1 = '''Give information about year of experience, CTC, notice period and educational requirement if available in the description. \
If not mark it as none. Also, summarize the technical skills needed.\
Summarize other needed skills like soft skills. \
Also, add must have skills - these are the most important requirements from the job description, clearly mentioned as must have skills. \
location is the place where the person will be posted. \
'''

extracted_info = processor.process_requirement(requirement_description, system_prompt_v1)
print("extracted information", extracted_info)

extracted information Year of Experience: 0-3 years
CTC: $100k-$150k per year
Notice Period: 1 month
Educational Requirement: None mentioned

Technical Skills Needed:
- Significant experience in the image/video domain
- Proficiency in writing production-quality Python code
- Hands-on experience with OpenCV for developing Computer Vision applications
- Acquaintance with C++ and MATLAB is a plus
- Experience with prevalent learning frameworks, including TensorFlow, PyTorch, and CAFFE
- Strong understanding of Algorithms and Data-structures
- Familiarity with code optimization techniques for memory and performance utilizing profiling tools

Other Skills:
- Effective communication skills
- Ability to be an efficient team player
- Capability to work independently on assigned tasks
- Critical thinking abilities
- Zeal for acquiring and implementing new skills

Must Have Skills:
- TensorFlow, Keras, PyTorch, Computer Vision

Location: San Francisco


## *Begin with a simple extraction. Check basic details like years of experience, CTC, and notice period*.
## *As a starting point, we need to establish the fundamental requirements without overloading the model.*
## Prompt Version 1:

system_prompt_v1 = '''Give information about year of experience, CTC, notice period and educational requirement if available in the description. If not mark it as none. Also, summarize the technical skills needed. Summarize other needed skills like soft skills. Also, add must have skills - these are the most important requirements from the job description, clearly mentioned as must have skills. location is the place where the person will be posted.'''


 ## Understanding the prompt:
1. **Year of Experience**:
    - **Explanation**: This is asking for the number of years of professional experience that is required or preferred for the job position. It is common to find phrases such as "2-3 years of experience in XYZ" in job descriptions. We are instructing the model to locate such information from the job description and specify it. If the information is not available, mark it as "none".

2. **CTC (Cost to Company)**:
    - **Explanation**: This refers to the total salary package offered by the company which includes the gross salary along with other benefits and bonuses. In the job description, it might be mentioned explicitly as the "CTC" or might be referred to in terms such as "salary package" or "compensation". If such details are available, list them; otherwise, indicate it as "none".

3. **Notice Period**:
    - **Explanation**: This is the period an employee must serve from the time they announce their resignation until their last working day. It might be mentioned in terms such as "available to start in X weeks/months" or "notice period of X months". If this information is in the description, note it down; if not, mark it as "none".

4. **Educational Requirement**:
    - **Explanation**: This pertains to the necessary educational background that the job demands. It may involve having a specific degree, diploma, or certification. If the information is available in the description, provide details; otherwise, mark it as "none".

5. **Technical Skills**:
    - **Explanation**: These are the skills directly related to the job’s core functions, e.g., programming languages, machinery operation, etc. Summarize the list of technical skills mentioned in the job description into a concise list or description.

6. **Soft Skills**:
    - **Explanation**: Soft skills are non-technical skills that relate to how you work. They include skills such as communication, critical thinking, and leadership. Summarize any mentioned soft skills required for the job from the job description into a concise list or description.

7. **Must-Have Skills**:
    - **Explanation**: These are the non-negotiable skills that are critical for performing the job successfully. They are clearly labeled as "must-have" in the job description. List down all such skills mentioned in the job description.

8. **Location**:
    - **Explanation**: This is the geographical location where the job role will be based. Look for details regarding the job location in the job description and mention it. If it is not specified, you should mark it as "none".

In summary, your prompt asks to extract and summarize essential details from a job description such as experience required, compensation details, notice period, educational prerequisites, a summary of necessary technical and soft skills, the skills that are absolutely necessary to have, and the job location. If any of this information is missing in the job description, it should be indicated as "none". This detailed extraction will help in getting a structured understanding of the job requirements as per the description.





**Note:**
The output generated by the aforementioned cell is inconsistent, yielding varied results with each execution. To address this, we need to modify the output format, ensuring uniformity and predictability each time the cell is run.

# Output:


```
extracted information Year of Experience: 0-3 years
CTC: $100k-$150k per year
Notice Period: 1 month
Educational Requirement: None mentioned

Technical Skills Needed:
- Significant experience in the image/video domain
- Proficiency in writing production-quality Python code
- Hands-on experience with OpenCV for developing Computer Vision applications
- Acquaintance with C++ and MATLAB is a plus
- Experience with prevalent learning frameworks, including TensorFlow, PyTorch, and CAFFE
- Strong understanding of Algorithms and Data-structures
- Familiarity with code optimization techniques for memory and performance utilizing profiling tools

Other Skills:
- Effective communication skills
- Ability to be an efficient team player
- Capability to work independently on assigned tasks
- Critical thinking abilities
- Zeal for acquiring and implementing new skills

Must Have Skills:
- TensorFlow, Keras, PyTorch, Computer Vision

Location: San Francisco
```



# *Convert the extracted information into a structured JSON format for easy data manipulation.*
# *The first output may be in plain text. Structured data in JSON format will be more useful for further data processing tasks.*

# Prompt Version 2:
system_prompt_v2 = '''Give information about year of experience, CTC, notice period and educational requirement if available in the description. \
    If not mark it as none. Also, summarize the technical skills needed.\
    Summarize other needed skills like soft skills. \
    Also, add must have skills - these are the most important requirements from the job description, clearly mentioned as must have skills. \
    location is the place where the person will be posted. The output must be in JSON.\
    '''

## Understanding the Prompt:
Absolutely, I would be happy to help explain how to fulfill this prompt, which asks to extract certain details from a job description and present them in a JSON format:

1. **Year of Experience**:
    - **Explanation**: You need to extract details regarding the required or preferred years of professional experience mentioned in the job description. Find phrases or sentences that specify the number of years of experience needed. If such information is not available, mark this field as "none".

2. **CTC (Cost to Company)**:
    - **Explanation**: CTC refers to the total annual salary package that an employee is offered, including all the monetary benefits. Extract details from the job description regarding the salary package or CTC. If it is not mentioned, denote this category as "none".

3. **Notice Period**:
    - **Explanation**: The notice period is the duration an employee is expected to serve before officially leaving the job, following their resignation. Locate information regarding the notice period in the job description, if mentioned. If not, label it as "none".

4. **Educational Requirement**:
    - **Explanation**: Here you need to find and mention the minimum educational qualifications required for the job as stated in the description. This might include specific degrees, diplomas, or certifications. If not found, write "none".

5. **Technical Skills**:
    - **Explanation**: Identify and summarize the list of technical skills mentioned in the job description. Technical skills are skills required to perform the specific tasks related to the job, often involving the use of specialized tools or knowledge. Summarize these into a concise list.

6. **Soft Skills**:
    - **Explanation**: Soft skills refer to personal attributes and interpersonal skills that facilitate good communication and cooperation with others. Summarize any mentioned soft skills required for the job from the job description.

7. **Must-Have Skills**:
    - **Explanation**: These are skills that are emphasized as being essential for the role in the job description, often mentioned using phrases such as "must-have", "required", "essential", etc. Identify and list all such skills highlighted in the job description.

8. **Location**:
    - **Explanation**: This refers to the geographical place where the job role is based or where the employee will be posted. Find the details about the job location in the job description and mention it in the output.

9. **Output in JSON**:
    - **Explanation**: JSON (JavaScript Object Notation) is a format for structuring data. It is primarily used to transmit data between a server and web applications as a way to encode data structures. In this case, the extracted details must be presented in a structured JSON format. It would look something like:
      ```json
      {
          "Experience Required": "3-5 years",
          "CTC": "none",
          "Notice Period": "2 months",
          "Educational Requirement": "Bachelor's degree in Computer Science",
          "Technical Skills": "Java, Python, SQL",
          "Soft Skills": "Communication, Teamwork",
          "Must Have Skills": "Java, Python",
          "Location": "San Francisco, CA"
      }
      ```
   
Using this prompt, one can extract the necessary details from the job description and represent them in the required JSON format. It facilitates a structured and organized way to present the details extracted from the job descriptions.


In [10]:
system_prompt_v2 = '''Give information about year of experience, CTC, notice period and educational requirement if available in the description. \
    If not mark it as none. Also, summarize the technical skills needed.\
    Summarize other needed skills like soft skills. \
    Also, add must have skills - these are the most important requirements from the job description, clearly mentioned as must have skills. \
    location is the place where the person will be posted. The output must be in JSON.\
    '''
extracted_info = processor.process_requirement(requirement_description, system_prompt_v2)
print("extracted information", extracted_info)

extracted information { 
  "Position": "Software Engineer - Computer Vision",
  "Description": "Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span: Image recognition, Object detection and tracking, Image segmentation, Human body pose estimation, Optical character recognition (OCR), Face detection and recognition, Augmented reality, 3D reconstruction, Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi.",
  "Requirements": [
    "Significant experience in the image/video domain.",
    "Proficiency in writing production-quality Python code.",
    "Hands-on experi

**Note:** In the above case, the keys in the JSON output keeps on changing on multiple runs, that is why the next task is to stabilise the keys.

# Output:


```
extracted information {
  "Position": "Software Engineer - Computer Vision",
  "Description": "Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span: Image recognition, Object detection and tracking, Image segmentation, Human body pose estimation, Optical character recognition (OCR), Face detection and recognition, Augmented reality, 3D reconstruction, Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi.",
  "Requirements": [
    "Significant experience in the image/video domain.",
    "Proficiency in writing production-quality Python code.",
    "Hands-on experience with OpenCV for developing Computer Vision applications.",
    "Acquaintance with C++ and MATLAB is a plus.",
    "Experience with prevalent learning frameworks, including TensorFlow, PyTorch, and CAFFE.",
    "A strong understanding of Algorithms and Data-structures.",
    "Familiarity with code optimization techniques for memory and performance utilizing profiling tools is an advantage."
  ],
  "Other Skills": [
    "Effective communication skills.",
    "Ability to be an efficient team player.",
    "Capability to work independently on assigned tasks.",
    "Critical thinking abilities.",
    "Zeal for acquiring and implementing new skills."
  ],
  "Must Have Skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "Client": "BIGVISION AI",
  "Minimum Experience": "0",
  "Maximum Experience": "3",
  "Minimum Salary": "100k dollars",
  "Maximum Salary": "150k dollars",
  "Location": "San Francisco",
  "Notice Period": "1 month"
}
```



# *Ensure the JSON keys are consistent across different runs to make the output predictable.*
# *The previous prompt might have inconsistent key naming. Consistent keys help in automated parsing without surprises.*
# Prompt Version 3:
system_prompt_v3 = '''give information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON.\'''

## Understanding the prompt:
The prompt instructs the entity (in this case, the GPT API) to extract and summarize certain details from a provided job description and represent the information in a JSON format.

All the information needs to be structured in a JSON format, which is a standard data format used to store and transport data in a structured manner. It essentially means that the output should be a text with field names and their corresponding values enclosed in curly braces.

The intention behind the prompt is to have a structured output containing all the necessary details extracted from a job description, which makes it easier to quickly understand the key requirements and details of the job role.


In [11]:
system_prompt_v3 = '''Give information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON.\
'''
extracted_info = processor.process_requirement(requirement_description, system_prompt_v3)
print("extracted information", extracted_info)

extracted information {
  "position_name": "Software Engineer - Computer Vision",
  "description": "Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span:  Image recognition, Object detection and tracking, Image segmentation, Human body pose estimation, Optical character recognition (OCR), Face detection and recognition, Augmented reality, 3D reconstruction, Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi.",
  "role_responsibilities": "Engage in both research and the productization of new computer vision/ML features. Continuously enhance and optimize existing 

**Note**: In the case of the outputs generated by the above cell, we are  getting unwanted JSON keys and values like as shown

```
  "position_name": "Software Engineer - Computer Vision",
  "description": "Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span:  Image recognition, Object detection and tracking, Image segmentation, Human body pose estimation, Optical character recognition (OCR), Face detection and recognition, Augmented reality, 3D reconstruction, Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi.",
  "role_responsibilities": "Engage in both research and the productization of new computer vision/ML features. Continuously enhance and optimize existing software modules.",
```
Therefore, we need to rectify this in the next iteration


# Output:


```
extracted information {
  "position_name": "Software Engineer - Computer Vision",
  "description": "Big Vision AI is a renowned consulting firm specializing in Computer Vision, Deep Learning, Machine Learning, and Artificial Intelligence (AI) research and development. Our areas of expertise span:  Image recognition, Object detection and tracking, Image segmentation, Human body pose estimation, Optical character recognition (OCR), Face detection and recognition, Augmented reality, 3D reconstruction, Medical image processing, and many more. With a strong foundation in libraries like OpenCV, Dlib, as well as frameworks like PyTorch and TensorFlow/Keras, we not only focus on training AI models but also emphasize their performance optimization for deployment on diverse platforms, including the cloud or edge devices such as Raspberry Pi.",
  "role_responsibilities": "Engage in both research and the productization of new computer vision/ML features. Continuously enhance and optimize existing software modules.",
  "key_skills": "Significant experience in the image/video domain. Proficiency in writing production-quality Python code. Hands-on experience with OpenCV for developing Computer Vision applications. Acquaintance with C++ and MATLAB is a plus. Experience with prevalent learning frameworks, including TensorFlow, PyTorch, and CAFFE. A strong understanding of Algorithms and Data-structures. Familiarity with code optimization techniques for memory and performance utilizing profiling tools, is an advantage.",
  "other_desirable_qualities": "Effective communication skills. Ability to be an efficient team player. Capability to work independently on assigned tasks. Critical thinking abilities. Zeal for acquiring and implementing new skills.",
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "technical_summary": "Computer Vision, Deep Learning, Machine Learning, Image recognition, Object detection and tracking, Image segmentation, Human body pose estimation, Optical character recognition (OCR), Face detection and recognition, Augmented reality, 3D reconstruction, Medical image processing, OpenCV, PyTorch, TensorFlow, Keras",
  "soft_skills_summary": "Effective communication, Team player, Independent work, Critical thinking, Continuous learning",
  "year_of_experience": "0-3",
  "CTC": "$100k - $150k",
  "notice_period": "1 month",
  "educational_requirement": "None",
  "location": "San Francisco"
}
```



# *Direct the model explicitly to provide output in a desired JSON format*
# *The last version may produce ambiguous outputs. Direct commands reduce uncertainty, ensuring the output aligns with our format requirements.*

# Prompt Version 4:
system_prompt_v4 = '''Return to me in json format information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON.'''

## Understanding the prmopt:

This latest version of the prompt is designed to guide the extraction of specific information from a job description and to organize that data into a JSON format. Here’s the summary of each part of the instruction:

1. **Return the Following Details in JSON Format**:
    - **Explanation**: The prompt begins with a clear directive to return the gathered information structured in JSON format. JSON format organizes data in key-value pairs, making it easy to read and process.

2. **"year_of_experience"**:
    - **Explanation**: Look for information about the required years of experience in the job description. If the information is not available, it should be reported as "none".

3. **"CTC" (Cost to Company)**:
    - **Explanation**: Find and report the CTC details available in the job description. If such details are not mentioned, it should be reported as "none".

4. **"notice_period"**:
    - **Explanation**: Note the notice period details from the job description. If this information is absent, it should be marked as "none".

5. **"educational_requirement"**:
    - **Explanation**: The user is to find and state the educational requirements mentioned in the job description. In the absence of this information, it should be reported as "none".

6. **"technical_summary"**:
    - **Explanation**: Summarize the technical skills required for the job role as indicated in the job description, consolidating them into a "technical_summary" field.

7. **"soft_skills_summary"**:
    - **Explanation**: Find and summarize the soft skills specified in the job description, and compile them into a "soft_skills_summary" field.

8. **"must_have_skills"**:
    - **Explanation**: Identify and list the skills clearly labeled as essential or "must-have" in the job description, noting them in the "must_have_skills" field.

9. **"location"**:
    - **Explanation**: Find the job location from the job description and mention it in the "location" field. If not specified, it should be mentioned as "none".

10. **Instruction on Output Format**:
    - **Explanation**: The output should strictly be in JSON format, reinforcing the initial directive.

In this refined prompt, you have added a directive at the beginning, instructing clearly to "Return to me in json format", which sets a clear expectation for the kind of output you are looking for, thereby reducing the scope for ambiguous outputs. This makes it more explicit and straightforward, guiding the GPT model to align the output exactly as per your format requirements. It instructs a structured representation of various details extracted from a job description in a JSON format, which ensures a clear and organized presentation of the information.

In [12]:
system_prompt_v4 = '''Return to me in json format information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON.\
'''
extracted_info = processor.process_requirement(requirement_description, system_prompt_v4)
print("extracted information", extracted_info)

extracted information {
  "year_of_experience": "0-3",
  "CTC": "100k-150k dollars",
  "notice_period": "1 month",
  "educational_requirement": "none",
  "technical_summary": [
    "OpenCV",
    "Python",
    "C++",
    "MATLAB",
    "TensorFlow",
    "PyTorch",
    "CAFFE",
    "Algorithms",
    "Data-structures"
  ],
  "soft_skills_summary": [
    "Effective communication",
    "Team player",
    "Independent work",
    "Critical thinking",
    "Zeal for learning"
  ],
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}


**Note**:
We are going to use `json.loads()` to process the outputs in the later Assignments. `json.loads()` throws
`json.JSONDecodeError` if the provided input is not a valid `JSON` string. Examples of them are:
1. Missing double quotes around property names:
  '{year_of_experience: "0-3", CTC: "12-18"}'
2. Trailing comma:
  '{"year_of_experience": "0-3", "CTC": "12-18",}'
3. Single quotes for property names or values:
  "{'year_of_experience': '0-3', 'CTC': '12-18'}"
4. Unescaped double quote inside a property value:
  '{"technical_summary": "Uses the library "OpenCV" extensively."}'
5. Unexpected characters, e.g., double colons:
  '{"year_of_experience":: "0-3", "CTC": "12-18"}'
6. Missing closing brace:
  '{"year_of_experience": "0-3", "CTC": "12-18"'
7. Additional comma in the array-like comma-separated values:
  '{"technical_summary": "image recognition, object detection,,"}'

That is why we need to mention this in the next iteration of the prompt.



# Output:


```
extracted information {
  "year_of_experience": "0-3",
  "CTC": "100k-150k dollars",
  "notice_period": "1 month",
  "educational_requirement": "none",
  "technical_summary": [
    "OpenCV",
    "Python",
    "C++",
    "MATLAB",
    "TensorFlow",
    "PyTorch",
    "CAFFE",
    "Algorithms",
    "Data-structures"
  ],
  "soft_skills_summary": [
    "Effective communication",
    "Team player",
    "Independent work",
    "Critical thinking",
    "Zeal for learning"
  ],
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}
```



# *Ensure the output can be easily parsed by subsequent code, like using `json.loads()`*
# *The prior prompt's response might not be perfectly compatible with `json.loads()`. By specifying the downstream application, we aim for a response that can be parsed without hiccups.*
# Prompt Version 5:
system_prompt_v5 = '''Return to me in json format information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON. I will use json.loads to convert it to a dictionary.'''

## Understanding the prompt:

In this prompt, you are instructing the GPT-3 API to extract specified details from a job description, such as "year_of_experience", "CTC", etc., and present them in a JSON format that can be later parsed into a Python dictionary using the `json.loads()` method. This ensures that the output not only retains the necessary details but is also structured in a way conducive to further programmatic manipulation. It stresses compatibility with subsequent code, highlighting the practical application of the output.



In [13]:
system_prompt_v5 = '''Return to me in json format information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON. I will use json.loads to convert it to a dictionary.\
'''
extracted_info = processor.process_requirement(requirement_description, system_prompt_v5)
print("extracted information", extracted_info)

extracted information {
  "year_of_experience": {
    "minimum": 0,
    "maximum": 3
  },
  "CTC": {
    "minimum": 100000,
    "maximum": 150000
  },
  "notice_period": "1 month",
  "educational_requirement": "none",
  "technical_summary": "Computer Vision, Deep Learning, Machine Learning, OpenCV, PyTorch, TensorFlow/Keras",
  "soft_skills_summary": "Effective communication skills, ability to be an efficient team player, capability to work independently, critical thinking abilities, zeal for acquiring and implementing new skills",
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}


**Note**:
In the case of above cell, it can be seen that on multiple cell runs, the values of `technical_summary` and `soft_skills_summary` sometimes changes to a list and sometimes outputs a string which can cause problems in later stages. So, we need to fix this in the next prompt iteration.

# Output:


```
extracted information {
  "year_of_experience": {
    "minimum": 0,
    "maximum": 3
  },
  "CTC": {
    "minimum": 100000,
    "maximum": 150000
  },
  "notice_period": "1 month",
  "educational_requirement": "none",
  "technical_summary": "Computer Vision, Deep Learning, Machine Learning, OpenCV, PyTorch, TensorFlow/Keras",
  "soft_skills_summary": "Effective communication skills, ability to be an efficient team player, capability to work independently, critical thinking abilities, zeal for acquiring and implementing new skills",
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}
```



# *Guide the model to list technical skills as separate points for clarity*
# *Earlier responses may have clumped skills together as a string. Enumerating skills provides easier and quicker scanning for recruiters or HR systems*
# Prompt Version 6:
system_prompt_v6 = '''Return to me in json format information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field. Provide this technical summary as a list of points.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON. I will use json.loads to convert it to a dictionary.'''

## Understanding the prompt:

In this updated prompt, you're asking the GPT-3 API to extract specified details from a job description and represent them in JSON format, emphasizing the representation of technical skills as a list of individual points for easier readability and analysis. This facilitates more efficient scanning for recruiters and HR systems, allowing for a better organization and quick retrieval of pertinent details, which aids in the smoother downstream processing of the data using `json.loads()`. The objective is to generate a response that neatly compartmentalizes each skill, enhancing clarity.

In [14]:
system_prompt_v6 = '''Return to me in json format information about "year_of_experience", "CTC", "notice_period" and "educational_requirement" if available in the description. \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field. Provide this technical summary as a list of points.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON. I will use json.loads to convert it to a dictionary.\
'''
extracted_info = processor.process_requirement(requirement_description, system_prompt_v6)
print("extracted information", extracted_info)

extracted information {
  "year_of_experience": "0-3",
  "CTC": {
    "minimum": "100k",
    "maximum": "150k"
  },
  "notice_period": "1 month",
  "educational_requirement": "none",
  "technical_summary": [
    "Python",
    "OpenCV",
    "C++",
    "MATLAB",
    "TensorFlow",
    "PyTorch",
    "CAFFE",
    "Algorithms",
    "Data-structures",
    "Profiling tools",
    "Cloud deployment",
    "Edge devices"
  ],
  "soft_skills_summary": [
    "Effective communication",
    "Efficient team player",
    "Ability to work independently",
    "Critical thinking",
    "Zeal for learning and implementing new skills"
  ],
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}


**Note**:
In some outputs, we've observed that fields like `year_of_experience` and `CTC` occasionally include undesired keys such as `minimum`, `maximum`, `min`, or `max`. To ensure consistency and accuracy, these keys need to be removed from the output.

# Output:


```
extracted information {
  "year_of_experience": "0-3",
  "CTC": {
    "minimum": "100k",
    "maximum": "150k"
  },
  "notice_period": "1 month",
  "educational_requirement": "none",
  "technical_summary": [
    "Python",
    "OpenCV",
    "C++",
    "MATLAB",
    "TensorFlow",
    "PyTorch",
    "CAFFE",
    "Algorithms",
    "Data-structures",
    "Profiling tools",
    "Cloud deployment",
    "Edge devices"
  ],
  "soft_skills_summary": [
    "Effective communication",
    "Efficient team player",
    "Ability to work independently",
    "Critical thinking",
    "Zeal for learning and implementing new skills"
  ],
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}

```



# *Extract experience and CTC in a specific format (lower-upper) for consistency*
# *The previous prompt might produce varied formats for experience and CTC like "year_of_experience":{"minimum": 0, "maximum": 3}, "CTC": {"minimum": 12 "maximum": 18}. Uniform data formats simplify comparison across job descriptions and streamline analytics tasks*

# Prompt Version 7:
system_prompt_v7 = '''You are an assistant to the recruiter. \
You need to be very truthful since this information will be used in recruiting a person. \
Return to me in json format, "year_of_experience", "CTC", "notice_period", "educational_requirement" if available in the description, "year_of_experience", "CTC" should only show the lower and upper limit like (lower-upper). \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field for the recruiter to be clear. \
Provide this technical summary as a list of points.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON. I will use json.loads to convert it to a dictionary.'''

## Understanding the prompt:

In this version of the prompt, you are guiding GPT-3 to assist in the recruitment process by extracting vital details from job descriptions and structuring them in a JSON format. It directs the API to present the "year_of_experience" and "CTC" in a specific lower-upper limit format, ensuring consistency and facilitating easier comparison across various job descriptions. The instructions emphasize presenting the technical skills as a list of individual points, maintaining clarity for the recruiter, and ensuring the output is compatible with the `json.loads()` method for smooth data processing in subsequent steps. The prompt insists on truthful and precise information reflecting the necessity of accuracy in recruitment tasks.

In [15]:
system_prompt_v7 = '''You are an assistant to the recruiter. \
You need to be very truthful since this information will be used in recruiting a person. \
Return to me in json format, "year_of_experience", "CTC", "notice_period", "educational_requirement" if available in the description, "year_of_experience", "CTC" should only show the lower and upper limit like (lower-upper). \
If not mark it as none. Also, summarize the technical skills needed in the "technical_summary" field for the recruiter to be clear. \
Provide this technical summary as a list of points.\
Summarize other needed skills like soft skills needed in the "soft_skills_summary" field. \
Also, add "must_have_skills" - these are the most important requirements from the job description, clearly mentioned as must have skills. \
"location" is the place where the person will be posted. The output must be in JSON. I will use json.loads to convert it to a dictionary.\
'''
extracted_info = processor.process_requirement(requirement_description, system_prompt_v7)
print("extracted information", extracted_info)

extracted information {
  "year_of_experience": "0-3",
  "CTC": "100k-150k dollars",
  "notice_period": "1 month",
  "educational_requirement": null,
  "technical_summary": [
    "Significant experience in the image/video domain",
    "Proficiency in writing production-quality Python code",
    "Hands-on experience with OpenCV for developing Computer Vision applications",
    "Experience with prevalent learning frameworks including TensorFlow, PyTorch, and CAFFE",
    "Understanding of Algorithms and Data-structures"
  ],
  "soft_skills_summary": [
    "Effective communication skills",
    "Ability to be an efficient team player",
    "Capability to work independently on assigned tasks",
    "Critical thinking abilities",
    "Zeal for acquiring and implementing new skills"
  ],
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}


# Output:


```
extracted information {
  "year_of_experience": "0-3",
  "CTC": "100k-150k dollars",
  "notice_period": "1 month",
  "educational_requirement": null,
  "technical_summary": [
    "Significant experience in the image/video domain",
    "Proficiency in writing production-quality Python code",
    "Hands-on experience with OpenCV for developing Computer Vision applications",
    "Experience with prevalent learning frameworks including TensorFlow, PyTorch, and CAFFE",
    "Understanding of Algorithms and Data-structures"
  ],
  "soft_skills_summary": [
    "Effective communication skills",
    "Ability to be an efficient team player",
    "Capability to work independently on assigned tasks",
    "Critical thinking abilities",
    "Zeal for acquiring and implementing new skills"
  ],
  "must_have_skills": [
    "TensorFlow",
    "Keras",
    "PyTorch",
    "Computer Vision"
  ],
  "location": "San Francisco"
}
```



In [18]:
import json

def save_to_json(data, filename='data.json'):
    """
    Save a Python data structure to a JSON file.

    Args:
    - data (dict or str): The Python data structure to be saved. Can be a string (that can be loaded as JSON) or a dictionary.
    - filename (str): The name of the JSON file.

    Returns:
    - None
    """
    # Check if the data is already a string and try to load it into a Python object.
    # If it's already a Python object (like a dictionary or list), then pass.
    if isinstance(data, str):
        try:
            data = json.loads(data)
        except json.JSONDecodeError:
            raise ValueError("The provided string is not valid JSON.")
    elif not isinstance(data, (dict, list)):
        raise TypeError("The data should either be a valid JSON string, dictionary, or list.")

    with open(filename, 'w') as f:
        json.dump(data, f)

save_to_json(extracted_info, '/content/requirements_output.json')  # This will save the data to 'my_data.json' file.


Download the file `requirements_output.json` which will be used in the next assignments.

In [17]:
from google.colab import files

# List of file paths that you want to download
file_path = "/content/requirements_output.json"

# Download each file to your local system
files.download(file_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In this notebook, we explored techniques for crafting prompts that effectively extract pertinent information from job descriptions, specifically for the position of 'Software Engineer - Computer Vision'. Additionally, we delved into methods to stabilize outputs and eliminate any inaccuracies or hallucinations. In subsequent assignments, we will leverage this extracted information to filter and evaluate resumes effectively.