<a href="https://colab.research.google.com/github/k4scode/nlp-gpt-3/blob/main/AI_based_recruitment_platform_semantic_search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AI-based recruitment platform.

Have you ever wanted a sourcing platform where you could start a recruitment workflow with this text:


**"Startup is looking for a founder engineer with experience on Ethereum and smart contracts. Experience in frontEnd development with Angular is also desired."**



And then that system returns you a list of candidates as:

CATEGORY: Blockchain
RESUME: Skills Strong CS fundamentals and problem solving Ethereum, Smart Contracts, Solidity skills Golang,…
- -

And for the highest ranked candidate the system would automatically ask and provide the following answers:

1 - What are your main technical skills? 
2 - What was your major at school? 
3 - Please list all the SQL databases you have worked with. 
4 - Can you List all your work experience?

Looks interesting, right? In this notebook, I will show you a quick approach to building a recruitment platform based on AI that implements these nice features described above.

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]
!pip install faiss-gpu

In [None]:
!pip install -U sentence-transformers

In [None]:
!pip install openai

In [3]:
from transformers.pipelines import pipeline

question_answerer = pipeline('question-answering')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [4]:
from datasets import load_dataset

resume_dataset = load_dataset("csv", data_files='UpdatedResumeDataSet.csv', split="train")
resume_dataset



Downloading and preparing dataset csv/default to /root/.cache/huggingface/datasets/csv/default-2ff440ee4e9ba2c3/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-2ff440ee4e9ba2c3/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a. Subsequent calls will reuse this data.


Dataset({
    features: ['Category', 'Resume'],
    num_rows: 962
})

In [5]:
resume_dataset.set_format("pandas")
df = resume_dataset[:]

In [6]:
df.head()

Unnamed: 0,Category,Resume
0,Data Science,Skills * Programming Languages: Python (pandas...
1,Data Science,Education Details \r\nMay 2013 to May 2017 B.E...
2,Data Science,"Areas of Interest Deep Learning, Control Syste..."
3,Data Science,Skills â¢ R â¢ Python â¢ SAP HANA â¢ Table...
4,Data Science,"Education Details \r\n MCA YMCAUST, Faridab..."


In [7]:
from datasets import Dataset

resume_dataset = Dataset.from_pandas(df)
resume_dataset

Dataset({
    features: ['Category', 'Resume'],
    num_rows: 962
})

In [8]:
resume_dataset = resume_dataset.map(
    lambda x: {"resume_length": len(x["Resume"].split())}
)

  0%|          | 0/962 [00:00<?, ?ex/s]

In [9]:
resume_dataset = resume_dataset.filter(lambda x: x["resume_length"] > 15)
resume_dataset

  0%|          | 0/1 [00:00<?, ?ba/s]

Dataset({
    features: ['Category', 'Resume', 'resume_length'],
    num_rows: 962
})

In [10]:
from transformers import AutoTokenizer, AutoModel

model_ckpt = "sentence-transformers/multi-qa-mpnet-base-dot-v1"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModel.from_pretrained(model_ckpt)

Downloading:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [None]:
import torch

device = torch.device("cuda")
model.to(device)

In [12]:
def cls_pooling(model_output):
    return model_output.last_hidden_state[:, 0]

In [13]:
def get_embeddings(text_list):
    encoded_input = tokenizer(
        text_list, padding=True, truncation=True, return_tensors="pt"
    )
    encoded_input = {k: v.to(device) for k, v in encoded_input.items()}
    model_output = model(**encoded_input)
    return cls_pooling(model_output)

In [14]:
embedding = get_embeddings(resume_dataset["Resume"][0])
embedding.shape

torch.Size([1, 768])

In [15]:
embeddings_dataset = resume_dataset.map(
    lambda x: {"embeddings": get_embeddings(x["Resume"]).detach().cpu().numpy()[0]}
)

  0%|          | 0/962 [00:00<?, ?ex/s]

In [16]:
embeddings_dataset.add_faiss_index(column="embeddings")

  0%|          | 0/1 [00:00<?, ?it/s]

Dataset({
    features: ['Category', 'Resume', 'resume_length', 'embeddings'],
    num_rows: 962
})

In [17]:
question = '''
Startup is looking for a founder engineer with experience on Blockchain and smart contracts.
Experience on frontEnd development with Angular is also desired.
'''
question_embedding = get_embeddings([question]).cpu().detach().numpy()
question_embedding.shape

(1, 768)

In [21]:
scores, samples = embeddings_dataset.get_nearest_examples(
    "embeddings", question_embedding, k=10
)

In [22]:
import pandas as pd

samples_df = pd.DataFrame.from_dict(samples)
samples_df["scores"] = scores
samples_df.sort_values("scores", ascending=False, inplace=True)

In [23]:
for _, row in samples_df.iterrows():
    print(f"CATEGORY: {row.Category}")
    print(f"SCORE: {row.scores}")
    print(f"RESUME: {row.Resume}")
    print("=" * 50)
    print()

CATEGORY: Blockchain
SCORE: 33.69643783569336
RESUME: SOFTWARE SKILLS: Languages: C, C++ & java Operating Systems: Windows XP, 7, Ubuntu RDBMS: Oracle (SQL) Database, My SQL, PostgreSQL Markup & Scripting: HTML, JavaScript & PHP, CSS, JQuery, Angular js. Framework: Struts, Hibernate, spring, MVC Web Server: Tomcat and Glassfish. Web Services: REST AND SOAP TRAINING DETAIL Duration: 4 months From: - United Telecommunication Limited Jharnet project (Place - Ranchi, Jharkhand) Networking Requirements: Elementary configuration of router and switch, IP and MAC addressing, Lease Line, OSI Layers, Routing protocols. Status: - Network Designer.Education Details 
    2 High School
 Diploma Government Women Ranchi, Jharkhand The Institution
Blockchain Engineer 

Blockchain Engineer - Auxledger
Skill Details 
JAVA- Exprience - 19 months
CSS- Exprience - 12 months
HTML- Exprience - 12 months
JAVASCRIPT- Exprience - 12 months
C++- Exprience - 6 monthsCompany Details 
company - Auxledger

In [45]:
print(samples_df['Resume'][0])

Skills Strong CS fundamentals and problem solving Ethereum, Smart Contracts, Solidity skills Golang, Node, Angular, React Culturally fit for startup environment MongoDB, PostGresql, MySql Enthusiastic to learn new technologies AWS, Docker, Microservices Blockchain, Protocol, ConsensusEducation Details 
January 2014 M.Tech Computer Engineering Jaipur, Rajasthan Malaviya National Institute Of Technology Jaipur
January 2011 B.E. Computer Science And Engg Kolhapur, Maharashtra Shivaji University
Blockchain Engineer 

Blockchain Engineer - XINFIN Orgnization
Skill Details 
MONGODB- Exprience - 16 months
CONTRACTS- Exprience - 12 months
MYSQL- Exprience - 9 months
AWS- Exprience - 6 months
PROBLEM SOLVING- Exprience - 6 monthsCompany Details 
company - XINFIN Orgnization
description - Xinfin is a global open source Hybrid Blockchain protocol.
Rolled out multiple blockchain based pilot projects on different use cases for various clients. Eg.
Tradefinex (Supply chain Management)

#Lets ask question on the top ranked profile

In [26]:
answer = question_answerer(question='What are your main technical skills?', context=samples_df['Resume'][0])
print(answer)

{'score': 0.7134349346160889, 'start': 2238, 'end': 2312, 'answer': 'bug fixing, DB operations, Feature customisation and writing API endpoints'}


In [28]:
answer = question_answerer(question='What was your major at school?', context=samples_df['Resume'][0])
print(answer)

{'score': 0.00600675493478775, 'start': 2475, 'end': 2486, 'answer': 'IT Services'}


Skills Strong CS fundamentals and problem solving Ethereum, Smart Contracts, Solidity skills Golang, Node, Angular, React Culturally fit for startup environment MongoDB, PostGresql, MySql Enthusiastic to learn new technologies AWS, Docker, Microservices Blockchain, Protocol, ConsensusEducation Details 
January 2014 M.Tech Computer Engineering Jaipur, Rajasthan Malaviya National Institute Of Technology Jaipur
January 2011 B.E. Computer Science And Engg Kolhapur, Maharashtra Shivaji University
Blockchain Engineer 

Blockchain Engineer - XINFIN Orgnization
Skill Details 
MONGODB- Exprience - 16 months
CONTRACTS- Exprience - 12 months
MYSQL- Exprience - 9 months
AWS- Exprience - 6 months
PROBLEM SOLVING- Exprience - 6 monthsCompany Details 
company - XINFIN Orgnization
description - Xinfin is a global open source Hybrid Blockchain protocol.
Rolled out multiple blockchain based pilot projects on different use cases for various clients. Eg.
Tradefinex (Supply chain Management), Land Registry (Govt of MH), inFactor (Invoice Factoring)
Build a secure and scalable hosted wallet based on ERC 20 standards for XINFIN Network.
Working on production level blockchain use cases.
Technology: Ethereum Blockchain, Solidity, Smart Contracts, DAPPs, Nodejs
company - ORO Wealth
description - OroWealth is a zero commision online investment platform, currently focused on direct mutual funds
Build various scalable web based products (B2B and B2C) based on MEAN stack technology and integrated  with multiple finance applications/entities. eg. Integration KYC and MF Entities.
Technology: Node.js, Angular.js, MongoDB, Express
company - YallaSpree
description - Hyderabad, Telangana
Yallaspree is a largest digital shopping directory in U.A.E with over 22K stores.
Own the responsibility to develop and maintain following modules:
- Admin and Vendor interface       - Database operations
- Writing Webservices                      - Complete Notification system
- Events  and Offers Page
Technology: CakePHP (PHP Framework), JQuery, MySql
company - RailTiffin.com
description - Mumbai, Maharashtra
RailTiffin.com is an e-commerce platform to serve food to railway passengers.
Worked on multiple roles like bug fixing, DB operations, Feature customisation and writing API endpoints.
Technology: OpenCart (Ecommerce Framework), JQuery, MySql
company - Accolite Software India Private Limited
description - Bengaluru, KA
Accolite is a global IT Services company headquartered in Dallas, USA with offices in India.
Worked on Birst Analytics Tool to develop, deploy and maintain reports

Answer the following questions:

1 - What are your main  technical skills?

2 - What was your major at schools?

3 - What is your experience with databases?

4 - Can you List all your work experience?

Answers:

1 - My main technical skills are CS fundamentals and problem solving, Ethereum, Smart Contracts, Solidity skills, Golang, Node, Angular, React.

2 - I have M.Tech in Computer Engineering from Jaipur, Rajasthan Malaviya National Institute Of Technology Jaipur.

3 - I have B.E. in Computer Science and Engineering from Kolhapur, Maharashtra Shivaji University.

4 - I have worked in various roles like bug fixing, DB operations, Feature customisation and writing API endpoints.

In [38]:
import openai

COMPLETIONS_MODEL = "text-curie-001"
openai.api_key = 'your key here'

In [33]:
openai_prompt = samples_df['Resume'][0]
openai_prompt += '''

Answer the following questions:

1 - What are your main technical skills?

2 - What was your major at schools?

3 - What is your experience with databases?

4 - Can you List all your work experience?

Answers: 
'''

In [34]:
print(openai_prompt)

Skills Strong CS fundamentals and problem solving Ethereum, Smart Contracts, Solidity skills Golang, Node, Angular, React Culturally fit for startup environment MongoDB, PostGresql, MySql Enthusiastic to learn new technologies AWS, Docker, Microservices Blockchain, Protocol, ConsensusEducation Details 
January 2014 M.Tech Computer Engineering Jaipur, Rajasthan Malaviya National Institute Of Technology Jaipur
January 2011 B.E. Computer Science And Engg Kolhapur, Maharashtra Shivaji University
Blockchain Engineer 

Blockchain Engineer - XINFIN Orgnization
Skill Details 
MONGODB- Exprience - 16 months
CONTRACTS- Exprience - 12 months
MYSQL- Exprience - 9 months
AWS- Exprience - 6 months
PROBLEM SOLVING- Exprience - 6 monthsCompany Details 
company - XINFIN Orgnization
description - Xinfin is a global open source Hybrid Blockchain protocol.
Rolled out multiple blockchain based pilot projects on different use cases for various clients. Eg.
Tradefinex (Supply chain Management)

In [36]:
openai.Completion.create(
    prompt=openai_prompt,
    temperature=0,
    max_tokens=100,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    model=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

'1 - I have strong CS fundamentals and problem solving skills in Ethereum, Smart Contracts, Solidity. \n2 - I am a Computer Engineering graduate from Jaipur, Rajasthan Malaviya National Institute Of Technology. \n3 - I have experience with databases such as MongoDB, PostgreSQL, MySql. \n4 - I have worked in various roles such as Blockchain Engineer, Ethereum Developer, Web Developer, and Database Administrator.'