<a href="https://colab.research.google.com/github/layafakher/Resume_Parsing_Using_Llama2/blob/main/Llama2_FineTuning_For_ResumeParsing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install datasets
!pip install peft
!pip install trl



In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer



#Model configuration

In [None]:
# Model from Hugging Face hub
base_model = "NousResearch/Llama-2-7b-chat-hf"

# Resume dataset
resume_dataset = "gautamsabba/training_data_llama2_resume_distiller"

# Fine-tuned model
new_model = "llama-2-7b-resume-parser"

# Loading dataset, model, and tokenizer

In [None]:
dataset = load_dataset(resume_dataset, split="train")



In [None]:
!pip install bitsandbytes



#8-bit quantization configuration

8-bit quantization via QLoRA allows efficient finetuning of huge LLM models on consumer hardware while retaining high performance. This dramatically improves accessibility and usability for real-world applications.

QLoRA quantizes a pre-trained language model to 8 bits and freezes the parameters. A small number of trainable Low-Rank Adapter layers are then added to the model.

During fine-tuning, gradients are backpropagated through the frozen 8-bit quantized model into only the Low-Rank Adapter layers. So, the entire pretrained model remains fixed at 8 bits while only the adapters are updated. Also, the 8-bit quantization does not hurt model performance.

In [None]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_quant_type="nf4",
    bnb_8bit_compute_dtype=compute_dtype,
    bnb_8bit_use_double_quant=False,
)

In [None]:
!pip install accelerate
!pip install -i https://test.pypi.org/simple/ bitsandbytes

Looking in indexes: https://test.pypi.org/simple/


#Loading Llama 2 model

Note: if it is the first time you run the code, before loading the Llama2 model, you should restart the run time. It is because of installing the accelerate. Otherwise, you may face an error while loading the model.

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]



#Loading tokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

#PEFT parameters

In [None]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

#Training parameters

In [None]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)



Map:   0%|          | 0/3067 [00:00<?, ? examples/s]

In [None]:
trainer.train()

#Model fine-tuning

In [None]:
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)


('llama-2-7b-resume-parser/tokenizer_config.json',
 'llama-2-7b-resume-parser/special_tokens_map.json',
 'llama-2-7b-resume-parser/tokenizer.json')

#Evaluation

In [None]:
logging.set_verbosity(logging.CRITICAL)

prompt = "what does a person who is computer engineer do?"

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=700)
result = pipe(f"<s>[INST] {prompt} [/INST]\n")
print(result[0]['generated_text'])
print(result)



<s>[INST] what does a person who is computer engineer do? [/INST]
A computer engineer is responsible for designing, developing, and testing computer hardware and software systems. Their work involves a wide range of tasks, including:

1. Designing and developing computer hardware, such as processors, memory devices, and input/output devices.
2. Writing and testing software programs to control and interact with computer hardware.
3. Troubleshooting and repairing computer systems and networks.
4. Ensuring that computer systems and networks are secure and functioning properly.
5. Collaborating with other engineers and professionals to design and develop new technologies and products.
6. Staying up-to-date with the latest developments in computer engineering and technology.
7. Documenting and communicating technical information to other team members and stakeholders.
8. Working with cross-functional teams to design and develop new products and features.
9. Conducting research and developme

In [None]:
logging.set_verbosity(logging.CRITICAL)

prompt = "what skills should a computer engineer have?"

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=700)
result = pipe(f"<s>[INST] {prompt} [/INST]\n")
print(result[0]['generated_text'])
print(result)

<s>[INST] what skills should a computer engineer have? [/INST]
As a computer engineer, you will be responsible for designing, developing, and testing computer hardware and software systems. Here are some key skills that you should possess:

1. Programming skills: As a computer engineer, you should have strong programming skills in languages such as C, C++, Java, Python, and MATLAB.
2. Data structures and algorithms: You should be proficient in data structures such as arrays, linked lists, stacks, and queues, and algorithms such as sorting, searching, and graph traversal.
3. Computer architecture: You should have a good understanding of computer architecture, including the design and organization of computer systems, including the von Neumann model, the Harvard model, and RISC and CISC architectures.
4. Digital logic: You should be familiar with digital logic circuits, including Boolean algebra, logic gates, flip-flops, and counters.
5. Microprocessors: You should have knowledge of micr

In [None]:
logging.set_verbosity(logging.CRITICAL)

prompt = "Please extract the following details from the resume text provided below :\n Name of the candidate:\n Contact Details:\n Skills (comma separated):\n Companies worked in (comma separated):\n Total Years of Experience:\n Resume Text: ARATI DANANE Sr Software Engineer at Itarium technologies India Pvt Ltd Pune Since 2017 to Present SKILL SUMMARY SKILLS Proven work experience as a  Operating Systems Windows 10 Frontend developer Hands on  Languages Typescript JavaScript HTML CSS experience on Web and Angular  Frameworks Angular 2 5 technologies  Tools Visual Studio Code Notepad Postman WinSCP JIRA Jenkins GitHub Desktop Spinnaker mobaXterm TOTAL EXPERIENCE  Web Development Angular 810 Angular Material Design HTML CSS JavaScript Typescript Bootstrap 5 years  Version Control Bitbucket SVN GIT  Server Deployment WinSCP mobaXterm Spinnaker EDUCATION ROLES AND RESPONSIBILITIES Bachelor of Engineering Computer  Responsible for creating basic web pages using HTML CSS Science and Bootstrap College  ADCET Ashta  Using SCSS making web pages more attractive Year20132017  In Angular creating components Modules Directives Routings  Using lazy loading angular feature making our web pages on CONTACT demand load so initial load time decreases  Integrating REST API using JSON into Angular application to Mobile 8530820052 communicate with backend and store data into database  Write business logic as per client requirement and suggest Email aratidanane93@gmailcom some additional feature to make web application more user friendly using typescript  Responsible to communicate with client to understand the change and implement that change  Used SVN Git code commit to version control  Fixed the Functional security and code related issue raised in SonarQube and VAPT testing tool  Used JIRA to maintain task assigned to team  Also responsible to manage security while sending important data to backend or storing data in sessionlocal storage  Code review refactoring the code and adding comments for required methods logic etc  Reducing build size by removing unwanted code imports assets etc  Also responsible to build and deploy the code on server using WinSCP maxterm Using Kubernetes copy commands Spinnaker Project Details  BMC Product Oct 2020  Present   Role  Sr Product Developer  Staining Atlas Front End Jun 2020  Oct 2020  Technology Front End Angular10 Typescript HTML5 CSS3 Bootstrap Material Design  Role Developer  Project Details In this project we are providing an imaging solution to medical professionals They can zoom the highresolution 500MB cell image at micro level to figure out the disease or issue Also at any zoom level we can annotate the field and download that image file as PDF We used OpenSeadragon library to achieve this  Responsibilities  Create Angular project folder structure  Design pages  Create the forms Data binding  Perform CRUD Operation and apply business logic  API integration  Code review and fixes  Bug fixing from SonarQube and VAPT testing report  Deployment  BMC AO Atrium Orchestrator Tool Jan 2020  June 2020   Role Team Member  Project Details In this project I was responsible to analyze the tickets already existing workflows triggers and provide a solution to reduce the count of ticket of that type by modifying the triggers workflow or creating new triggers or workflows  ERP Enterprise Resource Planning Front End Sept 2019  Dec 2019  Technology Front End Angular 6 and Angular Material Design Typescript HTML5 CSS3 Bootstrap  Role Developer  Project Details ERP is business process management software that allows an organization to use a system of integrated applications to manage the business and automate many backoffice functions related to technology services and human resource It is a system for managing supplier and customer data creating sales order purchase order quotations generating invoices and reports It involves managing inventory of stock items and products System can generate monthly and yearly reports in excel and graphs  Responsibilities  Implement basic page design  Create the forms Data binding  Perform CRUD Operation and apply business logic  API integration  Manage task list in JIRA and assign the task  Deployment  Learning Management System Front End June 2020  Oct 2020 Technology Front End Angular 2 Material JavaScript HTML5 CSS3 Bootstrap  Role Developer  Project Details LMS is a learning management system where employee can selflearn and attempt tests on various machines and tools It uses SCORM packages to load training content Admin can create users create course assign courses and batches to user in multiple languages with SCORM for each language user can download certificate in pdf format  Responsibilities  Implement design changes  Few additional feature implementation  Bug fixing from SonarQube and VAPT testing report  Deployment  Guided Support System Front End May 2019  Aug 2019  Technology  Front End Angular 8 Typescript HTML5 CSS3 Bootstrap  Admin Template  Sing Admin Template  Role Developer  Project Details It is a webbased troubleshoot support system Using this application user can be able troubleshoot the problem by means of preconfigured guided questionnaire  Responsibilities  Create Angular project folder structure  Design pages as per the wireframes  Create the forms Data binding  Perform CRUD Operation and apply business logic  API integration  Deployment  Website Clients intranet  Tax Buddy Front End Jan 2018  May 2019Technology Front End Angular5 Typescript HTML5 CSS3 Bootstrap Material Design  Role Developer  Project Details Tax Buddy is tax related application that help the user to file their ITR by collecting required information from user etc  Responsibilities  Design pages  Create the forms Data binding CRUD Operation  API integration  Payment gateway front end design  Deployment  Website wwwtaxbuddycom  Internship  Period  Sept 2017 to Dec 2017 Company  the Itarium Technologies India Pvt Ltd [/INST] Role  Front End Developer  Project Details  Itarium is a webbased application that helps to increase the efficiency of the business process It allows to manage the employee data and track the performance of the employee  Responsibilities  Design pages  Create the forms Data binding  Perform CRUD Operation and apply business logic  API integration  Deployment"


pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=2000)
result = pipe(f"<s>[INST] {prompt} [/INST]\n")
print(result[0]['generated_text'])


<s>[INST] Please extract the following details from the resume text provided below :
 Name of the candidate:
 Contact Details:
 Skills (comma separated):
 Companies worked in (comma separated):
 Total Years of Experience:
 Resume Text: ARATI DANANE Sr Software Engineer at Itarium technologies India Pvt Ltd Pune Since 2017 to Present SKILL SUMMARY SKILLS Proven work experience as a  Operating Systems Windows 10 Frontend developer Hands on  Languages Typescript JavaScript HTML CSS experience on Web and Angular  Frameworks Angular 2 5 technologies  Tools Visual Studio Code Notepad Postman WinSCP JIRA Jenkins GitHub Desktop Spinnaker mobaXterm TOTAL EXPERIENCE  Web Development Angular 810 Angular Material Design HTML CSS JavaScript Typescript Bootstrap 5 years  Version Control Bitbucket SVN GIT  Server Deployment WinSCP mobaXterm Spinnaker EDUCATION ROLES AND RESPONSIBILITIES Bachelor of Engineering Computer  Responsible for creating basic web pages using HTML CSS Science and Bootstrap Col