## Load Resume Dataset from Kaggle
https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset/data

In [2]:
# load data
import pandas as pd
df = pd.read_csv("Resume.csv")
print(df.head())

         ID                                         Resume_str  \
0  16852973           HR ADMINISTRATOR/MARKETING ASSOCIATE\...   
1  22323967           HR SPECIALIST, US HR OPERATIONS      ...   
2  33176873           HR DIRECTOR       Summary      Over 2...   
3  27018550           HR SPECIALIST       Summary    Dedica...   
4  17812897           HR MANAGER         Skill Highlights  ...   

                                         Resume_html Category  
0  <div class="fontsize fontface vmargins hmargin...       HR  
1  <div class="fontsize fontface vmargins hmargin...       HR  
2  <div class="fontsize fontface vmargins hmargin...       HR  
3  <div class="fontsize fontface vmargins hmargin...       HR  
4  <div class="fontsize fontface vmargins hmargin...       HR  


## Preprocessing 

In [3]:
df.shape

(2484, 4)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2484 entries, 0 to 2483
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   ID           2484 non-null   int64 
 1   Resume_str   2484 non-null   object
 2   Resume_html  2484 non-null   object
 3   Category     2484 non-null   object
dtypes: int64(1), object(3)
memory usage: 77.8+ KB


In [5]:
df['Category'].unique()

array(['HR', 'DESIGNER', 'INFORMATION-TECHNOLOGY', 'TEACHER', 'ADVOCATE',
       'BUSINESS-DEVELOPMENT', 'HEALTHCARE', 'FITNESS', 'AGRICULTURE',
       'BPO', 'SALES', 'CONSULTANT', 'DIGITAL-MEDIA', 'AUTOMOBILE',
       'CHEF', 'FINANCE', 'APPAREL', 'ENGINEERING', 'ACCOUNTANT',
       'CONSTRUCTION', 'PUBLIC-RELATIONS', 'BANKING', 'ARTS', 'AVIATION'],
      dtype=object)

In [6]:
df['Category'].value_counts()

Category
INFORMATION-TECHNOLOGY    120
BUSINESS-DEVELOPMENT      120
ADVOCATE                  118
CHEF                      118
ENGINEERING               118
ACCOUNTANT                118
FINANCE                   118
FITNESS                   117
AVIATION                  117
SALES                     116
BANKING                   115
HEALTHCARE                115
CONSULTANT                115
CONSTRUCTION              112
PUBLIC-RELATIONS          111
HR                        110
DESIGNER                  107
ARTS                      103
TEACHER                   102
APPAREL                    97
DIGITAL-MEDIA              96
AGRICULTURE                63
AUTOMOBILE                 36
BPO                        22
Name: count, dtype: int64

In [7]:
# drop column 'Resume_html'
df = df.drop(columns=["Resume_html"])

In [8]:
# show full column width
pd.set_option('display.max_colwidth', None)
# show first row
df["Resume_str"].iloc[0]

"         HR ADMINISTRATOR/MARKETING ASSOCIATE\n\nHR ADMINISTRATOR       Summary     Dedicated Customer Service Manager with 15+ years of experience in Hospitality and Customer Service Management.   Respected builder and leader of customer-focused teams; strives to instill a shared, enthusiastic commitment to customer service.         Highlights         Focused on customer satisfaction  Team management  Marketing savvy  Conflict resolution techniques     Training and development  Skilled multi-tasker  Client relations specialist           Accomplishments      Missouri DOT Supervisor Training Certification  Certified by IHG in Customer Loyalty and Marketing by Segment   Hilton Worldwide General Manager Training Certification  Accomplished Trainer for cross server hospitality systems such as    Hilton OnQ  ,   Micros    Opera PMS   , Fidelio    OPERA    Reservation System (ORS) ,   Holidex    Completed courses and seminars in customer service, sales strategies, inventory control, loss pr

In [15]:
import re

# Create a boolean column that marks resumes containing HTML tags
df["has_html"] = df["Resume_str"].apply(lambda x: bool(re.search(r"<.*?>", str(x))))

# Count how many resumes have HTML tags
print("Resumes with HTML tags:", df["has_html"].sum())
print("Resumes without HTML tags:", len(df) - df["has_html"].sum())



Resumes with HTML tags: 10
Resumes without HTML tags: 2474


In [16]:
# remove extra spaces, line breaks and html tags and add clean text as a column (Resume_str = Resume_clean : just without the extra)
import re

def clean_text(text):
    text = re.sub(r'<.*?>', ' ', str(text)) 
    text = re.sub(r'\s+', ' ', text) 
    return text.strip()

df["Resume_clean"] = df["Resume_str"].apply(clean_text)
df["Resume_clean"].head(2)


0                                                                                                                                                      HR ADMINISTRATOR/MARKETING ASSOCIATE HR ADMINISTRATOR Summary Dedicated Customer Service Manager with 15+ years of experience in Hospitality and Customer Service Management. Respected builder and leader of customer-focused teams; strives to instill a shared, enthusiastic commitment to customer service. Highlights Focused on customer satisfaction Team management Marketing savvy Conflict resolution techniques Training and development Skilled multi-tasker Client relations specialist Accomplishments Missouri DOT Supervisor Training Certification Certified by IHG in Customer Loyalty and Marketing by Segment Hilton Worldwide General Manager Training Certification Accomplished Trainer for cross server hospitality systems such as Hilton OnQ , Micros Opera PMS , Fidelio OPERA Reservation System (ORS) , Holidex Completed courses and seminars in cu

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2484 entries, 0 to 2483
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ID            2484 non-null   int64 
 1   Resume_str    2484 non-null   object
 2   Category      2484 non-null   object
 3   Resume_clean  2484 non-null   object
 4   has_html      2484 non-null   bool  
dtypes: bool(1), int64(1), object(3)
memory usage: 80.2+ KB


In [18]:
df= df.drop(columns=['has_html'])

### Entity Extraction (Use Case: Skill, Experience,Education)

In [21]:
# Information Extraction Prompt Generator
def create_extraction_prompt(resume_text):
    prompt = f"""
You are an experienced HR specialist. Extract the following information from the candidate's resume:

- Skills
- Experiences (with years if mentioned)
- Certifications
- Education

Resume:
{resume_text}

Respond ONLY in JSON format like this:
{{
  "skills": [],
  "experiences": [],
  "certifications": [],
  "education": []
}}
"""
    return prompt.strip()

# Prompt Dataset
df["extraction_prompt"] = df["Resume_clean"].apply(create_extraction_prompt)

# Preview
df[["Category", "extraction_prompt"]].head(3)


Unnamed: 0,Category,extraction_prompt
0,HR,"You are an experienced HR specialist. Extract the following information from the candidate's resume:\n\n- Skills\n- Experiences (with years if mentioned)\n- Certifications\n- Education\n\nResume:\nHR ADMINISTRATOR/MARKETING ASSOCIATE HR ADMINISTRATOR Summary Dedicated Customer Service Manager with 15+ years of experience in Hospitality and Customer Service Management. Respected builder and leader of customer-focused teams; strives to instill a shared, enthusiastic commitment to customer service. Highlights Focused on customer satisfaction Team management Marketing savvy Conflict resolution techniques Training and development Skilled multi-tasker Client relations specialist Accomplishments Missouri DOT Supervisor Training Certification Certified by IHG in Customer Loyalty and Marketing by Segment Hilton Worldwide General Manager Training Certification Accomplished Trainer for cross server hospitality systems such as Hilton OnQ , Micros Opera PMS , Fidelio OPERA Reservation System (ORS) , Holidex Completed courses and seminars in customer service, sales strategies, inventory control, loss prevention, safety, time management, leadership and performance assessment. Experience HR Administrator/Marketing Associate HR Administrator Dec 2013 to Current Company Name － City , State Helps to develop policies, directs and coordinates activities such as employment, compensation, labor relations, benefits, training, and employee services. Prepares employee separation notices and related documentation Keeps records of benefits plans participation such as insurance and pension plan, personnel transactions such as hires, promotions, transfers, performance reviews, and terminations, and employee statistics for government reporting. Advises management in appropriate resolution of employee relations issues. Administers benefits programs such as life, health, dental, insurance, pension plans, vacation, sick leave, leave of absence, and employee assistance. Marketing Associate Designed and created marketing collateral for sales meetings, trade shows and company executives. Managed the in-house advertising program consisting of print and media collateral pieces. Assisted in the complete design and launch of the company's website in 2 months. Created an official company page on Facebook to facilitate interaction with customers. Analyzed ratings and programming features of competitors to evaluate the effectiveness of marketing strategies. Advanced Medical Claims Analyst Mar 2012 to Dec 2013 Company Name － City , State Reviewed medical bills for the accuracy of the treatments, tests, and hospital stays prior to sanctioning the claims. Trained to interpret the codes (ICD-9, CPT) and terminology commonly used in medical billing to fully understand the paperwork that is submitted by healthcare providers. Required to have organizational and analytical skills as well as computer skills, knowledge of medical terminology and procedures, statistics, billing standards, data analysis and laws regarding medical billing. Assistant General Manager Jun 2010 to Dec 2010 Company Name － City , State Performed duties including but not limited to, budgeting and financial management, accounting, human resources, payroll and purchasing. Established and maintained close working relationships with all departments of the hotel to ensure maximum operation, productivity, morale and guest service. Handled daily operations and reported directly to the corporate office. Hired and trained staff on overall objectives and goals with an emphasis on high customer service. Marketing and Advertising, working on public relations with the media, government and local businesses and Chamber of Commerce. Executive Support / Marketing Assistant Jul 2007 to Jun 2010 Company Name － City , State Provided assistance to various department heads - Executive, Marketing, Customer Service, Human Resources. Managed front-end operations to ensure friendly and efficient transactions. Ensured the swift resolution of customer issues to preserve customer loyalty while complying with company policies. Exemplified the second-to-none customer service delivery in all interactions with customers and potential clients. Reservation & Front Office Manager Jun 2004 to Jul 2007 Company Name － City , State Owner/ Partner Dec 2001 to May 2004 Company Name － City , State Price Integrity Coordinator Aug 1999 to Dec 2001 Company Name － City , State Education N/A , Business Administration 1999 Jefferson College － City , State Business Administration Marketing / Advertising High School Diploma , College Prep. studies 1998 Sainte Genevieve Senior High － City , State Awarded American Shrubel Leadership Scholarship to Jefferson College Skills Accounting, ads, advertising, analytical skills, benefits, billing, budgeting, clients, Customer Service, data analysis, delivery, documentation, employee relations, financial management, government relations, Human Resources, insurance, labor relations, layout, Marketing, marketing collateral, medical billing, medical terminology, office, organizational, payroll, performance reviews, personnel, policies, posters, presentations, public relations, purchasing, reporting, statistics, website.\n\nRespond ONLY in JSON format like this:\n{\n ""skills"": [],\n ""experiences"": [],\n ""certifications"": [],\n ""education"": []\n}"
1,HR,"You are an experienced HR specialist. Extract the following information from the candidate's resume:\n\n- Skills\n- Experiences (with years if mentioned)\n- Certifications\n- Education\n\nResume:\nHR SPECIALIST, US HR OPERATIONS Summary Versatile media professional with background in Communications, Marketing, Human Resources and Technology. Experience 09/2015 to Current HR Specialist, US HR Operations Company Name － City , State Managed communication regarding launch of Operations group, policy changes and system outages Designed standard work and job aids to create comprehensive training program for new employees and contractors Audited job postings for old, pending, on-hold and draft positions. Audited union hourly, non-union hourly and salary background checks and drug screens Conducted monthly new hire benefits briefing to new employees across all business units Served as a link between HR Managers and vendors by handling questions and resolving system-related issues Provide real-time process improvement feedback on key metrics and initiatives Successfully re-branded US HR Operations SharePoint site Business Unit project manager for RFI/RFP on Background Check and Drug Screen vendor 01/2014 to 05/2015 IT, Marketing and Communications Co-op Company Name － City , State Posted new articles, changes and updates to corporate SharePoint site including graphics and visual communications. Researched and drafted articles and feature stories to promote company activities and programs. Co-edited and developed content for quarterly published newsletter. Provided communication support for internal and external events. Collaborated with Communication team, media professionals and vendors to determine program needs for print materials, web design and digital communications. Entrusted to lead product, service and software launches for Digital Asset Management tool, Marketing Toolkit website and Executive Tradeshows Calendar. Created presentations for management and executive approval to ensure alignment with corporate guidelines and branding. Maintained the MySikorsky SharePoint site and provided timely solutions to mitigate issues. Created story board and produced video for annual IT All Hands meeting. 10/2012 to 01/2014 Relationship Coordinator/Marketing Specialist Company Name － City , State Partnered with vendor to manage the in-house advertising program consisting of print and media collateral pieces. Coordinated pre-show and post-show activities at trade shows. Managed marketing campaigns to generate new business and to support partner and sales teams. Ordered marketing collateral for meetings, trade shows and advisors. Improved, administered and modified marketing programs to increase product awareness. Assisted in preparing internal promotional publications, managed marketing material inventory and supervised distribution of publications to ensure high quality product output. Coordinated marketing materials including brochures, promotional materials and products. Partnered with graphic designers to develop appropriate materials and branding for brochures. Used tracking and reporting systems for sales leads and appointments. 09/2009 to 10/2012 Assistant Head Teller Company Name － City , State Received an internal audit score of 100 %. Performed daily and monthly audits of ATM machines and tellers. Educated customers on a variety of retail products and available credit options. Consistently met or exceeded quarterly sales goals Promoted products and services to customers while maintaining company brand identity · Implemented programs to achieve and exceed customer and company participation goals Organized company sponsored events on campus resulting in increased brand awareness · Coached peers on the proper use of programs to improve work flow efficiency Utilized product knowledge to successfully sell to and refer clients based on individual needs Promoted marketing the grand opening of new branch locations to strengthen company brand affinity · Organized company sponsored events resulting in increased brand awareness and improved sales · Coached peers on the proper use of programs to increase work flow efficiency Senior Producer - 2014 SHU Media Exchange Company Name － City , State Planned and executed event focusing on Connecticut's creative corridor, growth of industry and opportunities that come with development. A panel of industry professionals addressed topics related to media and hosted a question and answer session for approximately 110 attendees. Following the forum, guests were invited to engage in networking and conversation at a post-event reception. Education 2014 Master of Arts : Corporate Communication & Public Relations Sacred Heart University － City , State 2013 Bachelor of Arts : Relational Communication Western Connecticut State University － City , State Skills Adobe Photoshop, ADP, Asset Management, branding, brochures, content, Customer Care, Final Cut Pro, graphics, graphic, HR, Illustrator, InDesign, Innovation, inventory, Lotus Notes, marketing, marketing materials, marketing material, materials, Microsoft Office, SharePoint, newsletter, presentations, process improvement, Project Management, promotional materials, publications, Quality, real-time, Recruitment, reporting, RFP, sales, stories, Employee Development, video, web design, website, articles\n\nRespond ONLY in JSON format like this:\n{\n ""skills"": [],\n ""experiences"": [],\n ""certifications"": [],\n ""education"": []\n}"
2,HR,"You are an experienced HR specialist. Extract the following information from the candidate's resume:\n\n- Skills\n- Experiences (with years if mentioned)\n- Certifications\n- Education\n\nResume:\nHR DIRECTOR Summary Over 20 years experience in recruiting, 15 plus years in Human Resources Executive Management, 5 years of HRIS development and maintenance 4 years working in a Healthcare Enviroment Skills Recruiting FMLA/EEO/FLSA HRIS Development Benefit Administration Policy Development Web Page Development Accomplishments Kansas Health Institute -Health Outcomes for the State of Kansas -1999 Memberships and Accolades: Project Management Institute Member, SHRM, Chamber of Commerce, 1999 Friends University President's Honor Roll, 1997 Friends University Dean's Honor Roll, Student Liaison for Friends University Topeka (member of Mother-To-Mother, member of the Topeka Advertising Federation, several production pieces created nominated for ADDY Awards, received recognition for outstanding customer service assistance by the State of Kansas Travel and Tourism Department., ASHHRA, KAHHR, ACM. Additional Information: Leading Change -I have been instrumental in development and implementation of the Adjutant General's Retention Research project, involving survey development and analyzing the results of the surveys to present to the Adjutant General to help retain the qualified talent of the Departments. I have been tasked with working with the Federal Security Manager for the Joint Forces Headquarters in developing policies, procedures and processes to ensure that all current and new State Employees have the appropriate security clearances for the position held per the Federal Government Requirements. While at LMIS, I lead the Job Vacancy Project and was able to produce results in less time and man hours than in years before with staff that were inexperienced in the JVC process and procedures. I have been responsible to develop, plan and implement database programs, for the last three positions I have held. These were designed to cultivate, involve, renew contact, and promote active and potential employees. These databases were used for reporting FTE usage, budget management and turnover reports. While working in the healthcare field, I took the initiative in creating a local website that was used to receive and respond to requests for information and assistance in marketing and promoting the healthcare facility along with recruitment of potential employees. As Human Resource Coordinator in the healthcare field, I managed the front office personnel, reduced contract labor costs for nursing staff and implemented a unique pay structure to increase PRN staff utilization. I continually think ""Outside-the-box"" to create and develop strategies to resolve issues faced in my work environment. In my current position, I have met and exceeded all hiring goals for the firm. This resulted in our client increasing the business transferred to our location. Experience HR Director 09/2016 to Current Company Name City , State Developed New Website for Agency, payroll processing changes, and implementation of new HRIS System. Oversaw the employment process taking the lead role in clinician, physician and management team recruitment initiatives. Developed, maintain and interpret HR policy. Authored the Employee Handbook. Provided coaching and support to management and supervisors on performance management and other related issues. Maintained in-depth knowledge of legal requirements related to day-to-day management of employees, reducing legal risks and ensuring regulatory compliance. Evaluated and recommend changes to the employee benefits plan. Oversaw day to day administration of benefits. HR Director 04/2009 to 09/2016 Company Name City , State Develped Supervisory Education, SHRO Website, SHRO HRIS System and Automation of payroll processing. Established and directed a comprehensive statewide human resource program for both classified and unclassified State employees/positions in the Adjutant General's Department. Resolved non-routine HR related issues associated as they arose; reviewed documents and approved all hires and promotions; reviews and approved, modified and/or disapproved wage/salary requests to hire individuals above the pre-established minimum classified or unclassified pay rate, Reviewed any significant changes to position descriptions and determined if reclassification should be pursued; reviewed position descriptions for new positions and determined the appropriate wage range based on comparable classified position (if they exist), Consulted with the TAG and both military and state manager/ supervisors in order to provide technical and common sense guidance on properly addressing sensitive or complex employee and organizational issues; aided them in achieving their ever changing program goals and provided innovative ideas for staffing; Served as the administration's management representative in labor negotiations with the local KAPE unit at the 190th Fire Department. HR Manager/Sr. Recruiter 10/2003 to 06/2006 Company Name City , State Created an HRIS tracking system used for recruitment. Responsible for Ramp up and hiring of all customer service agents, and other positions as needed. Placement and development of all advertising Met and exceeded all hiring goals. Responsible for Hiring Senior Management to cover such duties as: Team Managers, Payroll, Quality Control. Responsible for recruitment of 950 new employees, meeting and exceeded goals set. Coordinated and facilitated manpower planning, recruitment and retention, career development and training, staff relations, compensation and benefits, compliance with local, state and federal statutory regulations, public programs, and regulatory audit procedures. Served as a resource person to administration, mid-level management and staff regarding HR related. Human Resources Coordinator 03/1996 to 02/2000 Company Name City , State Developed HRIS database from ground up for employee records and monitoring. Instrumental in reducing the use of Agency Staffing needs for hospital. Coordinated and facilitates manpower planning, recruitment and retention, career development and training, staff relations, compensation and benefits, compliance with local, state and federal statutory regulations, public programs, and regulatory audit procedures. Served as the HR resource source for administration, mid-level management and staff. Coordinated hiring procedures, appraisals, pay increases, promotions, transfers, terminations, job postings, and all corrective actions; One Person Office, responsible for all OSHA, Work Comp, Benefits, payroll, etc. Education and Training Master's Degree : Information Management Systems 05/2005 Friends University City , State , United States 3.5 Credits Earned: 62 Semester hours Information Management Systems Bachelor of Science : Organizational Management 05/2000 Friends University City , State , United States 4.0 Credits Earned: 62 Semester hours Activities and Honors Topeka Chamber of Commerce -Ambassador Kansas Hospital Association -Health Care Human Resources Member SHRM -Legislative Liaison Skills Desktop Publishing, Newsletter productions, DATABASE Management, Leadership Training, OSHA, FMLA, Workers Compensation. PageMaker, Agency Automation, back-up, Benefits, Budget management, Corel Suite, Harvard Graphics, Access, Excel, Microsoft Publisher, MS Word, Quark Express, Quattro Pro, Strategic Planning, Web page development, WordPerfect\n\nRespond ONLY in JSON format like this:\n{\n ""skills"": [],\n ""experiences"": [],\n ""certifications"": [],\n ""education"": []\n}"


In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2484 entries, 0 to 2483
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   ID                 2484 non-null   int64 
 1   Resume_str         2484 non-null   object
 2   Category           2484 non-null   object
 3   Resume_clean       2484 non-null   object
 4   extraction_prompt  2484 non-null   object
dtypes: int64(1), object(4)
memory usage: 97.2+ KB


In [23]:
# Sample ca. 100 resumes, by category 5 from each (proportional)
df_sample = df.groupby("Category", group_keys=False).apply(lambda x: x.sample(min(len(x), 5), random_state=42))
df_sample.shape


  df_sample = df.groupby("Category", group_keys=False).apply(lambda x: x.sample(min(len(x), 5), random_state=42))


(120, 5)

In [25]:
df_sample.to_csv("resume_sample.csv", index=False)
df_sample.info()

<class 'pandas.core.frame.DataFrame'>
Index: 120 entries, 1864 to 379
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   ID                 120 non-null    int64 
 1   Resume_str         120 non-null    object
 2   Category           120 non-null    object
 3   Resume_clean       120 non-null    object
 4   extraction_prompt  120 non-null    object
dtypes: int64(1), object(4)
memory usage: 9.7+ KB


In [None]:
# gather output from LLM

In [None]:
# annotate a sample ( 5 categories, 3-4 per category, 15-20 annotated resumes)