In [4]:
%pip install pandas spacy langchain-community sentence-transformers transformers faiss-cpu scipy tqdm --quiet

Note: you may need to restart the kernel to use updated packages.


In [8]:
%pip install -U langchain-huggingface


Defaulting to user installation because normal site-packages is not writeable
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.2.0-py3-none-any.whl.metadata (941 bytes)
Collecting huggingface-hub>=0.30.2 (from langchain-huggingface)
  Downloading huggingface_hub-0.32.4-py3-none-any.whl.metadata (14 kB)
Downloading langchain_huggingface-0.2.0-py3-none-any.whl (27 kB)
Downloading huggingface_hub-0.32.4-py3-none-any.whl (512 kB)
Installing collected packages: huggingface-hub, langchain-huggingface

  Attempting uninstall: huggingface-hub

    Found existing installation: huggingface-hub 0.29.3

   ---------------------------------------- 0/2 [huggingface-hub]
    Uninstalling huggingface-hub-0.29.3:
   ---------------------------------------- 0/2 [huggingface-hub]
      Successfully uninstalled huggingface-hub-0.29.3
   ---------------------------------------- 0/2 [huggingface-hub]
   ---------------------------------------- 0/2 [huggingface-hub]
   -----------------

In [1]:
# ✅ Imports
import pandas as pd
import numpy as np
import spacy
import re
from tqdm.notebook import tqdm
from scipy.spatial.distance import cosine
from langchain_huggingface import HuggingFaceEmbeddings


In [9]:
# ✅ Load spaCy NER model
nlp = spacy.load("en_core_web_sm")


In [2]:
embedder = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",
    # model_kwargs={"device": "cpu"},          # optional – "cuda" if you have a GPU
    encode_kwargs={"normalize_embeddings": True}  # ✅ ← put it here
)




In [10]:

def clean_text(text):
    text = text.replace("\u00a0", " ")
    text = re.sub(r"\s+", " ", text).strip()
    return text

def extract_skills(text):
    doc = nlp(text)
    keep_labels = {"ORG", "PRODUCT", "LANGUAGE", "WORK_OF_ART", "NORP", "GPE"}
    skills = {ent.text.strip() for ent in doc.ents if ent.label_ in keep_labels}
    return [s for s in skills if len(s) > 1 and not s.isdigit()]

def embed(text):
    return np.array(embedder.embed_query(text), dtype=np.float32)

def cosine_score(vec1, vec2):
    return 1 - cosine(vec1, vec2)

def rank_resumes(df, job_description, top_k=5):
    jd_clean = clean_text(job_description)
    jd_skills = extract_skills(jd_clean)
    jd_text = " ; ".join(jd_skills) if jd_skills else jd_clean
    jd_vec = embed(jd_text)

    results = []
    for idx, resume in tqdm(df["Resume"].items(), total=len(df)):
        if not isinstance(resume, str) or resume.strip() == "":
            continue

        resume_clean = clean_text(resume)[:4000]
        res_skills = extract_skills(resume_clean)
        res_text = " ; ".join(res_skills) if res_skills else resume_clean
        res_vec = embed(res_text)

        score = cosine_score(res_vec, jd_vec)
        row = df.loc[idx].to_dict()
        row["match_score"] = round(score, 4)
        results.append(row)

    return pd.DataFrame(results).sort_values("match_score", ascending=False).head(top_k)

In [None]:
df = pd.read_csv("../data/resumes.csv")

df.head()


Unnamed: 0,Category,Resume
0,Data Science,Skills * Programming Languages: Python (pandas...
1,Data Science,Education Details \r\nMay 2013 to May 2017 B.E...
2,Data Science,"Areas of Interest Deep Learning, Control Syste..."
3,Data Science,Skills â¢ R â¢ Python â¢ SAP HANA â¢ Table...
4,Data Science,"Education Details \r\n MCA YMCAUST, Faridab..."


In [4]:
job_description = """
About the job
Software Developer



Department: Software development team
Starting date: Immediately upon Selection
Report to: Software Development Manager
Location: Elsyca HQ, Wijgmaal, Leuven
Employment: Full time


The Opportunity

Elsyca is seeking a passionate and driven Software Developer to join our innovative software development team working on the numerical modeling of electrochemical processes. This is a unique opportunity to work on advanced simulation and modeling tools that support various industries including automotive, aerospace, electronics, medical and energy.
In this role, you will contribute to the ongoing digital transformation of diverse industries by building cutting-edge desktop applications that enhance operational efficiency and performance.


The Role

As a Software Developer, you will be instrumental in developing high-quality desktop applications that empower our users in optimizing their manufacturing and operational processes. You will work closely with a talented team of developers and domain experts to ensure our software solutions are robust, intuitive, and efficient.
Key responsibilities include implementing new features, resolving technical challenges, and collaborating on innovative tools that address the needs of this highly technical and impactful industry.
This is an exciting opportunity to play a key role in delivering Elsyca’s cutting-edge solutions while growing your expertise in software development.


Key Responsibilities

The successful candidate will be part of the software development team. His (her) task will be to:
Design, develop and test software programs that are leading the way in this industry.
Maintain the software programs once they are up and running.
Communicate and collaborate as team member/teammate in the software development team.
Interact with the application and consulting & engineering team to discuss new specifications.
Contribute to DevOps practices to streamline software development, deployment, and maintenance processes.
Report to the software development manager.


About you

BS, MS or PhD in Computer Science or programming, or equivalent by experience.
Experience in software development and architecture on Windows and/or Linux.
Very good understanding of object-oriented design, data structures and algorithms.
Experience with TDD, Unit Testing and Agile software development.
Good understanding on the design of graphical user interfaces.
Experience in the creation and management of a product suite architecture.
Monitor system performance, logs, and alerts to detect and resolve issues proactively.
Excellent problem-solving skills.
Fluent English (knowledge of Dutch and/or German, French is favorable).
Have good communications skills.
Be able to work as an individual and as part of a team.
Be able to work to tight deadlines.
Work in a logical manner.
Demonstrate good attention to detail.


Bonus Skills:
Experience with C++, Qt / Qml (https://www.qt.io/developers).
Familiarity with scientific calculations, like Finite Element Methods.
Familiarity with AI concepts, tools, or frameworks, or an eagerness to gain experience in this field.
Knowledge of database design, Sql, usage, or a willingness to develop expertise in this area.
Automation of software deployment, testing, and monitoring to improve efficiency and reliability.
Experience with Python, Jenkins, GitHub, OpenProject.


What we offer

Elsyca offers you a challenging and varied job content.
Full-time employment in a dynamic and highly ambitious company.
The opportunity to contribute to the expansion of a trusted global brand in technical excellence.
Work autonomy with a results-oriented culture.
Access to professional development and training opportunities.
An attractive salary aligned with your experience and impact.


About Elsyca

Founded in 1997, Elsyca is the culmination of research efforts in numerical modeling of electrochemical processes. Its name is derived from 'electrochemical system calculations'.
Today, Elsyca is active in the markets of corrosion design & engineering, cathodic protection & AC mitigation, surface finishing, and electrochemical manufacturing within a variety of industries – such as automotive, aerospace, electronics, medical and energy. The combination of the practical engineering knowledge, the in-house developed family of engineering simulation tools, and the continuous focus on R&D and innovation has established Elsyca as a trustworthy and appreciated partner for many clients across the globe.
Elsyca’s reliable and innovative solutions meet market expectations by reducing the cost of manufacturing goods or ownership of assets. Everyday Elsyca’s employees work to keep pipelines operating safely, to advance renewable offshore energy or to produce goods with less natural resources.


Contact

If you have any questions or if you would like to apply, send an email to recruitment@elsyca.com.
"""


In [11]:
top_matches = rank_resumes(df, job_description, top_k=10)



  0%|          | 0/962 [00:00<?, ?it/s]

In [20]:
top_matches.iloc[1]

Category                                          Java Developer
Resume         Education Details \r\nJanuary 2016 B.E Informa...
match_score                                               0.7176
Name: 329, dtype: object

In [None]:
top_matches.drop(columns=["resume"]).reset_index(drop=True)

In [None]:
# View the full text of the best matched résumé
print(top_matches.iloc[0]["resume"])
