<h1>Assignment 1 - Intelligent Job Matching  Recommendation System</h1>

<h3>Importing Modules</h3>

In [5]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

<h3>Step 1: Load the dataset</h3>

In [8]:
jobs_df = pd.read_csv("Downloads/jobs_skills(1).csv")
jobs_df.head()

Unnamed: 0,Job_Title,Skills
0,Data Scientist,"Python, Machine Learning, Statistics, SQL"
1,Web Developer,"HTML, CSS, JavaScript, React, Bootstrap"
2,Software Engineer,"Java, Python, Algorithms, Data Structures"
3,Data Analyst,"Excel, Data Visualization, SQL, Power BI"
4,DevOps Engineer,"Linux, Docker, Jenkins, Kubernetes"


In [31]:
print("Loaded dataset:")
print(jobs_df.head())

Loaded dataset:
           Job_Title                                     Skills  Cluster
0     Data Scientist  Python, Machine Learning, Statistics, SQL        2
1      Web Developer    HTML, CSS, JavaScript, React, Bootstrap        2
2  Software Engineer  Java, Python, Algorithms, Data Structures        1
3       Data Analyst   Excel, Data Visualization, SQL, Power BI        1
4    DevOps Engineer         Linux, Docker, Jenkins, Kubernetes        2


<h3>Step 2:Data Cleaning</h3>

In [20]:
# Step 2: Data Cleaning
# Remove rows with missing skills or job titles
jobs_df.dropna(subset=['Job_Title', 'Skills'], inplace=True)

<h3>Step 3: Preprocess the data</h3>

In [21]:
# Convert skills into a TF-IDF matrix
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(jobs_df['Skills'])


<h3>Step 4: Apply K-Means clustering</h3>

In [22]:
# Define the number of clusters (this can be adjusted based on dataset size)
num_clusters = 5
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
kmeans.fit(X)

<h3>Step 5: Add cluster labels to the DataFrame</h3>

In [23]:


jobs_df['Cluster'] = kmeans.labels_


<h3>Step 6: Main function to run the recommendation system</h3>

In [25]:
# Step 6: Function to recommend jobs based on user's skills
def recommend_jobs(user_skills, top_n=5):
    # Step 6.1: Transform user input skills to match the model's input format
    user_vector = vectorizer.transform([user_skills])
    
    # Step 6.2: Predict cluster for the user input
    user_cluster = kmeans.predict(user_vector)[0]
    
    # Step 6.3: Filter jobs from the same cluster
    cluster_jobs = jobs_df[jobs_df['Cluster'] == user_cluster]
    
    # Step 6.4: Calculate similarity scores between the user skills and job skills
    cluster_vectors = vectorizer.transform(cluster_jobs['Skills'])
    similarity_scores = cosine_similarity(user_vector, cluster_vectors).flatten()
    
    # Step 6.5: Rank the jobs by similarity score
    cluster_jobs['Similarity_Score'] = similarity_scores
    recommended_jobs = cluster_jobs.sort_values(by='Similarity_Score', ascending=False).head(top_n)
    
    return recommended_jobs[['Job_Title', 'Skills', 'Similarity_Score']]

<h3>Step 7: Interactive input from the user</h3>

In [27]:

def get_user_input():
    print("Please enter your skills (comma-separated):")
    user_input_skills = input()  # e.g., "Python, Data Analysis, Machine Learning"
    return user_input_skills

<h3>Step 8:Display the recommendations</h3>

In [28]:
def display_recommendations(recommended_jobs):
    print("\nRecommended Jobs for your skills:\n")
    for index, row in recommended_jobs.iterrows():
        print(f"Job Title: {row['Job_Title']}")
        print(f"Required Skills: {row['Skills']}")
        print(f"Similarity Score: {row['Similarity_Score']:.2f}")
        print("-" * 40)


<h3>Step 9:Main Logic</h3>

In [29]:
if __name__ == "__main__":
    user_input_skills = get_user_input()
    
    # Get top 5 job recommendations
    recommended_jobs = recommend_jobs(user_input_skills, top_n=5)
    
    # Display recommendations
    display_recommendations(recommended_jobs)

    # Optional: Save the recommendations to a CSV file
    recommended_jobs.to_csv('recommended_jobs.csv', index=False)

Please enter your skills (comma-separated):


 python,java,Machine Learning



Recommended Jobs for your skills:

Job Title: Data Scientist
Required Skills: Python, Machine Learning, Statistics, SQL
Similarity Score: 0.67
----------------------------------------
Job Title: Machine Learning Engineer
Required Skills: Python, TensorFlow, Scikit-learn, Deep Learning
Similarity Score: 0.31
----------------------------------------
Job Title: Java Developer
Required Skills: Java, Spring, Hibernate, Web Development
Similarity Score: 0.24
----------------------------------------
Job Title: Full Stack Developer
Required Skills: JavaScript, HTML, CSS, Python
Similarity Score: 0.19
----------------------------------------
Job Title: Python Developer
Required Skills: Python, Django, Flask, Web Development
Similarity Score: 0.16
----------------------------------------


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cluster_jobs['Similarity_Score'] = similarity_scores
