# Final Resume Classification Model

This notebook trains and evaluates the Final Resume Classification Model for our project.


*   The goal is to predict the job category of a resume given their context.
*   We used the hyperparameters from our tuning process

Hyperparameters Used


*   Linear SVC
*   C: 1.0
*   TF-IDF max_features: 10,000
*   TF-IDF ngram_range: (1,2)
*   Stopwords: English





In [None]:
from google.colab import files
uploaded = files.upload()

KeyboardInterrupt: 

In [44]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC

In [45]:
resumes_df = pd.read_csv('Cleaned_Resume.csv')
resumes_df.head()

Unnamed: 0,Resume_str,Category,Cleaned_Resume,Lower_Only,Skill_NGrams,Resume_Length
0,HR ADMINISTRATOR/MARKETING ASSOCIATE\...,HR,administratormarketing associate administrator...,hr administrator/marketing associate\...,"activity employment compensation, ad advertisi...",489
1,"HR SPECIALIST, US HR OPERATIONS ...",HR,specialist operation summary versatile medium ...,"hr specialist, us hr operations ...","asset management, background check, background...",515
2,HR DIRECTOR Summary Over 2...,HR,director summary year experience recruiting pl...,hr director summary over 2...,"adjutant general, administration midlevel, adm...",691
3,HR SPECIALIST Summary Dedica...,HR,specialist summary dedicated driven dynamic ye...,hr specialist summary dedica...,"access outlook powerpoint, action taken, actio...",258
4,HR MANAGER Skill Highlights ...,HR,manager skill highlight skill department start...,hr manager skill highlights ...,"assist creation, benefit administration, best ...",833


In [46]:
print("Shape:", resumes_df.shape)
print("Categories", resumes_df['Category'].unique())

Shape: (2484, 6)
Categories ['HR' 'DESIGNER' 'INFORMATION-TECHNOLOGY' 'TEACHER' 'ADVOCATE'
 'BUSINESS-DEVELOPMENT' 'HEALTHCARE' 'FITNESS' 'AGRICULTURE' 'BPO' 'SALES'
 'CONSULTANT' 'DIGITAL-MEDIA' 'AUTOMOBILE' 'CHEF' 'FINANCE' 'APPAREL'
 'ENGINEERING' 'ACCOUNTANT' 'CONSTRUCTION' 'PUBLIC-RELATIONS' 'BANKING'
 'ARTS' 'AVIATION']


In [47]:
#Features & Labels
X = resumes_df['Cleaned_Resume']
y = resumes_df['Category']

In [48]:
print("Missing values in X:", X.isna().sum())

Missing values in X: 1


In [49]:
X = X.fillna("")

In [52]:
#Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [53]:
#TF-IDF with tuned hyperparameters
tfidf = TfidfVectorizer(
    max_features=10000,
    ngram_range=(1, 2),
    stop_words='english'
)

X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

In [54]:
#Linear SVM with tuned hyperparameters
model = LinearSVC(
    C=1.0,
    max_iter=5000,
    class_weight='balanced'
)

In [55]:
# Fit model
model.fit(X_train_tfidf, y_train)

In [56]:
# Evaluate on test set
y_pred = model.predict(X_test_tfidf)

print("Final Test Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred, zero_division=0))

Final Test Accuracy: 0.7283702213279678

Classification Report:

                        precision    recall  f1-score   support

            ACCOUNTANT       0.70      0.88      0.78        24
              ADVOCATE       0.65      0.71      0.68        24
           AGRICULTURE       0.80      0.62      0.70        13
               APPAREL       0.57      0.42      0.48        19
                  ARTS       0.56      0.43      0.49        21
            AUTOMOBILE       0.60      0.43      0.50         7
              AVIATION       0.90      0.75      0.82        24
               BANKING       0.78      0.78      0.78        23
                   BPO       0.00      0.00      0.00         4
  BUSINESS-DEVELOPMENT       0.65      0.92      0.76        24
                  CHEF       0.86      0.75      0.80        24
          CONSTRUCTION       0.82      0.82      0.82        22
            CONSULTANT       0.79      0.48      0.59        23
              DESIGNER       0.86     