
## Personality System

Five Personality Traits (OCEAN)

- Openness to experience (inventive/curious vs. consistent/cautious)
- Conscientiousness (efficient/organized vs. easy-going/careless)
- Extroversion (outgoing/energetic vs. solitary/reserved)
- Agreeableness (friendly/compassionate vs. challenging/detached)
- Neuroticism (sensitive/nervous vs. secure/confident)

Resources: 
- [wikipedia](https://en.wikipedia.org/wiki/Big_Five_personality_traits)
- [ipip.ori.org](https://ipip.ori.org/newBigFive5broadKey.htm)
- [How Accurately Can You Describe Yourself?](https://ipip.ori.org/new_ipip-50-item-scale.htm)
- [Dataset](https://www.kaggle.com/datasets/tunguz/big-five-personality-test)


# Personality System

#### read Codebook

In [1]:
# lib's
import numpy as np
import pandas as pd
import glob
import random

# for ploting
import matplotlib.pyplot as plt
import seaborn as sns

# clustering lib and Visualize the elbow
from sklearn.cluster import MiniBatchKMeans
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

# For ease of calculation, scale all the values between 0-1 
from sklearn.preprocessing import MinMaxScaler

import warnings
warnings.filterwarnings('ignore')

In [2]:
# df = pd.read_csv(r'dataset\\data-final.csv', delimiter='\t')
# df

In [3]:
# dataframe_list = [pd.read_csv(df_path) for df_path in glob.glob(r'dataset\data_final\*.csv')]
# print(dataframe_list)

In [None]:
df = pd.concat([pd.read_csv(df_path) for df_path in glob.glob(r'dataset\data_final\*.csv')])
df.reset_index(drop=True, inplace=True)
df

In [None]:
columns = df.columns
print(columns)

for c in columns:
    print(c)

In [None]:
X = df[df.columns[0:50]]
X

In [None]:
X.info()

In [None]:
round(X.describe(),2)

In [None]:
# plot missing values

# def plot_nas(df: pd.DataFrame):
#     if df.isnull().sum().sum() != 0:
#         na_df = (df.isnull().sum() / len(df)) * 100      
#         na_df = na_df.drop(na_df[na_df == 0].index).sort_values(ascending=False)
#         missing_data = pd.DataFrame({'Missing Ratio %' :na_df})
#         missing_data.plot(kind = "barh")
#         plt.show()
#     else:
#         print('No NAs found')
# plot_nas(df)

# https://dev.to/tomoyukiaota/visualizing-the-patterns-of-missing-value-occurrence-with-python-46dj

In [None]:
# No nan value : for safety purposes fill nan with 0
X = X.fillna(0)

In [None]:
# to find number of clusters
columns = list(X.columns)
scaler = MinMaxScaler(feature_range=(0,1))
data = scaler.fit_transform(X)
data = pd.DataFrame(data, columns=columns)
df_sample = data[:10000]
df_sample

In [None]:
wcss = [] 
for i in range(1, 11): 
    kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 101)
    kmeans.fit(df_sample) 
    wcss.append(kmeans.inertia_)
    print(f"{i} : {kmeans.inertia_}")

In [None]:
plt.plot(range(1, 11), wcss, 'bo-')
plt.xlabel('Values of K')
plt.ylabel('Distortion')
plt.title('The Elbow Method using Distortion')
plt.show()

In [None]:
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,13))

visualizer.fit(df_sample) # Fit the data to the visualizer
visualizer.show() # Finalize and render the figure

In [None]:
# MiniBatchKMeans clustering -- in unsupervised learning algorithms

kmeans = MiniBatchKMeans(n_clusters=5, random_state=0, batch_size=1000, max_iter=100).fit(data)

# n_clusters : number of personality type (in our case its 10 -- you can change it with any number of cluster)
# random_state : change as you want
# batch_size : the amount of data that is going to train at once or one at a time (feed the data in batches)
# max_iter : train the data at n times (in our case its 100 times)

In [None]:
# to check the number of cluster after model train

len(kmeans.cluster_centers_)

In [None]:
# to find personality types -- most common answer of each type or common answer pattern

one = kmeans.cluster_centers_[0]
two = kmeans.cluster_centers_[1]
three =kmeans.cluster_centers_[2]
four = kmeans.cluster_centers_[3]
five =kmeans.cluster_centers_[4]
six = kmeans.cluster_centers_[5]
seven = kmeans.cluster_centers_[6]
eight = kmeans.cluster_centers_[7]
nine= kmeans.cluster_centers_[8]
ten = kmeans.cluster_centers_[9]

In [None]:
all_types = {'one':one, 'two': two, 'three' :three, 'four':four, 'five':five, 'six': six, 'seven': seven, 'eight': eight,
             'nine': nine, 'ten': ten}

all_types_scores ={}

for name, personality_type in all_types.items():
    personality_trait = {}

    personality_trait['extroversion_score'] =  personality_type[0] - personality_type[1] + personality_type[2] - personality_type[3] + personality_type[4] - personality_type[5] + personality_type[6] - personality_type[7] + personality_type[8] - personality_type[9]
    personality_trait['neuroticism_score'] =  personality_type[0] - personality_type[1] + personality_type[2] - personality_type[3] + personality_type[4] + personality_type[5] + personality_type[6] + personality_type[7] + personality_type[8] + personality_type[9]
    personality_trait['agreeableness_score'] =  -personality_type[0] + personality_type[1] - personality_type[2] + personality_type[3] - personality_type[4] - personality_type[5] + personality_type[6] - personality_type[7] + personality_type[8] + personality_type[9]
    personality_trait['conscientiousness_score'] = personality_type[0] - personality_type[1] + personality_type[2] - personality_type[3] + personality_type[4] - personality_type[5] + personality_type[6] - personality_type[7] + personality_type[8] + personality_type[9]
    personality_trait['openness_score'] =  personality_type[0] - personality_type[1] + personality_type[2] - personality_type[3] + personality_type[4] - personality_type[5] + personality_type[6] + personality_type[7] + personality_type[8] + personality_type[9]
    
    all_types_scores[name] = personality_trait

In [None]:
all_types_scores

In [None]:
all_extroversion = []
all_neuroticism =[]
all_agreeableness =[]
all_conscientiousness =[]
all_openness =[]

for personality_type, personality_trait in all_types_scores.items():
    all_extroversion.append(personality_trait['extroversion_score'])
    all_neuroticism.append(personality_trait['neuroticism_score'])
    all_agreeableness.append(personality_trait['agreeableness_score'])
    all_conscientiousness.append(personality_trait['conscientiousness_score'])
    all_openness.append(personality_trait['openness_score'])
    

In [None]:
all_extroversion_normalized = (all_extroversion-min(all_extroversion))/(max(all_extroversion)-min(all_extroversion))
all_neuroticism_normalized = (all_neuroticism-min(all_neuroticism))/(max(all_neuroticism)-min(all_neuroticism))
all_agreeableness_normalized = (all_agreeableness-min(all_agreeableness))/(max(all_agreeableness)-min(all_agreeableness))
all_conscientiousness_normalized = (all_conscientiousness-min(all_conscientiousness))/(max(all_conscientiousness)-min(all_conscientiousness))
all_openness_normalized = (all_openness-min(all_openness))/(max(all_openness)-min(all_openness))

In [None]:
all_extroversion_normalized, len(all_extroversion_normalized)

In [None]:
counter = 0

normalized_all_types_scores ={}

for personality_type, personality_trait in all_types_scores.items():
    normalized_personality_trait ={}
    normalized_personality_trait['extroversion_score'] = all_extroversion_normalized[counter]
    normalized_personality_trait['neuroticism_score'] = all_neuroticism_normalized[counter]
    normalized_personality_trait['agreeableness_score'] = all_agreeableness_normalized[counter]
    normalized_personality_trait['conscientiousness_score'] = all_conscientiousness_normalized[counter]
    normalized_personality_trait['openness_score'] = all_openness_normalized[counter]
    
    normalized_all_types_scores[personality_type] = normalized_personality_trait
    
    counter+=1

In [None]:
normalized_all_types_scores

In [None]:
for k in all_types.keys():

    plt.figure(figsize=(15,5))
    plt.ylim(0, 1)
    plt.bar(list(normalized_all_types_scores[k].keys()), normalized_all_types_scores[k].values(), color='b')
    plt.title(f"Score of personality type '{k.upper()}' ", size=18)
    plt.show()

In [None]:
X

In [None]:
data

In [None]:
Y = kmeans.predict(data)

In [None]:
Y, len(Y)

In [None]:
df_with_labels = pd.DataFrame(X, columns=X.columns)
df_with_labels['personality_type'] = Y
df_with_labels

In [None]:
# Visualizing the Cluster Predictions

# In order to visualize in 2D graph I will use PCA
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca_fit = pca.fit_transform(X)

df_pca = pd.DataFrame(data=pca_fit, columns=['PCA1', 'PCA2'])
df_pca['Clusters'] = Y
df_pca.head()

In [None]:
plt.figure(figsize=(10,10))
sns.scatterplot(data=df_pca, x='PCA1', y='PCA2', hue='Clusters', palette='Set1', alpha=0.8)
plt.title('Personality Clusters after PCA');

In [None]:
# Visualizing the Cluster Predictions

# In order to visualize in 2D graph I will use PCA
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca_fit = pca.fit_transform(X)

df_pca = pd.DataFrame(data=pca_fit, columns=['PCA1', 'PCA2', 'PCA3'])
df_pca['Clusters'] = Y
df_pca.head()

In [None]:
df_pca_1 = df_pca[:5000]

In [None]:
import plotly.express as px
fig = px.scatter_3d(df_pca_1, x='PCA1', y='PCA2', z='PCA3', color='Clusters')
fig.show()

In [None]:
# Implementing the Model to See My Personality
columns = ['EXT1', 'EXT2', 'EXT3', 'EXT4', 'EXT5', 'EXT6', 'EXT7', 'EXT8', 'EXT9', 'EXT10', 
           'EST1', 'EST2', 'EST3', 'EST4', 'EST5', 'EST6', 'EST7', 'EST8', 'EST9', 'EST10', 
           'AGR1', 'AGR2', 'AGR3', 'AGR4', 'AGR5', 'AGR6', 'AGR7', 'AGR8', 'AGR9', 'AGR10', 
           'CSN1', 'CSN2', 'CSN3', 'CSN4', 'CSN5', 'CSN6', 'CSN7', 'CSN8', 'CSN9', 'CSN10', 
           'OPN1', 'OPN2', 'OPN3', 'OPN4', 'OPN5', 'OPN6', 'OPN7', 'OPN8', 'OPN9', 'OPN10']

val1 = [random.randint(0,5) for ind in range(10)]
val2 = [random.randint(0,5) for ind in range(10)]
val3 = [random.randint(0,5) for ind in range(10)]
val4 = [random.randint(0,5) for ind in range(10)]
val5 = [random.randint(0,5) for ind in range(10)]
val = val1+val2+val3+val4+val5
len(val)

In [None]:
my_data = pd.DataFrame(data=[val], columns=columns)
my_data

In [None]:
my_data1 = scaler.transform(my_data)
my_data1

In [None]:
my_personality = kmeans.predict(my_data1)
print('My Personality Type Cluster is : ', my_personality)

In [None]:
# Summing up the my question groups
col_list = list(my_data)

# ext = col_list[0:10]
# est = col_list[10:20]
# agr = col_list[20:30]
# csn = col_list[30:40]
# opn = col_list[40:50]


ext = list(my_data1[0][0:10])
est = list(my_data1[0][10:20])
agr = list(my_data1[0][20:30])
csn = list(my_data1[0][30:40])
opn = list(my_data1[0][40:50])

extroversion = round(ext[0] - ext[1] + ext[2] - ext[3] + ext[4] - ext[5] + ext[6] - ext[7] + ext[8] - ext[9], 2)
neurotic = round(est[0] - est[1] + est[2] - est[3] + est[4] + est[5] + est[6] + est[7] + est[8] + est[9], 2)
agreeable = round(-agr[0] + agr[1] - agr[2] + agr[3] - agr[4] - agr[5] + agr[6] - agr[7] + agr[8] + agr[9], 2)
conscientious = round(csn[0] - csn[1] + csn[2] - csn[3] + csn[4] - csn[5] + csn[6] - csn[7] + csn[8] + csn[9], 2)
open_ = round(opn[0] - opn[1] + opn[2] - opn[3] + opn[4] - opn[5] + opn[6] + opn[7] + opn[8] + opn[9], 2)

li = [extroversion, neurotic, agreeable, conscientious, open_]
scaled_data = (li - min(li)) / (max(li) - min(li))

my_sums = pd.DataFrame([scaled_data], 
                       columns=['extroversion', 'neurotic', 'agreeable', 'conscientious', 'open'])

my_sums['cluster'] = my_personality

print('Sum of my question groups')
my_sums

In [None]:
plt.figure(figsize=(15,5))
# plt.ylim(0, 1)
x_ax = my_sums.columns[:-1]
y_ax = my_sums.values[0][:-1]
plt.bar(x_ax, y_ax, color='b')
plt.title(f"Score of your personality type ", size=18)
plt.show()

In [None]:
import pickle

# save the MinMaxScaler to disk
filename = 'MinMaxScaler_for_personality_type.pkl'
pickle.dump(scaler, open(filename, 'wb'))

In [None]:
# save the Module to disk
filename = 'personality_type_model.pkl'
pickle.dump(kmeans, open(filename, 'wb'))

In [None]:
loaded_scaler = pickle.load(open('MinMaxScaler_for_personality_type.pkl', 'rb')) 
loaded_model = pickle.load(open('personality_type_model.pkl', 'rb')) 

# Implementing the Model to See My Personality
columns = ['EXT1', 'EXT2', 'EXT3', 'EXT4', 'EXT5', 'EXT6', 'EXT7', 'EXT8', 'EXT9', 'EXT10', 
           'EST1', 'EST2', 'EST3', 'EST4', 'EST5', 'EST6', 'EST7', 'EST8', 'EST9', 'EST10', 
           'AGR1', 'AGR2', 'AGR3', 'AGR4', 'AGR5', 'AGR6', 'AGR7', 'AGR8', 'AGR9', 'AGR10', 
           'CSN1', 'CSN2', 'CSN3', 'CSN4', 'CSN5', 'CSN6', 'CSN7', 'CSN8', 'CSN9', 'CSN10', 
           'OPN1', 'OPN2', 'OPN3', 'OPN4', 'OPN5', 'OPN6', 'OPN7', 'OPN8', 'OPN9', 'OPN10']

val1 = [random.randint(0,2) for ind in range(10)]
val2 = [random.randint(0,2) for ind in range(10)]
val3 = [random.randint(0,2) for ind in range(10)]
val4 = [random.randint(0,2) for ind in range(10)]
val5 = [random.randint(0,2) for ind in range(10)]
val = val1+val2+val3+val4+val5
print('length of val : ',len(val), val)

my_data = pd.DataFrame(data=[val], columns=columns)
my_data1 = loaded_scaler.transform(my_data)

my_personality = loaded_model.predict(my_data1)
print('My Personality Type Cluster is : ', my_personality)

# Summing up the my question groups
col_list = list(my_data)

ext = list(my_data1[0][0:10])
est = list(my_data1[0][10:20])
agr = list(my_data1[0][20:30])
csn = list(my_data1[0][30:40])
opn = list(my_data1[0][40:50])

extroversion = ext[0] - ext[1] + ext[2] - ext[3] + ext[4] - ext[5] + ext[6] - ext[7] + ext[8] - ext[9]
neurotic = est[0] - est[1] + est[2] - est[3] + est[4] + est[5] + est[6] + est[7] + est[8] + est[9]
agreeable = -agr[0] + agr[1] - agr[2] + agr[3] - agr[4] - agr[5] + agr[6] - agr[7] + agr[8] + agr[9]
conscientious = csn[0] - csn[1] + csn[2] - csn[3] + csn[4] - csn[5] + csn[6] - csn[7] + csn[8] + csn[9]
open_ = opn[0] - opn[1] + opn[2] - opn[3] + opn[4] - opn[5] + opn[6] + opn[7] + opn[8] + opn[9]

li = [extroversion, neurotic, agreeable, conscientious, open_]
scaled_data = (li - min(li)) / (max(li) - min(li))

my_sums = pd.DataFrame([scaled_data], 
                       columns=['extroversion', 'neurotic', 'agreeable', 'conscientious', 'open'])

my_sums['cluster'] = my_personality

print('Sum of my question groups')
my_sums

In [None]:
plt.figure(figsize=(15,5))
# plt.ylim(0, 1)
x_ax = my_sums.columns[:-1]
y_ax = my_sums.values[0][:-1]
plt.bar(x_ax, y_ax, color='b')
plt.title(f"Score of your personality type ", size=18)
plt.show()

# Resume Classification

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import MultinomialNB
from sklearn.multiclass import OneVsRestClassifier

from sklearn import metrics
from sklearn.metrics import accuracy_score
from pandas.plotting import scatter_matrix

from sklearn.neighbors import KNeighborsClassifier

import warnings
warnings.filterwarnings('ignore')

In [None]:
resumeDataSet = pd.read_csv('dataset/UpdatedResumeDataSet.csv' ,encoding='utf-8')
resumeDataSet['cleaned_resume'] = ''
resumeDataSet.head()

In [None]:
print ("Displaying the distinct categories of resume -")
print (resumeDataSet['Category'].unique())

In [None]:
print ("Displaying the distinct categories of resume and the number of records belonging to each category -")
print (resumeDataSet['Category'].value_counts())

In [None]:
import seaborn as sns
plt.figure(figsize=(15,15))
plt.xticks(rotation=90)
sns.countplot(y="Category", data=resumeDataSet)

In [None]:
# to eliminate regular expression

import re
def cleanResume(resumeText):
    resumeText = re.sub('http\S+\s*', ' ', resumeText)  # remove URLs
    resumeText = re.sub('RT|cc', ' ', resumeText)  # remove RT and cc
    resumeText = re.sub('#\S+', '', resumeText)  # remove hashtags
    resumeText = re.sub('@\S+', '  ', resumeText)  # remove mentions
    resumeText = re.sub('[%s]' % re.escape("""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""), ' ', resumeText)  # remove punctuations
    resumeText = re.sub(r'[^\x00-\x7f]',r' ', resumeText) 
    resumeText = re.sub('\s+', ' ', resumeText)  # remove extra whitespace
    return resumeText

In [None]:
resumeDataSet['cleaned_resume'] = resumeDataSet.Resume.apply(lambda x: cleanResume(x))
print (resumeDataSet['cleaned_resume'][0])

In [None]:
# !pip install nltk
# !pip install wordcloud
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
import string
from wordcloud import WordCloud

oneSetOfStopWords = set(stopwords.words('english')+['``',"''"])
totalWords =[]
Sentences = resumeDataSet['Resume'].values
print('Sentences : \n',Sentences[0],'\n')
cleanedSentences = ""
for i in range(0,160):
    cleanedText = cleanResume(Sentences[i])
    cleanedSentences += cleanedText
    requiredWords = nltk.word_tokenize(cleanedText)
    for word in requiredWords:
        if word not in oneSetOfStopWords and word not in string.punctuation:
            totalWords.append(word)
    
wordfreqdist = nltk.FreqDist(totalWords)
mostcommon = wordfreqdist.most_common(50)
print('mostcommon : ', mostcommon,'\n')
print('cleanedSentences : ', cleanedSentences,'\n')

wc = WordCloud().generate(cleanedSentences)
print('wc : ', wc,'\n')

plt.figure(figsize=(15,15))
plt.imshow(wc, interpolation='spline16') # bilinear
# 'antialiased', 'none', 'nearest', 'bilinear', 'bicubic', 'spline16', 'spline36', 
# 'hanning', 'hamming', 'hermite', 'kaiser', 'quadric', 'catrom', 'gaussian', 'bessel', 
# 'mitchell', 'sinc', 'lanczos', 'blackman'
plt.axis("off")
plt.show()

In [None]:
# Transform target variable using label encoding

from sklearn.preprocessing import LabelEncoder

var_mod = ['Category']
le = LabelEncoder()
for i in var_mod:
    resumeDataSet[i] = le.fit_transform(resumeDataSet[i])
print ("CONVERTED THE CATEGORICAL VARIABLES INTO NUMERICALS")
print('resumeDataSet : ')
resumeDataSet

In [None]:
resumeDataSet.to_csv('resumeDataSet.csv', index=False)

In [None]:
# Text representation using TFIDF

from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse import hstack

requiredText = resumeDataSet['cleaned_resume'].values
requiredTarget = resumeDataSet['Category'].values
print('requiredText : ',requiredText,'\n')
print('requiredTarget : ',requiredTarget,'\n')

word_vectorizer = TfidfVectorizer(
    sublinear_tf=True,
    stop_words='english',
    max_features=1500)
word_vectorizer.fit(requiredText)
WordFeatures = word_vectorizer.transform(requiredText)
print('WordFeatures : ',WordFeatures,'\n')

print ("Feature completed .....")


In [None]:
# Split the date

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(WordFeatures,requiredTarget,random_state=0, test_size=0.2,stratify=requiredTarget)
print(X_train.shape)
print(X_test.shape)


In [None]:
clf = OneVsRestClassifier(KNeighborsClassifier())
clf.fit(X_train, y_train)
prediction = clf.predict(X_test)

print('Accuracy of KNeighbors Classifier on training set: {:.2f}'.format(clf.score(X_train, y_train)))
print('Accuracy of KNeighbors Classifier on test set: {:.2f}'.format(clf.score(X_test, y_test)))
print("\n Classification report for classifier %s:\n%s\n" % (clf, metrics.classification_report(y_test, prediction)))
#print("Confusion matrix:\n%s" % metrics.confusion_matrix(y_test, prediction))

In [None]:
# Using our CVs to test the model

Test_CVs = pd.read_csv('dataset/resume_dataset.csv' ,encoding='utf-8')
Test_CVs

In [None]:
Test_CVs['cleaned_resume'] = ''
Test_CVs['cleaned_resume']= Test_CVs.Resume.apply(lambda x: cleanResume(x))
Test_CVs

In [None]:
print(Test_CVs['cleaned_resume'][0],'\n', len(Test_CVs['cleaned_resume'][0]))

In [None]:
test_text = Test_CVs['cleaned_resume'].values

# word_vectorizer = TfidfVectorizer(
#     sublinear_tf=True,
#     stop_words='english',
#     max_features=1500)
# word_vectorizer.fit(test_text)

WordFeatures = word_vectorizer.transform(test_text)
print('WordFeatures : ', WordFeatures)

In [None]:
WordFeatures.shape

In [None]:
X_test2=WordFeatures

In [None]:
y_pred2 = clf.predict(X_test2)

In [None]:
y_pred2

In [None]:
prediction= le.inverse_transform(y_pred2)

In [None]:
prediction

In [None]:
test_resume = '''
Mohamed Abdelghani Mohamed Mobile 20 0101 024 336 5 mohamedabdelghani1511 Objective Strong analytical thinker with high 
problem solving and communication skills Seeking an opportunity in data science and machine learning field to utilize my 
data science skills to transform data into business value Education 9 month Diploma Information Technology Institute Smart 
Village Oct 2020 Present Track Date Science Intake 41 Graduation Project Resume Ranking using NLP for a multinational 
telecommunication company Bachelor of Engineering Ain Shams University Year of graduation 2017 Department Electrical 
Engineering Grade Very Good Work Experience Senior technical support engineer at Orange Business Services Sep 2018 Dec 
2020 Providing a professional technical point of contact for customers for different services including L3 VPN solution 
VPN remote access Z scalar proxy Skype for business Acting as an escalation manager for chronic and complex problems and 
incidents Awarded many local awards for performance excellence Acting as a shift leader and conduct trainings to the new 
comers Intern Orange Business Services got introduced to Orange Support functions August 2017 Intern Schneider Electric 
got introduced to Schneider products and services July 2016 Intern ABB got assigned to the tendering department August 
2015 Skills Technical Skills Concepts Machine Learning Cloud computing Fundamentals Business Statistics Data warehouse 
fundamentals and Data modeling Deep learning Agile Methodologies Visualization and story telling Big Data fundamentals 
Optimization and Simulation methods Modeling and Operations research OOP Systems Thinking Tools Programming Languages 
Python R SAS Bash Java Database SQL Oracle PL SQL Analytical SQL Tools Excel Hadoop Spark Linux Languages Skills Fluent in 
spoken and written English Arabic as a native speaker Very good command of written and spoken French Extracurricular 
Activities Business Development Moderator ACES Aug 2016 Jun 2017 Prepared and conducted 30 hrs Business development 
workshop Participant in Exxon Mobil case study MECA Academy Sep 2016 Jun 2017 Awarded special mention perseverance in 
difficult circumstances from Exxon Mobil Academic committee member Pirates Egypt Aug 2014 Jun 2015 Prepared and conducted 
30 hrs CCNA workshop Projects Recommender system using XGBoost for Airbnb use case Energy consumption prediction using 
LightGBM Movie rating prediction using Random Forest Analyzing ticketing system performance using R Certificates Machine 
learning course offered by Stanford University Coursera Feb 2021 ITILV3 Dec 2019 Six Sigma Yellow Belt July 2019
'''

In [None]:
test_text = [test_resume] # Test_CVs['cleaned_resume'].values

# word_vectorizer = TfidfVectorizer(
#     sublinear_tf=True,
#     stop_words='english',
#     max_features=1500)
# word_vectorizer.fit(test_text)

WordFeatures = word_vectorizer.transform(test_text)
print('WordFeatures : ', WordFeatures)

In [None]:
WordFeatures.shape

In [None]:
X_test2=WordFeatures
y_pred2 = clf.predict(X_test2)

In [None]:
y_pred2 

In [None]:
prediction= le.inverse_transform(y_pred2)
prediction

In [None]:
# Save model
import pickle

# save the model to disk
filename = 'Resume_Classification_KNN.pkl'
pickle.dump(clf, open(filename, 'wb'))

In [None]:
loaded_model = pickle.load(open('Resume_Classification_KNN.pkl', 'rb')) 

y_pred = loaded_model.predict(X_test2)
y_pred

In [None]:
prediction= le.inverse_transform(y_pred)
prediction