<h3> Approach Followed </h3>

<h5>1. I preprocessed the data for analysis :</h5>

    Subsituted the 'name' column value using the index
    
    Removed rows with missing values of 'Performance_10', 'Performance_12', 'Performance_UG' 
    
    Cleaned the 'Other skills' column by removing leading/trailing spaces 
    
    Converted 'Performance_10', 'Performance_12', 'Performance_UG', 'Performance_PG' columns to percentage values
    
    Filled missing values in 'Performance_PG' column with a specific value
    
<h5> 2. I defined the selection criteria for filtering the resumes: </h5>

     What is the desired degree
     
     Whether the candidate is available for 3 months starting immediately
     
     If the percentages is above 75
     
<h5> 3. Then I filtered the resumes based on the selection criteria.</h5>

<h5> 4. Then I conducted the technical round: </h5>

     Calculated the average technical score based on python score, ML score and NLP score

<h5> 5. Then I conducted soft-skills round: </h5>

     Used different soft skills that were mentioned in the 'Other skills' column to calculate the soft skills score

<h5> 6. Then I conducted the final evaluation: </h5>

     Assigned weights to different features, i.e. technical score and soft skills score
     
     Performed further analysis, combining scores and criteria to make final selections
     
     Sorted the filtered resumes by the total score in descending order
     
     Selected the top 5 candidates

In [105]:
#Imporing the libraries

import pandas as pd

In [106]:
# Load the dataset
data = pd.read_csv('Applications_for_Machine_Learning_internship_edited.csv')

In [107]:
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,,6.50/7,,
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,,8.90/10,,
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,,,,
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),,2024,,,85.60/85.60,10.00/10.00
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,,8.10/10,93.40/93.40,10.00/10.00


<h3>Preprocessing the data</h3>

In [108]:
# Since the name column is blank for each row, assigning the values using index as a subsitute for "Name" column

data['Name'] = data.index + 1

In [109]:
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,1,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,,6.50/7,,
1,2,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,,8.90/10,,
2,3,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,,,,
3,4,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),,2024,,,85.60/85.60,10.00/10.00
4,5,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,,8.10/10,93.40/93.40,10.00/10.00


In [110]:
# Drop rows with missing values in specific columns
data.dropna(subset=['Performance_10', 'Performance_12', 'Performance_UG'], inplace=True)

In [111]:
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
4,5,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,,8.10/10,93.40/93.40,10.00/10.00
7,8,2,2,0,1,"Atmel AVR, Data Analytics, Data Science, MS-Ex...","Yes, I am available for 3 months starting imme...",PGP,Data Science And Machine Learning,2023,,6.80/10,75.20/75.20,8.80/8.80
8,9,2,1,1,0,"C++ Programming, Data Analytics, Data Structur...","Yes, I am available for 3 months starting imme...",B.Tech,Information and Communication Technology,2023,,7.71/10,89.90/89.90,10.00/10.00
10,11,2,0,0,0,"MS-Excel, MS-PowerPoint, Power BI, Python, R P...","Yes, I am available for 3 months starting imme...",MBA,Analytics And Finance,2023,7.45/8,83.65/100,74.50/74.50,9.60/9.60
11,12,2,2,0,1,"MS-Excel, Artificial Intelligence, Data Analyt...","Yes, I am available for 3 months starting imme...",Bachelor of Technology (B.Tech),Information Technology,2023,,7.85/10,76.80/76.80,81.00/81.00


In [112]:
# Fill missing values in 'Performance_PG' column with a specific value (e.g., -1)
data['Performance_PG'].fillna(-1, inplace=True)

In [113]:
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
4,5,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,-1,8.10/10,93.40/93.40,10.00/10.00
7,8,2,2,0,1,"Atmel AVR, Data Analytics, Data Science, MS-Ex...","Yes, I am available for 3 months starting imme...",PGP,Data Science And Machine Learning,2023,-1,6.80/10,75.20/75.20,8.80/8.80
8,9,2,1,1,0,"C++ Programming, Data Analytics, Data Structur...","Yes, I am available for 3 months starting imme...",B.Tech,Information and Communication Technology,2023,-1,7.71/10,89.90/89.90,10.00/10.00
10,11,2,0,0,0,"MS-Excel, MS-PowerPoint, Power BI, Python, R P...","Yes, I am available for 3 months starting imme...",MBA,Analytics And Finance,2023,7.45/8,83.65/100,74.50/74.50,9.60/9.60
11,12,2,2,0,1,"MS-Excel, Artificial Intelligence, Data Analyt...","Yes, I am available for 3 months starting imme...",Bachelor of Technology (B.Tech),Information Technology,2023,-1,7.85/10,76.80/76.80,81.00/81.00


In [114]:
# Clean the 'Other skills' column by removing leading/trailing spaces
data['Other skills'] = data['Other skills'].str.strip()

In [115]:
# Reset the index after dropping rows
data.reset_index(drop=True, inplace=True)


In [116]:
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,5,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,-1,8.10/10,93.40/93.40,10.00/10.00
1,8,2,2,0,1,"Atmel AVR, Data Analytics, Data Science, MS-Ex...","Yes, I am available for 3 months starting imme...",PGP,Data Science And Machine Learning,2023,-1,6.80/10,75.20/75.20,8.80/8.80
2,9,2,1,1,0,"C++ Programming, Data Analytics, Data Structur...","Yes, I am available for 3 months starting imme...",B.Tech,Information and Communication Technology,2023,-1,7.71/10,89.90/89.90,10.00/10.00
3,11,2,0,0,0,"MS-Excel, MS-PowerPoint, Power BI, Python, R P...","Yes, I am available for 3 months starting imme...",MBA,Analytics And Finance,2023,7.45/8,83.65/100,74.50/74.50,9.60/9.60
4,12,2,2,0,1,"MS-Excel, Artificial Intelligence, Data Analyt...","Yes, I am available for 3 months starting imme...",Bachelor of Technology (B.Tech),Information Technology,2023,-1,7.85/10,76.80/76.80,81.00/81.00


In [117]:
# Convert 'Performance_10', 'Performance_12', 'Performance_UG', 'Performance_PG' columns to percentage values
percentage_columns = ['Performance_10', 'Performance_12', 'Performance_UG', 'Performance_PG']

for col in percentage_columns:
    data[col] = data[col].astype(str)
    split_values = data[col].str.split('/', expand=True)
    numerator = pd.to_numeric(split_values[0], errors='coerce')
    denominator = pd.to_numeric(split_values[1], errors='coerce')
    data[col] = numerator / denominator * 100

In [118]:
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,5,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,,81.0,100.0,100.0
1,8,2,2,0,1,"Atmel AVR, Data Analytics, Data Science, MS-Ex...","Yes, I am available for 3 months starting imme...",PGP,Data Science And Machine Learning,2023,,68.0,100.0,100.0
2,9,2,1,1,0,"C++ Programming, Data Analytics, Data Structur...","Yes, I am available for 3 months starting imme...",B.Tech,Information and Communication Technology,2023,,77.1,100.0,100.0
3,11,2,0,0,0,"MS-Excel, MS-PowerPoint, Power BI, Python, R P...","Yes, I am available for 3 months starting imme...",MBA,Analytics And Finance,2023,93.125,83.65,100.0,100.0
4,12,2,2,0,1,"MS-Excel, Artificial Intelligence, Data Analyt...","Yes, I am available for 3 months starting imme...",Bachelor of Technology (B.Tech),Information Technology,2023,,78.5,100.0,100.0


In [119]:
# Fill missing values in 'Performance_PG' column with a specific value (e.g., -1)
data['Performance_PG'].fillna(-1, inplace=True)
data.head()

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,5,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,-1.0,81.0,100.0,100.0
1,8,2,2,0,1,"Atmel AVR, Data Analytics, Data Science, MS-Ex...","Yes, I am available for 3 months starting imme...",PGP,Data Science And Machine Learning,2023,-1.0,68.0,100.0,100.0
2,9,2,1,1,0,"C++ Programming, Data Analytics, Data Structur...","Yes, I am available for 3 months starting imme...",B.Tech,Information and Communication Technology,2023,-1.0,77.1,100.0,100.0
3,11,2,0,0,0,"MS-Excel, MS-PowerPoint, Power BI, Python, R P...","Yes, I am available for 3 months starting imme...",MBA,Analytics And Finance,2023,93.125,83.65,100.0,100.0
4,12,2,2,0,1,"MS-Excel, Artificial Intelligence, Data Analyt...","Yes, I am available for 3 months starting imme...",Bachelor of Technology (B.Tech),Information Technology,2023,-1.0,78.5,100.0,100.0


<h3>Filtering the resumes</h3>

In [120]:
# Define the selection criteria
desired_education = ['B.Tech', 'M.Tech', 'Bachelor of Technology', 'Master of Technology']
available_for_3_months = 'Yes, I am available for 3 months starting immediately for a full-time internship.'
minimum_performance_10 = 75
minimum_performance_12 = 75
minimum_performance_UG = 75
minimum_performance_PG = 75

In [121]:
# Filter resumes based on the selection criteria
filtered_resumes = data[
    (data['Are you available for 3 months, starting immediately, for a full-time work from home internship? '].str.strip() == available_for_3_months) &
    (data['Degree'].str.contains('|'.join(desired_education), case=False, na=False)) &
    (data['Performance_10'] >= minimum_performance_10) &
    (data['Performance_12'] >= minimum_performance_12)
]

# Apply performance criteria based on degree
filtered_resumes.loc[filtered_resumes['Degree'].str.contains('Master'), 'Performance'] = filtered_resumes['Performance_PG']
filtered_resumes.loc[filtered_resumes['Degree'].str.contains('Bachelor'), 'Performance'] = filtered_resumes['Performance_UG']

# Filter resumes based on performance criteria
filtered_resumes = filtered_resumes[
    (filtered_resumes['Performance'] >= minimum_performance_UG)
]

# Reset the index of the DataFrame
filtered_resumes.reset_index(drop=True, inplace=True)

# Print the filtered resumes
print(filtered_resumes)

    Name  Python (out of 3)  Machine Learning (out of 3)  \
0     12                  2                            2   
1     28                  2                            1   
2    131                  2                            1   
3    143                  2                            2   
4    185                  2                            2   
5    215                  0                            0   
6    258                  2                            2   
7    278                  2                            1   
8    439                  2                            2   
9    462                  2                            2   
10   503                  1                            0   
11   510                  3                            2   
12   561                  3                            2   
13   587                  2                            3   
14   611                  2                            0   
15   640                  2             

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_resumes.loc[filtered_resumes['Degree'].str.contains('Master'), 'Performance'] = filtered_resumes['Performance_PG']


<h3> Technical Round </h3>

In [122]:
# Conduct a technical interview
def conduct_technical_interview(candidate):
    
    python_score = candidate['Python (out of 3)']
    ml_score = candidate['Machine Learning (out of 3)']
    nlp_score = candidate['Natural Language Processing (NLP) (out of 3)']

    # Calculate the average technical score
    technical_score = (python_score + ml_score + nlp_score) / 3

    # Return the technical score
    return technical_score

<h3> SoftSkills Round </h3>

In [123]:
def conduct_soft_skills_evaluation(candidate):

    soft_skills_score = 0

    # Check if the candidate has good communication skills
    if 'Effective communication' in candidate['Other skills']:
        soft_skills_score += 1

    # Check if the candidate has spoken english proficiency
    if 'English Proficiency (Spoken)' in candidate['Other skills']:
        soft_skills_score += 1

    # Return the soft skills score
    return soft_skills_score


<h3>Evaluation</h3>

In [124]:
#Weighting Features : Assigning weights to Technical and Soft Skills
technical_weight = 0.7
soft_skills_weight = 0.3

# Iterate through the filtered resumes for the technical round and soft skills evaluation
for index, candidate_row in filtered_resumes.iterrows():
    technical_score = conduct_technical_interview(candidate_row)
    soft_skills_score = conduct_soft_skills_evaluation(candidate_row)

    # Store the technical and soft skills scores in the candidate's row
    filtered_resumes.at[index, 'Technical Score'] = technical_score
    filtered_resumes.at[index, 'Soft Skills Score'] = soft_skills_score

# Perform further analysis, combining scores and criteria to make final selections
filtered_resumes['Total Score'] = (filtered_resumes['Technical Score'] * technical_weight) + (
        filtered_resumes['Soft Skills Score'] * soft_skills_weight)

# Sort the filtered resumes by the total score in descending order
filtered_resumes = filtered_resumes.sort_values('Total Score', ascending=False)

# Select the top candidates based on your desired number or threshold
top_candidates = filtered_resumes.head(5)  # Select top 5 candidates

# Print the top candidates
print(top_candidates)

    Name  Python (out of 3)  Machine Learning (out of 3)  \
30   958                  3                            3   
17   657                  3                            2   
13   587                  2                            3   
21   739                  2                            3   
3    143                  2                            2   

    Natural Language Processing (NLP) (out of 3)  Deep Learning (out of 3)  \
30                                             3                         2   
17                                             2                         2   
13                                             2                         2   
21                                             2                         1   
3                                              1                         1   

                                         Other skills  \
30  Data Analytics, Machine Learning, Natural Lang...   
17  Data Science, MS-Excel, Python, R Programming,...   