# Course-To-Career: Coding Implementation

The following modules and examples show how a user's inputs, which consist the user's skills & and classes taken, results in a list of a website's job/internship listings ordered by best match.  Our process is broken up into these steps:

1. Scrape a list of university courses from a given university (UC Berkeley, etc.) <br>
2. Acquire list of job/internship listings from a given job search website (Freelancer, Indeed, etc.)<br> 
3. Input technical skills and university courses taken by the user.<br>
4. Match word descriptions of listing to word description of skills/course. Points are alloted for every match & diversity of matches.<br>
5. Generate list of job/internship listings by highest to lowest total of points.<br>
6. Update by adding to input of technical skills/courses taken by the user.<br>

__Note:__ For this example, let's introduce Joe, who is a fourth-year undergrad Computer Science student at UC Berkeley who possesses several technical skills and who's taken many CS classes.

In [1]:
# Load required modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import requests 
import bs4 as bs
import re
from collections import Counter
import pandas as pd

# Scraping Courses

Here, we are scraping courses from a specified university course site-- more specifically, their course names and course description.  In this example, we are scraping courses and their descriptions from the EECS Course website. 

In [2]:
#Putting courses in a dictionary
def get_courses_in_field(field_site):    
    source = requests.get(site) 
    soup = bs.BeautifulSoup(source.content ,features='lxml') 
    prelinks = soup.find_all('a')
    
    courses_in_field = {}
    disclude = [] # A list of words that if in the prelink text, we can infer is not a course link 

    for prelink in prelinks: 
        if 'CS' in prelink.text and 'Courses' in prelink.get('href'):
            if 0 == len([k for k in disclude if k in prelink.text]):
                link = 'https://www2.eecs.berkeley.edu' + prelink.get('href')
                courses_in_field[prelink.text] = link
    return courses_in_field

def get_description(site):
    source = requests.get(site) 
    soup = bs.BeautifulSoup(source.content ,features='lxml') 
    pre = soup.find_all('p')
    for p in pre:
        if 'Catalog Description' in p.text:
            description = p.text
    listed_items = soup.find_all('li')
    for item in listed_items:
        if not item.get('class'):
            description += str(item)
    return description

In [3]:
#Data initialization
course_and_description = {}

In [4]:
site = 'https://www2.eecs.berkeley.edu/Courses/CS/'
# site = 'https://www2.eecs.berkeley.edu/Courses/EE/'
courses_in_field = get_courses_in_field(site)
#first_five = {k: courses_in_field[k] for k in list(courses_in_field)[:]}
for course in courses_in_field:
    course_link = courses_in_field[course]
    description = get_description(course_link)
    course_and_description[course] = description

In [5]:
len(courses_in_field)

213

In [6]:
len(course_and_description)

213

In [7]:
print(course_and_description)

{'CS C8. Foundations of Data Science': 'Catalog Description: Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.<li><a href="https://eecs.berkeley.edu/academics">Academics</a></li><li><a href="https://eecs.berkeley.edu/academics/courses">Courses</a></li><li>\n<a href="https://eecs.berkeley.edu/privacy-policy"><span>Privacy Policy</span></a>\n</li>', 'CS C8R. Introduction to Computational Thinking with Data': 'Catalog Description: An introduc

# Scraping Jobs

Here, we are scraping, from job listings from the specified jobsite (Freelancer), URLs of the website (__get_jobsites__), skills listed under each job listing (__get_jobskills__), and title of job listing (__get_title__). 

__Note:__ This code works only for Freelancer.  Other sites may not work with the following code.

In [8]:
def get_jobsites(generalsite):   
    source = requests.get(generalsite) 
    soup = bs.BeautifulSoup(source.content ,features='lxml') 
    prelinks = soup.find_all('a')

    jobsites = []
    disclude = [] # a list of words that if in the prelink text, we can infer is not a course link 

    for p in prelinks:
        if p.get('class'):
            if 'JobSearchCard-ctas-btn' == p.get('class')[0]:
                link = 'https://www.freelancer.com' + p.get('href')
                jobsites += [link]
    return jobsites
    
def get_jobskills(jobsite):
    source = requests.get(jobsite) 
    soup = bs.BeautifulSoup(source.content ,features='lxml') 
    prelinks = soup.find_all('a')
    jobskills = []
    for prelink in prelinks:
        if prelink.get('class'):
            if 'skill-navigation-link' in prelink.get('class'):
                jobskills.append(prelink.text[33:][:-29])
    return jobskills

def get_title(joblink):
    source = requests.get(joblink) 
    soup = bs.BeautifulSoup(source.content ,features='lxml') 
    title = soup.find('title')
    i = 0
    while i < len(title.text) and title.text[i] != '|':
        i += 1
    return title.text[0:i-1]

def get_sites_name_skills(jobsites, slice_number=None):
    sliced_jobsites = jobsites[:slice_number]
    jobnames = []
    jobskills = []
    for jobsite in sliced_jobsites:
        jobnames.append(get_title(jobsite))
        jobskills.append(get_jobskills(jobsite))
    return sliced_jobsites, jobnames, jobskills

We will initialize the data scraped from job sites. (We will make an empty list of jobs.)

In [105]:
# Initialize data
jobsites = []
jobnames = []
jobskills = []

Rerun the portion below with other links as __generalsite__ to add to the total job data.

In [109]:
# Choose a job site you would like to scrape.
#generalsite = 'https://www.freelancer.com/jobs/python/'
generalsite = 'https://www.freelancer.com/jobs/programming/'
# generalsite = 'https://www.freelancer.com/jobs/engineering/'

# Adding the new job data from the website specified above.
all_jobsites = get_jobsites(generalsite)
temp_sites, temp_names, temp_skills = get_sites_name_skills(all_jobsites)
jobsites += temp_sites
jobnames += temp_names
jobskills += temp_skills

In [110]:
# The total number of jobs has been scraped for the dataframe.
len(jobsites)

100

Run the following code to generate a dataframe containing the job, job name, list of skills, and website for each listing scraped.

In [112]:
jobsdf = pd.DataFrame(index=jobnames) 
jobsdf.index.name = "Jobs"
jobsdf["List of Skills"] = jobskills
jobsdf["Website"] = jobsites
jobsdf

Unnamed: 0_level_0,List of Skills,Website
Jobs,Unnamed: 1_level_1,Unnamed: 2_level_1
Google Price Scraper,[],https://www.freelancer.com/projects/php/google...
Expert in Data mining of Image data,[],https://www.freelancer.com/projects/python/exp...
We need an experienced python/django web developer who can do everything,"[Django, HTML, Javascript, PHP, Python]",https://www.freelancer.com/projects/php/need-e...
Script for Microsoft Word,[],https://www.freelancer.com/projects/python/scr...
Django Social Media/Tinder like app fixes,"[Django, HTML, Javascript, PHP, Python]",https://www.freelancer.com/projects/php/django...
odoo configuration,"[PHP, PostgreSQL, Python]",https://www.freelancer.com/projects/php/odoo-c...
jenkins or gitlab CI/CD expert to help deploy microservices in the cloud,[],https://www.freelancer.com/projects/software-a...
I would like to hire a Python Developer,[],https://www.freelancer.com/projects/python/wou...
Need to build a query parser for search engine,[],https://www.freelancer.com/projects/python/nee...
I need a python developer for small work,"[Javascript, PHP, Python, Software Architectur...",https://www.freelancer.com/projects/php/need-p...


# Job Matching

Here, we will match a user's profile of classes and skills to the best-fit jobs listed based on a scoring system.  The function __get_matches__ takes the inputs __user_courses__ and __user_defined_skills__, both lists of courses and skills, respectively, and returns the lists of skill/course matches for all the listings.  The function __rate_matches__ will "score" each of the job listings as follows:
1. One point is added if a user's skill/course matches with a part of the job description.  
2. One point is added for each unique match.

In [19]:
def find_text(word, text):
    return re.search(re.escape(word), text, re.IGNORECASE)

def get_matches(user_courses, user_jobs, user_defined_skills):
    jobmatches = pd.DataFrame(columns=['Matched Skills', 'Rating', 'Website'])

    for j, row in jobsdf.iterrows():
        
        temp_skills = []
        #skill is a skill needed for job J
        #for skill in j:
        for skill in jobsdf.loc[j][0]:
            # taking the courses which the user has taken.  
            for course in user_courses:
                description = course_and_description[course]
                if find_text(skill, description):
                    temp_skills.append(skill)
            #match with skills needed for the user's previous jobs
            for user_job in user_jobs:
                for jskill in jobsdf.loc[user_job]['List of Skills']:
                    if skill == jskill:
                        temp_skills.append(skill)
            # taking the skills which the user possess.
            for defined_skill in user_defined_skills:
                if skill == defined_skill:
                    temp_skills.append(skill)
        # Here we add up the points.
        rating = len(temp_skills) + len(set(temp_skills))
        if rating > 0:
            temp = pd.Series({'Matched Skills':temp_skills,'Rating':rating,'Website':jobsdf.loc[j]['Website']}) 
            jobmatches.loc[j] = temp
    jobmatches = jobmatches.sort_values(by='Rating', ascending = False)
    return jobmatches

In [113]:
class User:        
    def __init__(self, user_courses=[], user_jobs=[], expressed_skills=[], course_reviews=[], jobs_reviews=[]):
        # self.courses empty by default
        self.courses = user_courses
        for course in self.courses:    
            if course not in course_and_description:
                course_and_description.append(course)
        # self.user_jobs empty by default
        self.jobs = user_jobs
        # self.expressesd_skills empty by default
        self.expressed_skills = expressed_skills

        self.match_user()
        
    def match_user(self):                          
        # match jobs
        self.jobmatches = get_matches(self.courses, self.jobs, self.expressed_skills)
        
        # best match
        iterator = self.jobmatches.iterrows()
        try:
            self.best_match_name, self.best_match_series  = next(iterator)
        except StopIteration: 
            print('User has no job matches')
        
    # add_course: run this code to add more courses to the user    
    def add_course(self, course):
        if course not in course_and_description:
            course_and_description.append(course)
        self.courses += course
        self.match_user() # remember to refer back to self
        
    def review_course(self, course, feedback):
        #feedback is a STRING, skills that the user believes he learned from the course
        # check that user has taken this course
        assert course in self.courses, 'Our system has not logged that you have taken this course'
        course_and_description[course] += ' ' + feedback + ' '
        self.match_user()
    
    # express_skills: run this code to add more skills to the same user
    def add_expressed_skills(self, skills):
        self.expressed_skills += skills
        self.match_user()
    
    def add_jobs(self, job, feedback=[]):
        if job not in jobskills:
            jobskills[job] = feedback
        self.jobs += job
        self.match_user()

    def review_job(self, job, feedback):
        #feedback is in the form of a LIST, skills that the user believes he learned from the course
        # check that user has had this job
        assert job in self.jobs, 'Our system has not logged that you have had this job'
        jobskills[job].append(feedback)
        self.match_user()
        
    def __str__(self):
        return ('User: {0}, {1}, {2}, {3}, {4}'.format(self.courses, self.jobs, self.expressed_skills, self.best_match_name, self.best_match_series))

# Using the System

As mentioned before, as an example, Joe will be defining which classes he has taken and which skills he possesses. Joe's coursework has been composed into a list as defined by __user_courses__, and his skills has been composed into a list as defined by __expressed__.

In [114]:
# Inputting Joe's coursework and skills.
user_courses = ['CS C8. Foundations of Data Science', 'CS 160. User Interface Design and Development', 'CS 189. Introduction to Machine Learning', 'EECS 127. Optimization Models in Engineering', 'CS 9A. Matlab for Programmers', 'CS 9A-001. Self-paced courses', 'CS 9C. C for Programmers']
expressed_skills = ['Java', 'Django', 'HTML', 'Linux', 'MySQL', 'PHP', 'BeautifulSoup']
# (add something here)
Joe = User(user_courses=user_courses, expressed_skills=expressed_skills)

In [115]:
print("Because of your skills {0}, we have matched you with this job opportunity titled, {1}, which can be found at {2}".format(set(Joe.best_match_series['Matched Skills']), Joe.best_match_name, Joe.best_match_series['Website']))

Because of your skills {'C Programming', 'Graphic Design', 'Programming'}, we have matched you with this job opportunity titled, design your own t-shirt platform, which can be found at https://www.freelancer.com/projects/graphic-design/design-your-own-shirt-platform/


# Updating Course Data Based on User Feedback of Courses

A user can always access Course-To-Career in the future if he/she wishes to add more skills and coursework to his/her profile.  For example, Joe can add feedback to a certain class description if he wants stress a particular skill not mentioned in the class description. For example, consider the description of CS C8:

In [99]:
course_and_description['CS C8. Foundations of Data Science']

'Catalog Description: Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.<li><a href="https://eecs.berkeley.edu/academics">Academics</a></li><li><a href="https://eecs.berkeley.edu/academics/courses">Courses</a></li><li>\n<a href="https://eecs.berkeley.edu/privacy-policy"><span>Privacy Policy</span></a>\n</li>'

If Joe wants to add a skill learned in that course, or to stress (to put more "weight") on a particular skill when running through a job search, then he may append the course description using __review_course__. Running the following line of code will add 'data science' to the course description.

In [100]:
Joe.review_course('CS C8. Foundations of Data Science', 'data science')

Following the addition, the words "data science" are appended to the end of the description:

In [101]:
course_and_description['CS C8. Foundations of Data Science']

'Catalog Description: Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.<li><a href="https://eecs.berkeley.edu/academics">Academics</a></li><li><a href="https://eecs.berkeley.edu/academics/courses">Courses</a></li><li>\n<a href="https://eecs.berkeley.edu/privacy-policy"><span>Privacy Policy</span></a>\n</li> data science '

# What Are All Those Instance Attributes?

Let me show you

In [116]:
print(Joe)

User: ['CS C8. Foundations of Data Science', 'CS 160. User Interface Design and Development', 'CS 189. Introduction to Machine Learning', 'EECS 127. Optimization Models in Engineering', 'CS 9A. Matlab for Programmers', 'CS 9A-001. Self-paced courses', 'CS 9C. C for Programmers'], [], ['Java', 'Django', 'HTML', 'Linux', 'MySQL', 'PHP', 'BeautifulSoup'], design your own t-shirt platform, Matched Skills    [C Programming, Graphic Design, Programming, P...
Rating                                                           10
Website           https://www.freelancer.com/projects/graphic-de...
Name: design your own t-shirt platform, dtype: object


In [120]:
Joe.jobmatches

Unnamed: 0,Matched Skills,Rating,Website
design your own t-shirt platform,"[C Programming, Graphic Design, Programming, P...",10,https://www.freelancer.com/projects/graphic-de...
I need a Java Coder,"[Java, PHP, Programming, Programming, Programm...",10,https://www.freelancer.com/projects/php/need-j...
Xbox Gamertag Claimer,"[C Programming, Java, Programming, Programming...",10,https://www.freelancer.com/projects/java/xbox-...
Build me a chatbot,"[Machine Learning, Machine Learning, Programmi...",9,https://www.freelancer.com/projects/machine-le...
Max of a row,"[Mathematics, Mathematics, Programming, Progra...",9,https://www.freelancer.com/projects/matlab-mat...
Help me make this fun and helpful desktop assistant!,"[PHP, Programming, Programming, Programming, P...",8,https://www.freelancer.com/projects/php/help-m...
Build Email Verification System,"[Linux, Programming, Programming, Programming,...",8,https://www.freelancer.com/projects/python/bui...
API Expert Required,"[PHP, Programming, Programming, Programming, P...",8,https://www.freelancer.com/projects/programmin...
Build Discord Bot in Python (need fast delivery),"[Java, Programming, Programming, Programming, ...",8,https://www.freelancer.com/projects/java/build...
Simple GoLang Code,"[C Programming, Programming, Programming, Prog...",8,https://www.freelancer.com/projects/data-proce...


In [95]:
Joe.courses

['CS C8. Foundations of Data Science',
 'CS 160. User Interface Design and Development',
 'CS 189. Introduction to Machine Learning',
 'EECS 127. Optimization Models in Engineering',
 'CS 9A. Matlab for Programmers',
 'CS 9A-001. Self-paced courses',
 'CS 9C. C for Programmers']

In [117]:
Joe.jobs

[]

In [118]:
Joe.expressed_skills

['Java', 'Django', 'HTML', 'Linux', 'MySQL', 'PHP', 'BeautifulSoup']

In [96]:
Joe.best_match_name

'Person re-identification using triplet loss'

In [119]:
Joe.best_match_series['Matched Skills']

['C Programming',
 'Graphic Design',
 'Programming',
 'Programming',
 'Programming',
 'Programming',
 'Programming']

In [98]:
Joe.best_match_series['Rating']

8