# Fake Job Postings Analysis

## by Justin Sierchio

In this analysis, we will be looking at job postings and how legitimate they actually are. Ideally, we would like to be able to answer the following questions:

<ul>
    <li>Which job postings are fake?</li>
    <li>Can we predict if a job posting is fake or not?</li>
    <li>What are some other conclusions we might able to draw from this analysis?</li>
</ul>

This data is in .csv file format and is from Kaggle at: https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction/download. More information related to the dataset can be found at: https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction.

## Notebook Initialization

In [1]:
# Import Relevant Libraries
import numpy as np 
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import re 

print('Initial libraries loaded into workspace!')

Initial libraries loaded into workspace!


Because we will be working with natural text, we also need to import some specialized libraries.

In [2]:
# Import Sci-Kit Learn Libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import confusion_matrix 
from sklearn.model_selection import RandomizedSearchCV

# Import Nltk Libraries
import nltk
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer

# Import Additional Libraries
import joblib
import xgboost as xgb
import string
import imblearn
from imblearn.combine import SMOTETomek

print('Specialized Libraries loaded in workspace!')

Specialized Libraries loaded in workspace!


We finally need to load the stopwords needed.

In [3]:
# Download Nltk "stopwords"
nltk.download('stopwords')

# Initialize the stopword algorithms
stop_words = set(stopwords.words("english"))
default_stemmer = PorterStemmer()
default_stopwords = stopwords.words('english')
default_tokenizer=RegexpTokenizer(r"\w+")

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\jmsie\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [4]:
# Upload Datasets for Study
df_JOBS = pd.read_csv("fake_job_postings.csv");

print('Datasets uploaded!');

Datasets uploaded!


In [5]:
# Display 1st 5 rows from Fake Job Postings dataset
df_JOBS.head()

Unnamed: 0,job_id,title,location,department,salary_range,company_profile,description,requirements,benefits,telecommuting,has_company_logo,has_questions,employment_type,required_experience,required_education,industry,function,fraudulent
0,1,Marketing Intern,"US, NY, New York",Marketing,,"We're Food52, and we've created a groundbreaki...","Food52, a fast-growing, James Beard Award-winn...",Experience with content management systems a m...,,0,1,0,Other,Internship,,,Marketing,0
1,2,Customer Service - Cloud Video Production,"NZ, , Auckland",Success,,"90 Seconds, the worlds Cloud Video Production ...",Organised - Focused - Vibrant - Awesome!Do you...,What we expect from you:Your key responsibilit...,What you will get from usThrough being part of...,0,1,0,Full-time,Not Applicable,,Marketing and Advertising,Customer Service,0
2,3,Commissioning Machinery Assistant (CMA),"US, IA, Wever",,,Valor Services provides Workforce Solutions th...,"Our client, located in Houston, is actively se...",Implement pre-commissioning and commissioning ...,,0,1,0,,,,,,0
3,4,Account Executive - Washington DC,"US, DC, Washington",Sales,,Our passion for improving quality of life thro...,THE COMPANY: ESRI – Environmental Systems Rese...,"EDUCATION: Bachelor’s or Master’s in GIS, busi...",Our culture is anything but corporate—we have ...,0,1,0,Full-time,Mid-Senior level,Bachelor's Degree,Computer Software,Sales,0
4,5,Bill Review Manager,"US, FL, Fort Worth",,,SpotSource Solutions LLC is a Global Human Cap...,JOB TITLE: Itemization Review ManagerLOCATION:...,QUALIFICATIONS:RN license in the State of Texa...,Full Benefits Offered,0,1,1,Full-time,Mid-Senior level,Bachelor's Degree,Hospital & Health Care,Health Care Provider,0
