This notebook hits the federal USA jobs database to retrieve both current and archived jobs based on given keywords and/or dates. **To get this working, make sure you have a folder called "collected_data" in the same directory level as this notebook.** The collected datasets will be dropped into that folder. 

To use this, simply input your parameters in the section below. Click on the "Cell" tab then "Run All". 

API Documentation: https://developer.usajobs.gov/API-Reference/GET-api-Search

### FILENAME CONVENTIONS:

See examples below:

**datascience_N_346_20171223111701.csv** for current jobs:

* Keyword = datascience
* Y/N (Y = Archived/N = Current)
* Number of results = 346
* Current Datetime stamp = 20171223111701 (2017-12-23 on 11:11am)

**geospatial_Y_4_20171223112709_12012016_12302016.csv** for archived jobs:

* Keyword = geospatial
* Y/N (Y = Archived/N = Current)
* Number of results = 4
* Current Datetime stamp = 20171223112709 (2017-12-23 on 11:27am)
* From Archived Date = 12012016 (12/01/2016)
* To Archived Date = 12302016 (12/30/2016)




## INPUT PARAMETERS:

In [497]:
# API Key 
# Request one at:
# https://developer.usajobs.gov/APIRequest/Index

apiKey = "REQUEST YOUR API KEY AND INSERT IT HERE"

In [498]:
# specifies whether results are current job postings or archived job postings
# Y = Archived posts
# N = Current posts

archive = "N"

In [499]:
# specifies the number of results to retrieve
# only for current searches

resultsPerPage = 500

In [500]:
# date range
# ONLY FOR ARCHIVED JOB POSTINGS

# format: MM/DD/YYYY

startDate = "01/01/2016"
endDate = "12/31/2016"

In [501]:
# Position Title

title = "analysis"

## CODE BELOW:

In [502]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import time
import datetime
import os

In [503]:
urlSearch = "https://data.usajobs.gov/api/search?Keyword=" + title + "&KeywordFilter=All"

In [504]:
urlArchive = "https://data.usajobs.gov/api/Archive?PositionTitle=" + title

In [505]:
if (archive == "Y"):
    url = urlArchive
    url = url + "&PostingStartDate=" + startDate
    url = url + "&PostingEndDate=" + endDate
else:
    url = urlSearch
    url = url +  "&ResultsPerPage=" + str(resultsPerPage) 

In [506]:
print("API GET URL CALL:")
print(url)

API GET URL CALL:
https://data.usajobs.gov/api/search?Keyword=analysis&KeywordFilter=All&ResultsPerPage=500


In [507]:
headers = {"Authorization-Key": apiKey}

In [508]:
res = requests.get(url, headers = headers, verify = False)



In [509]:
print("API Response code: ", res.status_code)

API Response code:  200


In [510]:
response = res.json()

In [511]:
def parseResults(searchResults):
    
    ApplicationCloseDate = []
    ApplyURI = []
    JobGrade = []
    PositionStartDate = []
    PositionEndDate = []
    PositionID = []
    PositionLocation = []
    PositionLocationDisplay = []
    QualificationSummary = []
    OrganizationName = []
    DepartmentName = []
    OfferingType = []
    MinPay = []
    MaxPay = []
    PayType = []
    PositionTitle = []
    JobSummary = []
    
    for r in searchResults:
        if 'MatchedObjectDescriptor' in r:            
            rr =  r.get("MatchedObjectDescriptor", {})   
            
            r_ApplicationCloseDate = rr.get("ApplicationCloseDate", None)
            r_ApplyURI = rr.get("PositionURI", None)
            r_JobGrade = rr.get("JobGrade", None)[0]["Code"]
            r_PositionStartDate = rr.get("PositionStartDate", None)
            r_PositionEndDate = rr.get("PositionEndDate", None)
            r_PositionID = rr.get("PositionID", None)
            r_PositionLocationDisplay = rr.get("PositionLocationDisplay", None)
            r_QualificationSummary = rr.get("QualificationSummary", None)            
            r_OrganizationName = rr.get("OrganizationName", None)
            r_DepartmentName = rr.get("DepartmentName", None)    
            r_PositionLocation = rr.get("PositionLocation", None)       
            r_PositionTitle = rr.get("PositionTitle", None)
            
            locations = []
            for l in r_PositionLocation:
                locations.append(l.get("CityName", None)) 
            locations = '|'.join(str(locs) for locs in locations)
            
            r_OfferingType = rr.get("PositionOfferingType", None)[0]["Name"]
            r_MinPay = rr.get("PositionRemuneration")[0]["MinimumRange"]
            r_MaxPay = rr.get("PositionRemuneration")[0]["MaximumRange"]
            r_PayType = rr.get("PositionRemuneration")[0]["RateIntervalCode"] 
            r_JobSummary = rr.get("UserArea", {}).get("Details", {}).get("JobSummary", None)
            
            ApplicationCloseDate.append(r_ApplicationCloseDate)
            ApplyURI.append(r_ApplyURI)
            JobGrade.append(r_JobGrade)
            PositionEndDate.append(r_PositionEndDate)
            PositionID.append(r_PositionID)
            PositionLocationDisplay.append(r_PositionLocationDisplay)
            QualificationSummary.append(r_QualificationSummary)
            OrganizationName.append(r_OrganizationName)
            DepartmentName.append(r_DepartmentName)
            PositionLocation.append(locations)
            OfferingType.append(r_OfferingType)
            MinPay.append(r_MinPay)
            MaxPay.append(r_MaxPay)
            PayType.append(r_PayType)
            PositionStartDate.append(r_PositionStartDate)
            PositionTitle.append(r_PositionTitle)
            JobSummary.append(r_JobSummary)

    return pd.DataFrame({
            "PositionID": PositionID,
            "ApplicationCloseDate": ApplicationCloseDate,
            "JobGrade": JobGrade,
            "PositionEndDate": PositionEndDate,   
            "OrganizationName": OrganizationName, 
            "DepartmentName": DepartmentName,
            "QualificationSummary": QualificationSummary,
            "URI": ApplyURI,
            "PositionLocation": PositionLocation,
            "OfferingType": OfferingType,
            "MinPay": MinPay,
            "MaxPay": MaxPay,
            "PayType": PayType,
            "PositionStartDate": PositionStartDate,
            "PositionTitle": PositionTitle,
            "JobSummary": JobSummary
    })

In [512]:
def convertToDataframe(res):
    
    response = res.json()
    print("==============================================================")
    if ('SearchResult' in response):
        
        if (archive != "Y"):
            searchResultNumber = response.get("SearchResult", {}).get("SearchResultCountAll", 0)
        else:
            searchResultNumber = response.get("SearchResult", {}).get("SearchResultCount", 0)
            
        print(searchResultNumber, "results found in API response...")        
        
        if (searchResultNumber > 0):
            searchResults = response.get("SearchResult", {}).get("SearchResultItems", None)
            
            if (len(searchResults) > 0):
                t0 = time.time()
                print("Parsing in progress...")
                df = parseResults(searchResults).reset_index(drop = True)
                t1 = time.time()
                print("Parse complete. \nDuration: ", round(t1-t0, 5), " seconds.")
                print("Number of records: ", len(df))
                return df
            else:
                print("No Search Results.")        
        else:
            print("Search Result Number = 0.")                
    else:
        print("No SearchResult found in json response.")   

In [513]:
df = convertToDataframe(res)

1107 results found in API response...
Parsing in progress...
Parse complete. 
Duration:  0.00903  seconds.
Number of records:  500


In [514]:
now = datetime.datetime.now()

nowDate = str(now).split(" ")[0].replace("-", "")
nowHour = str(now).split(" ")[1].split(":")[0]
nowMin = str(now).split(" ")[1].split(":")[1]
nowSec = str(now).split(" ")[1].split(":")[2].split(".")[0]

nowString = nowDate + nowHour + nowMin + nowSec

searchDates = ""

if archive == "Y":
    searchDates = "_" + str(startDate) + "_" + str(endDate)
    searchDates = searchDates.replace("/", "")

fileName = title.replace(";", "") + "_" + archive + "_" + str(len(df)) + "_" + nowString + searchDates + ".csv"

print(fileName)

analysis_N_500_20171223141842.csv


In [515]:
df.to_csv(os.path.join("collected_data", fileName), index = False)

In [516]:
df.head().transpose()

Unnamed: 0,0,1,2,3,4
ApplicationCloseDate,2018-03-31,2018-03-31,2018-03-31,2017-12-29,2018-11-09
DepartmentName,Department of the Air Force,Department of the Air Force,Department of the Air Force,Other Agencies and Independent Organizations,Department of the Air Force
JobGrade,GS,GS,GS,CU,GS
JobSummary,The mission of the United States Air Force is ...,The mission of the United States Air Force is ...,The mission of the United States Air Force is ...,"At NCUA, differences make a difference. We val...",The mission of the United States Air Force is ...
MaxPay,155073.0000,155073.0000,155073.0000,190357.0000,134776.0000
MinPay,32844.0000,32844.0000,32844.0000,121246.0000,18526.0000
OfferingType,Multiple Appointment Types,Multiple Appointment Types,Multiple Appointment Types,4 years,Multiple Appointment Types
OrganizationName,Air Force Personnel Center,Air Force Personnel Center,Air Force Personnel Center,National Credit Union Administration,Air Force Materiel Command
PayType,Per Year,Per Year,Per Year,Per Year,Per Year
PositionEndDate,2018-03-31,2018-03-31,2018-03-31,2017-12-29,2018-11-09
