In [1]:
import re
import requests
from pprint import pp
import pandas as pd
import json
import math

The US_Cities file contains a list of all the USA cities that can be passed as an api parameter for TheMuse's api.  The list contains nearly 2400 cities so its saved to another file instead of clogging up this file.  The process that was used to generate this list is included in the US_Cities file.

In [5]:
import US_Cities as us

This function is a simple api call from TheMuse assuming a single location is passed.  It returns the results of a single api page and the number of pages that need to accessed to extract the full results.

In [3]:
def muse_api_call(category, location, sort_order="d", page=0):
    url = 'https://www.themuse.com/api/public/jobs'
    headers = {"Content-type": "application/json"}

    if sort_order in ["d", "desc", "descending"]: sort_value = "true"
    else: sort_value = "false"

    params = {'category':category, 'location': location, 'page': page, 'descending': sort_value}
    resp = requests.get(url, params=params, headers=headers).json()

    return resp

Since the api can't handle the entire list of USA cities we had to break the list up into pieces.  The results with 600 cities passed was in the 110-150 page range, depending on the portion of the list.  Since we can't access past page 99 we decided to make two api calls, one in descending order and one in ascending order.  So we add the first half of the pages to a dictionary using the descending order results and then add the second half of the results using the ascending order results.

We stored each job listing as an entry into a dictionary with the id of the job listing being the key and the contents of the job listing being the value.  This elminated any duplicates in our results.

In [14]:
def get_all_muse_pages(category, location):
    results = {}
    loc_lmt = 600
    
    while loc_lmt-600 < len(location):
        resp = muse_api_call(category, location[loc_lmt-600:loc_lmt])
        pages = resp['page_count']
        for page in range(math.ceil(pages/2)):
            order = ['d', 'a']
            for ele in order:
                resp = muse_api_call(category, location[loc_lmt-600:loc_lmt], ele, page)
                for num in range(len(resp['results'])):
                    results[resp['results'][num]['id']] = resp['results'][num]

        loc_lmt += 600

    return results

In [15]:
data_analytics_results = get_all_muse_pages('Data and Analytics', us.US_Cities)

In [7]:
data_science_results = get_all_muse_pages('Data Science', us.US_Cities)

Below two lines write the saved dictionairies to a file

In [74]:
with open('data_analytics_results.json', 'w') as new_file:
    new_file.write(json.dumps(data_analytics_results))

In [97]:
with open('data_science_results.json', 'w') as new_file:
    new_file.write(json.dumps(data_science_results))