No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.md
ala_job_list.json
ala_page_scrape.py
ala_url_scrape.py
ala_urls.json
asist_job_list.json
asist_linkedin_urls.json
asist_page_scrape.py
asist_url_scrape.py
asist_urls.json
job_join.py
master_job_list.json
saa_job_list.json
saa_page_scrape.py
saa_url_scrape.py
saa_urls.json

README.md

Scraping for Library Jobs

For this project, I used Python to attempt to scrape job listings from three popular professional associations' websites:

  1. Society of American Archivists (SAA)
  2. American Library Association (ALA)
  3. Association for Information Science and Technology (ASIS&T)

I first scraped the websites for links to detailed job postings and stored the URLs in a JSON file. Then I scraped each detailed job page (with different code to suit each websites' format), and saved this data in a JSON file. I successfully scraped the ASIS&T and ALA websites, but couldn't scrape the SAA website, so my dataset only includes jobs from the last two professional associations. Lastly, I compiled the ASIS&T and ALA jobs into one JSON file (master_job_list.json).

For this project I used four modules:

  1. Requests (An HTTP library that provides a shortcut for making HTML requests)
  2. Beautiful Soup (Helps with searching and pulling data from websites structured with HTML or XML)
  3. JSON (Used to create and edit .json file types)
  4. Time (Used to add time between executions of the code)