Scraping for Library Jobs

For this project, I used Python to attempt to scrape job listings from three popular professional associations' websites:

Society of American Archivists (SAA)
American Library Association (ALA)
Association for Information Science and Technology (ASIS&T)

I first scraped the websites for links to detailed job postings and stored the URLs in a JSON file. Then I scraped each detailed job page (with different code to suit each websites' format), and saved this data in a JSON file. I successfully scraped the ASIS&T and ALA websites, but couldn't scrape the SAA website, so my dataset only includes jobs from the last two professional associations. Lastly, I compiled the ASIS&T and ALA jobs into one JSON file (master_job_list.json).

For this project I used four modules:

Requests (An HTTP library that provides a shortcut for making HTML requests)
Beautiful Soup (Helps with searching and pulling data from websites structured with HTML or XML)
JSON (Used to create and edit .json file types)
Time (Used to add time between executions of the code)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
README.md		README.md
ala_job_list.json		ala_job_list.json
ala_page_scrape.py		ala_page_scrape.py
ala_url_scrape.py		ala_url_scrape.py
ala_urls.json		ala_urls.json
asist_job_list.json		asist_job_list.json
asist_linkedin_urls.json		asist_linkedin_urls.json
asist_page_scrape.py		asist_page_scrape.py
asist_url_scrape.py		asist_url_scrape.py
asist_urls.json		asist_urls.json
job_join.py		job_join.py
master_job_list.json		master_job_list.json
saa_job_list.json		saa_job_list.json
saa_page_scrape.py		saa_page_scrape.py
saa_url_scrape.py		saa_url_scrape.py
saa_urls.json		saa_urls.json

emillikendetro/LibraryJobs

Folders and files

Latest commit

History

Repository files navigation

Scraping for Library Jobs

About

Resources

Stars

Watchers

Forks

Languages