PDF link scraper/tester for Schroedinger's Code paper
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
check_links.py
db.py
link_data.py
links.sqlite
process_pdfs.py

README.md

ascl:1711.018

About LExTeS

LExTeS: Link Extraction and Testing Suite is a series of scripts written for the paper Schroedinger's Code: A Preliminary Study on Research Source Code Availability and Link Persistence in Astrophysics (ApJS, in press).

Requirements

Python 2.7 and the pyPdf and sqlite3 libraries.

Instructions

process_pdfs.py foo/*.pdf scrapes all links from the PDFs in foo/, filters out domains and protocols irrelevant to our paper, and stores the remaining links in links.sqlite, along with information about which paper each link is from.

check_links.py tests each link in the links.sqlite database and stores each result to the checks column of the database.

link_data.py displays information about the results of the check_links.py runs: how many links worked consistently, how many only worked in some checks, how many didn't work in any checks, etc.

Citation

If you use this code, please cite as: http://adsabs.harvard.edu/abs/2017ascl.soft11018R