Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the pacer_url property on PACER documents to use show_case_doc URLs #774

Closed
mlissner opened this issue Nov 20, 2017 · 1 comment
Closed

Comments

@mlissner
Copy link
Member

Two things here, per @johnhawkinson's discovery in freelawproject/recap#214, we a have a way of looking up a doc1 ID from a document number/attachment_number/case_number triplet.

We can use this to:

  1. Update every document we currently have to add the doc1 ID. This will be a marvelous improvement for the historical data before the doc1's were prevalent.

  2. Tweak the pacer_url property so that if we don't have the pacer_doc_id value, we can take the user directly to the document URL instead of taking them to the docket, as we do presently.

2 will be easy. 1 will take a bit of work, but should be fairly easy too.

mlissner added a commit to freelawproject/juriscraper that referenced this issue Dec 8, 2017
PACER has a way of taking a document number and case number and getting a doc1 URL in response.
This code lands an API to use that system as carefully as possible.

freelawproject/courtlistener#774
mlissner added a commit that referenced this issue Dec 8, 2017
pacer_doc_ids are available via a link we recently discovered called show_case_doc. The input for
the link is the document number, attachment number, and case number, and in return it gives you the
pacer_doc_id.

The code in this commit sets us up to get about 3.5M of those IDs that we're currently missing.
Partially addresses #774
@mlissner
Copy link
Member Author

mlissner commented Dec 8, 2017

OK, there's a mega scrape now happening to get the pacer_doc_id values that we're currently lacking. That'll only work on non-bankruptcy PACER courts, but it'll still be a huge improvement.

Any item that lacks a pacer_doc_id is now updated to have a better URL (this works even on bankruptcy courts, it's just that the bankruptcy courts don't do the nice, scrapable redirection we get in normal district courts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant