Write scraper for PACER NEF emails that can pull out the links and anything else #381

mlissner · 2021-04-19T21:10:59Z

One of the first things we're going to need to do is start parsing the NEF emails for links and other metadata that we can grab. I'll attach a few example emails in a sec.

mlissner · 2021-04-20T05:11:36Z

OK, getting these emails mostly anonymized took awhile, sorry for the delay. Here's two HTML emails (in a single file). I'm told there's also an option for plaintext emails, but I suspect few people actually want that and I guess we can put it off until the future if nobody actually turns that on (we'll get failing examples of those eventually if people are using it).
nef-examples.mbox.txt

mlissner · 2021-04-20T05:16:46Z

As far as fields for these go, I'd start by looking at the field names in the test assets directories, where we have dockets as HTML parsed to JSON. For example, these test fixtures probably have most if not all of the field names you need:

https://github.com/freelawproject/juriscraper/tree/master/tests/examples/pacer/dockets/district

And you can see how those are usually used, here:

https://github.com/freelawproject/juriscraper/blob/master/tests/local/test_DocketParseTest.py#L115

The parser itself might live in juriscraper.pacer.nef_email.py.

tewen self-assigned this May 14, 2021

tewen mentioned this issue May 14, 2021

NEF email parsers, v1. #384

Merged

tewen closed this as completed Sep 2, 2021

albertisfu mentioned this issue Feb 11, 2022

Hand off recap.email to @albertisfu freelawproject/courtlistener#1901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write scraper for PACER NEF emails that can pull out the links and anything else #381

Write scraper for PACER NEF emails that can pull out the links and anything else #381

mlissner commented Apr 19, 2021

mlissner commented Apr 20, 2021

mlissner commented Apr 20, 2021

Write scraper for PACER NEF emails that can pull out the links and anything else #381

Write scraper for PACER NEF emails that can pull out the links and anything else #381

Comments

mlissner commented Apr 19, 2021

mlissner commented Apr 20, 2021

mlissner commented Apr 20, 2021