Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all "optional" attributes getters to OpinionSiteLinear, and correct failing sources #1028

Open
grossir opened this issue May 9, 2024 · 3 comments

Comments

@grossir
Copy link
Contributor

grossir commented May 9, 2024

This surfaced on #1020, where the _get_docket_document_numbers getter was needed for the new OpinionSiteLinear scraper to keep the old behavior. However, when adding this and other getters to the base OpinionSiteLinear class, it breaks the tests because some fields that were supossed to be scraped, were actually not passed into the scrape caller. They were present on site.cases[0], but not on site[0]. With the getter change, theyare now added to the example tests. For example, on fla and "disposition": disposition.text_content().strip(),

grossir added a commit to grossir/juriscraper that referenced this issue May 10, 2024
Solves freelawproject#1028

All updated example files were picking up the "disposition" field, which matches OpinionCluster.disposition . This was not passed to CL, since the getter was not implemented
grossir added a commit to grossir/juriscraper that referenced this issue May 10, 2024
Solves freelawproject#1028

All updated example files were picking up the "disposition" field, which matches OpinionCluster.disposition . This was not passed to CL, since the getter was not implemented
@grossir
Copy link
Contributor Author

grossir commented May 10, 2024

It may be worth to check freelawproject/courtlistener#4042, maybe we can get rid of some of the unused attributes in this PR

@grossir
Copy link
Contributor Author

grossir commented May 13, 2024

I found another type of silent failure for OpinionSiteLinear
For example, in haw, we collect a "lower_courts" key. However, this won't appear on the final dictionary used in CL, nor in the example file because the OpinionSiteLinear getter expects a "lower_court" key. So, it fails silently

"lower_courts": lower_court.text_content(),

A similar instance is in mo.py, where "judge" and "judges" keys are both collected, and only the first one is used

"judge": author,
"judges": vote.split(".", 1)[1].strip(),

I will write a test to catch this problem of unused keys in OpinionSite Linear, and will also correct the straightforward ones

@grossir
Copy link
Contributor Author

grossir commented May 14, 2024

I have extended AbstractSite._check_sanity for OpinionSiteLinear, to check for "wrong" case key strings, which were not being used by the OpinionSiteLinear cases. Found and corrected these:

KeyError: "Invalid key 'author' for case dictionary juriscraper.opinions.united_states.federal_appellate.ca7"
KeyError: "Invalid key 'cite' for case dictionary juriscraper.opinions.united_states.federal_special.armfor"
KeyError: "Invalid key 'lower_courts' for case dictionary juriscraper.opinions.united_states.state.haw"
KeyError: "Invalid key 'judges' for case dictionary juriscraper.opinions.united_states.territories.nmariana"
KeyError: "Invalid key 'dispostion' for case dictionary juriscraper.opinions.united_states.state.mo" (this one is a typo, should be 'disposition')
KeyError: "Invalid key 'judges' for case dictionary juriscraper.opinions.united_states.state.mo"
KeyError: "Invalid key 'judges' for case dictionary juriscraper.opinions.united_states.state.tenn"

grossir added a commit to grossir/juriscraper that referenced this issue May 14, 2024
Solves freelawproject#1028

- Fixed scrapers: ca7, armfor, haw, mo, tenn, nmariana, virginislands had invalid keys that were not used by any getter, and corrected their example files
- extend _check_sanity for OpinionSiteLinear to validate key names
- add support for more optional fields: authors, joined_by, per_curiam, Opinion.type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant