-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(OpinionSiteLinear): implement all optional attribute getters #1029
fix(OpinionSiteLinear): implement all optional attribute getters #1029
Conversation
Solves freelawproject#1028 All updated example files were picking up the "disposition" field, which matches OpinionCluster.disposition . This was not passed to CL, since the getter was not implemented
Solves freelawproject#1028 - Fixed scrapers: ca7, armfor, haw, mo, tenn, nmariana, virginislands had invalid keys that were not used by any getter, and corrected their example files - extend _check_sanity for OpinionSiteLinear to validate key names - add support for more optional fields: authors, joined_by, per_curiam, Opinion.type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing:
ERROR: test_scrape_opinion_state_example_files (test_ScraperExampleTest.ScraperExampleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/juriscraper/juriscraper/tests/local/test_ScraperExampleTest.py", line 201, in test_scrape_opinion_state_example_files
self.run_tests_on_module_str(
File "/home/runner/work/juriscraper/juriscraper/tests/local/test_ScraperExampleTest.py", line 96, in run_tests_on_module_str
site.parse()
File "/home/runner/work/juriscraper/juriscraper/juriscraper/AbstractSite.py", line 154, in parse
self._check_sanity()
File "/home/runner/work/juriscraper/juriscraper/juriscraper/OpinionSiteLinear.py", line 161, in _check_sanity
raise KeyError(
KeyError: "Invalid key 'divisions' for case dictionary juriscraper.opinions.united_states.state.calctapp_1st"
@quevon24 tests are passing now, please give it another check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks great.
The implementation of the new getters is correct
All the updated scrapers are correct (i tested one by one, there are not too many)
All the updated examples works correctly (test suite passes successfully)
I think the implementation of the _check_sanity method was a good idea. This makes it easier to identify if there are any implementation errors in the OpinionSiteLinear type scraper.
I think it's ready to merge. Do you have any feedback @flooie?
@@ -6,6 +6,7 @@ | |||
"precedential_statuses": "Published", | |||
"blocked_statuses": false, | |||
"date_filed_is_approximate": false, | |||
"docket_document_numbers": "456", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we have this field now but im not where it's used in our db model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that problem is bigger than this single ticket, I am just trying to standardize what already exists.
@@ -7,8 +7,14 @@ | |||
"blocked_statuses": false, | |||
"date_filed_is_approximate": false, | |||
"docket_numbers": "", | |||
"judges": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we get judges here and judge elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a single return interface, defined by AbstractSite. We should be getting "judges" as a key everywhere, because Courtlistener never uses a "judge" key
Apart from that, it shouldn't return a list, but I wasn't checking values, only checking key names. The list type comes from way back
2ec36d8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so this is a mistake (presumably I made) here but it shouldn't affect this pr
@@ -7,6 +7,7 @@ | |||
"blocked_statuses": false, | |||
"date_filed_is_approximate": false, | |||
"docket_numbers": "SCT-CIV-2022-0046", | |||
"judges": "Hodge, Rhys S.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and how is this a string and the other an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my answer to your comment on nmariana_example.compare.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by some of the name changes here and how they are implemented. Can you explain some of the nuance here.
I answered to your questions, which I think mostly have to do with other problems. In this PR I am just trying to standardize naming and enforce that standard, so that we don't return names that won't be used anyway, which makes it seem we collect data when we don't I think this PR should be merged if it fullfils that purpose, and the related problems be addressed somewhere else, otherwise we will try to solve everything at once but that usually takes a lot more time |
Thanks @grossir - looks like this is going to make us stronger. |
self.cases.append( | ||
{ | ||
"url": item["link"], | ||
"docket": docket, | ||
"date": item["published"], | ||
"name": name, | ||
"status": "Published", | ||
"author": item["summary"].split("{")[1].split("}")[0], | ||
"judge": author, | ||
"author": author, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
author!!!
Solves #1028