Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(OpinionSiteLinear): implement all optional attribute getters #1029

Merged

Conversation

grossir
Copy link
Contributor

@grossir grossir commented May 10, 2024

Solves #1028

  • Some updated example files were picking up the "disposition" field, which matches OpinionCluster.disposition . This was not passed to CL, since the getter was not implemented
  • Fixed scrapers: ca7, armfor, haw, mo, tenn, nmariana, virginislands had invalid keys that were not used by any getter, and corrected their example files
  • extend _check_sanity for OpinionSiteLinear to validate key names
  • add support for more optional fields: authors, joined_by, per_curiam, Opinion.type

Solves freelawproject#1028

All updated example files were picking up the "disposition" field, which matches OpinionCluster.disposition . This was not passed to CL, since the getter was not implemented
@grossir grossir requested a review from flooie May 10, 2024 01:17
Solves freelawproject#1028

- Fixed scrapers: ca7, armfor, haw, mo, tenn, nmariana, virginislands had invalid keys that were not used by any getter, and corrected their example files
- extend _check_sanity for OpinionSiteLinear to validate key names
- add support for more optional fields: authors, joined_by, per_curiam, Opinion.type
@grossir grossir requested a review from quevon24 June 8, 2024 02:22
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing:


ERROR: test_scrape_opinion_state_example_files (test_ScraperExampleTest.ScraperExampleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/juriscraper/juriscraper/tests/local/test_ScraperExampleTest.py", line 201, in test_scrape_opinion_state_example_files
    self.run_tests_on_module_str(
  File "/home/runner/work/juriscraper/juriscraper/tests/local/test_ScraperExampleTest.py", line 96, in run_tests_on_module_str
    site.parse()
  File "/home/runner/work/juriscraper/juriscraper/juriscraper/AbstractSite.py", line 154, in parse
    self._check_sanity()
  File "/home/runner/work/juriscraper/juriscraper/juriscraper/OpinionSiteLinear.py", line 161, in _check_sanity
    raise KeyError(
KeyError: "Invalid key 'divisions' for case dictionary juriscraper.opinions.united_states.state.calctapp_1st"

@grossir
Copy link
Contributor Author

grossir commented Jun 12, 2024

@quevon24 tests are passing now, please give it another check

@grossir grossir requested a review from quevon24 June 12, 2024 18:40
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great.

The implementation of the new getters is correct
All the updated scrapers are correct (i tested one by one, there are not too many)
All the updated examples works correctly (test suite passes successfully)

I think the implementation of the _check_sanity method was a good idea. This makes it easier to identify if there are any implementation errors in the OpinionSiteLinear type scraper.

I think it's ready to merge. Do you have any feedback @flooie?

@@ -6,6 +6,7 @@
"precedential_statuses": "Published",
"blocked_statuses": false,
"date_filed_is_approximate": false,
"docket_document_numbers": "456",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have this field now but im not where it's used in our db model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that problem is bigger than this single ticket, I am just trying to standardize what already exists.

@@ -7,8 +7,14 @@
"blocked_statuses": false,
"date_filed_is_approximate": false,
"docket_numbers": "",
"judges": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we get judges here and judge elsewhere

Copy link
Contributor Author

@grossir grossir Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single return interface, defined by AbstractSite. We should be getting "judges" as a key everywhere, because Courtlistener never uses a "judge" key

Apart from that, it shouldn't return a list, but I wasn't checking values, only checking key names. The list type comes from way back
2ec36d8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so this is a mistake (presumably I made) here but it shouldn't affect this pr

@@ -7,6 +7,7 @@
"blocked_statuses": false,
"date_filed_is_approximate": false,
"docket_numbers": "SCT-CIV-2022-0046",
"judges": "Hodge, Rhys S.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and how is this a string and the other an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my answer to your comment on nmariana_example.compare.json

Copy link
Contributor

@flooie flooie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by some of the name changes here and how they are implemented. Can you explain some of the nuance here.

@grossir
Copy link
Contributor Author

grossir commented Jul 3, 2024

I answered to your questions, which I think mostly have to do with other problems. In this PR I am just trying to standardize naming and enforce that standard, so that we don't return names that won't be used anyway, which makes it seem we collect data when we don't

I think this PR should be merged if it fullfils that purpose, and the related problems be addressed somewhere else, otherwise we will try to solve everything at once but that usually takes a lot more time

@grossir grossir requested a review from flooie July 3, 2024 22:49
@flooie
Copy link
Contributor

flooie commented Jul 8, 2024

Thanks @grossir - looks like this is going to make us stronger.

self.cases.append(
{
"url": item["link"],
"docket": docket,
"date": item["published"],
"name": name,
"status": "Published",
"author": item["summary"].split("{")[1].split("}")[0],
"judge": author,
"author": author,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

author!!!

@flooie flooie merged commit 0f75250 into freelawproject:main Jul 8, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants