Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcd is failing with InsanityException #1019

Closed
sentry-io bot opened this issue May 4, 2024 · 1 comment
Closed

dcd is failing with InsanityException #1019

sentry-io bot opened this issue May 4, 2024 · 1 comment

Comments

@sentry-io
Copy link

sentry-io bot commented May 4, 2024

Sentry Issue: COURTLISTENER-702

InsanityException: juriscraper.opinions.united_states.federal_district.dcd: Scraped meta data fields have differing lengths: {'case_dates': 619, 'case_names': 620, 'download_urls': 619, 'precedential_statuses': 620, 'blocked_statuses': 620, 'date_filed_is_approximate': 620, 'docket_document_numbers': 619, 'docket_numbers': 619, 'judges': 619, 'nature_of_suit': 619, 'case_name_shorts': 620}
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 387, in handle
    self.parse_and_scrape_site(mod, options["full_crawl"])
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 350, in parse_and_scrape_site
    site = mod.Site().parse()

Problem is causes by this line, which creates an extra case name

<td>Civil Action No. 2014-1904<br>NYAMBAL v. ALLIED BARTON SECURITY SERVICES, LLC (<b><font color="red">LEAVE OF COURT REQUIRED FOR FILINGS. </font></b>)</td>

Filed by @grossir

grossir added a commit to grossir/juriscraper that referenced this issue May 4, 2024
Solves freelawproject#1019

Updating base class to OpinionSiteLinear makes code shorter and cleaner, also solves the following bug:

InsanityException caused by unexpected extra line of text in case name cell: "LEAVE OF COURT REQUIRED FOR FILINGS.", which inflated the case_name count by 1
grossir added a commit to grossir/juriscraper that referenced this issue May 6, 2024
Solves freelawproject#1019

Updating base class to OpinionSiteLinear makes code shorter and cleaner, also solves the following bug:

InsanityException caused by unexpected extra line of text in case name cell: "LEAVE OF COURT REQUIRED FOR FILINGS.", which inflated the case_name count by 1
@grossir
Copy link
Contributor

grossir commented Jun 12, 2024

This is working properly now, but we have a gap from the time it tooks us to fix the error
Closing this issue, will track the gap on #929

@grossir grossir closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant