Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(coloctapp): dynamic backscraper #1011

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Apr 19, 2024

Helps solve #979

Since old opinions for coloctapp are inside PDFs, this script scrapes new https://research.coloradojudicial.gov/ search interface, which has a vlex backend

grossir and others added 2 commits April 19, 2024 14:01
Helps solve freelawproject#979

Since old opinions for coloctapp are inside PDFs, this script scrapes new https://research.coloradojudicial.gov/  search interface, which has a vlex backend
@quevon24 quevon24 self-requested a review June 1, 2024 01:13
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the command to fill the gaps is incorrect:

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.coloctapp --backscrape-start=09/28/2021 --backscrape-end=02/01/2022

the correct should be:

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states_backscrapers.state.coloctapp --backscrape --backscrape-start=09/28/2021 --backscrape-end=02/01/2022

I tried to run it several times but it always gives me the same error:

Traceback (most recent call last):
  File "/home/quevon24/PycharmProjects/juriscraper/sample_caller.py", line 253, in main
    site.parse()
  File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/AbstractSite.py", line 145, in parse
    self.__setattr__(attr, getattr(self, f"_get_{attr}")())
  File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/OpinionSiteLinear.py", line 29, in _get_case_dates
    return [convert_date_string(case["date"]) for case in self.cases]
  File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/OpinionSiteLinear.py", line 29, in <listcomp>
    return [convert_date_string(case["date"]) for case in self.cases]
KeyError: 'date'

let me know if you want me to try anything special 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

None yet

2 participants