Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mass, massappct): backscraper for masscases.com #1001

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Apr 16, 2024

Helps solve #984

@grossir
Copy link
Contributor Author

grossir commented Apr 16, 2024

I have put the backscraper on the same file as the scraper, even when it targets a different source. After finishing it, I actually think the backscraper should be on its own file, because it targets a different site, and uses an exclusive extract_from_text. What do you think? Maybe juriscraper/opinions/united_states/state/mass_backscraper.py?

I normally wouldn't use juriscraper/opinions/united_states_backscrapers/mass.py since it is awkward to have two folders for what is basically the same thing (scraping opinions past or present), but in this case it seems it could also be a proper place for the scripts...

@mlissner
Copy link
Member

One of the reasons I liked the united_states_backscraper directory is because you can put one-off scripts in there and not worry too much if they go stale or stop working.

@flooie
Copy link
Contributor

flooie commented Jul 2, 2024

I tested this today and found it crashed

Traceback (most recent call last):
  File "/Users/Palin/Code/juriscraper/sample_caller.py", line 246, in main
    for site in site_yielder(
  File "/Users/Palin/Code/juriscraper/juriscraper/lib/importer.py", line 79, in site_yielder
    site._download_backwards(i)
  File "/Users/Palin/Code/juriscraper/juriscraper/opinions/united_states_backscrapers/state/mass.py", line 103, in _download_backwards
    self._process_html()
  File "/Users/Palin/Code/juriscraper/juriscraper/opinions/united_states_backscrapers/state/mass.py", line 55, in _process_html
    _, date_filed_str, name = row.xpath("td/text()")
ValueError: too many values to unpack (expected 3)

on the live site after a few iterations when it gets to 2024-07-02 12:24:55,029 - INFO: Now downloading case page at: http://masscases.com/425-449.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants