Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Pydantic #1068

Merged
merged 15 commits into from
Apr 12, 2024
Merged

Use Pydantic #1068

merged 15 commits into from
Apr 12, 2024

Conversation

jpmckinney
Copy link
Member

@jpmckinney jpmckinney commented Apr 11, 2024

Builds on #1066

#995

Some differences:

  • pydantic validates on instantiation (and we opt-in to also validate on assignment). This means the full item needs to be instantiated at once, instead of composed over several instructions. This currently works with how we do things. This mainly means a programming error will be raised early, rather than once an item reaches the pipeline.
  • pydantic casts values, unless strict is set to true. We use strict where possible, but there is no strict option for dicts, so we add our own validator.
  • There was a weird bug where, if a pydantic item is returned instead of yielded, then some code somewhere iterates over it, which turns it into tuples. Anyway, we should always yield.

@jpmckinney jpmckinney requested a review from yolile April 11, 2024 22:33
@jpmckinney
Copy link
Member Author

Tests pass and I also ran:

scrapy crawlall --sample 1 --loglevel=ERROR

I added this to the crawall command to skip the ones with authentication and the many broken Digiwhist spiders:

if spider_name.startswith(('openopps', 'paraguay')) or spider_name.endswith('digiwhist'):
    continue

@yolile
Copy link
Member

yolile commented Apr 12, 2024

Ah, nice and clear!

@yolile yolile merged commit 3087eec into main Apr 12, 2024
10 checks passed
@yolile yolile deleted the 995-pydantic branch April 12, 2024 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants