New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Add set serialization to ScrapyJSONEncoder #2058
Conversation
Current coverage is 83.47% (diff: 100%)
|
I think this change is fine. The problem with set serialization is that you get a list back, which has a different time complexity at lookups. In your example after duplicate check will run in O(1) before the change, and in O(N) after. But for items export it looks fine. |
@@ -14,7 +14,9 @@ class ScrapyJSONEncoder(json.JSONEncoder): | |||
TIME_FORMAT = "%H:%M:%S" | |||
|
|||
def default(self, o): | |||
if isinstance(o, datetime.datetime): | |||
if isinstance(o, set): | |||
return list(o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if ScrapyJSONEncoder.default is applied for each element of a set? E.g. is it possible to serialize a set of datetime.datetime objects? It'd be nice to have a test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check that.
@redapple Yes, just haven't had much spare time recently. @kmike On a more general sense there's no way to serialize to JSON and get a set back when deserializing, so that's probably why this feature is not already in the json module. I added this PR because using a set to eliminate duplicates is a common python pattern and thinking of items export as the intended use case. |
Updated the PR |
s = {'foo'} | ||
ss = ['foo'] | ||
dt_set = {dt} | ||
dt_sets = ["2010-01-02 10:11:12"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be [dts]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same. But I'll change to reuse the dts
variable.
Thanks @dalleng! |
Tried using sets a few times as a way to get a list that avoid duplicates in field values. Only to remember that neither Python or Scrapy's encoder can serialize them.