Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add set serialization to ScrapyJSONEncoder #2058

Merged
merged 1 commit into from Aug 12, 2016

Conversation

dalleng
Copy link
Contributor

@dalleng dalleng commented Jun 17, 2016

Tried using sets a few times as a way to get a list that avoid duplicates in field values. Only to remember that neither Python or Scrapy's encoder can serialize them.

@codecov-io
Copy link

codecov-io commented Jun 17, 2016

Current coverage is 83.47% (diff: 100%)

Merging #2058 into master will increase coverage by <.01%

Powered by Codecov. Last update ec1c615...e17fdd7

@kmike
Copy link
Member

kmike commented Jul 11, 2016

I think this change is fine.

The problem with set serialization is that you get a list back, which has a different time complexity at lookups. In your example after duplicate check will run in O(1) before the change, and in O(N) after. But for items export it looks fine.

@@ -14,7 +14,9 @@ class ScrapyJSONEncoder(json.JSONEncoder):
TIME_FORMAT = "%H:%M:%S"

def default(self, o):
if isinstance(o, datetime.datetime):
if isinstance(o, set):
return list(o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if ScrapyJSONEncoder.default is applied for each element of a set? E.g. is it possible to serialize a set of datetime.datetime objects? It'd be nice to have a test for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check that.

@redapple
Copy link
Contributor

hi @dalleng , have you have time to check @kmike 's comment?

@dalleng
Copy link
Contributor Author

dalleng commented Jul 22, 2016

@redapple Yes, just haven't had much spare time recently.

@kmike On a more general sense there's no way to serialize to JSON and get a set back when deserializing, so that's probably why this feature is not already in the json module. I added this PR because using a set to eliminate duplicates is a common python pattern and thinking of items export as the intended use case.

@dalleng
Copy link
Contributor Author

dalleng commented Jul 22, 2016

Updated the PR

s = {'foo'}
ss = ['foo']
dt_set = {dt}
dt_sets = ["2010-01-02 10:11:12"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be [dts]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same. But I'll change to reuse the dts variable.

@kmike kmike changed the title Add set serialization to ScrapyJSONEncoder [MRG+1] Add set serialization to ScrapyJSONEncoder Aug 11, 2016
@redapple redapple merged commit 9a734e6 into scrapy:master Aug 12, 2016
@kmike
Copy link
Member

kmike commented Aug 12, 2016

Thanks @dalleng!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants