Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add set serialization to ScrapyJSONEncoder #2058

Merged
merged 1 commit into from Aug 12, 2016

Conversation

@dalleng
Copy link
Contributor

@dalleng dalleng commented Jun 17, 2016

Tried using sets a few times as a way to get a list that avoid duplicates in field values. Only to remember that neither Python or Scrapy's encoder can serialize them.

@codecov-io
Copy link

@codecov-io codecov-io commented Jun 17, 2016

Current coverage is 83.47% (diff: 100%)

Merging #2058 into master will increase coverage by <.01%

Powered by Codecov. Last update ec1c615...e17fdd7

@kmike
Copy link
Member

@kmike kmike commented Jul 11, 2016

I think this change is fine.

The problem with set serialization is that you get a list back, which has a different time complexity at lookups. In your example after duplicate check will run in O(1) before the change, and in O(N) after. But for items export it looks fine.

@@ -14,7 +14,9 @@ class ScrapyJSONEncoder(json.JSONEncoder):
TIME_FORMAT = "%H:%M:%S"

def default(self, o):
if isinstance(o, datetime.datetime):
if isinstance(o, set):
return list(o)

This comment has been minimized.

@kmike

kmike Jul 11, 2016
Member

Do you know if ScrapyJSONEncoder.default is applied for each element of a set? E.g. is it possible to serialize a set of datetime.datetime objects? It'd be nice to have a test for that.

This comment has been minimized.

@dalleng

dalleng Jul 22, 2016
Author Contributor

Will check that.

@redapple
Copy link
Contributor

@redapple redapple commented Jul 22, 2016

hi @dalleng , have you have time to check @kmike 's comment?

@dalleng
Copy link
Contributor Author

@dalleng dalleng commented Jul 22, 2016

@redapple Yes, just haven't had much spare time recently.

@kmike On a more general sense there's no way to serialize to JSON and get a set back when deserializing, so that's probably why this feature is not already in the json module. I added this PR because using a set to eliminate duplicates is a common python pattern and thinking of items export as the intended use case.

@dalleng dalleng force-pushed the dalleng:serialize_set branch from b1078f8 to 336e28f Jul 22, 2016
@dalleng
Copy link
Contributor Author

@dalleng dalleng commented Jul 22, 2016

Updated the PR

s = {'foo'}
ss = ['foo']
dt_set = {dt}
dt_sets = ["2010-01-02 10:11:12"]

This comment has been minimized.

@kmike

kmike Jul 22, 2016
Member

shouldn't it be [dts]?

This comment has been minimized.

@dalleng

dalleng Jul 22, 2016
Author Contributor

It's the same. But I'll change to reuse the dts variable.

@dalleng dalleng force-pushed the dalleng:serialize_set branch from 336e28f to e17fdd7 Jul 22, 2016
@kmike kmike changed the title Add set serialization to ScrapyJSONEncoder [MRG+1] Add set serialization to ScrapyJSONEncoder Aug 11, 2016
@redapple redapple merged commit 9a734e6 into scrapy:master Aug 12, 2016
2 checks passed
2 checks passed
codecov/patch 100% of diff hit (target 100%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kmike
Copy link
Member

@kmike kmike commented Aug 12, 2016

Thanks @dalleng!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.