define global var + remove set() [slow] #45

Merged
merged 1 commit into from Feb 29, 2016

Projects

None yet

3 participants

@sciunto
sciunto commented Jan 17, 2016

See #44

This makes some speed improvements on datetimes.

@swistakm swistakm commented on the diff Feb 23, 2016
feedparser/datetimes/rfc822.py
@@ -13,6 +13,12 @@
'm': -12, 'y': 12,
'met': 1, 'mest': 2,
}
+daynames = ('mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun')
@swistakm
swistakm Feb 23, 2016

How does using tuple instead of set makes the performance better?

@sciunto
sciunto Feb 23, 2016

When set() is called, the content is transformed (potentially) to make each element unique. A tuple does not do that.
Since daynames is supposed to be immutable, a tuple is more appropriated here.

@swistakm
swistakm Feb 23, 2016

Elements in daynames are already unique and since it moved to the global namespace the set still would be better. It is only used for checking if a date part is contained in this collection (line 45) so for tuple this would mean unnecessary iteration because it is still sequence. In case of miss this is a scan over whole tuple content - O(n). Sets in Python allow for fast lookups and efficient membership testing using in operator - O(1). See PythonSpeed as a reference.

I assume that redundant set creation on every call was reducing the performance but once it is moved to the global scope there is a substantial gain of using it instead of list or tuple. Mutability is not a problem here. Also, you can move further and use its immutable counterpart - frozenset. I don't think if it can give any time performance advantage here but for sure it is more memory efficient because it does not require excessive overallocation of underlying hash table structure.

@sciunto
sciunto Feb 23, 2016

Ok, we can just move the declaration outside the function if you prefer.

@kurtmckee kurtmckee merged commit 6f54b5f into kurtmckee:develop Feb 29, 2016
@kurtmckee
Owner

François, thanks for this patch! I added you to the list of contributors and modified the daynames to use a set. Mutability is irrelevant because monkey patching is possible (something that feedparser used to do, breaking other libraries in larger applications that depended on sgmllib). =)

@sciunto
sciunto commented Feb 29, 2016

Great! Thanks.

@sciunto sciunto deleted the sciunto:datetimes branch Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment