define global var + remove set() [slow] #45

merged 1 commit into from Feb 29, 2016


None yet

3 participants

sciunto commented Jan 17, 2016

See #44

This makes some speed improvements on datetimes.

@swistakm swistakm commented on the diff Feb 23, 2016
@@ -13,6 +13,12 @@
'm': -12, 'y': 12,
'met': 1, 'mest': 2,
+daynames = ('mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun')
swistakm Feb 23, 2016

How does using tuple instead of set makes the performance better?

sciunto Feb 23, 2016

When set() is called, the content is transformed (potentially) to make each element unique. A tuple does not do that.
Since daynames is supposed to be immutable, a tuple is more appropriated here.

swistakm Feb 23, 2016

Elements in daynames are already unique and since it moved to the global namespace the set still would be better. It is only used for checking if a date part is contained in this collection (line 45) so for tuple this would mean unnecessary iteration because it is still sequence. In case of miss this is a scan over whole tuple content - O(n). Sets in Python allow for fast lookups and efficient membership testing using in operator - O(1). See PythonSpeed as a reference.

I assume that redundant set creation on every call was reducing the performance but once it is moved to the global scope there is a substantial gain of using it instead of list or tuple. Mutability is not a problem here. Also, you can move further and use its immutable counterpart - frozenset. I don't think if it can give any time performance advantage here but for sure it is more memory efficient because it does not require excessive overallocation of underlying hash table structure.

sciunto Feb 23, 2016

Ok, we can just move the declaration outside the function if you prefer.

@kurtmckee kurtmckee merged commit 6f54b5f into kurtmckee:develop Feb 29, 2016

François, thanks for this patch! I added you to the list of contributors and modified the daynames to use a set. Mutability is irrelevant because monkey patching is possible (something that feedparser used to do, breaking other libraries in larger applications that depended on sgmllib). =)

sciunto commented Feb 29, 2016

Great! Thanks.

@sciunto sciunto deleted the sciunto:datetimes branch Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment