New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define global var + remove set() [slow] #45

Merged
merged 1 commit into from Feb 29, 2016

Conversation

Projects
None yet
3 participants
@sciunto

sciunto commented Jan 17, 2016

See #44

This makes some speed improvements on datetimes.

@@ -13,6 +13,12 @@
'm': -12, 'y': 12,
'met': 1, 'mest': 2,
}
daynames = ('mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun')

This comment has been minimized.

@swistakm

swistakm Feb 23, 2016

How does using tuple instead of set makes the performance better?

@swistakm

swistakm Feb 23, 2016

How does using tuple instead of set makes the performance better?

This comment has been minimized.

@sciunto

sciunto Feb 23, 2016

When set() is called, the content is transformed (potentially) to make each element unique. A tuple does not do that.
Since daynames is supposed to be immutable, a tuple is more appropriated here.

@sciunto

sciunto Feb 23, 2016

When set() is called, the content is transformed (potentially) to make each element unique. A tuple does not do that.
Since daynames is supposed to be immutable, a tuple is more appropriated here.

This comment has been minimized.

@swistakm

swistakm Feb 23, 2016

Elements in daynames are already unique and since it moved to the global namespace the set still would be better. It is only used for checking if a date part is contained in this collection (line 45) so for tuple this would mean unnecessary iteration because it is still sequence. In case of miss this is a scan over whole tuple content - O(n). Sets in Python allow for fast lookups and efficient membership testing using in operator - O(1). See PythonSpeed as a reference.

I assume that redundant set creation on every call was reducing the performance but once it is moved to the global scope there is a substantial gain of using it instead of list or tuple. Mutability is not a problem here. Also, you can move further and use its immutable counterpart - frozenset. I don't think if it can give any time performance advantage here but for sure it is more memory efficient because it does not require excessive overallocation of underlying hash table structure.

@swistakm

swistakm Feb 23, 2016

Elements in daynames are already unique and since it moved to the global namespace the set still would be better. It is only used for checking if a date part is contained in this collection (line 45) so for tuple this would mean unnecessary iteration because it is still sequence. In case of miss this is a scan over whole tuple content - O(n). Sets in Python allow for fast lookups and efficient membership testing using in operator - O(1). See PythonSpeed as a reference.

I assume that redundant set creation on every call was reducing the performance but once it is moved to the global scope there is a substantial gain of using it instead of list or tuple. Mutability is not a problem here. Also, you can move further and use its immutable counterpart - frozenset. I don't think if it can give any time performance advantage here but for sure it is more memory efficient because it does not require excessive overallocation of underlying hash table structure.

This comment has been minimized.

@sciunto

sciunto Feb 23, 2016

Ok, we can just move the declaration outside the function if you prefer.

@sciunto

sciunto Feb 23, 2016

Ok, we can just move the declaration outside the function if you prefer.

@kurtmckee kurtmckee merged commit 6f54b5f into kurtmckee:develop Feb 29, 2016

@kurtmckee

This comment has been minimized.

Show comment
Hide comment
@kurtmckee

kurtmckee Feb 29, 2016

Owner

François, thanks for this patch! I added you to the list of contributors and modified the daynames to use a set. Mutability is irrelevant because monkey patching is possible (something that feedparser used to do, breaking other libraries in larger applications that depended on sgmllib). =)

Owner

kurtmckee commented Feb 29, 2016

François, thanks for this patch! I added you to the list of contributors and modified the daynames to use a set. Mutability is irrelevant because monkey patching is possible (something that feedparser used to do, breaking other libraries in larger applications that depended on sgmllib). =)

@sciunto

This comment has been minimized.

Show comment
Hide comment
@sciunto

sciunto Feb 29, 2016

Great! Thanks.

sciunto commented Feb 29, 2016

Great! Thanks.

@sciunto sciunto deleted the sciunto:datetimes branch Feb 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment