New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching database structure has primary key at wrong field #2163

Closed
creativedutchmen opened this Issue Aug 13, 2014 · 12 comments

Comments

Projects
None yet
3 participants
@creativedutchmen
Member

creativedutchmen commented Aug 13, 2014

The current tbl_cache table has an id, hash, creation, expiry and data field.

The id here is completely redundant, as it's the hash that's used as the identifier in the database caching provider.

For this reason the primary key should have been on the hash, not on the id. Going even further, the id field could be completely removed - although there may be people using it for some reason, of course.

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 14, 2014

Member

Nice find. Thankfully we still have a KEY on the hash field so I don't think performance will be affected too much... infact, I think using the int(14) will be more performant than caching a VARCHAR field :)

Member

brendo commented Aug 14, 2014

Nice find. Thankfully we still have a KEY on the hash field so I don't think performance will be affected too much... infact, I think using the int(14) will be more performant than caching a VARCHAR field :)

@creativedutchmen

This comment has been minimized.

Show comment
Hide comment
@creativedutchmen

creativedutchmen Aug 14, 2014

Member

Thankfully we still have a KEY on the hash field

Yes, but it does not enforce uniqueness. And therefore it is useless as an identifier. The point here is not really performance but semantics. We are using the hash field as our primary key (all lookups and references are using the hash) so it should also be the primary key.

infact, I think using the int(14) will be more performant than caching a VARCHAR field :)

If we actually used the id, sure. Now the id's index is just using memory for nothing, as we still use the VARCHAR field for all our lookups.

Member

creativedutchmen commented Aug 14, 2014

Thankfully we still have a KEY on the hash field

Yes, but it does not enforce uniqueness. And therefore it is useless as an identifier. The point here is not really performance but semantics. We are using the hash field as our primary key (all lookups and references are using the hash) so it should also be the primary key.

infact, I think using the int(14) will be more performant than caching a VARCHAR field :)

If we actually used the id, sure. Now the id's index is just using memory for nothing, as we still use the VARCHAR field for all our lookups.

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 14, 2014

Member

Both very good points.

Member

brendo commented Aug 14, 2014

Both very good points.

@nitriques

This comment has been minimized.

Show comment
Hide comment
@nitriques

nitriques Aug 15, 2014

Member

I would also add that adding a hash index on that primary key would also speed things up. I know a lot more on that topic on Oracle DB, but I think the same can be acheived with mysql (http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html)

Normally indexes are B+ trees. Records are then stored on this randomly but can be filtered (>, <, =) easily since the index records are physically stored (on disk) in order. Filtering now cost O(n log n) instead of O(n).

But hashes are a special case. If you never need to sort them and only need the equality filter, you can now stored them using their hash as the location pointer on disk and achieve reads of O(1).

I will have to read more on mysql in order to find that but I think this would bring faster reads.

Member

nitriques commented Aug 15, 2014

I would also add that adding a hash index on that primary key would also speed things up. I know a lot more on that topic on Oracle DB, but I think the same can be acheived with mysql (http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html)

Normally indexes are B+ trees. Records are then stored on this randomly but can be filtered (>, <, =) easily since the index records are physically stored (on disk) in order. Filtering now cost O(n log n) instead of O(n).

But hashes are a special case. If you never need to sort them and only need the equality filter, you can now stored them using their hash as the location pointer on disk and achieve reads of O(1).

I will have to read more on mysql in order to find that but I think this would bring faster reads.

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 15, 2014

Member

Epic! That level of insight would be incredible. Look forward to what you
discover! I wonder what real world impact this change would make!
On 15 Aug 2014 11:50, "Nicolas Brassard" notifications@github.com wrote:

I would also add that adding a hash index on that primary key would also
speed things up. I know a lot more on that topic on Oracle DB, but I think
the same can be acheived with mysql (
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html)

Normally indexes are B+-trees. Records are then stored on this randomly
but can be filtered (>, <, =) easiely since they are physically stored (on
disk) in order. Filtering now cost O(n log n) instead of O(n).

But hashes are a special case. If you never need to sort them and only
need the equality filter, you can now stored them usign their hash as the
location pointer on disk an acheive reads of O(1).

I will have to read more on mysql in order to find that but I think this
would bring faster reads.


Reply to this email directly or view it on GitHub
#2163 (comment)
.

Member

brendo commented Aug 15, 2014

Epic! That level of insight would be incredible. Look forward to what you
discover! I wonder what real world impact this change would make!
On 15 Aug 2014 11:50, "Nicolas Brassard" notifications@github.com wrote:

I would also add that adding a hash index on that primary key would also
speed things up. I know a lot more on that topic on Oracle DB, but I think
the same can be acheived with mysql (
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html)

Normally indexes are B+-trees. Records are then stored on this randomly
but can be filtered (>, <, =) easiely since they are physically stored (on
disk) in order. Filtering now cost O(n log n) instead of O(n).

But hashes are a special case. If you never need to sort them and only
need the equality filter, you can now stored them usign their hash as the
location pointer on disk an acheive reads of O(1).

I will have to read more on mysql in order to find that but I think this
would bring faster reads.


Reply to this email directly or view it on GitHub
#2163 (comment)
.

@nitriques

This comment has been minimized.

Show comment
Hide comment
@nitriques

nitriques Aug 16, 2014

Member

Well after digging a little bit, seems like hash indexes are only supported in MEMORY/HEAP engine storage (see http://dev.mysql.com/doc/refman/5.0/en/create-index.html). But mysql does physically stores the data according to its primary key order in innoDB

From: http://dev.mysql.com/doc/refman/5.0/en/create-index.html

With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.

I did not found any information regarding MyISAM tho.

Member

nitriques commented Aug 16, 2014

Well after digging a little bit, seems like hash indexes are only supported in MEMORY/HEAP engine storage (see http://dev.mysql.com/doc/refman/5.0/en/create-index.html). But mysql does physically stores the data according to its primary key order in innoDB

From: http://dev.mysql.com/doc/refman/5.0/en/create-index.html

With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.

I did not found any information regarding MyISAM tho.

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 16, 2014

Member

Well if the cache is storing non permanent data, I wonder if we should
switch it to a MEMORY or better table type?
On 16 Aug 2014 10:50, "Nicolas Brassard" notifications@github.com wrote:

Well after digging a little bit, seems like hash indexes are only
supported in MEMORY/HEAP engine storage (see
http://dev.mysql.com/doc/refman/5.0/en/create-index.html). But mysql does
physically stores the data according to its primary key order in innoDB

From: http://dev.mysql.com/doc/refman/5.0/en/create-index.html

With the InnoDB storage engine, the table data is physically organized to
do ultra-fast lookups and sorts based on the primary key column or columns.

I did not found any information regarding MyISAM tho.


Reply to this email directly or view it on GitHub
#2163 (comment)
.

Member

brendo commented Aug 16, 2014

Well if the cache is storing non permanent data, I wonder if we should
switch it to a MEMORY or better table type?
On 16 Aug 2014 10:50, "Nicolas Brassard" notifications@github.com wrote:

Well after digging a little bit, seems like hash indexes are only
supported in MEMORY/HEAP engine storage (see
http://dev.mysql.com/doc/refman/5.0/en/create-index.html). But mysql does
physically stores the data according to its primary key order in innoDB

From: http://dev.mysql.com/doc/refman/5.0/en/create-index.html

With the InnoDB storage engine, the table data is physically organized to
do ultra-fast lookups and sorts based on the primary key column or columns.

I did not found any information regarding MyISAM tho.


Reply to this email directly or view it on GitHub
#2163 (comment)
.

@nitriques

This comment has been minimized.

Show comment
Hide comment
@nitriques

nitriques Aug 19, 2014

Member

Well if the cache is storing non permanent data, I wonder if we should
switch it to a MEMORY or better table type?

Cache can grow very large ... I wonder how many bytes it would take to stored my biggest cache table... I check on that.

But on the other hand, we would loose any information when rebooting the server... This may be considered both a feature and bug at the same time 😄

Member

nitriques commented Aug 19, 2014

Well if the cache is storing non permanent data, I wonder if we should
switch it to a MEMORY or better table type?

Cache can grow very large ... I wonder how many bytes it would take to stored my biggest cache table... I check on that.

But on the other hand, we would loose any information when rebooting the server... This may be considered both a feature and bug at the same time 😄

@nitriques

This comment has been minimized.

Show comment
Hide comment
@nitriques

nitriques Aug 19, 2014

Member

...and I am always mixed up on what part of the system uses what cache provider. Can't wait to look at what @creativedutchmen started...

Member

nitriques commented Aug 19, 2014

...and I am always mixed up on what part of the system uses what cache provider. Can't wait to look at what @creativedutchmen started...

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 19, 2014

Member

Its easy. Symphony has no core caching :)
On 19 Aug 2014 11:03, "Nicolas Brassard" notifications@github.com wrote:

...and I am always mixed up on what part of the system uses what cache
provider. Can't wait to look at what @creativedutchmen
https://github.com/creativedutchmen started...


Reply to this email directly or view it on GitHub
#2163 (comment)
.

Member

brendo commented Aug 19, 2014

Its easy. Symphony has no core caching :)
On 19 Aug 2014 11:03, "Nicolas Brassard" notifications@github.com wrote:

...and I am always mixed up on what part of the system uses what cache
provider. Can't wait to look at what @creativedutchmen
https://github.com/creativedutchmen started...


Reply to this email directly or view it on GitHub
#2163 (comment)
.

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Nov 30, 2014

Member

Fixed. A Unique key constraint has been added to the hash column. In 3.0 we can drop the id column if required.

Member

brendo commented Nov 30, 2014

Fixed. A Unique key constraint has been added to the hash column. In 3.0 we can drop the id column if required.

@brendo brendo closed this Nov 30, 2014

@brendo brendo self-assigned this Nov 30, 2014

@brendo brendo added this to the 2.6.0 milestone Nov 30, 2014

@nitriques

This comment has been minimized.

Show comment
Hide comment
@nitriques

nitriques Nov 30, 2014

Member

Thanks Brendan!

Member

nitriques commented Nov 30, 2014

Thanks Brendan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment