New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use BASE58 for has_secure_token #20133

Open
kenn opened this Issue May 13, 2015 · 12 comments

Comments

Projects
None yet
9 participants
@kenn
Contributor

kenn commented May 13, 2015

With this commit 47316fe , SecureRandom.hex is replaced with SecureRandom.base58 but I'd point out problems with it.

  • By default, MySQL uses case-insensitive collation like utf8_general_ci. Which means, 4kUgL2pdQMSCQtjE and 4KUGL2PDQMSCQTJE are considered the same token even with a unique index.
    • Which not only means that added entropy by case-sensitivity is useless (it is now equivalent to base35), but that could potentially bring in an attack vector.
    • If we want to support any case-sensitive encodings (base58, base62, etc.), we need to specify t.column :token, 'VARCHAR(24) CHARACTER SET utf8 COLLATE utf8_bin' in migrations, which I don't think is realistic.
  • Base58 works best when it is a "public" address that can be read by human and orally spelled. I don't think secure_token is or should be. If the goal is to make the best use of string length, there are less controversial options like Base62 or Base64, but I don't think Base58 makes sense.

I would revert to SecureRandom.hex for compatibility with MySQL.

Let me know if I should create a PR for this.

@rafaelfranca

This comment has been minimized.

Show comment
Hide comment
@matthewd

This comment has been minimized.

Show comment
Hide comment
@matthewd

matthewd May 13, 2015

Member

there are less controversial options like Base62 or Base64, but I don't think Base58 makes sense

Base64 was undesirable because it contains URL characters, as I recall. #18347 (comment) was where we then switched from Base62 to Base58.

How is Base58 more controversial than Base64/Base62?


I'm also confused about how this creates an attack vector. As far as I can see, we end up with less density than we otherwise expect (though still far more than hexadecimal), but that's all: if you're storing it in a case insensitive data store, then you're effectively just using Base35.

However, if you see something I'm missing, please do report it: http://rubyonrails.org/security/

Member

matthewd commented May 13, 2015

there are less controversial options like Base62 or Base64, but I don't think Base58 makes sense

Base64 was undesirable because it contains URL characters, as I recall. #18347 (comment) was where we then switched from Base62 to Base58.

How is Base58 more controversial than Base64/Base62?


I'm also confused about how this creates an attack vector. As far as I can see, we end up with less density than we otherwise expect (though still far more than hexadecimal), but that's all: if you're storing it in a case insensitive data store, then you're effectively just using Base35.

However, if you see something I'm missing, please do report it: http://rubyonrails.org/security/

@kenn

This comment has been minimized.

Show comment
Hide comment
@kenn

kenn May 13, 2015

Contributor

Base35 gives you 123 bits of entropy instead of 140 bits of Base58 when stored in MySQL. For a feature like has_secure_token, bit strength should be a part of the specification, not something that "depends where you store it". Plus it is misleading as most MySQL users aren't even aware that they are using case-insensitive collation and they would be surprised when they learn 4kUgL2pdQMSCQtjE is the same thing as 4KUGL2PDQMSCQTJE.

Let me reiterate my point that case sensitive string key in general is problematic on MySQL unless you carefully configure otherwise, so Base58 / Base62 / Base64 are all in the same camp. I would recommend hex (96 bit), but if we really want to get the most out of it while maintaining case insensitivity, Base36 (124 bit) would be an option.

Contributor

kenn commented May 13, 2015

Base35 gives you 123 bits of entropy instead of 140 bits of Base58 when stored in MySQL. For a feature like has_secure_token, bit strength should be a part of the specification, not something that "depends where you store it". Plus it is misleading as most MySQL users aren't even aware that they are using case-insensitive collation and they would be surprised when they learn 4kUgL2pdQMSCQtjE is the same thing as 4KUGL2PDQMSCQTJE.

Let me reiterate my point that case sensitive string key in general is problematic on MySQL unless you carefully configure otherwise, so Base58 / Base62 / Base64 are all in the same camp. I would recommend hex (96 bit), but if we really want to get the most out of it while maintaining case insensitivity, Base36 (124 bit) would be an option.

@rails-bot

This comment has been minimized.

Show comment
Hide comment
@rails-bot

rails-bot Aug 25, 2015

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-2-stable, 4-1-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-2-stable, 4-1-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

@rails-bot rails-bot added the stale label Aug 25, 2015

@kuboon

This comment has been minimized.

Show comment
Hide comment
@kuboon

kuboon Nov 22, 2015

@kenn

4kUgL2pdQMSCQtjE and 4KUGL2PDQMSCQTJE are considered the same token

It's not true. even in "_ci" collation, where clause is always case sensitive.
"_ci" collation only affect to LIKE and GROUP BY keywords.

kuboon commented Nov 22, 2015

@kenn

4kUgL2pdQMSCQtjE and 4KUGL2PDQMSCQTJE are considered the same token

It's not true. even in "_ci" collation, where clause is always case sensitive.
"_ci" collation only affect to LIKE and GROUP BY keywords.

@kenn

This comment has been minimized.

Show comment
Hide comment
@kenn

kenn Nov 23, 2015

Contributor

@kuboon @chancancode have you guys actually tested it?

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_general_ci |
| collation_server     | utf8mb4_general_ci |
+----------------------+--------------------+
CREATE TABLE `collations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `token` varchar(128) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `index_collations_on_token` (`token`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO collations (token) VALUES ('someToken');

SELECT * FROM collations WHERE token = 'someToken'; # => found
SELECT * FROM collations WHERE token = 'someTOKEN'; # => found
SELECT * FROM collations WHERE BINARY token = 'someTOKEN'; # => no record

DROP TABLE `collations`;

Notice the use of BINARY to make it behave as case sensitive. It will change the behavior of the query optimizer - still use the index, but the performance will suffer as you'll have to scan a wide range of b-tree blocks (or the entire index, even!) instead of simply traversing the tree down to a leaf node.

mysql> explain SELECT * FROM collations WHERE token = 'someTOKEN';
+----+-------------+------------+-------+---------------------------+---------------------------+---------+-------+------+-------------+
| id | select_type | table      | type  | possible_keys             | key                       | key_len | ref   | rows | Extra       |
+----+-------------+------------+-------+---------------------------+---------------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | collations | const | index_collations_on_token | index_collations_on_token | 514     | const |    1 | Using index |
+----+-------------+------------+-------+---------------------------+---------------------------+---------+-------+------+-------------+

mysql> explain SELECT * FROM collations WHERE BINARY token = 'someTOKEN';
+----+-------------+------------+-------+---------------+---------------------------+---------+------+------+--------------------------+
| id | select_type | table      | type  | possible_keys | key                       | key_len | ref  | rows | Extra                    |
+----+-------------+------------+-------+---------------+---------------------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | collations | index | NULL          | index_collations_on_token | 514     | NULL |    1 | Using where; Using index |
+----+-------------+------------+-------+---------------+---------------------------+---------+------+------+--------------------------+

In any case, how you select a record is irrelevant with my argument - my point is that the value space is significantly smaller on MySQL.

In other words, the real problem is that you can't insert a new record with different cases as it's considered as collision, which kills the whole point of using case-sensitive encodings like BASE58.

INSERT INTO collations (token) VALUES ('someToken');
INSERT INTO collations (token) VALUES ('someTOKEN'); # => Duplicate error
Contributor

kenn commented Nov 23, 2015

@kuboon @chancancode have you guys actually tested it?

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_general_ci |
| collation_server     | utf8mb4_general_ci |
+----------------------+--------------------+
CREATE TABLE `collations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `token` varchar(128) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `index_collations_on_token` (`token`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO collations (token) VALUES ('someToken');

SELECT * FROM collations WHERE token = 'someToken'; # => found
SELECT * FROM collations WHERE token = 'someTOKEN'; # => found
SELECT * FROM collations WHERE BINARY token = 'someTOKEN'; # => no record

DROP TABLE `collations`;

Notice the use of BINARY to make it behave as case sensitive. It will change the behavior of the query optimizer - still use the index, but the performance will suffer as you'll have to scan a wide range of b-tree blocks (or the entire index, even!) instead of simply traversing the tree down to a leaf node.

mysql> explain SELECT * FROM collations WHERE token = 'someTOKEN';
+----+-------------+------------+-------+---------------------------+---------------------------+---------+-------+------+-------------+
| id | select_type | table      | type  | possible_keys             | key                       | key_len | ref   | rows | Extra       |
+----+-------------+------------+-------+---------------------------+---------------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | collations | const | index_collations_on_token | index_collations_on_token | 514     | const |    1 | Using index |
+----+-------------+------------+-------+---------------------------+---------------------------+---------+-------+------+-------------+

mysql> explain SELECT * FROM collations WHERE BINARY token = 'someTOKEN';
+----+-------------+------------+-------+---------------+---------------------------+---------+------+------+--------------------------+
| id | select_type | table      | type  | possible_keys | key                       | key_len | ref  | rows | Extra                    |
+----+-------------+------------+-------+---------------+---------------------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | collations | index | NULL          | index_collations_on_token | 514     | NULL |    1 | Using where; Using index |
+----+-------------+------------+-------+---------------+---------------------------+---------+------+------+--------------------------+

In any case, how you select a record is irrelevant with my argument - my point is that the value space is significantly smaller on MySQL.

In other words, the real problem is that you can't insert a new record with different cases as it's considered as collision, which kills the whole point of using case-sensitive encodings like BASE58.

INSERT INTO collations (token) VALUES ('someToken');
INSERT INTO collations (token) VALUES ('someTOKEN'); # => Duplicate error
@kuboon

This comment has been minimized.

Show comment
Hide comment
@kuboon

kuboon Nov 24, 2015

@chancancode sorry, my comment was wrong. I think this issue should be opened again.

kuboon commented Nov 24, 2015

@chancancode sorry, my comment was wrong. I think this issue should be opened again.

@chancancode chancancode reopened this Nov 24, 2015

@rafaelfranca rafaelfranca added pinned and removed stale labels Dec 23, 2015

@rafaelfranca

This comment has been minimized.

Show comment
Hide comment
Member

rafaelfranca commented Dec 23, 2015

@sblackstone

This comment has been minimized.

Show comment
Hide comment
@sblackstone

sblackstone Jan 5, 2016

Contributor

Perhaps another idea would be to add a new primitive column type for secure tokens, e.g.

create_table :users do
  t.secure_token :token
end

Then per database support this column type correctly and have AR be aware of the correct way to do lookups.

Contributor

sblackstone commented Jan 5, 2016

Perhaps another idea would be to add a new primitive column type for secure tokens, e.g.

create_table :users do
  t.secure_token :token
end

Then per database support this column type correctly and have AR be aware of the correct way to do lookups.

@kenn

This comment has been minimized.

Show comment
Hide comment
@kenn

kenn Jan 6, 2016

Contributor

@sblackstone I think it unnecessarily confuses people who don't use MySQL, as it's just a regular string column on other databases. IMO database-dependent settings should be an option on the column method, like :precision or :scale, like so:

create_table :users do
  t.string :token, case_sensitive: true
end

=> `token` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL

I think the feature is very useful for other use cases, too, but realistically for Rails 5, I would suggest revert to SecureRandom.hex for now. When we are ready, we can easily enhance this feature to support BASE58 when we have the case_sensitive option as it's upper compatible. (BASE58 ⊃ Hex)

Contributor

kenn commented Jan 6, 2016

@sblackstone I think it unnecessarily confuses people who don't use MySQL, as it's just a regular string column on other databases. IMO database-dependent settings should be an option on the column method, like :precision or :scale, like so:

create_table :users do
  t.string :token, case_sensitive: true
end

=> `token` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL

I think the feature is very useful for other use cases, too, but realistically for Rails 5, I would suggest revert to SecureRandom.hex for now. When we are ready, we can easily enhance this feature to support BASE58 when we have the case_sensitive option as it's upper compatible. (BASE58 ⊃ Hex)

@sudhirj

This comment has been minimized.

Show comment
Hide comment
@sudhirj

sudhirj May 8, 2016

I'd actually suggest turning on case sensitivity on all MySQL statements - this really is the default expected behaviour for everyone except those who are particularly tuned to and aware of MySQL problems. Having the columns case sensitive by default will probably avoid other gotchas as well.

sudhirj commented May 8, 2016

I'd actually suggest turning on case sensitivity on all MySQL statements - this really is the default expected behaviour for everyone except those who are particularly tuned to and aware of MySQL problems. Having the columns case sensitive by default will probably avoid other gotchas as well.

@kenn

This comment has been minimized.

Show comment
Hide comment
@kenn

kenn May 9, 2016

Contributor

@sudhirj I'm sentimentally 80% in agreement with you, but there's always a catch.

On PostgreSQL, you would do this to create a case-insensitive unique index on email:

CREATE UNIQUE INDEX index_users_on_lower_email ON users (LOWER(email))

But on MySQL, as its default is case-insensitive, t.index :email, unique: true has been working just fine. If we change the default behavior, there's going to be a backward problem - since MySQL doesn't have indexes on expressions, we suddenly need to change the collation on email column to a case-insensitive one.

Also, see #20000 as to why changing the default collation is a bad idea. Collation is much more than case sensitivity - CJKV languages would be screwed if it's done half-assed. I'd recommend to read Unicode Collation Algorithm while you're interested in this topic.

Contributor

kenn commented May 9, 2016

@sudhirj I'm sentimentally 80% in agreement with you, but there's always a catch.

On PostgreSQL, you would do this to create a case-insensitive unique index on email:

CREATE UNIQUE INDEX index_users_on_lower_email ON users (LOWER(email))

But on MySQL, as its default is case-insensitive, t.index :email, unique: true has been working just fine. If we change the default behavior, there's going to be a backward problem - since MySQL doesn't have indexes on expressions, we suddenly need to change the collation on email column to a case-insensitive one.

Also, see #20000 as to why changing the default collation is a bad idea. Collation is much more than case sensitivity - CJKV languages would be screwed if it's done half-assed. I'd recommend to read Unicode Collation Algorithm while you're interested in this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment