Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rewrite Data Types #14505

Merged
merged 159 commits into from Nov 6, 2022
Merged

feat: rewrite Data Types #14505

merged 159 commits into from Nov 6, 2022

Conversation

ephys
Copy link
Member

@ephys ephys commented May 13, 2022

PR Description

This PR migrates DataTypes to TypeScript.
It continues the work that was started in the previous DataType PRs by @allawesome497

Closes #14240 - @allawesome497's refactor
Closes #14239 - @allawesome497's refactor
Closes #14238 - @allawesome497's refactor
Closes #14220 - @allawesome497's refactor
Closes #14081 - @allawesome497's refactor
Closes #13981 - @allawesome497's refactor
Closes #13970 - @allawesome497's refactor
Closes #13799 - @allawesome497's refactor
Closes #12941 - initial refactor
Closes #14438
Closes #9786
Closes #8356
Closes #8963
Closes #13065
Closes #11828
Closes #11550
Closes #11266
Closes #11177
Closes #10946
Closes #10493
Closes #10383
Closes #8702
Closes #7930
Closes #10982

I'm doing the whole migration in one PR.

Notable Changes

  • There is a lot less special casing. A lot of logic has been moved to DataTypes, including ENUM name generation, validation, and escaping.
  • Model validation now runs .validate on all of its DataTypes instead of just a single hardcoded one for strings.
  • Dialect-specific DataType implementations are not exported anymore. Users cannot access DataTypes.postgres.JSON directly.
  • DataTypes are now fully responsible for properly escaping values that are injected in queries. This has been done because:
  • Typing has been simplified. DataTypes only need to specify their AcceptedType. This was done because the previous 3 types added a lot of complexity with little to no benefit over just using unknown. Especially because we can't guarantee what we will be receiving from connectors. It's better to use unknown and be defensive when treating the input.
  • Removed the escape key altogether. All DataTypes must properly escape their values themselves. No magic.
  • Removed JSONTYPE. It was just an alias for JSON, and should not be used.
  • Removed NUMERIC, it was an alias for DECIMAL
  • Removed DOUBLE PRECISION, it was an alias for DOUBLE
  • DataTypes can only be assigned to one and only one attribute. This was done because the DataType receives some extra context, such as on which attribute it is being used. By doing this, DataTypes.ENUM is able to generate its SQL type name.
  • DataType integration tests have been rewritten https://github.com/sequelize/sequelize/tree/ephys/ts-data-types/test/integration/data-types
  • Added support for TypedArrays in binary (blob, binary string) datatypes
  • Added support for specifying the precision in DataTypes.TIME
  • Fixed SQL injection vulnerabilities in Geometry/Geography datatypes in MySQL/MariaDB
  • Deprecated DataTypes.REAL. DataTypes.FLOAT is always single-precision floating point. DataTypes.DOUBLE is always double-precision floating point. FLOAT & REAL are redundant, but we did not normalize so users needed all 3:
    image
  • Added basic JSON support to all dialects. Basic means that the JSON value will be serialized/deserialized into a text column. No query support.
  • Added ability to specify precision of DataTypes.DATE in mssql & postgres

Merge Squash Commit Message Body (for breaking changes)

Don't forget the empty line at the start of the body


This is a major rewrite of DataTypes. See PR https://github.com/sequelize/sequelize/pull/14505 to learn more.
Added support for specifying a precision in `DataTypes.TIME`
BREAKING CHANGE: The DataTypes API has changed. Custom DataTypes will need to be rewritten.
BREAKING CHANGE: `DataTypes.ENUM`: Specifying the list of values through `{ type: DataTypes.ENUM, values: ['a', 'b'] }` is no longer supported. Use `{ type: DataTypes.ENUM(['a', 'b']) }` instead.
BREAKING CHANGE: Int DataTypes will reject number values if they are outside of the safe integer range. Use JS bigints or strings for big values.
BREAKING CHANGE: The `typeValidation` option is now `noTypeValidation` and is false by default, meaning type validation of values passed to `Model.create` and `Model.update` are now checked.
BREAKING CHANGE: `DataTypes.NUMERIC` has been removed, use `DataTypes.DECIMAL`.
BREAKING CHANGE: `DataTypes.NUMBER` has been removed (no alternative, internal).
BREAKING CHANGE: `DataTypes['DOUBLE PRECISION']` has been removed, use `DataTypes.DOUBLE`.
BREAKING CHANGE: `DataTypes.JSONTYPE` has been removed, use `DataTypes.JSON`.
BREAKING CHANGE: In raw queries that do not specify the JS DataType of a column, Dates are now returned as ISO 8601 strings instead of `Date` objects. This has been done in preparation for the arrival of Temporal.
BREAKING CHANGE: `DataTypes.BOOLEAN` only accepts JS booleans. Strings, numbers, etc… will no longer be cast to boolean.
BREAKING CHANGE: `DataTypes.STRING`, `DataTypes.CITEXT`, `DataTypes.TEXT`, and `DataTypes.CHAR` will only accept strings. Buffers & numbers will not be stringified anymore.
BREAKING CHANGE: `DataTypes.DECIMAL` with scale & precision left unspecified is now interpreted as an unconstrained decimal, and throws in dialects that do not support unconstrained decimals.
BREAKING CHANGE: BigInt attributes will be returned as strings instead of JavaScript numbers, because they cannot be safely represented using that type.
BREAKING CHANGE: `DataTypes.FLOAT(precision)` is no longer accepted. `FLOAT` is now always be single-precision floating point. `DOUBLE` is now always be double-precision floating point.
BREAKING CHANGE: `DataTypes.JSON` & `DataTypes.JSONB` throw in dialects that do not support JSON.
BREAKING CHANGE: `DataTypes.CHAR.BINARY` and `DataTypes.STRING.BINARY` now mean "chars with a binary collation" and throw in dialects that do not support collations.
BREAKING CHANGE: `DataTypes.FLOAT` now maps to `REAL` instead of `FLOAT` in all dialects except SQLite and MySQL
BREAKING CHANGE: **mssql** - `DataTypes.UUID` now maps to `UNIQUEIDENTIFIER` instead of `CHAR(36)`.

Some explanations for above breaking change

  • DataTypes have completely been rewritten. The goal is to have a stable API that is TypeScript friendly.
  • To reduce special-casing, and improve typing, I removed the ability to specify values outside of ENUM. i.e. the following is no longer possible:
    sequelize.define('User', {
      status: {
        type: DataTypes.ENUM,
        values: ['active', 'inactive'],
      }
    });
    instead, it must be written like this:
    sequelize.define('User', {
      status: { 
        type: DataTypes.ENUM(['active', 'inactive']) 
      },
    });
  • Dates in raw queries are returned as a string, not a Date object if no JS DataType is specified.
    • This was changed because more than one JS DataType could map to the same Database DataType. e.g. DataTypes.DATE, DataTypes.DATETIME.ZONED would both map to the Postgres DataType timestamptz. We do a minimal amount of parsing based on the DB type, and if a JS DataType is specified, do extra parsing (such as converting to a Date object for DataTypes.DATE, and a temporal object for DataTypes.DATETIME.ZONED)
  • DataTypes.DECIMAL has very different meaning from one dialect to the other. In MySQL, it's defaulted to DataTypes.DECIMAL(10, 0) (meaning a 10 digit number, with no decimal part). In postgres, it means "unconstrained". There are no way to have an unconstrained decimal any other way, so DataTypes.DECIMAL is now interpreted as "unconstrained", and dialects that do not support unconstrained decimals now throw when no parameter is specified. This will require updating https://sequelize.org/docs/v7/other-topics/other-data-types/#exact-decimal-numbers

TODO

  • Initial Dialect Migration
    • Abstract
    • db2
    • ibmi
    • mariadb
    • mssql
    • mysql
    • postgres
    • snowflake
    • sqlite
  • Unit Tests pass
    • Abstract
    • db2
    • ibmi
    • mariadb
    • mssql
    • mysql
    • postgres
    • snowflake
    • sqlite
  • Integration tests pass
    • Abstract
    • db2
    • ibmi impossible to know, no DB available
    • mariadb
    • mssql
    • mysql
    • postgres
    • snowflake impossible to know, no DB available
    • sqlite
  • Add test to ensure the ENUM breaking change is properly explained to the user
  • Test how each DB DataType is parsed by default when the JS DataType is unspecified (raw query)
  • Allow scientific notation on FLOAT, REAL, & DOUBLE + re-add validation test
  • Re-enable the data-type warn log test

Things to extract to other PRs

To reduce the weight of this one

Blocking PRs

Follow Up Tasks

@ephys ephys self-assigned this May 13, 2022
@ephys ephys added the type: typescript For issues and PRs. Things that involve typescript, such as typings and intellisense. label May 13, 2022
@WikiRik WikiRik mentioned this pull request Jun 20, 2022
6 tasks
WikiRik added a commit that referenced this pull request Oct 12, 2022
WikiRik added a commit that referenced this pull request Oct 12, 2022
@WikiRik WikiRik mentioned this pull request Oct 12, 2022
5 tasks
WikiRik added a commit that referenced this pull request Oct 12, 2022
WikiRik added a commit that referenced this pull request Oct 14, 2022
* meta: update unit tests from #14505

* meta: update typing tests

* meta: update integration tests from #14505

* meta: fix ibmi/db2 unit test

* meta: fix failing integration tests

* meta: undo integration/utils changes

* meta: undo index order

* meta: remove incorrect mention of bigint

* meta: fix index order test
Copy link
Member

@WikiRik WikiRik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've gone through all tests and most of the implementation as well and I think that once I resolve all my comments we should be good to merge. Like you mentioned before there are indeed still a lot of TODOs but let's not make this PR even larger.

Also; I hope that we won't have any PRs of this size again soon

package.json Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
test/integration/associations/has-one.test.js Outdated Show resolved Hide resolved
test/unit/sql/insert.test.js Show resolved Hide resolved
test/integration/sequelize.test.js Show resolved Hide resolved
test/integration/data-types/data-types.test.ts Outdated Show resolved Hide resolved
test/unit/data-types/string-types.test.ts Outdated Show resolved Hide resolved
test/unit/data-types/string-types.test.ts Outdated Show resolved Hide resolved
test/unit/data-types/string-types.test.ts Show resolved Hide resolved
@ephys
Copy link
Member Author

ephys commented Nov 6, 2022

Also; I hope that we won't have any PRs of this size again soon

I can't agree more. This was one hell of a PR. Thankfully even files like query-generator.js should be possible to migrate slowly by doing the same thing we're doing for sequelize.js and model.js

@WikiRik
Copy link
Member

WikiRik commented Nov 6, 2022

Only three conversations left and then we should be good to merge!
image

@ephys
Copy link
Member Author

ephys commented Nov 6, 2022

All 3 are done. All that remains is sorting out the failing integration tests (dialects trying to use the "timezone" option but it's unsupported)

Copy link
Member

@WikiRik WikiRik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests agree I think we should be good to go! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment