Proper implementation of Value Types in Table #6073

radeusgd · 2023-03-24T15:28:42Z

Pull Request Description

This is the first part of the #5158 umbrella task. It closes #5158, follow-up tasks are listed as a comment in the issue.

Updates all prototype methods dealing with Value_Type with a proper implementation.
Adds a more precise mapping from in-memory storage to Value_Type.
Adds a dialect-dependent mapping between SQL_Type and Value_Type.
Removes obsolete methods and constants on SQL_Type that were not portable.
Ensures that in the Database backend, operation results are computed based on what the Database is meaning to return (by asking the Database about expected types of each operation).
- But also ensures that the result types are sane.
  - While SQLite does not officially support a BOOLEAN affinity, we add a set of type overrides to our operations to ensure that Boolean operations will return Boolean values and will not be changed to integers as SQLite would suggest.
  - Some methods in SQLite fallback to a NUMERIC affinity unnecessarily, so stuff like max(text, text) will keep the text type instead of falling back to numeric as SQLite would suggest.
Adds ability to use custom fetch / builder logic for various types, so that we can support vendor specific types (for example, Postgres dates).

Important Notes

There are some TODOs left in the code. I'm still aligning follow-up tasks - once done I will try to add references to relevant tasks in them.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
- Check Table.info in GUI.
- Check SQL visualization in GUI.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

radeusgd · 2023-03-28T11:32:24Z

Verified that the SQL visualization still works.
Added to_js_object override to ensure that Value Type is visualized nicely in Table.info displayed in the IDE.

(Note how the color of the interpolated parameter in the SQL query visualization matches the color of its Enso type (Integer and Text) - no changes here, it's the thing we did almost 2 years ago, but I just like it a lot 😁)

GregoryTravis · 2023-03-28T14:25:05Z

distribution/lib/Standard/Base/0.0.0-dev/src/Runtime/Lazy.enso

+    ## Creates a new lazy value.
+    new : Any -> Lazy
+    new ~lazy_computation =
+        builder _ = lazy_computation


Just curious, is this dummy parameter necessary to avoid early evaluation of lazy_computation?

Tbh I did not test it, but I'm pretty sure that fields are always stored 'eagerly' in an Atom - and the builder will be stored in an Atom a few lines later.

So to store a 'thunk' I need to wrap it in a lambda.

Fields will always be stored "eagerly" in the Atom until we implement

Provide metadata for widgets lazily #5085

Would you benefit from having "lazy atom fields"? Ask and we'll be more than glad to implement it!

distribution/lib/Standard/Base/0.0.0-dev/src/Runtime/Lazy.enso

JaroslavTulach

Frgaal upgrade is certainly OK.

project/FrgaalJavaCompiler.scala

JaroslavTulach

I'd prefer implementation of "lazy/cached atom fields" in the engine rather than adding yet another (to Ref type) low level support type into Standard.Base.

JaroslavTulach · 2023-03-29T04:52:04Z

distribution/lib/Standard/Base/0.0.0-dev/src/Runtime/Lazy.enso

+import project.Panic.Panic
+import project.Runtime.Ref.Ref
+
+## Holds a value that is computed on first access.


I believe there should be no Lazy type in the libraries, but the support shall come from the engine:

Provide metadata for widgets lazily #5085

we have a prototype that shows how it all can be done

the API will be simpler - just ~ in front of the atom field

the access to the field will be transparent - no difference between lazy/cached/eager field

the performance will be better if this is implemented in the engine

we'll use less memory, if this is done in the engine as we eliminate delegation once the field is materialized/cached

Can we get a request from libraries to implement "lazy atom fields"?

Ok, sounds all good!

I implemented this in pure Enso because I could do it quickly (it took about 1-2hr) and didn't want to wait for it to be implemented, as I wanted to be able to rely on this code straight away.

It also felt like it is simpler to do that way and at least for now we don't need this to be super performant - it's not used on some very hot paths - so I didn't want to take engine's time for something not critical.

But if we can get it replaced with a more efficient engine solution - I will be really happy to have it replaced. The benefits sound cool.

However, I'd like to proceed with this PR without waiting for this implementation. So I would just keep the Lazy type for now. But once it is implemented in the engine, I'm more than happy to have it replaced with the "native" solution.

Reported as

Easy way to delay & cache computed atom fields #6134

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Dialect.enso

JaroslavTulach · 2023-03-29T05:56:13Z

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Value_Type.enso

+
+from project.Errors import Invalid_Value_Type
+
+## Type to represent the different sizes of integer or float possible within a database.


Am I right this is the core part of the PR? At least the file name Table/**/Value_Type.enso matches the title "..Value Types in Table..".

Shouldn't there be an example of how the Value_Type is supposed to be used from an end user perspective?

I can envision some form of conversions from/to Enso types.

TruffleObject provides special support for converting of objects between systems/languages/types

Wrapping the values into Enso Value_Type is unlikely the most effective way

but I'd need to understand the use-case more...

Yep, you are right about the file being important :)

As for examples, I guess more will be available once we implement functions like cast. For now you can see how it's used in Table_Spec or for example Postgres_Spec.

Currently we are not performing conversions to/from Enso types, at least not directly, the concepts are a bit parallel. We could do these, but there was no use-case yet.

As for TruffleObject and mappings - I'm not sure we need this either. Sounds interesting, maybe could be useful if we have a usecase.

Currently, the Value_Type is meant to provide metadata of what kind of values may be stored in a Table Column. It is tailored to work well with both in-memory and Database columns, so the type system is built around how SQL types tend to work.

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Value_Type.enso

jdunkerley

This looks like a fantastic job - I'm through 86 of the files, and will review the rest tomorrow.

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso

distribution/lib/Standard/Database/0.0.0-dev/src/Errors.enso

test/Table_Tests/src/Common_Table_Operations/Aggregate_Spec.enso

std-bits/table/src/main/java/org/enso/table/data/column/storage/type/Bits.java

std-bits/table/src/main/java/org/enso/table/data/column/storage/type/StorageType.java

std-bits/table/src/main/java/org/enso/table/data/column/storage/type/Constants.java

std-bits/table/src/main/java/org/enso/table/data/column/builder/object/NumericBuilder.java

std-bits/table/src/main/java/org/enso/table/data/column/builder/object/TypedBuilderImpl.java

jdunkerley · 2023-03-29T17:24:39Z

std-bits/table/src/main/java/org/enso/table/data/column/builder/object/InferredBuilder.java

+          new RetypeInfo(String.class, Constants.STRING),
+          // TODO [RW] I think BigDecimals should not be coerced to floats, we should add Decimal
+          // support to in-memory tables at some point
+          // new RetypeInfo(BigDecimal.class, StorageType.FLOAT_64),


While I agree the todo - think the line should be enabled?

Without it, BigDecimal will be just stored as an object and the column will get a Mixed type. Won't that be better until we get proper support?

We don't have a dedicated storage type for BigDecimal anyway. Nor for BigInteger which is more problematic since we have a real "bug" here because BigIntegers are natively supported by Enso (an Enso Integer can be a long or BigInteger.

I'm not sure the retyping will work correctly if I just enable it, it would require additional work. Let's keep it as-is for now. If we want support for BigDecimal in Table, let's add a ticket and do it properly.

For the record, this is what currently happens if I put a BigInteger into a Table - it's not bad, because since that type is unrecognized, it is just treated as a Mixed column:

Not ideal, because we cannot do arithmetic operations on it directly (since it is not of a numeric type). But at least there's no data corruption.

jdunkerley

LGTM - a great piece of work.
A few things to look over and think on but generally awesome.

distribution/lib/Standard/Test/0.0.0-dev/src/Extensions.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Value_Type.enso

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Statement_Setter.enso

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/SQLite/SQLite_Type_Mapping.enso

…ld later break our generated queries).

…to Value_Type

This reverts commit 2dbb33c.

radeusgd self-assigned this Mar 24, 2023

radeusgd requested a review from jdunkerley March 24, 2023 15:29

radeusgd force-pushed the wip/radeusgd/value-types-1 branch 2 times, most recently from 43bf16e to e113c2c Compare March 28, 2023 08:12

radeusgd marked this pull request as ready for review March 28, 2023 08:15

radeusgd requested review from 4e6, JaroslavTulach, hubertp, MichaelMauderer, wdanilo, farmaazon and kazcw as code owners March 28, 2023 08:15

radeusgd added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Mar 28, 2023

radeusgd force-pushed the wip/radeusgd/value-types-1 branch from 2685155 to c5286fa Compare March 28, 2023 11:31

radeusgd requested a review from GregoryTravis March 28, 2023 14:10

GregoryTravis approved these changes Mar 28, 2023

View reviewed changes

radeusgd force-pushed the wip/radeusgd/value-types-1 branch from c5286fa to 0fb07c6 Compare March 28, 2023 23:41

JaroslavTulach approved these changes Mar 29, 2023

View reviewed changes

project/FrgaalJavaCompiler.scala Show resolved Hide resolved

JaroslavTulach reviewed Mar 29, 2023

View reviewed changes

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Dialect.enso Show resolved Hide resolved

JaroslavTulach reviewed Mar 29, 2023

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Value_Type.enso Show resolved Hide resolved

JaroslavTulach reviewed Mar 29, 2023

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Type/Value_Type.enso Show resolved Hide resolved

radeusgd force-pushed the wip/radeusgd/value-types-1 branch from 0fb07c6 to ba9a238 Compare March 29, 2023 11:14

JaroslavTulach mentioned this pull request Mar 29, 2023

Easy way to delay & cache computed atom fields #6134

Closed

4 tasks

jdunkerley reviewed Mar 29, 2023

View reviewed changes

enso-bot bot mentioned this pull request Mar 30, 2023

Always print stacktrace to provide some guidance of what is going on #6128

Merged

2 tasks

jdunkerley approved these changes Mar 30, 2023

View reviewed changes

radeusgd mentioned this pull request Mar 30, 2023

Add Value_Type to In-Memory Column #5158

Closed

radeusgd added 26 commits March 31, 2023 16:13

checkpoint

2cb2452

fix imports issue

fd32207

fix the test suite fix as it was wrong the first time...

926ba5b

Ensure that the query does not contain any unexpected holes (this cou…

c7e5da7

…ld later break our generated queries).

typo

f19866b

Fix issue with parameter metadata on Postgres.

8565dbe

Fix stacktraces of tests failed outside of a group.

743bad0

CHANGELOG

998da4b

add some docs

f19d37b

move some methods to helpers, add to_display_text and to_json_object …

9c9ed81

…to Value_Type

Add tests for Value_Type serialization, fixes

490e9cb

fixes

b1bdae5

Add references to GH issues in TODOs.

c4b06f6

fix

958f7c8

fix should_be_a

b359927

prettier

bace6ad

Move to_js_object to visualizations

faf7715

update tests

ba19256

Revert "Move to_js_object to visualizations"

9143e62

This reverts commit 2dbb33c.

CR: 1

c1f69b4

Remove should_be_an in favour of should_be_a

5005fd2

Add missing paths to test suite to silently ignoring dataflow errors.

7dc4933

CR: 3

ca17057

CR: 4 - get rid of Constants file

0ca7f2c

CR: 4 - get rid of Constants file

733fefc

CR: 4 - fixes

b9d1b46

radeusgd force-pushed the wip/radeusgd/value-types-1 branch from c697831 to b9d1b46 Compare March 31, 2023 14:14

Merge branch 'develop' into wip/radeusgd/value-types-1

1d0d545

mergify bot merged commit 6f86115 into develop Mar 31, 2023

mergify bot deleted the wip/radeusgd/value-types-1 branch March 31, 2023 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper implementation of Value Types in Table #6073

Proper implementation of Value Types in Table #6073

radeusgd commented Mar 24, 2023 •

edited

Loading

radeusgd commented Mar 28, 2023 •

edited

Loading

GregoryTravis Mar 28, 2023

radeusgd Mar 28, 2023

JaroslavTulach Mar 29, 2023

JaroslavTulach left a comment

JaroslavTulach left a comment

JaroslavTulach Mar 29, 2023

radeusgd Mar 29, 2023

radeusgd Mar 29, 2023

JaroslavTulach Mar 29, 2023

JaroslavTulach Mar 29, 2023

radeusgd Mar 29, 2023

jdunkerley left a comment

jdunkerley Mar 29, 2023

radeusgd Mar 30, 2023

radeusgd Mar 31, 2023

radeusgd Mar 31, 2023

jdunkerley left a comment


		from project.Errors import Invalid_Value_Type

		## Type to represent the different sizes of integer or float possible within a database.

Proper implementation of Value Types in Table #6073

Proper implementation of Value Types in Table #6073

Conversation

radeusgd commented Mar 24, 2023 • edited Loading

Pull Request Description

Important Notes

Checklist

radeusgd commented Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaroslavTulach left a comment

Choose a reason for hiding this comment

JaroslavTulach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdunkerley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdunkerley left a comment

Choose a reason for hiding this comment

radeusgd commented Mar 24, 2023 •

edited

Loading

radeusgd commented Mar 28, 2023 •

edited

Loading