Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper implementation of Value Types in Table #6073

Merged
merged 84 commits into from
Mar 31, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Mar 24, 2023

Pull Request Description

This is the first part of the #5158 umbrella task. It closes #5158, follow-up tasks are listed as a comment in the issue.

  • Updates all prototype methods dealing with Value_Type with a proper implementation.
  • Adds a more precise mapping from in-memory storage to Value_Type.
  • Adds a dialect-dependent mapping between SQL_Type and Value_Type.
  • Removes obsolete methods and constants on SQL_Type that were not portable.
  • Ensures that in the Database backend, operation results are computed based on what the Database is meaning to return (by asking the Database about expected types of each operation).
    • But also ensures that the result types are sane.
      • While SQLite does not officially support a BOOLEAN affinity, we add a set of type overrides to our operations to ensure that Boolean operations will return Boolean values and will not be changed to integers as SQLite would suggest.
      • Some methods in SQLite fallback to a NUMERIC affinity unnecessarily, so stuff like max(text, text) will keep the text type instead of falling back to numeric as SQLite would suggest.
  • Adds ability to use custom fetch / builder logic for various types, so that we can support vendor specific types (for example, Postgres dates).

Important Notes

  • There are some TODOs left in the code. I'm still aligning follow-up tasks - once done I will try to add references to relevant tasks in them.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
    • Check Table.info in GUI.
    • Check SQL visualization in GUI.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd self-assigned this Mar 24, 2023
@radeusgd radeusgd force-pushed the wip/radeusgd/value-types-1 branch 2 times, most recently from 43bf16e to e113c2c Compare March 28, 2023 08:12
@radeusgd radeusgd marked this pull request as ready for review March 28, 2023 08:15
@radeusgd radeusgd added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Mar 28, 2023
@radeusgd
Copy link
Member Author

radeusgd commented Mar 28, 2023

  1. Verified that the SQL visualization still works.
  2. Added to_js_object override to ensure that Value Type is visualized nicely in Table.info displayed in the IDE.

image

(Note how the color of the interpolated parameter in the SQL query visualization matches the color of its Enso type (Integer and Text) - no changes here, it's the thing we did almost 2 years ago, but I just like it a lot 😁)

## Creates a new lazy value.
new : Any -> Lazy
new ~lazy_computation =
builder _ = lazy_computation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is this dummy parameter necessary to avoid early evaluation of lazy_computation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I did not test it, but I'm pretty sure that fields are always stored 'eagerly' in an Atom - and the builder will be stored in an Atom a few lines later.

image

So to store a 'thunk' I need to wrap it in a lambda.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields will always be stored "eagerly" in the Atom until we implement

Would you benefit from having "lazy atom fields"? Ask and we'll be more than glad to implement it!

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frgaal upgrade is certainly OK.

project/FrgaalJavaCompiler.scala Show resolved Hide resolved
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer implementation of "lazy/cached atom fields" in the engine rather than adding yet another (to Ref type) low level support type into Standard.Base.

import project.Panic.Panic
import project.Runtime.Ref.Ref

## Holds a value that is computed on first access.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should be no Lazy type in the libraries, but the support shall come from the engine:

  • Provide metadata for widgets lazily #5085
  • we have a prototype that shows how it all can be done
  • the API will be simpler - just ~ in front of the atom field
  • the access to the field will be transparent - no difference between lazy/cached/eager field
  • the performance will be better if this is implemented in the engine
  • we'll use less memory, if this is done in the engine as we eliminate delegation once the field is materialized/cached

Can we get a request from libraries to implement "lazy atom fields"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds all good!

I implemented this in pure Enso because I could do it quickly (it took about 1-2hr) and didn't want to wait for it to be implemented, as I wanted to be able to rely on this code straight away.

It also felt like it is simpler to do that way and at least for now we don't need this to be super performant - it's not used on some very hot paths - so I didn't want to take engine's time for something not critical.

But if we can get it replaced with a more efficient engine solution - I will be really happy to have it replaced. The benefits sound cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'd like to proceed with this PR without waiting for this implementation. So I would just keep the Lazy type for now. But once it is implemented in the engine, I'm more than happy to have it replaced with the "native" solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


from project.Errors import Invalid_Value_Type

## Type to represent the different sizes of integer or float possible within a database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right this is the core part of the PR? At least the file name Table/**/Value_Type.enso matches the title "..Value Types in Table..".

Shouldn't there be an example of how the Value_Type is supposed to be used from an end user perspective?

  • I can envision some form of conversions from/to Enso types.
  • TruffleObject provides special support for converting of objects between systems/languages/types
  • Wrapping the values into Enso Value_Type is unlikely the most effective way
  • but I'd need to understand the use-case more...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are right about the file being important :)

As for examples, I guess more will be available once we implement functions like cast. For now you can see how it's used in Table_Spec or for example Postgres_Spec.

Currently we are not performing conversions to/from Enso types, at least not directly, the concepts are a bit parallel. We could do these, but there was no use-case yet.

As for TruffleObject and mappings - I'm not sure we need this either. Sounds interesting, maybe could be useful if we have a usecase.

Currently, the Value_Type is meant to provide metadata of what kind of values may be stored in a Table Column. It is tailored to work well with both in-memory and Database columns, so the type system is built around how SQL types tend to work.

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a fantastic job - I'm through 86 of the files, and will review the rest tomorrow.

new RetypeInfo(String.class, Constants.STRING),
// TODO [RW] I think BigDecimals should not be coerced to floats, we should add Decimal
// support to in-memory tables at some point
// new RetypeInfo(BigDecimal.class, StorageType.FLOAT_64),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree the todo - think the line should be enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, BigDecimal will be just stored as an object and the column will get a Mixed type. Won't that be better until we get proper support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a dedicated storage type for BigDecimal anyway. Nor for BigInteger which is more problematic since we have a real "bug" here because BigIntegers are natively supported by Enso (an Enso Integer can be a long or BigInteger.

I'm not sure the retyping will work correctly if I just enable it, it would require additional work. Let's keep it as-is for now. If we want support for BigDecimal in Table, let's add a ticket and do it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, this is what currently happens if I put a BigInteger into a Table - it's not bad, because since that type is unrecognized, it is just treated as a Mixed column:

image

Not ideal, because we cannot do arithmetic operations on it directly (since it is not of a numeric type). But at least there's no data corruption.

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - a great piece of work.
A few things to look over and think on but generally awesome.

@mergify mergify bot merged commit 6f86115 into develop Mar 31, 2023
@mergify mergify bot deleted the wip/radeusgd/value-types-1 branch March 31, 2023 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Value_Type to In-Memory Column
4 participants