-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing operations on Mixed types #7368
Conversation
3e1779a
to
0850d2e
Compare
@@ -989,9 +1003,9 @@ type Column | |||
result = case self.value_type of | |||
Value_Type.Char _ _ -> self.is_empty | |||
Value_Type.Float _ -> | |||
if treat_nans_as_blank then self.is_nothing.iif True (self.internal_is_nan on_missing=False) else self.is_nothing | |||
if treat_nans_as_blank then self.is_nothing.iif True self.internal_is_nan else self.is_nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this computing self.is_nothing
twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation!! Yes, apparently it is.
Well, if it's a LongStorage
, then it is a O(1) operation, because it just re-uses the bitset used to mark missing values. But I guess better to cache it just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sorry, I'm reading the code again and realise it's not the case.
See how it's bracketed:
if treat_nans_as_blank
then self.is_nothing.iif True self.internal_is_nan
else self.is_nothing
So we can see that in each branch, is_nothing
is computed once.
If only we had multiline if to make stuff like this more readable...
Only thing we could optimize here is to compute self.is_nothing.iif True self.internal_is_nan
in a single pass, by creating a variant of internal_is_nan
that will return True
for null
values. But that seems out of scope of this PR.
On occassion of limitting when the After reminding myself of why it has been added in the first place, I conclude that it is needed and it will be needed whenever we are accepting date-time values, at least as long as we keep the architecture like the one we have currently.
The Graal polyglot values essentially can report
If we wanted to get rid of Alternatively, hypothetically, to make it 'faster' we could try getting rid of All-in-all I don't think we can avoid |
4f8c924
to
c556b0d
Compare
…t uniform inferred_precise_value_type)
…h map/zip to use Java-to-Enso callbacks which benchmarks have showed to be usually most efficient for these
…age instead of casting to Java Objects
c556b0d
to
930b3bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and great to have this more sorted.
One slight name sort out please.
set = Set.from_vector values | ||
set.contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be the implementation of contains
within Vector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one, but using set will be faster.
This way here we construct the Set (O(N)) operation once, and save that set in the closure of the generated function, so then we can use the quick (~O(1), hashing-based) set checks.
std-bits/table/src/main/java/org/enso/table/data/column/operation/map/BinaryMapOperation.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/column/operation/map/MapOpStorage.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/column/operation/map/UnaryMapOperation.java
Outdated
Show resolved
Hide resolved
std-bits/table/src/main/java/org/enso/table/data/column/storage/Storage.java
Outdated
Show resolved
Hide resolved
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False take_drop=True allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False supports_decimal_type=False supports_time_duration=False supports_nanoseconds_in_time=False | ||
- supports_mixed_columns: Specifies if the backend supports mixed-type | ||
columns. | ||
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False take_drop=True allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False supports_decimal_type=False supports_time_duration=False supports_nanoseconds_in_time=False supports_mixed_columns=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is getting very long...
I suggest we drop supports_
from names. Not necessarily in this PR!
Pull Request Description
Set
forFilter_Condition.Is_In
for better performance.MigratedColumn.map
andColumn.zip
to use the Java-to-Enso callbacks.Column.map
andColumn.zip
to Java-to-Enso callbacks #7371Trying to avoid conversions when calling Enso functions from Java.Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.