-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dtype, schema inference and implicit casting #1269
Conversation
ibis/expr/datatypes.py
Outdated
|
|
||
| @castable.register(Interval, Interval) | ||
| def can_cast_intervals(source, target): | ||
| return castable(source.value_type, target.value_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is tricky because there are a few cases where casting between intervals becomes ambiguous, like casting anything longer than a week (that isn't a week itself) to weeks, e.g., casting 1 month to weeks. Some months may have more weeks than others.
Casting seconds to months is also fraught with annoyances.
I think we will need to check the units to make sure these are valid casts.
|
@kszucs Looks like you need to rebase. |
ibis/expr/operations.py
Outdated
| self._assert_can_compare() | ||
| return rules.shape_like_args(self.args, 'boolean') | ||
| if not rules.comparable(self.left, self.right): | ||
| raise TypeError('Arguments are not comparable') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show self.left.type() and self.right.type() in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
ibis/expr/tests/test_schema.py
Outdated
| ([pd.Timedelta('1 days'), | ||
| pd.Timedelta('-1 days 2 min 3us'), | ||
| pd.Timedelta('-2 days +23:57:59.999997')], dt.Interval('ns')), | ||
| # (pd.Categorical(['a', 'b', 'c', 'a']), 'category') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mark this as pytest.mark.xfail with a specific exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually It passes locally. How should I create a categorical column working in each build?
ibis/util.py
Outdated
| from ibis.config import options | ||
|
|
||
|
|
||
| @contextmanager | ||
| def ignoring(*exceptions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exists in the standard library as contextlib.suppress since Python 3.4.
How about something inibis/compat.py like:
try:
from contextlib import suppress
except ImportError:
import contextlib
@contextlib.contextmanager
def suppress(...):
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great :) I wasn't aware.
| ('string', [], self.all_cols[:7] + self.all_cols[8:]), | ||
| ('timestamp', [], self.all_cols[:-1]), | ||
| ('decimal', [], self.all_cols[:4] + self.all_cols[7:]) | ||
| ('decimal', self.all_cols[:7], self.all_cols[7:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should clean up this test, it's quite hard to understand what's being tested here. I'll make an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should port these to the new test suite
| def issubtype(self, parent): | ||
| return issubtype(self, parent) | ||
|
|
||
| def castable(self, target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring here indicating that castable here means implicitly castable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I might be more verbose.
ibis/expr/datatypes.py
Outdated
| return True | ||
|
|
||
|
|
||
| @castable.register(UnsignedInteger, UnsignedInteger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can chain decorators here so you don't have to repeat the code for signed and unsigned integers, like this:
@castable.register(UnsignedInteger, UnsignedInteger)
@castable.register(SignedInteger, SignedInteger)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, any of these implementations that take the same number of arguments and return True can all be chained into one function to avoid repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
| return TypeParser(value).parse() | ||
|
|
||
|
|
||
| infer = Dispatcher('infer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a docstring to this (using the doc= argument to Dispatcher)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: from reading the code it looks like dtype is for mapping types from other systems (e.g., pandas and numpy) into ibis's type system, and infer is for getting the ibis schema or ibis type of data values such as literals, lists and DataFrames. Is that correct?
Can you make sure to clarify the distinction between these two functions in their respective docstrings. Some examples in the docstrings will likely help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would You please convert this to an issue?
| }) | ||
|
|
||
|
|
||
| dtype = Dispatcher('dtype') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring here
ibis/expr/tests/test_datatypes.py
Outdated
| assert customers.schema() == expected | ||
| @pytest.mark.parametrize(('pandas_dtype', 'ibis_dtype'), [ | ||
| (DatetimeTZDtype(tz='US/Eastern', unit='ns'), dt.Timestamp('US/Eastern')), | ||
| # (CategoricalDtype(['a', 'b']), dt.Category(2)) compat problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this pytest.mark.xfail with a specific exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing with the categorical column, how can I create a categorical dtype?
|
LGTM, merging. |
|
@kszucs Thank you! |
Relates to #1221