New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dtype, schema inference and implicit casting #1269

Closed
wants to merge 36 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@kszucs
Member

kszucs commented Dec 15, 2017

Relates to #1221

@kszucs kszucs changed the title from [WIP] Dtype, schema inference to [WIP] Dtype, schema inference and implicit casting Dec 15, 2017

@castable.register(Interval, Interval)
def can_cast_intervals(source, target):
return castable(source.value_type, target.value_type)

This comment has been minimized.

@cpcloud

cpcloud Dec 15, 2017

Member

Hm, this is tricky because there are a few cases where casting between intervals becomes ambiguous, like casting anything longer than a week (that isn't a week itself) to weeks, e.g., casting 1 month to weeks. Some months may have more weeks than others.

Casting seconds to months is also fraught with annoyances.

I think we will need to check the units to make sure these are valid casts.

@cpcloud cpcloud added the enhancement label Dec 18, 2017

@cpcloud cpcloud added this to the 0.13 milestone Dec 18, 2017

@cpcloud

This comment has been minimized.

Member

cpcloud commented Dec 22, 2017

@kszucs Looks like you need to rebase.

@kszucs kszucs force-pushed the kszucs:infer_schema branch from e7534c0 to ef2439e Dec 24, 2017

kszucs added some commits Dec 24, 2017

self._assert_can_compare()
return rules.shape_like_args(self.args, 'boolean')
if not rules.comparable(self.left, self.right):
raise TypeError('Arguments are not comparable')

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

Can you show self.left.type() and self.right.type() in the error message?

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Sure!

([pd.Timedelta('1 days'),
pd.Timedelta('-1 days 2 min 3us'),
pd.Timedelta('-2 days +23:57:59.999997')], dt.Interval('ns')),
# (pd.Categorical(['a', 'b', 'c', 'a']), 'category')

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

Let's mark this as pytest.mark.xfail with a specific exception.

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Actually It passes locally. How should I create a categorical column working in each build?

from ibis.config import options
@contextmanager
def ignoring(*exceptions):

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

This exists in the standard library as contextlib.suppress since Python 3.4.

How about something inibis/compat.py like:

try:
    from contextlib import suppress
except ImportError:
    import contextlib


    @contextlib.contextmanager
    def suppress(...):
        ...

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Great :) I wasn't aware.

('string', [], self.all_cols[:7] + self.all_cols[8:]),
('timestamp', [], self.all_cols[:-1]),
('decimal', [], self.all_cols[:4] + self.all_cols[7:])
('decimal', self.all_cols[:7], self.all_cols[7:])

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

We really should clean up this test, it's quite hard to understand what's being tested here. I'll make an issue.

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Probably we should port these to the new test suite

def issubtype(self, parent):
return issubtype(self, parent)
def castable(self, target):

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

Can you add a docstring here indicating that castable here means implicitly castable?

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Sure, I might be more verbose.

return True
@castable.register(UnsignedInteger, UnsignedInteger)

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

You can chain decorators here so you don't have to repeat the code for signed and unsigned integers, like this:

@castable.register(UnsignedInteger, UnsignedInteger)
@castable.register(SignedInteger, SignedInteger)

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

In fact, any of these implementations that take the same number of arguments and return True can all be chained into one function to avoid repetition.

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Cool!

return TypeParser(value).parse()
infer = Dispatcher('infer')

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

Let's add a docstring to this (using the doc= argument to Dispatcher)

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

To clarify: from reading the code it looks like dtype is for mapping types from other systems (e.g., pandas and numpy) into ibis's type system, and infer is for getting the ibis schema or ibis type of data values such as literals, lists and DataFrames. Is that correct?

Can you make sure to clarify the distinction between these two functions in their respective docstrings. Some examples in the docstrings will likely help here.

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Sure!

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Would You please convert this to an issue?

})
dtype = Dispatcher('dtype')

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

Docstring here

assert customers.schema() == expected
@pytest.mark.parametrize(('pandas_dtype', 'ibis_dtype'), [
(DatetimeTZDtype(tz='US/Eastern', unit='ns'), dt.Timestamp('US/Eastern')),
# (CategoricalDtype(['a', 'b']), dt.Category(2)) compat problem

This comment has been minimized.

@cpcloud

cpcloud Dec 26, 2017

Member

let's make this pytest.mark.xfail with a specific exception.

This comment has been minimized.

@kszucs

kszucs Dec 27, 2017

Member

Same thing with the categorical column, how can I create a categorical dtype?

kszucs added some commits Dec 27, 2017

@kszucs kszucs changed the title from [WIP] Dtype, schema inference and implicit casting to Dtype, schema inference and implicit casting Dec 28, 2017

kszucs added a commit to kszucs/ibis that referenced this pull request Dec 29, 2017

kszucs added a commit to kszucs/ibis that referenced this pull request Dec 29, 2017

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jan 8, 2018

LGTM, merging.

@cpcloud

cpcloud approved these changes Jan 8, 2018

@cpcloud cpcloud closed this in 71fc552 Jan 8, 2018

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jan 8, 2018

@kszucs Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment