Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clickhouse: support group_concat operation #2839

Merged
merged 3 commits into from
Aug 4, 2021

Conversation

Lunaticus7
Copy link
Contributor

Added group_concat operation to Clickhouse.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lunaticus7 thanks can you add a release note

@jreback jreback requested a review from kszucs June 29, 2021 12:39
@jreback jreback added the clickhouse The ClickHouse backend label Jun 29, 2021
@jreback jreback added this to the Next release milestone Jun 29, 2021
@kszucs
Copy link
Member

kszucs commented Jun 29, 2021

LGTM, though the linter fails.

@jreback
Copy link
Contributor

jreback commented Jun 29, 2021

@Lunaticus7
Copy link
Contributor Author

Thanks for review, release note has been added.
@jreback @kszucs

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lunaticus7, nice work. I added couple of comments, if you don't mind having a look.

docs/source/release/index.rst Show resolved Hide resolved
arg, sep, where = expr.op().args
if where is not None:
arg = where.ifelse(arg, ibis.NA)
arg = translator.translate(arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? You're already applying translator.translate to arg in the return.

Do you mind adding another test that hits the code in the if please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right,it will cause a bug 😣

expr = t.string_col.group_concat(',')
expected = """arrayStringConcat(groupArray(`string_col`), ',')"""
assert translate(expr) == expected
assert len(con.execute(expr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific on what this needs to return? Also, I guess the len is unnecessary, if len(...) != 0 I guess this will evaluate to true without the len. But would be better to test the value, or whatever possible, more than just saying that this should return something not empty, if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add test for where param in group_concat .
emmmm,i'm new in ibis and haven't found the right way to load test data and run ci locally,so i have no ideal to specify the return value of the group_concat expr 😇.

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lunaticus7, looks good, just couple of suggestions, in case they make sense to you.

Comment on lines 544 to 545
expected = """arrayStringConcat(groupArray(\
CASE WHEN `bool_col` = 0 THEN `string_col` ELSE Null END), '-')"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal, but I think this approach is more readable (maybe just to me):

Suggested change
expected = """arrayStringConcat(groupArray(\
CASE WHEN `bool_col` = 0 THEN `string_col` ELSE Null END), '-')"""
expected = ("arrayStringConcat(groupArray("
"CASE WHEN `bool_col` = 0 THEN `string_col` ELSE Null END), '-')")

Also, it'd probably be nice to have this as a separate test, so when it fails, we know which of the two asserts is failing faster.

@datapythonista
Copy link
Contributor

@Lunaticus7 tests are failing, not sure if you've seen it.

@Lunaticus7
Copy link
Contributor Author

@Lunaticus7 tests are failing, not sure if you've seen it.

Sorry for the delay.

I'm fighting with running CI locally 🤣(#2795 (comment)) and a little busy recently, maybe i could done this in the next week.

@datapythonista
Copy link
Contributor

I personally don't ever run the whole CI locally. I have the environment set up (the conda env), and then run the tests for specific backends. For example:

PYTEST_BACKENDS="pandas sqlite" python -m pytest ibis/tests ibis/backends/tests

For clickhouse you can run the same, but you need to run the docker image, and call the scripts to download and load the test data to it.

@datapythonista
Copy link
Contributor

@Lunaticus7 do you have time to fix the conflicts and the broken tests? This shouldn't be far from ready.

@pep8speaks
Copy link

pep8speaks commented Jul 24, 2021

Hello @Lunaticus7! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-04 08:11:31 UTC

@Lunaticus7
Copy link
Contributor Author

@Lunaticus7 do you have time to fix the conflicts and the broken tests? This shouldn't be far from ready.

Thanks for your tips of test, it works 🎉!

About the broken test, i found all the column in testing data is non-nullable, so when set predicate as where param in groupArray, it will cause an error:

IbisTypeError: Cannot compute precedence for string[non-nullable] and null types

So i remove the test case about where temporary.
Maybe it's worthy to add some nullable data in testing data?

@datapythonista
Copy link
Contributor

There are conflicts with master, can you merge master into your branch and fix them please.

For the tests, if one can't pass right now, it's better to xfail it than to delete it. I'm not sure what's the best for the nulls. Can you check what other backends are doing, and try to do the same.

@Lunaticus7
Copy link
Contributor Author

@datapythonista
I have xfailed the where case and rebased to master.

And there are some details about the null error

# for clickhouse backends and ibis-testing-data

import ibis

connection = ibis.clickhouse.connect(
    host='127.0.0.1',
    database='ibis_testing',
    port=9000
)
table_alltypes = connection.table('functional_alltypes')

expr = table_alltypes.string_col.group_concat(',', table_alltypes.bool_col == 1)

expr.compile()
# will cause:
# > IbisTypeError: Cannot compute precedence for string[non-nullable] and null types

# the expected sql of `expr`
sql = """
SELECT arrayStringConcat(groupArray(CASE WHEN `bool_col` = 1 THEN `string_col` ELSE Null END), '-') AS `tmp`
FROM ibis_testing.`functional_alltypes`
"""

connection.raw_sql(sql)
# can get right result

I also tested the same case on mysql backend,and got the same result.

It seems like a bug of ibis on computing precedence of dtypes, when target column(e.g. string_col) is string[non-nullable].
Should i open an issue about it?

@datapythonista
Copy link
Contributor

Sorry for the delay @Lunaticus7. Since the bug is general and unrelated to this change, I think we can move forward here, and open an issue for the general error. Does it make sense?

@Lunaticus7
Copy link
Contributor Author

@datapythonista Issue of the null error has moved to #2891
I have rebased my branch onto the newes master, is there anything else i should do 🤓?

@Lunaticus7
Copy link
Contributor Author

Lint fixed, I forgot pyproject.toml when black locally 🤕.

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lunaticus7 just couple of small things, but look great.


@pytest.mark.parametrize(
# FIXME: The `where` param needs `Nullable` column
# but the all in testing data is not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the xfail? You should be able to add a message to the xfail itself, and if for whatever reason that's not an option, don't make it a FIXME, which in general is for tasks that need to be done before it's merged.

In the xfail message, besides saying why the test is failing, you can add the issue number you created for it.

Comment on lines 557 to 560
if where_case is None:
where = None
else:
where = alltypes.bool_col == where_case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if where_case is None:
where = None
else:
where = alltypes.bool_col == where_case
where = None if where_case is None else alltypes.bool_col == where_case

@Lunaticus7
Copy link
Contributor Author

Thanks @Lunaticus7 just couple of small things, but look great.

Thanks for your patience, it's all done 🥳.

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lunaticus7

@jreback jreback merged commit da24a97 into ibis-project:master Aug 4, 2021
@jreback
Copy link
Contributor

jreback commented Aug 4, 2021

thanks @Lunaticus7

@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clickhouse The ClickHouse backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants