Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug causes warning in h2o.group_by #7155

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Closed

Bug causes warning in h2o.group_by #7155

exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Assignees

Comments

@exalate-issue-sync
Copy link

This issue is present in the documentation for h2o.group_by:
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/groupby.html

A warning pops up in the second to last example, but that is because the function is doing an if on what it expects is a single column/logical, but several can be passed in, so it only checks the first.

I would expect a small amount of logic cleanup could resolve this confusion.

{code:r}

library(data.table)
data.table 1.14.2 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
library(h2o)


Your next step is to start H2O:
> h2o.init()

For H2O package documentation, ask for help:
> ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit https://docs.h2o.ai


Attaching package: ‘h2o’

The following objects are masked from ‘package:data.table’:

hour, month, week, year

The following objects are masked from ‘package:stats’:

cor, sd, var

The following objects are masked from ‘package:base’:

%*%, %in%, &&, ||, apply, as.factor, as.numeric, colnames, colnames<-, ifelse, is.character, is.factor, is.numeric, log, log10, log1p, log2, round,
signif, trunc

devtools::session_info()

  • Session info ----------------------------------------------------------------------------------------------------------------------------------------------------------
    setting value
    version R version 4.1.2 (2021-11-01)
    os Windows 10 x64 (build 22000)
    system x86_64, mingw32
    ui RStudio
    language (EN)
    collate English_United States.1252
    ctype English_United States.1252
    tz America/Chicago
    date 2022-01-08
    rstudio 2021.09.0+351 Ghost Orchid (desktop)
    pandoc NA

  • Packages --------------------------------------------------------------------------------------------------------------------------------------------------------------
    package * version date (UTC) lib source
    bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.1)
    cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.2)
    callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.2)
    cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.2)
    crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.2)
    data.table * 1.14.2 2021-09-27 [1] CRAN (R 4.1.1)
    desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.2)
    devtools 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
    ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1)
    fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2)
    fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
    glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2)
    h2o * 3.36.0.1 2022-01-04 [1] CRAN (R 4.1.2)
    jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.1)
    lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
    magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1)
    memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.2)
    pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.2)
    pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.2)
    prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.2)
    processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.2)
    ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.2)
    purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1)
    R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
    RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.1.1)
    remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2)
    rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.2)
    rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.2)
    sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
    testthat 3.1.1 2021-12-03 [1] CRAN (R 4.1.2)
    usethis 2.1.5 2021-12-09 [1] CRAN (R 4.1.2)
    withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)

[1] C:/Users/donne/Documents/R/win-library/4.1
[2] C:/Program Files/R/R-4.1.2/library


h2o.init()

H2O is not running yet, starting it now...

Note: In case of errors look at the following log files:
C:\Users\donne\AppData\Local\Temp\Rtmp29jwbb\file1b206222ba/h2o_donne_started_from_r.out
C:\Users\donne\AppData\Local\Temp\Rtmp29jwbb\file1b2055776ea0/h2o_donne_started_from_r.err

java version "11.0.10" 2021-01-19 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.10+8-LTS-162)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.10+8-LTS-162, mixed mode)

Starting H2O JVM and connecting: Connection successful!

R is connected to the H2O cluster:
H2O cluster uptime: 3 seconds 84 milliseconds
H2O cluster timezone: America/Chicago
H2O data parsing timezone: UTC
H2O cluster version: 3.36.0.1
H2O cluster version age: 9 days
H2O cluster name: H2O_started_from_R_donne_yuv566
H2O cluster total nodes: 1
H2O cluster total memory: 1.97 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: Amazon S3, Algos, Infogram, AutoML, Core V3, TargetEncoder, Core V4
R Version: R version 4.1.2 (2021-11-01)

airlines_url <- "https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv"
airlines <- h2o.importFile(path = airlines_url)
|=================================================================================================================================================================| 100%

NOTE that I took out the additional column subsetting which I believe to be unnecessary from the example in the docs

the results are valid but see it provides a confusing warning due to a bug

cols <- c("Dest", "IsArrDelayed", "IsDepDelayed")
h2o.group_by(data = airlines,

  •          by = "Origin",
    
  •          sum(cols),
    
  •          gb.control = list(na.methods = "ignore", col.names = NULL))
    
    Origin sum_Dest sum_IsArrDelayed sum_IsDepDelayed
    1 ABE 5884 40 30
    2 ABQ 84505 545 370
    3 ACY 3131 9 7
    4 ALB 3646 49 50
    5 AMA 317 4 6
    6 ANC 100 0 1

[132 rows x 4 columns]
Warning message:
In if (is.na(col.idx)) stop("No column named ", col.name, " in ", :
the condition has length > 1 and only the first element will be used

add a column that doesn't exist to the vector

we get an error before that warning

cols <- c(cols, "fake")
h2o.group_by(data = airlines,

  •          by = "Origin",
    
  •          sum(cols),
    
  •          gb.control = list(na.methods = "ignore", col.names = NULL))
    

Error in if (x == 0) stop("Cannot select row or column 0") else if (x > :
missing value where TRUE/FALSE needed
In addition: Warning message:
In if (is.na(col.idx)) stop("No column named ", col.name, " in ", :
the condition has length > 1 and only the first element will be used

moving this fake column to the front and now we get a different error (and still get the warning)

cols <- c("fake", "Dest")
h2o.group_by(data = airlines,

  •          by = "Origin",
    
  •          sum(cols),
    
  •          gb.control = list(na.methods = "ignore", col.names = NULL))
    

Error in FUN(X[[i]], ...) : No column named fakeDest in data.
In addition: Warning message:
In if (is.na(col.idx)) stop("No column named ", col.name, " in ", :
the condition has length > 1 and only the first element will be used
{code}

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8507
Assignee: Tomas Fryda
Reporter: N/A
State: Resolved
Fix Version: 3.36.0.2
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#6004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants