Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: clean functions_eager.R #503

Merged
merged 5 commits into from
Nov 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## What's changed

- New methods `$write_json()` and `$write_ndjson()` for DataFrame (#502).
- Removed argument `name` in `pl$date_range()`, which was deprecated for a while
(#503).

# polars 0.10.1

Expand Down
116 changes: 43 additions & 73 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
@@ -1,30 +1,37 @@
#' @title Inner workings of the DataFrame-class
#'
#' @name DataFrame_class
#' @description The `DataFrame`-class is simply two environments of respectively
#' the public and private methods/function calls to the polars rust side. The instantiated
#' `DataFrame`-object is an `externalptr` to a lowlevel rust polars DataFrame object.
#' The pointer address is the only statefullness of the DataFrame object on the R side.
#' Any other state resides on the rust side. The S3 method `.DollarNames.DataFrame`
#' exposes all public `$foobar()`-methods which are callable onto the object. Most methods return
#' another `DataFrame`-class instance or similar which allows for method chaining.
#' This class system in lack of a better name could be called "environment classes"
#' and is the same class system extendr provides, except here there is
#' both a public and private set of methods. For implementation reasons, the private methods are
#' external and must be called from `.pr$DataFrame$methodname()`, also all private methods
#' must take any self as an argument, thus they are pure functions. Having the private methods
#' as pure functions solved/simplified self-referential complications.
#'
#' @details Check out the source code in R/dataframe_frame.R how public methods are derived from
#' private methods. Check out extendr-wrappers.R to see the extendr-auto-generated methods. These
#' are moved to .pr and converted into pure external functions in after-wrappers.R. In zzz.R (named
#' zzz to be last file sourced) the extendr-methods are removed and replaced by any function
#' prefixed `DataFrame_`.
#' @description
#' The `DataFrame`-class is simply two environments of respectively the public
#' and private methods/function calls to the polars Rust side. The instantiated
#' `DataFrame`-object is an `externalptr` to a low-level Rust polars DataFrame
#' object.
#'
#' The S3 method `.DollarNames.DataFrame` exposes all public `$foobar()`-methods
#' which are callable onto the object. Most methods return another `DataFrame`-
#' class instance or similar which allows for method chaining. This class system
#' could be called "environment classes" (in lack of a better name) and is the
#' same class system `extendr` provides, except here there are both a public and
#' private set of methods. For implementation reasons, the private methods are
#' external and must be called from `.pr$DataFrame$methodname()`. Also, all
#' private methods must take any `self` as an argument, thus they are pure
#' functions. Having the private methods as pure functions solved/simplified
#' self-referential complications.
#'
#' @details
#' Check out the source code in [R/dataframe_frame.R](https://github.com/pola-rs/r-polars/blob/main/R/dataframe__frame.R)
#' to see how public methods are derived from private methods. Check out
#' [extendr-wrappers.R](https://github.com/pola-rs/r-polars/blob/main/R/extendr-wrappers.R)
#' to see the `extendr`-auto-generated methods. These are moved to `.pr` and
#' converted into pure external functions in [after-wrappers.R](https://github.com/pola-rs/r-polars/blob/main/R/after-wrappers.R). In [zzz.R](https://github.com/pola-rs/r-polars/blob/main/R/zzz.R)
#' (named `zzz` to be last file sourced) the `extendr`-methods are removed and
#' replaced by any function prefixed `DataFrame_`.
#'
#' @keywords DataFrame
#' @return not applicable
#' @return Not applicable
#' @examples
#' # see all public exported method names (normally accessed via a class instance with $)
#' # see all public exported method names (normally accessed via a class
#' # instance with $)
#' ls(.pr$env$DataFrame)
#'
#' # see all private methods (not intended for regular use)
Expand All @@ -38,33 +45,35 @@
#' # use a public method/property
#' df$shape
#' df2 = df
#'
#' # use a private method, which has mutability
#' result = .pr$DataFrame$set_column_from_robj(df, 150:1, "some_ints")
#'
#' # column exists in both dataframes-objects now, as they are just pointers to the same object
#' # there are no public methods with mutability
#' # Column exists in both dataframes-objects now, as they are just pointers to
#' # the same object
#' # There are no public methods with mutability.
#' df$columns
#' df2$columns
#'
#' # set_column_from_robj-method is fallible and returned a result which could be ok or an err.
#' # set_column_from_robj-method is fallible and returned a result which could
#' # be "ok" or an error.
#' # No public method or function will ever return a result.
#' # The `result` is very close to the same as output from functions decorated with purrr::safely.
#' # To use results on R side, these must be unwrapped first such that
#' # potentially errors can be thrown. unwrap(result) is a way to
#' # bridge rust not throwing errors with R. Extendr default behavior is to use panic!(s) which
#' # would case some unneccesary confusing and some very verbose error messages on the inner
#' # workings of rust. unwrap(result) #in this case no error, just a NULL because this mutable
#' # The `result` is very close to the same as output from functions decorated
#' # with purrr::safely.
#' # To use results on the R side, these must be unwrapped first such that
#' # potentially errors can be thrown. `unwrap(result)` is a way to communicate
#' # errors happening on the Rust side to the R side. `Extendr` default behavior
#' # is to use `panic!`(s) which would cause some unnecessarily confusing and
#' # some very verbose error messages on the inner workings of rust.
#' # `unwrap(result)` in this case no error, just a NULL because this mutable
#' # method does not return any ok-value.
#'
#' # try unwrapping an error from polars due to unmatching column lengths
#' # Try unwrapping an error from polars due to unmatching column lengths
#' err_result = .pr$DataFrame$set_column_from_robj(df, 1:10000, "wrong_length")
#' tryCatch(unwrap(err_result, call = NULL), error = \(e) cat(as.character(e)))
DataFrame





#' @title auto complete $-access into a polars object
#' @description called by the interactive R session internally
#' @param x DataFrame
Expand Down Expand Up @@ -225,15 +234,9 @@ DataFrame_print = function() {
invisible(self)
}

## "Class methods"

# "properties"

## internal bookkeeping of methods which should behave as properties
DataFrame.property_setters = new.env(parent = emptyenv())



#' generic setter method
#' @noRd
#' @param self DataFrame
Expand Down Expand Up @@ -288,7 +291,6 @@ DataFrame.property_setters = new.env(parent = emptyenv())
pstop(err = paste("no setter method for", name))
}

# if(is.null(func)) pstop(err= paste("no setter method for",name)))
if (polars_optenv$strictly_immutable) self <- self$clone()
func = DataFrame.property_setters[[name]]
func(self, value)
Expand Down Expand Up @@ -491,38 +493,6 @@ DataFrame_schema = method_as_property(function() {
})



#
DataFrameCompareToOtherDF = function(self, other, op) {
stop("not done yet")
# """Compare a DataFrame with another DataFrame."""
if (!identical(self$columns, other$columns)) stop("DataFrame columns do not match")
if (!identical(self$shape, other$shape)) stop("DataFrame dimensions do not match")

suffix = "__POLARS_CMP_OTHER"
other_renamed = other$select(pl$all()$suffix(suffix))
# combined = concat([self, other_renamed], how="horizontal")

# if op == "eq":
# expr = [pli.col(n) == pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "neq":
# expr = [pli.col(n) != pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "gt":
# expr = [pli.col(n) > pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "lt":
# expr = [pli.col(n) < pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "gt_eq":
# expr = [pli.col(n) >= pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "lt_eq":
# expr = [pli.col(n) <= pli.col(f"{n}{suffix}") for n in self.columns]
# else:
# raise ValueError(f"got unexpected comparison operator: {op}")
#
# return combined.select(expr)
}



#' Convert an existing DataFrame to a LazyFrame
#' @name DataFrame_lazy
#' @description Start a new lazy query from a DataFrame.
Expand Down
112 changes: 52 additions & 60 deletions R/functions__eager.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,17 +60,12 @@ pl$concat = function(
how = c("vertical", "horizontal", "diagonal"),
parallel = TRUE,
to_supertypes = FALSE) {
# unpack arg list
l = unpack_list(..., skip_classes = "data.frame")

# nothing becomes NULL
if (length(l) == 0L) {
return(NULL)
}

## Check inputs
how_args = c("vertical", "horizontal", "diagonal") # , "vertical_relaxed", "diangonal_relaxed")

how_args = c("vertical", "horizontal", "diagonal")
how = match.arg(how[1L], how_args) |>
result() |>
unwrap("in pl$concat()")
Expand Down Expand Up @@ -149,76 +144,73 @@ pl$concat = function(
}


#' new date_range
#' New date range
#' @name pl_date_range
#' @param start POSIXt or Date preferably with time_zone or double or integer
#' @param end POSIXt or Date preferably with time_zone or double or integer. If end is and
#' interval are missing, then single datetime is constructed.
#' @param interval string pl_duration or R difftime. Can be missing if end is missing also.
#' @param eager bool, if FALSE (default) return `Expr` else evaluate `Expr` to `Series`
#' @param closed option one of 'both'(default), 'left', 'none' or 'right'
#' @param name name of series
#' @param time_unit option string ("ns" "us" "ms") duration of one int64 value on polars side
#' @param time_zone optional string describing a timezone.
#' @param explode if TRUE (default) all created ranges will be "unlisted" into on column, if FALSE
#' output will be a list of ranges.
#' @param end POSIXt or Date preferably with time_zone or double or integer. If
#' `end` and `interval` are missing, then a single datetime is constructed.
#' @param interval String, a Polars `duration` or R [difftime()]. Can be missing
#' if `end` is missing also.
#' @param eager If `FALSE` (default), return an `Expr`. Otherwise, returns a
#' `Series`.
#' @param closed One of `"both"` (default), `"left"`, `"none"` or `"right"`.
#' @param time_unit String (`"ns"`, `"us"`, `"ms"`) or integer.
#' @param time_zone String describing a timezone. If `NULL` (default), `"GMT` is
#' used.
#' @param explode If `TRUE` (default), all created ranges will be "unlisted"
#' into a column. Otherwise, output will be a list of ranges.
#'
#' @details
#' If param time_zone is not defined the Series will have no time zone.
#'
#' NOTICE: R POSIXt without defined timezones(tzone/tz), so called naive datetimes, are counter
#' intuitive in R. It is recommended to always set the timezone of start and end. If not output will
#' vary between local machine timezone, R and polars.
#' If param `time_zone` is not defined the Series will have no time zone.
#'
#' In R/r-polars it is perfectly fine to mix timezones of params time_zone, start and end.
#' Note that R POSIXt without defined timezones (tzone/tz), so-called naive
#' datetimes, are counter intuitive in R. It is recommended to always set the
#' timezone of start and end. If not output will vary between local machine
#' timezone, R and polars.
#'
#' In R/r-polars it is perfectly fine to mix timezones of params `time_zone`,
#' `start` and `end`.
#'
#' @return a datetime
#' @keywords functions ExprDT
#' @return A datetime
#'
#' @examples
#'
#' # All in GMT, straight forward, no mental confusion
#' s_gmt = pl$date_range(
#' as.POSIXct("2022-01-01", tz = "GMT"),
#' as.POSIXct("2022-01-02", tz = "GMT"),
#' interval = "6h", time_unit = "ms", time_zone = "GMT"
#' )
#' s_gmt
#' s_gmt$to_r() # printed same way in R and polars becuase tagged with a time_zone/tzone
#' s_gmt$to_r()
#'
#' # polars assumes any input in GMT if time_zone = NULL, set GMT on start end to see same print
#' # polars uses "GMT" if time_zone = NULL
#' s_null = pl$date_range(
#' as.POSIXct("2022-01-01", tz = "GMT"),
#' as.POSIXct("2022-01-02", tz = "GMT"),
#' interval = "6h", time_unit = "ms", time_zone = NULL
#' )
#' s_null$to_r() # back to R POSIXct. R prints non tzone tagged POSIXct in local timezone.
#'
#' # back to R POSIXct. R prints non tzone tagged POSIXct in local timezone
#' s_null$to_r()
#'
#' # use of ISOdate
#' t1 = ISOdate(2022, 1, 1, 0) # preset GMT
#' t2 = ISOdate(2022, 1, 2, 0) # preset GMT
#' pl$date_range(t1, t2, interval = "4h", time_unit = "ms", time_zone = "GMT")$to_r()
#'
pl$date_range = function(
start, # : date | datetime |# for lazy pli.Expr | str,
end, # : date | datetime | pli.Expr | str,
interval, # : str | timedelta,
eager = FALSE, # : Literal[True],
closed = "both", # : ClosedInterval = "both",
name = NULL, # : str | None = None,
start,
end,
interval,
eager = FALSE,
closed = "both",
time_unit = "us",
time_zone = NULL, # : str | None = None
time_zone = NULL,
explode = TRUE) {
if (missing(end)) {
end = start
interval = "1h"
}

if (!is.null(name)) warning("arg name is deprecated use $alias() instead")
name = name %||% ""

f_eager_eval = \(lit) {
if (isTRUE(eager)) {
result(lit$lit_to_s())
Expand Down Expand Up @@ -276,27 +268,28 @@ difftime_to_pl_duration = function(dft) {

#' Polars raw list
#' @description
#' create an "rpolars_raw_list", which is an R list where all elements must be an R raw or NULL.
#' Create an "rpolars_raw_list", which is an R list where all elements must be
#' an R raw or `NULL`.
#' @name pl_raw_list
#' @param ... elements
#' @param ... Elements
#' @details
#' In R raw can contain a binary sequence of bytes, and the length is the number of bytes.
#' In polars a Series of DataType [Binary][pl_dtypes] is more like a vector of vectors of bytes and missings
#' 'Nulls' is allowed, similar to R NAs in vectors.
#' In R, raw can contain a binary sequence of bytes, and the length is the number
#' of bytes. In polars a Series of DataType [Binary][pl_dtypes] is more like a
#' vector of vectors of bytes where missing values are allowed, similar to how
#' `NA`s can be present in vectors.
#'
#' To ensure correct round-trip conversion r-polars uses an R list where any elements must be
#' raw or NULL(encodes missing), and the S3 class is c("rpolars_raw_list","list").
#' To ensure correct round-trip conversion, r-polars uses an R list where any
#' elements must be raw or `NULL` (encoded as missing), and the S3 class is
#' `c("rpolars_raw_list","list")`.
#'
#' @return an R list where any elements must be raw, and the S3 class is
#' c("rpolars_raw_list","list").
#' @return An R list where any elements must be raw, and the S3 class is
#' `c("rpolars_raw_list","list")`.
#' @keywords functions
#'
#' @examples
#'
#' # craete a rpolars_raw_list
#' # create a rpolars_raw_list
#' raw_list = pl$raw_list(raw(1), raw(3), charToRaw("alice"), NULL)
#'
#'
#' # pass it to Series or lit
#' pl$Series(raw_list)
#' pl$lit(raw_list)
Expand All @@ -305,12 +298,12 @@ difftime_to_pl_duration = function(dft) {
#' pl$Series(raw_list)$to_r()
#'
#'
#' # NB a plain list of raws yield a polars Series of DateType [list[Binary]] which is not the same
#' # NB: a plain list of raws yield a polars Series of DateType [list[Binary]]
#' # which is not the same
#' pl$Series(list(raw(1), raw(2)))
#'
#' # to regular list, use as.list or unclass
#' as.list(raw_list)
#'
pl$raw_list = function(...) {
l = list2(...)
if (any(!sapply(l, is.raw) & !sapply(l, is.null))) {
Expand All @@ -322,11 +315,11 @@ pl$raw_list = function(...) {
}


#' subset polars raw list
#' Subset polars raw list
#' @rdname pl_raw_list
#' @param x A `rpolars_raw_list` object created with `pl$raw_list()`
#' @param index Elements to select
#' @export
#' @param x rpolars_raw_list list
#' @param index elements to get subset
#' @examples
#' # subsetting preserves class
#' pl$raw_list(NULL, raw(2), raw(3))[1:2]
Expand All @@ -336,11 +329,10 @@ pl$raw_list = function(...) {
x
}

#' coerce polars raw list to list
#' Coerce polars raw list to R list
#' @rdname pl_raw_list
#' @param x A `rpolars_raw_list` object created with `pl$raw_list()`
#' @export
#' @details the same as unclass(x)
#' @param x rpolars_raw_list list
#' @examples
#' # to regular list, use as.list or unclass
#' pl$raw_list(NULL, raw(2), raw(3)) |> as.list()
Expand Down
Loading
Loading