Skip to content

Commit

Permalink
chore: clean functions_eager.R (#503)
Browse files Browse the repository at this point in the history
  • Loading branch information
etiennebacher committed Nov 12, 2023
1 parent 67e6097 commit a44cd32
Show file tree
Hide file tree
Showing 8 changed files with 190 additions and 208 deletions.
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## What's changed

- New methods `$write_json()` and `$write_ndjson()` for DataFrame (#502).
- Removed argument `name` in `pl$date_range()`, which was deprecated for a while
(#503).

# polars 0.10.1

Expand Down
116 changes: 43 additions & 73 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
@@ -1,30 +1,37 @@
#' @title Inner workings of the DataFrame-class
#'
#' @name DataFrame_class
#' @description The `DataFrame`-class is simply two environments of respectively
#' the public and private methods/function calls to the polars rust side. The instantiated
#' `DataFrame`-object is an `externalptr` to a lowlevel rust polars DataFrame object.
#' The pointer address is the only statefullness of the DataFrame object on the R side.
#' Any other state resides on the rust side. The S3 method `.DollarNames.DataFrame`
#' exposes all public `$foobar()`-methods which are callable onto the object. Most methods return
#' another `DataFrame`-class instance or similar which allows for method chaining.
#' This class system in lack of a better name could be called "environment classes"
#' and is the same class system extendr provides, except here there is
#' both a public and private set of methods. For implementation reasons, the private methods are
#' external and must be called from `.pr$DataFrame$methodname()`, also all private methods
#' must take any self as an argument, thus they are pure functions. Having the private methods
#' as pure functions solved/simplified self-referential complications.
#'
#' @details Check out the source code in R/dataframe_frame.R how public methods are derived from
#' private methods. Check out extendr-wrappers.R to see the extendr-auto-generated methods. These
#' are moved to .pr and converted into pure external functions in after-wrappers.R. In zzz.R (named
#' zzz to be last file sourced) the extendr-methods are removed and replaced by any function
#' prefixed `DataFrame_`.
#' @description
#' The `DataFrame`-class is simply two environments of respectively the public
#' and private methods/function calls to the polars Rust side. The instantiated
#' `DataFrame`-object is an `externalptr` to a low-level Rust polars DataFrame
#' object.
#'
#' The S3 method `.DollarNames.DataFrame` exposes all public `$foobar()`-methods
#' which are callable onto the object. Most methods return another `DataFrame`-
#' class instance or similar which allows for method chaining. This class system
#' could be called "environment classes" (in lack of a better name) and is the
#' same class system `extendr` provides, except here there are both a public and
#' private set of methods. For implementation reasons, the private methods are
#' external and must be called from `.pr$DataFrame$methodname()`. Also, all
#' private methods must take any `self` as an argument, thus they are pure
#' functions. Having the private methods as pure functions solved/simplified
#' self-referential complications.
#'
#' @details
#' Check out the source code in [R/dataframe_frame.R](https://github.com/pola-rs/r-polars/blob/main/R/dataframe__frame.R)
#' to see how public methods are derived from private methods. Check out
#' [extendr-wrappers.R](https://github.com/pola-rs/r-polars/blob/main/R/extendr-wrappers.R)
#' to see the `extendr`-auto-generated methods. These are moved to `.pr` and
#' converted into pure external functions in [after-wrappers.R](https://github.com/pola-rs/r-polars/blob/main/R/after-wrappers.R). In [zzz.R](https://github.com/pola-rs/r-polars/blob/main/R/zzz.R)
#' (named `zzz` to be last file sourced) the `extendr`-methods are removed and
#' replaced by any function prefixed `DataFrame_`.
#'
#' @keywords DataFrame
#' @return not applicable
#' @return Not applicable
#' @examples
#' # see all public exported method names (normally accessed via a class instance with $)
#' # see all public exported method names (normally accessed via a class
#' # instance with $)
#' ls(.pr$env$DataFrame)
#'
#' # see all private methods (not intended for regular use)
Expand All @@ -38,33 +45,35 @@
#' # use a public method/property
#' df$shape
#' df2 = df
#'
#' # use a private method, which has mutability
#' result = .pr$DataFrame$set_column_from_robj(df, 150:1, "some_ints")
#'
#' # column exists in both dataframes-objects now, as they are just pointers to the same object
#' # there are no public methods with mutability
#' # Column exists in both dataframes-objects now, as they are just pointers to
#' # the same object
#' # There are no public methods with mutability.
#' df$columns
#' df2$columns
#'
#' # set_column_from_robj-method is fallible and returned a result which could be ok or an err.
#' # set_column_from_robj-method is fallible and returned a result which could
#' # be "ok" or an error.
#' # No public method or function will ever return a result.
#' # The `result` is very close to the same as output from functions decorated with purrr::safely.
#' # To use results on R side, these must be unwrapped first such that
#' # potentially errors can be thrown. unwrap(result) is a way to
#' # bridge rust not throwing errors with R. Extendr default behavior is to use panic!(s) which
#' # would case some unneccesary confusing and some very verbose error messages on the inner
#' # workings of rust. unwrap(result) #in this case no error, just a NULL because this mutable
#' # The `result` is very close to the same as output from functions decorated
#' # with purrr::safely.
#' # To use results on the R side, these must be unwrapped first such that
#' # potentially errors can be thrown. `unwrap(result)` is a way to communicate
#' # errors happening on the Rust side to the R side. `Extendr` default behavior
#' # is to use `panic!`(s) which would cause some unnecessarily confusing and
#' # some very verbose error messages on the inner workings of rust.
#' # `unwrap(result)` in this case no error, just a NULL because this mutable
#' # method does not return any ok-value.
#'
#' # try unwrapping an error from polars due to unmatching column lengths
#' # Try unwrapping an error from polars due to unmatching column lengths
#' err_result = .pr$DataFrame$set_column_from_robj(df, 1:10000, "wrong_length")
#' tryCatch(unwrap(err_result, call = NULL), error = \(e) cat(as.character(e)))
DataFrame





#' @title auto complete $-access into a polars object
#' @description called by the interactive R session internally
#' @param x DataFrame
Expand Down Expand Up @@ -225,15 +234,9 @@ DataFrame_print = function() {
invisible(self)
}

## "Class methods"

# "properties"

## internal bookkeeping of methods which should behave as properties
DataFrame.property_setters = new.env(parent = emptyenv())



#' generic setter method
#' @noRd
#' @param self DataFrame
Expand Down Expand Up @@ -288,7 +291,6 @@ DataFrame.property_setters = new.env(parent = emptyenv())
pstop(err = paste("no setter method for", name))
}

# if(is.null(func)) pstop(err= paste("no setter method for",name)))
if (polars_optenv$strictly_immutable) self <- self$clone()
func = DataFrame.property_setters[[name]]
func(self, value)
Expand Down Expand Up @@ -491,38 +493,6 @@ DataFrame_schema = method_as_property(function() {
})



#
DataFrameCompareToOtherDF = function(self, other, op) {
stop("not done yet")
# """Compare a DataFrame with another DataFrame."""
if (!identical(self$columns, other$columns)) stop("DataFrame columns do not match")
if (!identical(self$shape, other$shape)) stop("DataFrame dimensions do not match")

suffix = "__POLARS_CMP_OTHER"
other_renamed = other$select(pl$all()$suffix(suffix))
# combined = concat([self, other_renamed], how="horizontal")

# if op == "eq":
# expr = [pli.col(n) == pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "neq":
# expr = [pli.col(n) != pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "gt":
# expr = [pli.col(n) > pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "lt":
# expr = [pli.col(n) < pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "gt_eq":
# expr = [pli.col(n) >= pli.col(f"{n}{suffix}") for n in self.columns]
# elif op == "lt_eq":
# expr = [pli.col(n) <= pli.col(f"{n}{suffix}") for n in self.columns]
# else:
# raise ValueError(f"got unexpected comparison operator: {op}")
#
# return combined.select(expr)
}



#' Convert an existing DataFrame to a LazyFrame
#' @name DataFrame_lazy
#' @description Start a new lazy query from a DataFrame.
Expand Down
112 changes: 52 additions & 60 deletions R/functions__eager.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,17 +60,12 @@ pl$concat = function(
how = c("vertical", "horizontal", "diagonal"),
parallel = TRUE,
to_supertypes = FALSE) {
# unpack arg list
l = unpack_list(..., skip_classes = "data.frame")

# nothing becomes NULL
if (length(l) == 0L) {
return(NULL)
}

## Check inputs
how_args = c("vertical", "horizontal", "diagonal") # , "vertical_relaxed", "diangonal_relaxed")

how_args = c("vertical", "horizontal", "diagonal")
how = match.arg(how[1L], how_args) |>
result() |>
unwrap("in pl$concat()")
Expand Down Expand Up @@ -149,76 +144,73 @@ pl$concat = function(
}


#' new date_range
#' New date range
#' @name pl_date_range
#' @param start POSIXt or Date preferably with time_zone or double or integer
#' @param end POSIXt or Date preferably with time_zone or double or integer. If end is and
#' interval are missing, then single datetime is constructed.
#' @param interval string pl_duration or R difftime. Can be missing if end is missing also.
#' @param eager bool, if FALSE (default) return `Expr` else evaluate `Expr` to `Series`
#' @param closed option one of 'both'(default), 'left', 'none' or 'right'
#' @param name name of series
#' @param time_unit option string ("ns" "us" "ms") duration of one int64 value on polars side
#' @param time_zone optional string describing a timezone.
#' @param explode if TRUE (default) all created ranges will be "unlisted" into on column, if FALSE
#' output will be a list of ranges.
#' @param end POSIXt or Date preferably with time_zone or double or integer. If
#' `end` and `interval` are missing, then a single datetime is constructed.
#' @param interval String, a Polars `duration` or R [difftime()]. Can be missing
#' if `end` is missing also.
#' @param eager If `FALSE` (default), return an `Expr`. Otherwise, returns a
#' `Series`.
#' @param closed One of `"both"` (default), `"left"`, `"none"` or `"right"`.
#' @param time_unit String (`"ns"`, `"us"`, `"ms"`) or integer.
#' @param time_zone String describing a timezone. If `NULL` (default), `"GMT` is
#' used.
#' @param explode If `TRUE` (default), all created ranges will be "unlisted"
#' into a column. Otherwise, output will be a list of ranges.
#'
#' @details
#' If param time_zone is not defined the Series will have no time zone.
#'
#' NOTICE: R POSIXt without defined timezones(tzone/tz), so called naive datetimes, are counter
#' intuitive in R. It is recommended to always set the timezone of start and end. If not output will
#' vary between local machine timezone, R and polars.
#' If param `time_zone` is not defined the Series will have no time zone.
#'
#' In R/r-polars it is perfectly fine to mix timezones of params time_zone, start and end.
#' Note that R POSIXt without defined timezones (tzone/tz), so-called naive
#' datetimes, are counter intuitive in R. It is recommended to always set the
#' timezone of start and end. If not output will vary between local machine
#' timezone, R and polars.
#'
#' In R/r-polars it is perfectly fine to mix timezones of params `time_zone`,
#' `start` and `end`.
#'
#' @return a datetime
#' @keywords functions ExprDT
#' @return A datetime
#'
#' @examples
#'
#' # All in GMT, straight forward, no mental confusion
#' s_gmt = pl$date_range(
#' as.POSIXct("2022-01-01", tz = "GMT"),
#' as.POSIXct("2022-01-02", tz = "GMT"),
#' interval = "6h", time_unit = "ms", time_zone = "GMT"
#' )
#' s_gmt
#' s_gmt$to_r() # printed same way in R and polars becuase tagged with a time_zone/tzone
#' s_gmt$to_r()
#'
#' # polars assumes any input in GMT if time_zone = NULL, set GMT on start end to see same print
#' # polars uses "GMT" if time_zone = NULL
#' s_null = pl$date_range(
#' as.POSIXct("2022-01-01", tz = "GMT"),
#' as.POSIXct("2022-01-02", tz = "GMT"),
#' interval = "6h", time_unit = "ms", time_zone = NULL
#' )
#' s_null$to_r() # back to R POSIXct. R prints non tzone tagged POSIXct in local timezone.
#'
#' # back to R POSIXct. R prints non tzone tagged POSIXct in local timezone
#' s_null$to_r()
#'
#' # use of ISOdate
#' t1 = ISOdate(2022, 1, 1, 0) # preset GMT
#' t2 = ISOdate(2022, 1, 2, 0) # preset GMT
#' pl$date_range(t1, t2, interval = "4h", time_unit = "ms", time_zone = "GMT")$to_r()
#'
pl$date_range = function(
start, # : date | datetime |# for lazy pli.Expr | str,
end, # : date | datetime | pli.Expr | str,
interval, # : str | timedelta,
eager = FALSE, # : Literal[True],
closed = "both", # : ClosedInterval = "both",
name = NULL, # : str | None = None,
start,
end,
interval,
eager = FALSE,
closed = "both",
time_unit = "us",
time_zone = NULL, # : str | None = None
time_zone = NULL,
explode = TRUE) {
if (missing(end)) {
end = start
interval = "1h"
}

if (!is.null(name)) warning("arg name is deprecated use $alias() instead")
name = name %||% ""

f_eager_eval = \(lit) {
if (isTRUE(eager)) {
result(lit$lit_to_s())
Expand Down Expand Up @@ -276,27 +268,28 @@ difftime_to_pl_duration = function(dft) {

#' Polars raw list
#' @description
#' create an "rpolars_raw_list", which is an R list where all elements must be an R raw or NULL.
#' Create an "rpolars_raw_list", which is an R list where all elements must be
#' an R raw or `NULL`.
#' @name pl_raw_list
#' @param ... elements
#' @param ... Elements
#' @details
#' In R raw can contain a binary sequence of bytes, and the length is the number of bytes.
#' In polars a Series of DataType [Binary][pl_dtypes] is more like a vector of vectors of bytes and missings
#' 'Nulls' is allowed, similar to R NAs in vectors.
#' In R, raw can contain a binary sequence of bytes, and the length is the number
#' of bytes. In polars a Series of DataType [Binary][pl_dtypes] is more like a
#' vector of vectors of bytes where missing values are allowed, similar to how
#' `NA`s can be present in vectors.
#'
#' To ensure correct round-trip conversion r-polars uses an R list where any elements must be
#' raw or NULL(encodes missing), and the S3 class is c("rpolars_raw_list","list").
#' To ensure correct round-trip conversion, r-polars uses an R list where any
#' elements must be raw or `NULL` (encoded as missing), and the S3 class is
#' `c("rpolars_raw_list","list")`.
#'
#' @return an R list where any elements must be raw, and the S3 class is
#' c("rpolars_raw_list","list").
#' @return An R list where any elements must be raw, and the S3 class is
#' `c("rpolars_raw_list","list")`.
#' @keywords functions
#'
#' @examples
#'
#' # craete a rpolars_raw_list
#' # create a rpolars_raw_list
#' raw_list = pl$raw_list(raw(1), raw(3), charToRaw("alice"), NULL)
#'
#'
#' # pass it to Series or lit
#' pl$Series(raw_list)
#' pl$lit(raw_list)
Expand All @@ -305,12 +298,12 @@ difftime_to_pl_duration = function(dft) {
#' pl$Series(raw_list)$to_r()
#'
#'
#' # NB a plain list of raws yield a polars Series of DateType [list[Binary]] which is not the same
#' # NB: a plain list of raws yield a polars Series of DateType [list[Binary]]
#' # which is not the same
#' pl$Series(list(raw(1), raw(2)))
#'
#' # to regular list, use as.list or unclass
#' as.list(raw_list)
#'
pl$raw_list = function(...) {
l = list2(...)
if (any(!sapply(l, is.raw) & !sapply(l, is.null))) {
Expand All @@ -322,11 +315,11 @@ pl$raw_list = function(...) {
}


#' subset polars raw list
#' Subset polars raw list
#' @rdname pl_raw_list
#' @param x A `rpolars_raw_list` object created with `pl$raw_list()`
#' @param index Elements to select
#' @export
#' @param x rpolars_raw_list list
#' @param index elements to get subset
#' @examples
#' # subsetting preserves class
#' pl$raw_list(NULL, raw(2), raw(3))[1:2]
Expand All @@ -336,11 +329,10 @@ pl$raw_list = function(...) {
x
}

#' coerce polars raw list to list
#' Coerce polars raw list to R list
#' @rdname pl_raw_list
#' @param x A `rpolars_raw_list` object created with `pl$raw_list()`
#' @export
#' @details the same as unclass(x)
#' @param x rpolars_raw_list list
#' @examples
#' # to regular list, use as.list or unclass
#' pl$raw_list(NULL, raw(2), raw(3)) |> as.list()
Expand Down
Loading

0 comments on commit a44cd32

Please sign in to comment.