Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add knit_print method for DataFrame (experimental) #125

Merged
merged 36 commits into from
Apr 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
838e839
feat: add knit_print method for DataFrame
eitsupi Apr 18, 2023
6842c79
test: add tests for knitr
eitsupi Apr 18, 2023
ef8413f
fix: support gfm output
eitsupi Apr 18, 2023
6e49072
docs: update readme
eitsupi Apr 18, 2023
c9b7752
format: `<-` -> `=`
eitsupi Apr 18, 2023
91a8b9a
format: `<-` -> `=`
eitsupi Apr 18, 2023
afd0550
format: `<-` -> `=`
eitsupi Apr 18, 2023
58ea15e
format: `<-` -> `=`
eitsupi Apr 18, 2023
5d71f65
add knit_print doc + update snapshot
sorhawell Apr 18, 2023
f3e5e6c
test: use a small data frame for snapshot tests
eitsupi Apr 18, 2023
8fcfcec
fix: remove print prefix
eitsupi Apr 19, 2023
7f8d4ce
feat: df_print for polars DataFrame
eitsupi Apr 19, 2023
675df16
Merge branch 'main' into knitr-print
eitsupi Apr 20, 2023
ae6d5e6
feat: add a new function to_html_table
eitsupi Apr 20, 2023
6fd0902
Merge branch 'main' into knitr-print
eitsupi Apr 21, 2023
49c0f62
Merge branch 'main' into knitr-print
eitsupi Apr 22, 2023
0d79e01
fix: support polars DataFrame |> to_html_table
eitsupi Apr 22, 2023
8709555
fix: update imports
eitsupi Apr 22, 2023
7109ec9
test: update snapshot
eitsupi Apr 22, 2023
578f1ee
fix: update docs about html format and fix detect html table format
eitsupi Apr 22, 2023
b69e035
docs: update readme
eitsupi Apr 22, 2023
c68caaa
docs: update news
eitsupi Apr 22, 2023
4921ee8
fix!: remove the format param from as.character.Series
eitsupi Apr 22, 2023
d2decc8
fix: to_html_table also works for POSIXlt class
eitsupi Apr 22, 2023
f93a881
docs: fix as.character document
eitsupi Apr 22, 2023
40719c9
docs: fix missing param
eitsupi Apr 22, 2023
1abc182
Merge branch 'main' into knitr-print
eitsupi Apr 24, 2023
da5d723
fix: should mark as raw html
eitsupi Apr 24, 2023
e8228ad
chore: don't export to_html_table
eitsupi Apr 26, 2023
8d8a82c
Merge branch 'main' into knitr-print
eitsupi Apr 26, 2023
6841bb6
chore: update Rd file
eitsupi Apr 26, 2023
4921884
Merge branch 'main' into knitr-print
eitsupi Apr 27, 2023
396ef66
fix: ensure TRUE or FALSE
eitsupi Apr 27, 2023
f3e67a9
fix: check knitr is installed
eitsupi Apr 27, 2023
811205e
fix: also check pillar and fix error message
eitsupi Apr 27, 2023
0361c29
format: replace `<-` -> `=`
eitsupi Apr 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Suggests:
bit64,
knitr,
tibble,
pillar,
rmarkdown,
withr
Config/Needs/website:
Expand Down Expand Up @@ -69,6 +70,7 @@ Collate:
'namespace.R'
'options.R'
'parquet.R'
'pkg-knitr.R'
'pkg-nanoarrow.R'
'rlang.R'
'rust_result.R'
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ export(GroupBy_agg)
export(GroupBy_to_data_frame)
export(LazyFrame_print)
export(csv_reader)
export(knit_print.DataFrame)
export(ncol.DataFrame)
export(nrow.DataFrame)
export(pl)
Expand All @@ -146,6 +147,7 @@ importFrom(stats,na.omit)
importFrom(utils,.DollarNames)
importFrom(utils,capture.output)
importFrom(utils,download.file)
importFrom(utils,getFromNamespace)
importFrom(utils,globalVariables)
importFrom(utils,head)
importFrom(utils,str)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## What's changed
- `DataFrame` objects can be subsetted using brackets like standard R data frames: `pl$DataFrame(mtcars)[2:4, c("mpg", "hp")]` (#140 @vincentarelbundock)
- An experimental `knit_print()` method has been added to DataFrame that outputs HTML tables
(similar to py-polars' HTML output) (#125 @eitsupi)
- `Series` gains new methods: `$mean`, `$median`, `$std`, `$var` (#170 @vincentarelbundock)

# polars v0.5.0
Expand Down
177 changes: 177 additions & 0 deletions R/pkg-knitr.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
#' knit print polars DataFrame
#'
#' Mimics python-polars' NotebookFormatter
#' for HTML outputs.
#'
#' Outputs HTML tables if the output format is HTML
#' and the document's `df_print` option is not `"default"` or `"tibble"`.
#'
#' Or, the output format can be enforced with R's `options` function as follows:
#'
#' - `options(polars.df_print = "default")` for the default print method.
#' - `options(polars.df_print = "html")` for the HTML table.
#' @name knit_print.DataFrame
#' @param x a polars DataFrame to knit_print
#' @param ... additional arguments, not used
#' @keywords DataFrame
#' @export
knit_print.DataFrame = function(x, ...) {
.print_opt = getOption("polars.df_print", "auto")
.rmd_df_print = knitr::opts_knit$get("rmarkdown.df_print")
if (isTRUE(.print_opt == "html") ||
(isTRUE(.print_opt != "default") &&
!isTRUE(.rmd_df_print %in% c("default", "tibble")) &&
knitr::is_html_output())) {
x |>
to_html_table() |>
knitr::raw_html() |>
knitr::asis_output()
} else {
print(x)
}
}

#' Generate HTML table from a DataFrame
#' @param x DataFrame
#' @param max_cols an integer of maximum number of columns to display
#' @param max_rows an integer of maximum number of rows to display
#' @return A string of HTML code
#' @examples
#' to_html_table(mtcars, 3, 3)
#' @noRd
#' @importFrom utils getFromNamespace
to_html_table = function(x, max_cols = 75, max_rows = 40) {
if (!requireNamespace("knitr", quietly = TRUE)) {
stop("Please install the `knitr` package to use `to_html_table`.")
}

escape_html = getFromNamespace("escape_html", "knitr")
omit_chr = "&hellip;"

.dim = dim(x)

df_height = .dim[1]
df_width = .dim[2]

row_idx = .idx(max_rows, df_height)
col_idx = .idx(max_cols, df_width)

cols = names(x)[col_idx]
col_names = cols |>
escape_html()
col_types = x |>
.get_dtype_strings() |>
escape_html()

if (max_cols <= df_width) {
col_names = .cut_off(col_names, max_cols, omit_chr)
col_types = .cut_off(col_types, max_cols, omit_chr)
}

.header_names = col_names |>
.tag("th") |>
.tag("tr")

.header_dtypes = col_types |>
.tag("td") |>
.tag("tr")

.header_all = .header_names |>
paste0(.header_dtypes) |>
.tag("thead")

.env_str_len = Sys.getenv("POLARS_FMT_STR_LEN")
.str_len = ifelse(.env_str_len == "", 15, as.integer(.env_str_len))

chr_mat = sapply(cols, \(col) as.character(x[row_idx, col, drop = TRUE], str_length = .str_len)) |>
escape_html() |>
matrix(nrow = length(row_idx))

if (max_cols <= df_width) {
.seq = seq_along(cols)
chr_mat = cbind(
chr_mat[, head(.seq, max_cols %/% 2)],
omit_chr,
chr_mat[, tail(.seq, max_cols %/% 2)]
)
}

if (max_rows <= df_height) {
.seq = seq_along(row_idx)
chr_mat = rbind(
chr_mat[head(.seq, max_rows %/% 2), ],
omit_chr,
chr_mat[tail(.seq, max_rows %/% 2), ]
)
}

.body = chr_mat |>
t() |>
as.data.frame() |>
sapply(\(x) .tag(x, "td")) |>
.tag("tr") |>
.tag("tbody")

.style = "<style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
text-align: right;
}
</style>
"

.shape = sprintf("<small>shape: (%s, %s)</small>", .dim[1], .dim[2])

paste0(.header_all, .body) |>
.tag("table", c(border = "1", class = "dataframe")) |>
(\(x) (paste0(.style, .shape, x)))() |>
.tag("div")
}

.idx = function(.max, .length) {
if (.max <= .length) {
out = c(seq_len(.max %/% 2), seq(.length - .max %/% 2 + 1L, .length))
} else {
out = seq_len(.length)
}
out
}

.cut_off = function(.vec, .max, omit_chr) {
c(head(.vec, .max %/% 2), omit_chr, tail(.vec, .max %/% 2))
}

#' @param .elements chr vector
#' @param .tag single chr
#' @param .attr named chr vector
#' @return single charactor
#' @examples
#' .tag(letters[1:2], "tr")
#' @noRd
.tag = function(.elements, .tag, .attr = NULL) {
if (!is.null(.attr)) {
.pre = paste0("<", .tag, " ", paste0(sprintf('%s="%s"', names(.attr), .attr), collapse = " "), ">")
} else {
.pre = c(paste0("<", .tag, ">"))
}

.post = paste0("</", .tag, ">")

paste0(.pre, .elements, .post, collapse = "")
}

#' Get types of each column
#' @param df DataFrame like object
#' @return string vector of column type names
#' @noRd
.get_dtype_strings = function(df) {
if (inherits(df, "DataFrame")) {
df$dtype_strings()
} else {
if (!requireNamespace("pillar", quietly = TRUE)) {
stop("Please install the `pillar` package to use `to_html_table` for non-polars objects.")
}
sapply(names(df), \(x) pillar::type_sum(df[, x, drop = TRUE])) |>
unname()
}
}
13 changes: 6 additions & 7 deletions R/s3_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -171,20 +171,19 @@ max.LazyFrame = function(x, ...) x$max()
#' @noRd
as.vector.Series = function(x, mode) x$to_vector()

#' as.character for Series
#' as.character for polars Series
#' @param x Series
#' @param format a logical. If `TRUE`, the Series will be formatted.
#' @param str_length an integer. If `format = TRUE`,
#' @param str_length an integer. If specified,
#' @param ... Additional arguments are ignored.
#' utf8 or categorical type Series will be formatted to a string of this length.
#' @param ... Additional characters are ignored.
#' @examples
#' s = pl$Series(c("foo", "barbaz"))
#' as.character(s)
#' as.character(s, format = TRUE)
#' as.character(s, format = TRUE, str_length = 3)
#' as.character(s, str_length = 3)
#' @export
as.character.Series = function(x, ..., format = FALSE, str_length = 15) {
if (isTRUE(format)) {
as.character.Series = function(x, ..., str_length = NULL) {
if (is.numeric(str_length) && str_length > 0) {
.pr$Series$to_fmt_char(x, str_length = str_length)
} else {
x$to_vector() |>
Expand Down
1 change: 1 addition & 0 deletions R/zzz.R
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ pl$mem_address = mem_address
s3_register("nanoarrow::infer_nanoarrow_schema", "DataFrame")
s3_register("arrow::as_record_batch_reader", "DataFrame")
s3_register("arrow::as_arrow_table", "DataFrame")
s3_register("knitr::knit_print", "DataFrame")

pl$numeric_dtypes = pl$dtypes[substr(names(pl$dtypes),1,3) %in% c("Int","Flo")]

Expand Down
14 changes: 5 additions & 9 deletions man/as.character.Series.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 28 additions & 0 deletions man/knit_print.DataFrame.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading