Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the fmt_units() method #240

Merged
merged 57 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
364a112
Add the `generate_tokens_list()` util fn
rich-iannone Mar 9, 2024
b75982f
Define a class to store a single unit definition
rich-iannone Mar 9, 2024
feda9c0
Rename util function (prepend w/ `_`)
rich-iannone Mar 9, 2024
d3427c2
Import dataclass in order to define one
rich-iannone Mar 10, 2024
2fa1c3f
Define constructor as a data class
rich-iannone Mar 10, 2024
1566466
Add the UnitDefinitionList class
rich-iannone Mar 10, 2024
e436aad
Add util functions to transform (sub|super)scripts
rich-iannone Mar 10, 2024
cfbda80
Add the `_replace_units_symbol()` util fn
rich-iannone Mar 10, 2024
452aa42
Add `.to_html()` method
rich-iannone Mar 10, 2024
eb2aafb
Add the `_units_symbol_replacements()` util fn
rich-iannone Mar 10, 2024
7430812
Add the `fmt_units()` method
rich-iannone Mar 10, 2024
d5819b4
Add the `built` var in UnitDefinition
rich-iannone Mar 10, 2024
d59771a
Remove use of pd.na in `to_html()` method
rich-iannone Mar 10, 2024
286863b
Add docs for the `fmt_units()` method
rich-iannone Mar 10, 2024
1c3f90f
Simplify `_units_symbol_replacements()`
rich-iannone Mar 11, 2024
60d2540
Make correction to documentation
rich-iannone Mar 12, 2024
d15c584
Add example for the `fmt_units()` method
rich-iannone Mar 12, 2024
2d727f2
Make correction to example
rich-iannone Mar 12, 2024
14e2f8b
Update _utils_units_notation.py
rich-iannone Mar 12, 2024
0904ff8
Add tests for `fmt_units()`
rich-iannone Mar 13, 2024
1eed161
Merge branch 'main' into fmt-units
rich-iannone Mar 13, 2024
54bda69
Add several tests of util fns
rich-iannone Mar 13, 2024
02de345
Add more tests of units not'n util fns
rich-iannone Mar 13, 2024
b13e43e
Update example in `fmt_units()` docs
rich-iannone Mar 14, 2024
f47cc88
Add another example in the `fmt_units()` docs
rich-iannone Mar 14, 2024
c50896c
Merge branch 'main' into fmt-units
rich-iannone Mar 14, 2024
1efc9b5
Add fmt_units() to API docs via _quarto.yml
rich-iannone Mar 14, 2024
02fda7f
Move method definition to different location in file
rich-iannone Mar 14, 2024
823441b
Merge branch 'main' into fmt-units
rich-iannone Apr 22, 2024
a71a785
Use `is_na()` instead of `pd.isna()`
rich-iannone Apr 22, 2024
445e6dc
Merge branch 'main' into fmt-units
rich-iannone May 24, 2024
91b5050
Add `fmt_units` to import list
rich-iannone May 24, 2024
de0e5b4
Update type definition in `fmt_units()`
rich-iannone May 24, 2024
c338a64
Move `define_units()` into `helpers.py` file
rich-iannone May 24, 2024
b0f19de
Add `define_units()` to `_quarto.yml`
rich-iannone May 24, 2024
623b4a2
Modify import statement in `fmt_units()`
rich-iannone May 24, 2024
3d6b6eb
Add `define_units()` to init
rich-iannone May 24, 2024
92f35d3
Move tests of units notn to `test_helpers.py`
rich-iannone May 24, 2024
61c17d4
Add missing style rules to 'docs/styles.css'
rich-iannone May 24, 2024
e31beb9
Simplify the `assert_units_to_subscript()` test
rich-iannone May 24, 2024
db26289
Simplify the `assert_units_to_superscript()` test
rich-iannone May 24, 2024
5d0c453
Remove redundant tests from `test_fmt_units()`
rich-iannone May 24, 2024
0e98933
Simplify tests for util fns related to `fmt_units()`
rich-iannone May 24, 2024
6925f75
Improve `test_fmt_units()`
rich-iannone May 24, 2024
cac8262
Split and annotate tests in `test_fmt_units()`
rich-iannone May 24, 2024
e92d86f
Simplify tests in `assert_generate_tokens_list()`
rich-iannone May 24, 2024
ff272d6
Add explanatory text to `define_units()`
rich-iannone May 24, 2024
c32b226
Reorganize `test_fmt_units()`
rich-iannone May 24, 2024
e7c9022
Merge branch 'main' into fmt-units
rich-iannone May 30, 2024
f775dc5
Improve comments in `UnitDefinitionList` cls
rich-iannone May 30, 2024
4b83c48
Add the `from_token` class method
rich-iannone May 30, 2024
5cf6062
Refactor `to_html()` method; add missing line-height attrs
rich-iannone May 31, 2024
3d4a1f9
Update comments based on code review
rich-iannone Jun 3, 2024
ac0ba7f
refactor: wire up UnitDefinition.from_token, .to_html methods
machow Jun 3, 2024
30add11
Use improved definition of rules in example table
rich-iannone Jun 3, 2024
ab0b399
Ensure < and > inputs are escaped on HTML output
rich-iannone Jun 3, 2024
55d2fab
Add reference to the `define_units()` fn
rich-iannone Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ quartodoc:
- GT.fmt_time
- GT.fmt_datetime
- GT.fmt_markdown
- GT.fmt_units
- GT.fmt_image
- GT.fmt_nanoplot
- GT.fmt
Expand Down Expand Up @@ -158,6 +159,7 @@ quartodoc:
- html
- from_column
- system_fonts
- define_units
- nanoplot_options
- title: Table options
desc: >
Expand Down
8 changes: 8 additions & 0 deletions docs/styles.css
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ p,h1,h2,h3,#toc-title,#toc-function-reference,.nav-link,.table {
content: "()"
}

[id^=table-options] td a:after {
content: "()"
}

[id^=export] td a:after {
content: "()"
}

[id^=value-formatting] td a:after {
content: "()"
}
Expand Down
14 changes: 13 additions & 1 deletion great_tables/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,18 @@
from . import loc
from . import style
from ._styles import FromColumn as from_column
from ._helpers import letters, LETTERS, px, pct, md, html, random_id, system_fonts, nanoplot_options
from ._helpers import (
letters,
LETTERS,
px,
pct,
md,
html,
random_id,
system_fonts,
define_units,
nanoplot_options,
)


__all__ = (
Expand All @@ -25,6 +36,7 @@
"md",
"html",
"system_fonts",
"define_units",
"nanoplot_options",
"random_id",
"from_column",
Expand Down
163 changes: 160 additions & 3 deletions great_tables/_formats.py
Original file line number Diff line number Diff line change
Expand Up @@ -2114,6 +2114,163 @@
return fmt(self, fns=fmt_markdown_fn, columns=columns, rows=rows)


def fmt_units(
self: GTSelf,
columns: SelectExpr = None,
rows: int | list[int] | None = None,
pattern: str = "{x}",
) -> GTSelf:
"""
Format measurement units.

The `fmt_units()` method lets you better format measurement units in the table body. These must
conform to the **Great Tables** *units notation*; as an example of this, `"J Hz^-1 mol^-1"` can
be used to generate units for the *molar Planck constant*. The notation here provides several
conveniences for defining units, so as long as the values to be formatted conform to this
syntax, you'll obtain nicely-formatted inline units. Details pertaining to *units notation* can
be found in the section entitled *How to use units notation*.

Parameters
----------
columns
The columns to target. Can either be a single column name or a series of column names
provided in a list.
rows
In conjunction with `columns=`, we can specify which of their rows should undergo
formatting. The default is all rows, resulting in all rows in targeted columns being
formatted. Alternatively, we can supply a list of row indices.
pattern
A formatting pattern that allows for decoration of the formatted value. The formatted value
is represented by the `{x}` (which can be used multiple times, if needed) and all other
characters will be interpreted as string literals.

How to use units notation
-------------------------
The **Great Tables** units notation involves a shorthand of writing units that feels familiar
and is fine-tuned for the task at hand. Each unit is treated as a separate entity (parentheses
and other symbols included) and the addition of subscript text and exponents is flexible and
relatively easy to formulate. This is all best shown with examples:

- `"m/s"` and `"m / s"` both render as `"m/s"`
- `"m s^-1"` will appear with the `"-1"` exponent intact
- `"m /s"` gives the the same result, as `"/<unit>"` is equivalent to `"<unit>^-1"`
- `"E_h"` will render an `"E"` with the `"h"` subscript
- `"t_i^2.5"` provides a `t` with an `"i"` subscript and a `"2.5"` exponent
- `"m[_0^2]"` will use overstriking to set both scripts vertically
- `"g/L %C6H12O6%"` uses a chemical formula (enclosed in a pair of `"%"` characters) as a unit
partial, and the formula will render correctly with subscripted numbers
- Common units that are difficult to write using ASCII text may be implicitly converted to the
correct characters (e.g., the `"u"` in `"ug"`, `"um"`, `"uL"`, and `"umol"` will be converted to
the Greek *mu* symbol; `"degC"` and `"degF"` will render a degree sign before the temperature
unit)
- We can transform shorthand symbol/unit names enclosed in `":"` (e.g., `":angstrom:"`,
`":ohm:"`, etc.) into proper symbols
- Greek letters can added by enclosing the letter name in `":"`; you can use lowercase letters
(e.g., `":beta:"`, `":sigma:"`, etc.) and uppercase letters too (e.g., `":Alpha:"`, `":Zeta:"`,
etc.)
- The components of a unit (unit name, subscript, and exponent) can be fully or partially
italicized/emboldened by surrounding text with `"*"` or `"**"`

Returns
-------
GT
The GT object is returned. This is the same object that the method is called on so that we
can facilitate method chaining.

Examples
--------
Let's use the `illness` dataset and create a new table. The `units` column happens to contain
string values in *units notation* (e.g., `"x10^9 / L"`). Using the `fmt_units()` method here
will improve the formatting of those measurement units.

```{python}
from great_tables import GT, style, loc
from great_tables.data import illness

(
GT(illness, rowname_col="test")
.fmt_units(columns="units")
.fmt_number(columns=lambda x: x.startswith("day"), decimals=2, drop_trailing_zeros=True)
.tab_header(title="Laboratory Findings for the YF Patient")
.tab_spanner(label="Day", columns=lambda x: x.startswith("day"))
.tab_spanner(label="Normal Range", columns=lambda x: x.startswith("norm"))
.cols_label(
norm_l="Lower",
norm_u="Upper",
units="Units"
)
.opt_vertical_padding(scale=0.4)
.opt_align_table_header(align="left")
.tab_options(heading_padding="10px")
.tab_style(
locations=loc.body(columns="norm_l"),
style=style.borders(sides="left")
)
.opt_vertical_padding(scale=0.5)
)
```

The `constants` dataset contains values for hundreds of fundamental physical constants. We'll
take a subset of values that have some molar basis and generate a new display table from that.
Like the `illness` dataset, this one has a `units` column so, again, the `fmt_units()` method
will be used to format those units. Here, the preference for typesetting measurement units is to
have positive and negative exponents (e.g., not `"<unit_1> / <unit_2>"` but rather
`"<unit_1> <unit_2>^-1"`).

```{python}
from great_tables.data import constants
import polars as pl
import polars.selectors as cs

constants_mini = (
pl.from_pandas(constants)
.filter(pl.col("name").str.contains("molar")).sort("value")
.with_columns(
name=pl.col("name")
.str.to_titlecase()
.str.replace("Kpa", "kpa")
.str.replace("Of", "of")
)
)

(
GT(constants_mini)
.cols_hide(columns=["uncert", "sf_value", "sf_uncert"])
.fmt_units(columns="units")
.fmt_scientific(columns="value", decimals=3)
.tab_header(title="Physical Constants Having a Molar Basis")
.tab_options(column_labels_hidden=True)
)
```

See Also
--------
The [`define_units()`](`great_tables.define_units`) function can be used as a standalone utility
for working with units notation. It can parses strings in *units notation* and can emit
formatted units with its `.to_html()` method.
"""

def fmt_units_fn(
x: str,
pattern: str = pattern,
):
# If the `x` value is a missing value, then return the same value
if is_na(self._tbl_data, x):
return x

Check warning on line 2259 in great_tables/_formats.py

View check run for this annotation

Codecov / codecov/patch

great_tables/_formats.py#L2259

Added line #L2259 was not covered by tests

from great_tables._helpers import define_units

x_formatted = define_units(x).to_html()

# Use a supplied pattern specification to decorate the formatted value
if pattern != "{x}":
x_formatted = pattern.replace("{x}", x_formatted)

Check warning on line 2267 in great_tables/_formats.py

View check run for this annotation

Codecov / codecov/patch

great_tables/_formats.py#L2267

Added line #L2267 was not covered by tests

return x_formatted

return fmt(self, fns=fmt_units_fn, columns=columns, rows=rows)


def _value_to_decimal_notation(
value: int | float,
decimals: int = 2,
Expand Down Expand Up @@ -3276,12 +3433,12 @@

Examples
--------
Using a small portion of [`metro`] dataset, let's create a **gt** table. We will only include a
few columns and rows from that table. The `lines` column has comma-separated listings of numbers
Using a small portion of `metro` dataset, let's create a new table. We will only include a few
columns and rows from that table. The `lines` column has comma-separated listings of numbers
corresponding to lines served at each station. We have a directory of SVG graphics for all of
these lines in the package (the path for the image directory can be accessed via
`files("great_tables") / "data/metro_images"`, using the `importlib_resources` package). The
filenames roughly corresponds to the data in the `lines` column. The `fmt_image()` function can
filenames roughly corresponds to the data in the `lines` column. The `fmt_image()` method can
be used with these inputs since the `path=` and `file_pattern=` arguments allow us to compose
complete and valid file locations. What you get from this are sequences of images in the table
cells, taken from the referenced graphics files on disk.
Expand Down