You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a few adorn_* functions from R's janitor that are not yet ported over to pyjanitor. Janitor docs here.
I'm specifically looking at:
adorn_totals: adds a "total" column to either the rows, the columns, or both
adorn_percentages: converts the cell values into percentages, calculated along either axis or over the entire dataframe. In the R formulation, these are floats between 0 and 1, not the 0-100 percentages.
adorn_pct_formatting: formats the 0 to 1 values into the 0 to 100 percentage values, with rounding/formatting options
adorn_ns: adds the raw counts back into the cell values (meant to be run after adorn_percentages), so each cell has both percentage & count info, like "56 (24.3%)" for example.
I imagine these might be particularly useful for those doing data reporting.
These should go into the functions module.
Example API
In pyjanitor, I don't think having four separate functions work (how to enforce that adorn_ns comes after adorn_percentages? and where would we get the counts required for adorn_ns? etc.).
Perhaps we could just do a adorn_totals, and an adorn_percentages (which encapsulates the behaviour of adorn_pct_formatting and adorn_ns as well, controlled via function parameters).
adorn_totals
This function should mirror the R function almost 1-1.
>>>df=pd.DataFrame({"a": [6, np.nan, 2.5], "b": list("xyz")}); dfab06.0x1NaNy22.5z>>>df.adorn_totals(
... subset=None, # or list of index/col names; preferably can take in ranges like `slice("col_a","col_d")` also since `.loc` supports it
... axis="col", # index/0/row or column/1/col or both
... fill_value: str='-',
... name: str='Total',
... )
ab06.0x1NaNy22.5zTotal3.5-
A few points I disagree(?) with the R implementation:
I'm thinking that NaN values will be treated as 0 here by default, so totals won't be affected by presence of NaN -> sum(1, NaN, 2.5) = 3.5. The R janitor function has an na.rm parameter for this, but I somehow feel this isn't necessary.
The where parameter, as defined by the R implementation, is to dictate whether to add a Totals "row" or "col"; as opposed to doing the summation over "row"/"col". In the latter case, where="row" would add a new column containing the Totals across the rows (which to me is more natural). I'm calling this parameter axis here btw.
adorn_percentages
TBD. Let me have a little think about this over the weekend, I decided against my own implementation idea while writing out the example API.. ><
Original idea
>>>df=pd.DataFrame({"a": [6, np.nan, 2.5], "b": list("xyz")}); dfab06.0x1NaNy22.5z>>>df.adorn_percentages(
... subset=None, # similar to `adorn_totals`
... axis='col', # similar to `adorn_totals`
... adorn_count=True,
... count_position='front', # ignored if adorn_count=False
... count_format=0, # ignored if adorn_count=False
... percentage_format=2,
... )
ab06 (70.59%) x1nany23 (29.4%) z
Parameters:
count_position: whether to do front=="56 (23.4%)", back=="23.4% (56)"
count_format / percentage_format: if int, then represents the number of decimal places to round to. otherwise a string format specification like ':,.2f' or whatever.
I'm not that sold on this API yet. Doesn't look too clean / friendly to use. After all, it is an amalgamation of 3 different behaviours in 1 function 😅). Would be happy to hear comments / suggestions to improve, if any.
The text was updated successfully, but these errors were encountered:
@thatlittleboy your thoughts on encapsulation to enforce order sound like the right thing to do.
I'd admit I'm not so well-versed in the adorn_* family of functions in janitor, so I'll hold off on commenting on their specific behaviour. That said, I am in favour of adding in janitor functionality into pyjanitor, and I'm also in favour of your way of thinking about how to organize the functions in a sane fashion too. 😄
Brief Description
There are a few
adorn_*
functions from R's janitor that are not yet ported over to pyjanitor. Janitor docs here.I'm specifically looking at:
adorn_totals
: adds a "total" column to either the rows, the columns, or bothadorn_percentages
: converts the cell values into percentages, calculated along either axis or over the entire dataframe. In the R formulation, these are floats between 0 and 1, not the 0-100 percentages.adorn_pct_formatting
: formats the 0 to 1 values into the 0 to 100 percentage values, with rounding/formatting optionsadorn_ns
: adds the raw counts back into the cell values (meant to be run afteradorn_percentages
), so each cell has both percentage & count info, like "56 (24.3%)" for example.I imagine these might be particularly useful for those doing data reporting.
These should go into the
functions
module.Example API
In pyjanitor, I don't think having four separate functions work (how to enforce that
adorn_ns
comes afteradorn_percentages
? and where would we get the counts required foradorn_ns
? etc.).Perhaps we could just do a
adorn_totals
, and anadorn_percentages
(which encapsulates the behaviour ofadorn_pct_formatting
andadorn_ns
as well, controlled via function parameters).adorn_totals
This function should mirror the R function almost 1-1.
A few points I disagree(?) with the R implementation:
na.rm
parameter for this, but I somehow feel this isn't necessary.where
parameter, as defined by the R implementation, is to dictate whether to add a Totals "row" or "col"; as opposed to doing the summation over "row"/"col". In the latter case,where="row"
would add a new column containing the Totals across the rows (which to me is more natural). I'm calling this parameteraxis
here btw.adorn_percentages
TBD. Let me have a little think about this over the weekend, I decided against my own implementation idea while writing out the example API.. ><
Original idea
Parameters:
count_position
: whether to do front=="56 (23.4%)", back=="23.4% (56)"count_format
/percentage_format
: if int, then represents the number of decimal places to round to. otherwise a string format specification like ':,.2f' or whatever.I'm not that sold on this API yet. Doesn't look too clean / friendly to use. After all, it is an amalgamation of 3 different behaviours in 1 function 😅). Would be happy to hear comments / suggestions to improve, if any.
The text was updated successfully, but these errors were encountered: