# How to style Buckaroo tables
Buckaroo had a major refactoring of the styling system and callbacks with the 0.6 release.

This notebook walks through
#. Styling columns via the `displayer`
#. How to override columns
#. The variety of different column `displayer`s available
#. How to add automatic styling methods that are available via the UI to cycle through

In [None]:
import pandas as pd
import numpy as np
from buckaroo.dataflow_traditional import SimpleStylingAnalysis
from buckaroo.pluggable_analysis_framework.pluggable_analysis_framework import ColAnalysis
import polars as pl
from buckaroo.polars_buckaroo import PolarsBuckarooWidget

In [None]:
ROWS = 200
typed_df = pl.DataFrame({'int_col':np.random.randint(1,50, ROWS), 'float_col': np.random.randint(1,30, ROWS)/.7,
                         "str_col": ["foobar"]* ROWS
                        })
#typed_df = pl.from_pandas(typed_df)

In [None]:
PolarsBuckarooWidget(typed_df)

## Displayer
Changing the `displayer` is the most common way to customize the styling of a column, in the next example, we override the column_config for `float_col`


In [None]:
bw2 = PolarsBuckarooWidget(
    typed_df, 
    debug=False,
    column_config_overrides={
        'float_col':
            {'displayer_args': { 'displayer': 'float', 'minimumFractionDigits':0, 'maximumFractionDigits':3}}})
bw2

Now we are going to force `float_col` to be displayed with a 'float' displayer
notice how the decimal point aligns as opposed to above where 10 is floored without a decimal portion

Currently the types are best viewed in their typescript definition [DFWhole.ts](https://github.com/paddymul/buckaroo/blob/feat/dfviewer-config/js/components/DFViewerParts/DFWhole.ts)

There are Displayers of

`ObjDisplayer`, `BooleanDisplayer`, `StringDisplayer`, `FloatDisplayer`, 
`DatetimeDefaultDisplayer`, `DatetimeLocaleDisplayer`, `IntegerDisplayer`,

`HistogramDisplayer`, and `LinkifyDisplayer`,

There are planned displayers of [HumanAbbreviationDisplayer](https://github.com/paddymul/buckaroo/issues/83), [LineChartDisplayer](https://github.com/paddymul/buckaroo/issues/210), [GoogleMapsLinkDisplayer](https://github.com/paddymul/buckaroo/issues/211) , [InlineMapDisplayer](https://github.com/paddymul/buckaroo/issues/212)


There is experimental work building mirrored PyDantic types, and work to integrate this typechecking into Buckaroo.  There are also plans for a gallery of examples of the different options.

# Tooltip_config

There are tooltip_configs of simple summary_series available

Tooltips are helpful for adding extra context to cells.  Particularly for noting errors or values changed via auto-cleaning

Notice that `column_config_overrides` is merged with the existing column config from Buckaroo, every column still has a displayer

In [None]:
bw3 = PolarsBuckarooWidget(
    typed_df, 
    column_config_overrides={
        'str_col':
            {'tooltip_config': { 'tooltip_type':'simple', 'val_column': 'int_col'}}})
bw3

# color_map_config

Color_map_config controls coloring of columns.  
* `color_map` uses the bins from histogram to show a values place in the distribution.  wit the `val_column` parameter, you can color one column based on another.
* `color_when_not_null` hilights a cell when another row is not null.  This is meant for error highlighting,  the other column can be hidden
* `color_from_column` bases the color of a cell based on the RGB value written to another column.  It is the most generic coloring option

In [None]:
bw3 = PolarsBuckarooWidget(
    typed_df, 
    column_config_overrides={
        'int_col': {'color_map_config': {
            'color_rule': 'color_map',
            'map_name': 'DIVERGING_RED_WHITE_BLUE',
            'val_column': 'float_col'
        }}})
bw3

# Hiding a column

You can hide a column with `merge_rule:'hidden'`.  This removes that column from the column_config array.

Column hiding can be used to keep data in a dataframe (sent to the table widget) for use as a tooltip, or color, but preventing display which would distract the user


In [None]:
bw_ = PolarsBuckarooWidget(
    typed_df, 
    column_config_overrides={
        'int_col': {'merge_rule': 'hidden'}})
bw_

In [None]:
class SimpleStylingAnalysis(ColAnalysis):
    pinned_rows = [
        # {'primary_key_val': 'dtype', 'displayer_args': { 'displayer': 'obj' } },
        # {'primary_key_val': 'histogram', 'displayer_args': { 'displayer': 'histogram' }, }
    ]

    @staticmethod
    def single_sd_to_column_config(col, sd):
        return {'col_name':str(col), 'displayer_args': {'displayer': 'obj'}}

    #what is the key for this in the df_display_args_dictionary
    df_display_name = "main"
    data_key = "main"
    summary_stats_key= 'all_stats'

    
    @classmethod
    def style_columns(kls, sd):
        ret_col_config = []

        if 'index' not in sd:
            ret_col_config.append({'col_name': 'index', 'displayer_args': {'displayer': 'obj'}})
            
        for col in sd.keys():
            ret_col_config.append(kls.single_sd_to_column_config(col, sd[col]))
        return {
            'pinned_rows': kls.pinned_rows,
            'column_config': ret_col_config}


In [None]:
def obj_(pkey):
    return {'primary_key_val': pkey, 'displayer_args': { 'displayer': 'obj' } }
def float_(pkey, digits=3):
    return {'primary_key_val': pkey, 
            'displayer_args': { 'displayer': 'float', 'minimumFractionDigits':digits, 'maximumFractionDigits':digits}}

class SummaryStatsAnalysis(PSimpleStylingAnalysis):
    pinned_rows = [
        obj_('dtype'),
        float_('min'),
        float_('mode'),
        float_('mean'),
        float_('max'),
        float_('unique_count', 0),
        float_('distinct_count', 0),
        float_('empty_count', 0)]
    df_display_name = "summary"
    data_key = "empty"
    summary_stats_key= 'all_stats'
base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.append(SummaryStatsAnalysis)
class SummaryBuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
sbw = SummaryBuckarooWidget(
    typed_df, 
    column_config_overrides=dict(index={'displayer_args': {'displayer': 'obj'}}))
#also lets do some hacking so that we start with the summary stats view
bstate = sbw.buckaroo_state.copy()
bstate['df_display'] = 'summary'
sbw.buckaroo_state= bstate
sbw

In [None]:
base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.extend([SimpleStylingAnalysis])
class SimpleBuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
    
bw3 = SimpleBuckarooWidget(
    typed_df, 
    column_config_overrides={
        'float_col': {'color_map_config': {
          'color_rule': 'color_map',
          'map_name': 'BLUE_TO_YELLOW',
        }}})
bw3

Now lets color int_col based on the range of float_col

Let's hide a column. Note this still has the data for that column sent to the frontend, and it is still accessible for color_maps and tooltips.
A note about hiding columns.  It only makes sense to hide columns from functions with access to the whole of a dataframe.
The only reason to hide a column (as opposed to remove it from the dataframe) is to use the values for tooltips or colormaps of another column

In [None]:
bw_ = PolarsBuckarooWidget(
    typed_df, 
    column_config_overrides={
        'int_col': {'merge_rule': 'hidden'}})
bw_

Let's look at pinned_rows, they can be modified by setting `pinned_rows` on Buckaroo Instaniation

In [None]:
bw = PolarsBuckarooWidget(
    typed_df, 
    pinned_rows=[
        { 'primary_key_val': 'dtype',     'displayer_args': { 'displayer': 'obj' } },
        { 'primary_key_val': 'histogram', 'displayer_args': { 'displayer': 'histogram' }},   
    ])
bw

Pinned rows reads from summary_stats, based on `primary_key_val`.  You can list all summary_stats_keys like this

In [None]:
[x['index'] for x in bw.df_data_dict['all_stats']]

You can even display histograms in regular cells if 'histogram' is properly constructed

In [None]:
histogram_vals = [x for x in bw.df_data_dict['all_stats'] if x['index'] == 'histogram'][0]
histogram_vals

In [None]:
hist_df = pl.DataFrame({'a':[20, 30],  'hist_col':[  histogram_vals['int_col'], histogram_vals['float_col']]})
hist_bw = PolarsBuckarooWidget(hist_df, 
                                column_config_overrides={
                                              'hist_col': {'displayer_args' : {'displayer': 'histogram' }}})
hist_bw

Adding alternate styling methods

Buckaroo encourages using many opinionated analysis that can be quickly cycled through

Here we will add to pinned_row configs

In [None]:
class SummaryStatsAnalysis(SimpleStylingAnalysis):
    pinned_rows = [
        { 'primary_key_val': 'dtype',     'displayer_args': { 'displayer': 'obj' } },
        { 'primary_key_val': 'histogram', 'displayer_args': { 'displayer': 'histogram' }},   
    ]
    df_display_name = "summary5"
    data_key = "empty"
    summary_stats_key= 'all_stats'
base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.append(SummaryStatsAnalysis)
class SummaryBuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
SummaryBuckarooWidget(typed_df)

In [None]:
# it's annoying to type out all of those pinned rows, lets make some convienence functions
def obj_(pkey):
    return {'primary_key_val': pkey, 'displayer_args': { 'displayer': 'obj' } }

def float_(pkey, digits=3):
    return {'primary_key_val': pkey, 
            'displayer_args': { 'displayer': 'float', 'minimumFractionDigits':digits, 'maximumFractionDigits':digits}}

class SummaryStatsAnalysis1(SimpleStylingAnalysis):
    pinned_rows = [
        { 'primary_key_val': 'dtype',     'displayer_args': { 'displayer': 'obj' } },
        { 'primary_key_val': 'histogram', 'displayer_args': { 'displayer': 'histogram' }},   
    ]
    df_display_name = "summary1"
    data_key = "empty"
    summary_stats_key= 'all_stats'
class SummaryStatsAnalysis(SimpleStylingAnalysis):
    pinned_rows = [
        obj_('dtype'),
        float_('min'),
        #float_('median'),
        float_('mean'),
        float_('max'),
    ]
    df_display_name = "summary"
    data_key = "empty"
    summary_stats_key= 'all_stats'
base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.extend([SummaryStatsAnalysis1, SummaryStatsAnalysis])
class SummaryBuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
sbw = SummaryBuckarooWidget(typed_df)
#also lets do some hacking so that we start with the summary stats view
bstate = sbw.buckaroo_state.copy()
bstate['df_display'] = 'summary1'
sbw.buckaroo_state= bstate
sbw

In [None]:
class SummaryStatsAnalysis(SimpleStylingAnalysis):
    pinned_rows = [
        obj_('dtype'),
        float_('min'),
        #float_('median'),
        float_('mean'),
        float_('max'),
        float_('unique_count', 0),
        float_('distinct_count', 0),
        float_('empty_count', 0)
    ]
    df_display_name = "summary"
    data_key = "empty"
    summary_stats_key= 'all_stats'
base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.append(SummaryStatsAnalysis)
class SummaryBuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
sbw = SummaryBuckarooWidget(typed_df)
#also lets do some hacking so that we start with the summary stats view
bstate = sbw.buckaroo_state.copy()
bstate['df_display'] = 'summary'
sbw.buckaroo_state= bstate
sbw

# lets add a post processing method

In [None]:
from polars import functions as F
from buckaroo.pluggable_analysis_framework.polars_analysis_management import PolarsAnalysis

In [None]:
class ValueCountPostProcessing(PolarsAnalysis):
    @classmethod
    def post_process_df(kls, df):
        result_df = df.select(
            F.all().value_counts().implode().list.gather(pl.arange(0, 10), null_on_oob=True).explode().struct.rename_fields(['val', 'unused_count']).struct.field('val').prefix('val_'),
            F.all().value_counts().implode().list.gather(pl.arange(0, 10), null_on_oob=True).explode().struct.field('count').prefix('count_'))
        return [result_df, {}]
    post_processing_method = "value_counts"
class TransposeProcessing(ColAnalysis):
    @classmethod
    def post_process_df(kls, df):
        return [df.transpose(), {}]
    post_processing_method = "transpose"
base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.extend([SimpleStylingAnalysis, ValueCountPostProcessing, TransposeProcessing])
class VCBuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
vcb = VCBuckarooWidget(typed_df, debug=False)
vcb

In [None]:
class AdaptingStylingAnalysis(SimpleStylingAnalysis):
    requires_summary = ["histogram", "is_numeric", "dtype", "is_integer"]
    pinned_rows = [
        obj_('dtype'),
        {'primary_key_val': 'histogram', 'displayer_args': { 'displayer': 'histogram' }}]

    @staticmethod
    def single_sd_to_column_config(col, sd):
        digits = 3
        if sd['is_integer']:
            disp = {'displayer': 'float', 'minimumFractionDigits':0, 'maximumFractionDigits':0}
        elif sd['is_numeric']:
            disp = {'displayer': 'float', 'minimumFractionDigits':digits, 'maximumFractionDigits':digits}
        else:
            disp = {'displayer': 'obj'}
        return {'col_name':col, 'displayer_args': disp }

base_a_klasses = PolarsBuckarooWidget.analysis_klasses.copy()
base_a_klasses.extend([AdaptingStylingAnalysis, ValueCountPostProcessing])
class ABuckarooWidget(PolarsBuckarooWidget):
    analysis_klasses = base_a_klasses
acb = ABuckarooWidget(typed_df)
acb