### Story:

Let's imagine we investigate the effect X on a property Y for a series of molecules. 

We have calculated or measured the total effect X and various contributions to this effect - these contributions are supposed to be corrections to **v1** of an increasing accuracy towards a reference value, **vref**. 

Additionally we used two methods of calculations or measurements to do that. 

In this notebook we will use [Altair](https://altair-viz.github.io/index.html) to plot the contributions to the effect X for the whole series and for two measurements methods.


### Data description:

* **df1** and **df2** collect the data from two measurements (read from data1.csv and data2.csv files)
* **v1** and **vref** are reference values which we use to calculate the total effect:
    * **v1**   - is measured with no effect X
    * **vref** - is measured with full effect X
    * $\mathbf{v[total\ effect] = vref - v1}$
    
* **v2** and **v3** are approximations to **vref**. We then define:
    * $\mathbf{\Delta(1) = v2 - v1}$
    * $\mathbf{\Delta(2) = v3 - v2}$
    * $\mathbf{\Delta(3) = vref - v3}$
    
### Thanks:

The calls to Altair functions used in this notebook are a combination of advice found (mostly) on stackoverlow, which I lost track of... Kudos to all Altair experts ut there.

In [1]:
import pandas as pd
import altair as alt

In [2]:
def prep_data(df,name):      
    df['name'] = name
    df_te=df[['mol','name']].copy()
    
    df['delta1']  = df['v2']-df['v1']
    df['delta2']  = df['v3']-df['v2']
    df['delta3']  = df['vref']-df['v3']
    
    df_te['total_effect'] = df['vref']-df['v1']
    
    df = df.drop(['v1','v2','v3','vref'], axis=1)
    df = df.melt(id_vars =['mol', 'name'])
    
    df_te = df_te.melt(id_vars =['mol', 'name'])
    
    return df,df_te

In [3]:
df1=pd.read_csv('data1.csv')

Let's have a look at the dataframe:

In [4]:
df1

Unnamed: 0,mol,v1,v2,v3,vref
0,Cr,-2944.2912,-2813.8701,-2796.4468,-2863.68
1,Mn,-4259.8771,-4221.8368,-4215.8657,-4279.8494
2,Co,-6125.6798,-4967.6189,-4963.6009,-4993.8201
3,Zn,1985.4178,1947.7289,1946.8668,1939.1123
4,Mo,-403.3011,-335.5684,-315.6369,-383.6767
5,Tc,-1264.2636,-1241.5628,-1228.3957,-1254.453
6,Ru,-762.8891,69.6719,204.9638,74.8138
7,Pd,-1768.4043,-1550.3824,-1533.235,-1477.3337
8,Ag,4589.0627,4408.0483,4408.4061,4377.6442
9,W,4468.8709,4547.5992,4567.8561,4436.9428


In [5]:
df1,df1_te=prep_data(df1,'set1')

In [6]:
df2=pd.read_csv('data2.csv')

In [7]:
df2, df2_te=prep_data(df2,'set2')

In [8]:
df_plot = pd.concat([df1.set_index('mol'),
                     df2.set_index('mol')]).reset_index()
df_plot_te = pd.concat([df1_te.set_index('mol'),
                     df2_te.set_index('mol')]).reset_index()

In [9]:
df_plot['variable'].replace({'delta1': '\u0394'+'(1)',
                             'delta2': '\u0394'+'(2)',
                             'delta3': '\u0394'+'(3)',
                             'delta4': '\u0394'+'(4)'
                            },inplace=True)

In [10]:
df_plot_all = pd.merge(df_plot, df_plot_te, on=['mol','name'])

In [11]:
order_mol=['Cr', 'Mn', 'Co', 'Zn', 'Mo', 'Tc', 'Ru', 'Pd', 'Ag', 'W', 'Re', 'Pt']
order_where=['set1','set2']


bars=alt.Chart(df_plot_all).mark_bar(size=15).encode(     

    # which field to group columns on
    x=alt.X('name:O',
            axis=alt.Axis(grid=True,labelFontSize=8),
            sort=order_where,
            title=None),

    # which field to use as Y values and how to calculate
    y=alt.Y('value_x:Q',
            axis=alt.Axis(grid=True,title=None)),

    # which field to color by & legend
    color=alt.Color('variable_x',
                    scale=alt.Scale(range=['#4381d1', '#47c488', '#ff6f69']),
                    legend=alt.Legend(title="Contributions",
                                      orient="right",
                                      direction="horizontal",
                                      offset=-200,
                                      titleFontSize=16,
                                      labelFontSize=14)),
                   
    # how to order the data on bars
    order=alt.Order('variable_x:Q', sort='ascending'))


# use separate marks for the 'total effect'
rules = alt.Chart(df_plot_all).mark_tick(color='black', 
                                         thickness=1.5,
                                         size=15
                                        ).encode(x=alt.X('name:O',axis=alt.Axis(grid=True,title=None)),
                                                 y=alt.Y('value_y:Q',axis=alt.Axis(grid=True,title=None)))


# combine all together
alt.layer(bars,rules).properties(height=450,width=50).facet(
   column=alt.Column('mol',
                     sort=order_mol,
                     header=alt.Header(title='Plot title',
                                       orient='bottom',
                                       titleFontSize=24,
                                       labelFontSize=14,
                                       labelBaseline='line-top',
                                       labelAlign='center',
                                       labelAnchor='middle'))).resolve_scale(x='independent').configure_view(strokeOpacity=0)

### Final note

The contributions $\mathbf{\Delta(1)}$, $\mathbf{\Delta(2)}$, $\mathbf{\Delta(3)}$ are defined in the beginning of this notebook. 

In particular, $\mathbf{\Delta(3)}$ can be interpreted as the portion of the effect X that is not described by approximations **v2** and **v3**, and so it is marked in red.

The total effect is additionally marked by **horizontal black lines**.



### Improvements, questions:

* it would be nice to have the horizontal black lines (marking the total effect) added to the legend, which at this point is not straightfoward to do