First, I'm importing the dataframe from the python package I'm using to interact with my data. (I've mentioned it [here](https://beepb00p.xyz/annotating.html#infra)).

It's private at the moment, but it's pretty specific to my usecases and the only interfacing in this post it throught Pandas dataframe, so hopefully that wouldn't confuse you.

In [None]:
from my.workouts.dataframes import endomondo
df = endomondo()

Some sample data:

In [None]:
display(df[df['dt'].apply(lambda dt: str(dt.date())) == '2019-04-21'])

Error column is a neat way of propagating exceptions from data provider.

E.g. I only have HR data for the last couple of years or so, so data provider doesn't have any of HR points from endomondo. While I could filter out these points in the data provider, they might still be useful for other plots and analysis pipelines (e.g. if I was actually only interested in kcals and didn't hare about heartbeats).

Instead, I'm just being defensive and propagating exceptions up through the dataframe, leaving it up to the user to handle them.

In [None]:
display(df[df['dt'].apply(lambda dt: str(dt.date())).isin(['2015-03-06', '2018-05-28'])])

So, first we filter out the entries with errors:

In [None]:
df = df[df['error'].isnull()]

As well as some random entries which would end up as outliers:

In [None]:
df = df.groupby(['sport']).filter(lambda grp: len(grp) >= 10) 

In [None]:
%matplotlib inline
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns

matplotlib.rc('font', size=17, weight='regular')

sports = {
    g: len(f) for g, f in df.groupby(['sport'])
}

g = sns.lmplot(
    data=df,
    x='kcal',
    y='heartbeats',
    hue='sport', 
    hue_order=sports.keys(),
    legend_out=False,
    height=15,
)
g.set(
    title='Dependency between total heartbeats and Kcals (estimated by Endomondo)',
    
    xlim=(0, None), 
    xlabel='Kcal',
    
    ylim=(0, None),
    ylabel='Heartbeats, total'
)
# https://stackoverflow.com/a/55108651/706389
plt.legend(
    title='Sport',
    labels=[f'{s} ({cnt} points)' for s, cnt in sports.items()],
    loc='upper left',
   #  fontsize='xx-large',
)
pass

In [None]:
# import plotly.express as px # type: ignore
# f = px.scatter(df, x='kcal', y='heartbeats', color='sport')
# display(f)