<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-MVC-project-description" data-toc-modified-id="1.-MVC-project-description-1">1. MVC project description</a></span></li><li><span><a href="#2.-Setup" data-toc-modified-id="2.-Setup-2">2. Setup</a></span></li><li><span><a href="#3.-Get-the-data" data-toc-modified-id="3.-Get-the-data-3">3. Get the data</a></span><ul class="toc-item"><li><span><a href="#3.1.-From-matlab-to-dict" data-toc-modified-id="3.1.-From-matlab-to-dict-3.1">3.1. From matlab to dict</a></span></li><li><span><a href="#3.2.-From-dict-to-pandas-dataframe" data-toc-modified-id="3.2.-From-dict-to-pandas-dataframe-3.2">3.2. From dict to pandas dataframe</a></span></li></ul></li><li><span><a href="#4.-Data-analysis" data-toc-modified-id="4.-Data-analysis-4">4. Data analysis</a></span><ul class="toc-item"><li><span><a href="#4.1.-Muscles-by-dataset" data-toc-modified-id="4.1.-Muscles-by-dataset-4.1">4.1. Muscles by dataset</a></span></li><li><span><a href="#4.2.-Tests-by-dataset" data-toc-modified-id="4.2.-Tests-by-dataset-4.2">4.2. Tests by dataset</a></span></li><li><span><a href="#4.3.-Muscles-and-tests-count" data-toc-modified-id="4.3.-Muscles-and-tests-count-4.3">4.3. Muscles and tests count</a></span></li><li><span><a href="#4.4.-Max-for-each-test-(normalized-by-participant-number)" data-toc-modified-id="4.4.-Max-for-each-test-(normalized-by-participant-number)-4.4">4.4. Max for each test (normalized by participant number)</a></span></li><li><span><a href="#4.5.-Distribution" data-toc-modified-id="4.5.-Distribution-4.5">4.5. Distribution</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-4.6">Summary</a></span></li></ul></li></ul></div>

# 1. MVC project description

**Links**
- [github repo](https://github.com/romainmartinez/mvc)
- [plotly figures]()

**Todos**
- update readme, description
- data analysis
- plotly link
- one model by muscle

**Author**: _Romain Martinez._

# 2. Setup

In [1]:
# Common imports
import scipy.io as sio
import pandas as pd
import numpy as np

# Path
from pathlib import Path, PurePath
PROJECT_PATH = Path('./')
DATA_PATH = PROJECT_PATH.joinpath('data')

# to make this notebook's output stable across runs
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Figures
OFFLINE = True
if OFFLINE:
    import plotly.offline as py
    py.init_notebook_mode(connected=True)
else:
    import plotly.plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
BASE_LAYOUT = go.Layout(hovermode='closest', font=dict(size=14))

# 3. Get the data

## 3.1. From matlab to dict

In [2]:
def load_data(data_path, data_format, normalize=False):
    if not data_path.is_dir():
        raise ValueError('please provide a valid data path')
        
    mat = {}
    data = {key: [] for key in ('datasets', 'participants', 'muscles', 'tests', 'mvc')}
    count = -1
    dataset_names = []
    
    for idataset, ifile in enumerate(data_path.iterdir()):
        if ifile.parts[-1].endswith(f'{data_format}.mat'):
            dataset = ifile.parts[-1].replace('_only_max.mat', '').replace('MVE_Data_', '')
            
            if dataset not in dataset_names:
                dataset_names.append(dataset)
            
            mat[dataset] = sio.loadmat(ifile)['MVE']
            n_participants = mat[dataset].shape[0]
            print(f"project '{dataset}' ({n_participants} participants)")
            
            for iparticipant in range(mat[dataset].shape[0]):
                count += 1
                for imuscle in range(mat[dataset].shape[1]):
                    max_mvc = np.nanmax(mat[dataset][iparticipant, imuscle, :])
                    for itest in range(mat[dataset].shape[2]):
                        data['participants'].append(count)
                        data['datasets'].append(idataset)
                        data['muscles'].append(imuscle)
                        data['tests'].append(itest)
                        if normalize:
                            data['mvc'].append(mat[dataset][iparticipant, imuscle, itest] * 100 / max_mvc)
                        else:
                            data['mvc'].append(mat[dataset][iparticipant, imuscle, itest])
                            
    print(f'\n\ttotal participants: {count}')
    return data, dataset_names

In [3]:
DATA_FORMAT = 'only_max'
data, DATASET_NAMES = load_data(data_path=DATA_PATH, data_format=DATA_FORMAT, normalize=False)

MUSCLES_NAMES = [
    'upper trapezius', 'middle trapezius', 'lower trapezius',
    'anterior deltoid', 'middle deltoid', 'posterior deltoid',
    'pectoralis major', 'serratus anterior', 'latissimus dorsi',
    'supraspinatus', 'infraspinatus', 'subscapularis'
]

project 'Landry2016' (15 participants)
project 'Landry2015_2' (11 participants)
project 'Landry2015_1' (14 participants)
project 'Violon' (10 participants)
project 'Yoann_2015' (22 participants)
project 'Landry2013' (21 participants)
project 'Landry2012' (18 participants)
project 'Tennis' (16 participants)
project 'Patrick_2013' (16 participants)
project 'Sylvain_2015' (10 participants)

	total participants: 152



All-NaN slice encountered



## 3.2. From dict to pandas dataframe

In [4]:
df_tidy = pd.DataFrame({
    'participant': data['participants'],
    'dataset': data['datasets'],
    'muscle': data['muscles'],
    'test': data['tests'],
    'mvc': data['mvc']
}).dropna()

print(f'dataset shape = {df_tidy.shape}')
df_tidy.head()

dataset shape = (16456, 5)


Unnamed: 0,dataset,muscle,mvc,participant,test
2,0,0,0.127825,0,2
3,0,0,0.124255,0,3
4,0,0,0.146927,0,4
5,0,0,0.041583,0,5
8,0,0,0.162206,0,8


In [5]:
df_wide = df_tidy.pivot_table(
    index=['dataset', 'participant', 'muscle'],
    columns='test',
    values='mvc',
    fill_value=np.nan).reset_index()

df_wide = df_wide.drop(['dataset', 'participant'], axis=1)

print(f'dataset shape = {df_wide.shape}')
df_wide.head()

dataset shape = (1468, 17)


test,muscle,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,0,,,0.127825,0.124255,0.146927,0.041583,,,0.162206,0.017711,0.014369,,,,,0.036916
1,2,,,0.179864,0.294909,0.295846,0.107769,,,0.199097,0.20215,0.022668,,,,,0.07146
2,3,,,0.078753,0.244578,0.272709,0.010146,,,0.10649,0.0106,0.007517,,,,,0.238814
3,4,,,0.150353,0.104654,0.115272,0.057845,,,0.065429,0.05293,0.009894,,,,,0.079885
4,5,,,0.172669,0.124655,0.133114,0.196436,,,0.101393,0.187997,0.051396,,,,,0.050041


# 4. Data analysis

## 4.1. Muscles by dataset

In [6]:
def plot_count_by_dataset(d, values, index, columns, **kwargs):
    table = d.pivot_table(
        values,
        index,
        columns,
        aggfunc=lambda x: len(x) / x.nunique(),
        fill_value=0).astype(int)

    fig = ff.create_annotated_heatmap(
        z=np.array(table),
        x=kwargs.get('xlabel'),
        y=kwargs.get('ylabel'),
        showscale=True,
        colorscale='YlGnBu',
        colorbar=dict(title='Count', titleside='right'))

    fig['layout'].update(BASE_LAYOUT)
    fig['layout'].update(
        dict(
            title=kwargs.get('title'),
            xaxis=dict(title=kwargs.get('xtitle'), side='bottom'),
            yaxis=dict(title=kwargs.get('ytitle'), autorange='reversed'),
            margin=go.Margin(t=80, b=80, l=150, r=80, pad=0)))
    return fig

In [7]:
muscle_by_dataset = plot_count_by_dataset(
    df_tidy,
    values='test',
    index='dataset',
    columns='muscle',
    ylabel=DATASET_NAMES,
    xlabel=MUSCLES_NAMES,
    title='Muscles by dataset')
py.iplot(muscle_by_dataset, filename='mvc/muscles_by_dataset')

## 4.2. Tests by dataset

In [8]:
muscle_by_dataset = plot_count_by_dataset(
    df_tidy,
    values='muscle',
    index='dataset',
    columns='test',
    ylabel=DATASET_NAMES,
    xtitle='Tests',
    title='Tests by dataset')
py.iplot(muscle_by_dataset, filename='mvc/tests_by_dataset')

## 4.3. Muscles and tests count

In [9]:
def plot_count_bar(d, column, **kwargs):
    count = np.array(d[column].value_counts(sort=False))
    trace = go.Bar(
        x=count,
        y=kwargs.get('ylabel'),
        marker=dict(color='grey'),
        orientation='h')

    layout = BASE_LAYOUT.copy()
    layout.update(
        dict(
            title=kwargs.get('title'),
            xaxis=dict(
                title=kwargs.get('xtitle'), showline=True, linewidth=1.5),
            yaxis=dict(
                title=kwargs.get('ytitle'), showline=True, linewidth=1.5)))

    # adjust y axis
    layout['yaxis'].update(nticks=count.shape[0])
    layout.update(margin=go.Margin(t=80, b=80, l=150, r=80, pad=0))
    return dict(data=[trace], layout=layout)

In [10]:
test_count_bar = plot_count_bar(
    df_tidy, 'test', title='Tests count', xtitle='n', ytitle='Tests')
py.iplot(test_count_bar, filename='mvc/test_count_bar')

In [11]:
muscle_count_bar = plot_count_bar(
    df_tidy, 'muscle', ylabel=MUSCLES_NAMES, title='Muscles count', xtitle='n')
py.iplot(muscle_count_bar, filename='mvc/muscle_count_bar')

## 4.4. Max for each test (normalized by participant number)

In [12]:
normalized, _ = load_data(
    data_path=DATA_PATH, data_format=DATA_FORMAT, normalize=True)

df_normalized = pd.DataFrame({
    'participant': normalized['participants'],
    'dataset': normalized['datasets'],
    'muscle': normalized['muscles'],
    'test': normalized['tests'],
    'mvc': normalized['mvc']
}).dropna()

project 'Landry2016' (15 participants)
project 'Landry2015_2' (11 participants)
project 'Landry2015_1' (14 participants)
project 'Violon' (10 participants)
project 'Yoann_2015' (22 participants)
project 'Landry2013' (21 participants)
project 'Landry2012' (18 participants)
project 'Tennis' (16 participants)
project 'Patrick_2013' (16 participants)
project 'Sylvain_2015' (10 participants)

	total participants: 152



All-NaN slice encountered



In [13]:
def plot_max_by_test(d, **kwargs):
    maximum = d[d['mvc'] == 100].pivot_table(
        values='muscle',
        index='dataset',
        columns='test',
        aggfunc='count',
        fill_value=0)
    maximum = (maximum.div(maximum.sum(axis=1), axis=0) * 100).astype(int)

    fig = ff.create_annotated_heatmap(
        z=np.array(maximum),
        x=kwargs.get('xlabel'),
        y=kwargs.get('ylabel'),
        showscale=True,
        colorscale='YlGnBu',
        colorbar=dict(title='Percentage', titleside='right'))

    fig['layout'].update(BASE_LAYOUT)
    fig['layout'].update(
        dict(
            title=kwargs.get('title'),
            xaxis=dict(title=kwargs.get('xtitle'), side='bottom'),
            yaxis=dict(title=kwargs.get('ytitle'), autorange='reversed'),
            margin=go.Margin(t=80, b=80, l=150, r=80, pad=0)))
    return fig

In [80]:
max_by_test = plot_max_by_test(
    df_normalized,
    ylabel=DATASET_NAMES,
    xtitle='Tests',
    title='Max for each test (normalized by participant number)')

py.iplot(max_by_test, filename='mvc/max_by_test')

## 4.5. Distribution

In [15]:
def plot_mvc_distribution(d):
    for imuscle in d['muscle'].unique():
        

Unnamed: 0,dataset,muscle,mvc,participant,test
2,0,0,78.803940,0,2
3,0,0,76.603025,0,3
4,0,0,90.580569,0,4
5,0,0,25.635654,0,5
8,0,0,100.000000,0,8
9,0,0,10.918734,0,9
10,0,0,8.858734,0,10
15,0,0,22.758820,0,15
34,0,2,60.796379,0,2
35,0,2,99.683129,0,3


## Summary