Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving feature names #42

Merged
merged 6 commits into from Sep 28, 2018
Merged

Conversation

jbschiratti
Copy link
Collaborator

This PR aims at having more meaningful feature names when extract_featuresis called with return_df = True.

…returns a dataframe with a meaningful multiindex).
elif _params['ratios'] == 'only':
return ratios_names
else:
return pow_names + ratios_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not really clean. This gymnastic should be done via the function pow_freq_bands. You have some logic in a function agnostic class which percolates from a custom function. You should attach this logic to the pow_freq_bands callable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is not clean. This first commit was just to allow @l-omar-chehab to move on (and have meaning feature names for compute_pow_freq_bands). Now, we can think about making it nice.

@jbschiratti
Copy link
Collaborator Author

Here is one idea on how the feature names could be "attached" to the feature functions.

If called with ratios=None, the feature function compute_pow_freq_bands will return an array with shape (n_channels * n_freq_bands,). In this case, possible [meaningful] feature names could be ch0_band0, ch0_band1, ch0_band2..., ch1_band0,... To "attach" this information to compute_pow_freq_bands we could use a decorator:

@with_feature_names('ch[0]_band[1]')
def compute_pow_freq_bands(sfreq, data, ...)
    ...

So that, in FeatureFunctionTransformer, self.func.feat_names_pattern gives: 'ch[0]_band[1]. Then, in the get_feature_names method of FeatureFunctionTransformer, we could have something like this:

pattern = self.func.feat_names_pattern  # 'ch[0]_band[1]'
feature_names = self._get_feature_names_helper(pattern, self.out_shape)
return feature_names

where out_shape corresponds to X_out.shape with X_out, the array returned by the feature function. For this to work, feature functions will need to return multidimensional ndarrays. This is not a big issue since the transform method of FeatureFunctionTransformer could be changed to return X_out.ravel() instead of X_out.

In the code above, _get_feature_names_helper would transform 'ch[0]_band[1]' into the list ['ch0_band0', 'ch0_band1', ..., 'ch1_band0', 'ch1_band1',...] with using the following rationale: [0] in the input string mean iterating from 0 to out_shape[0] - 1 and [1] iterating from 0 to out_shape[1] - 1. This way, 'ch[0]_band[1] would be equivalent to:

['ch%s_band%s' % (i, j) for i in range(out_shape[0]) for j in range(out_shape[1])]

... you get the idea!

If a feature function does not have a feat_names_pattern attribute, nothing changes.

@agramfort
Copy link
Member

I would do it like this:

import numpy as np


def pow_freq_bands(X, bands):
    return np.random.randn(X.shape[0], len(bands))


def _pow_freq_bands_feature_names(X, bands):
    return ['ch%s_band%s' % (i, j) for i in range(X.shape[0]) for j in range(len(bands))]


pow_freq_bands.get_feature_names = _pow_freq_bands_feature_names

if __name__ == '__main__':
    func = pow_freq_bands
    X = np.random.randn(2, 20)
    bands = [(8, 12), (18, 22)]
    if hasattr(func, 'get_feature_names'):
        feature_names = func.get_feature_names(X, bands)

    print(feature_names)

by adding an attribute to the function. func.get_feature_names should expect the same parameters as func to make things easy.

clear?

`_compute_pow_freq_bands_feat_names` in univariate.py + changes in
feature_extraction.py.
* Minor changes in tests and examples to get rid of several warnings
(issue mne-tools#44).
@codecov
Copy link

codecov bot commented Sep 20, 2018

Codecov Report

Merging #42 into master will increase coverage by 0.38%.
The diff coverage is 95.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
+ Coverage   92.74%   93.12%   +0.38%     
==========================================
  Files          10       10              
  Lines        1089     1164      +75     
==========================================
+ Hits         1010     1084      +74     
- Misses         79       80       +1
Impacted Files Coverage Δ
mne_features/tests/test_feature_extraction.py 88.67% <100%> (ø) ⬆️
mne_features/univariate.py 97.26% <100%> (+0.48%) ⬆️
mne_features/utils.py 91.78% <83.33%> (-0.76%) ⬇️
mne_features/feature_extraction.py 95.27% <91.66%> (+1.01%) ⬆️
mne_features/tests/test_univariate.py 84.54% <94.11%> (+1.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 461a8a8...8748c87. Read the comment docs.

_feature_func = _get_python_func(self.func)
_params = self.get_params()
if hasattr(_feature_func, 'get_feature_names'):
self.feature_names = _feature_func.get_feature_names(X, **_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that calling transform affects the state of the object. Only a fit is allowed to do this. Can you see a way out? also any attribute that is data dependent should end with _

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then, what about

    def fit(self, X, y=None):
        """Fit the FeatureFunctionTransformer (does not extract features).
        
        Parameters
        ----------
        X : ndarray, shape (n_channels, n_times)
        
        y : ignored
        
        Returns
        -------
        self
        """
        self._check_input(X)
        _feature_func = _get_python_func(self.func)
        _params = self.get_params()
        if hasattr(_feature_func, 'get_feature_names'):
            self.feature_names_ = _feature_func.get_feature_names(X, **_params)
        return self

in FeatureFunctionTransformer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is ok in terms of API but _params = self.get_params() could be in the if block

; only used to get feature names when this is possible).
@jbschiratti
Copy link
Collaborator Author

@agramfort If you're OK and if Travis is OK, it's good to go !

@agramfort
Copy link
Member

there is to test to check that the feature names are correct. Please add one. thx

@jbschiratti
Copy link
Collaborator Author

Test added !

fb = np.array([[4., 8.], [30., 70.]])
ratios_col_names = ['ch0_0_1', 'ch0_1_0', 'ch1_0_1', 'ch1_1_0',
'ch2_0_1', 'ch2_1_0']
pow_col_names = ['ch0_0', 'ch0_1', 'ch1_0', 'ch1_1', 'ch2_0', 'ch2_1']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no way to have more explicit names likes alpha, beta etc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve that but, at some point, the user would need to name the frequency bands he wishes to use. What about allowing the freq_bands parameter in compute_pow_freq_bands to be a dict as the one below?

freq_bands = {'delta': [0.5, 4], 
              'theta': [4, 8], 
              'alpha': [8, 13], 
              'beta': [13, 30], 
              'low-gamma': [30, 70], 
              'high-gamma': [70, 100]}

@agramfort
Copy link
Member

agramfort commented Sep 27, 2018 via email

`compute_energy_freq_bands`) to be a dict with band names as keys. Added
feature names for `compute_enregy_freq_bands` + updated tests.
@jbschiratti
Copy link
Collaborator Author

jbschiratti commented Sep 28, 2018

The last commit improves the feature names when freq_bands is a dict such as:

freq_bands = {'delta': [0.5, 4], 
              'theta': [4, 8], 
              'alpha': [8, 13], 
              'beta': [13, 30], 
              'low-gamma': [30, 70], 
              'high-gamma': [70, 100]}

@agramfort agramfort merged commit 911d868 into mne-tools:master Sep 28, 2018
@agramfort
Copy link
Member

Thanks

@jbschiratti jbschiratti deleted the feature_names branch September 28, 2018 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants