Data dev #16

jdiedrichsen · 2019-10-15T02:06:07Z

Data set class definition -
I know that we talked about having the data set only containing a 2d-array
However, considering how we designed the RDMs class, I think it would be consistent to allow also 3d-arrays - with the variable n_set containign the number of data sets
This makes calculations on data sets with the same observation structure less awkward and generalizes nices.
obs_descriptors and channel_descriptors are forced to be the same over all sets - otherwise you need to make a new Dataset object

We may want an open discussion before this is merged into master

Indicator module in util
This is a collection of useful indicator matrices for coding linear models, pairwise contrast, etc.

Data set class creator util.indicator is a module with 3 different functions that produce useful indicator matrices

Fixed importing of rsa-subpackages

doerlbh

I agree with you that having a 3d Dataset can make it more flexible. I will integrate my updates in your initialized version and pushed it tonight.

… Dataset classes

doerlbh · 2019-10-15T03:22:28Z

Hi guys, I also added the options where people can implement their own preprocessing function for arbitrary data format (as option args rawdata and preprocess) - they can be found useful in files neurodataset.py and exampledataset.py where submodules can be implemented there.

…ptor as inputs

doerlbh · 2019-10-15T04:14:11Z

@jdiedrichsen Regarding the 3d-arrays, with the variable n_set containing the number of data sets, I now have a different opinion:

I like the additional flexibility of mulitiple sets of datasets. However, I believe this might create much additional burdens to the data structures of descriptor, obs_descriptor, and channel_descriptor - are they also need to have an additional dimension for the n_set?

An alternative would be, as we also briefly chatted on Oct 7, is to have an additional class, called DatasetList or DatasetSequence, where the Dataset object are collated together in a list. This also enables additional integration of my part on representational dynamics analysis, where RDM movies will be required to create from sliding window data, in this case, a DatasetList/DatasetSequence.

What do you think?

Other than this, I believe the pull request is still valid, since the 3d/2d discussion doesn't interact with other modules in this current commits (which should have already accommodated for all 2d inputs, with 3d inputs as an option).

doerlbh · 2019-10-15T04:30:15Z

All four methods have been implemented for Dataset class, accommodating 2d measurements as input:

split_obs(by=’descriptor’) returns list of Datasets
split_channel(by=’descriptor’) returns list of Datasets
subset_obs(by=‘descriptor’,value=’value’) returns Dataset
subset_channel(by=‘descriptor’,value=’value’) returns Dataset

JasperVanDenBosch · 2019-10-15T11:11:13Z

An alternative would be, as we also briefly chatted on Oct 7, is to have an additional class, called DatasetList or DatasetSequence, where the Dataset object are collated together in a list.

Agreed

JasperVanDenBosch · 2019-10-15T11:15:35Z

Can either of you have a look at the style issues on codeclimate? See the details link below next to CodeClimate. Many of them are just about whitespace. If you have a linter extension in your IDE it will mark these for you with squiggly lines, or alternatively you can automatically fix these with something like autopep8. Some other issues are repetitions of code, which can be fixed with a helper method.

JasperVanDenBosch · 2019-10-15T11:24:05Z

Also the unit tests are currently failing, see Travis:

  File "/home/travis/build/rsagroup/pyrsa/pyrsa/data/dataset.py", line 40, in DatasetBase
    def split_obs(self, by=descriptor):
NameError: name 'descriptor' is not defined

HeikoSchuett · 2019-10-15T14:38:10Z

I fixed the test problem in the rdm-dev branch. This should give an idea how to fix it here: essentially remove discriminator from by= discriminator as discriminator is not a defined variable

doerlbh · 2019-10-15T17:00:37Z

Thanks for the pointers on the unit tests and style issues! Now the unit test issues are all fixed and committed. And will work on the style in pyrsa.data later this week as well.

…ndling

doerlbh · 2019-11-01T21:07:27Z

Implementations and unit tests for the class with four main functions are all now complete and passing all criteria. Helpful remarks and requested changes from @ilogue and @HeikoSchuett are all fixed in the current version. Please review the code and let me know if anything else need to be changed, and approve if you think it's ready. Thanks!

JasperVanDenBosch

Happy with this!

jdiedrichsen added 2 commits October 14, 2019 21:46

Dataset class and util.indicator

14d9ef7

Data set class creator util.indicator is a module with 3 different functions that produce useful indicator matrices

Fixes to test_data and test_indicator

c071694

Fixed importing of rsa-subpackages

jdiedrichsen requested review from iancharest, doerlbh and HeikoSchuett October 15, 2019 02:06

doerlbh approved these changes Oct 15, 2019

View reviewed changes

doerlbh added 3 commits October 14, 2019 22:26

[FEAT] Added Dataset class

5a0f51c

[FEAT] Added Dataset class

d96cce9

[FEAT] Added options for neural or other user-defined data inputs for…

bfaa8ee

… Dataset classes

doerlbh requested a review from JasperVanDenBosch October 15, 2019 02:53

[DBG] fixed descriptions in Dataset methods

8569705

doerlbh requested a review from nikokri October 15, 2019 03:20

doerlbh added 2 commits October 14, 2019 23:58

[FEAT] Implemented split_channel and split_obs for 2d Dataset

044e5df

[DBG] fixed split methods and extend subset methods to include descri…

4099692

…ptor as inputs

[FEAT] Implemented subset functions for Dataset class

e010e15

[DBG] fixed neurodataset and exampledataset for extended Dataset class

2373b32

HeikoSchuett added this to the 3.0.1 milestone Oct 15, 2019

[DBG] fixed unit test issue on method arg for Dataset class

3e56ba8

doerlbh added 3 commits October 19, 2019 02:26

[DBG] style fix on neurodata

3b70d73

[DBG] style fix on data-dev

249c210

[DBG] more style fix in data-dev

70cd673

doerlbh added 15 commits November 1, 2019 15:14

[DBG] remove n_set from all formulation for consistency

3c4a38f

[DBG] fix test_data, clean styles

baed3c1

[FEAT] implemented check_dict_length, fix test_data to be array-based

d4779d3

[FEAT] implemented extract_dict, fix test_data

8db2449

[DBG] fix util import and test_data

251bd26

[DBG] more fixes on util import

8c26882

[DBG] fix data_utils

52bf954

[FEAT] add test for split_channel

b27d2af

[DBG] fix deep copy issue in extract_dict

9e66f9d

[FEAT] provide get_unique_unsorted to improve split_channel string ha…

56bfff6

…ndling

[DBG] fix minor typo

5b8c7ab

[DBG] fix test multi-element comparison issue

ee5bba9

[FEAT] Added test for subset_obs

307d314

[FEAT] Added test for subset_channel

734597c

[DBG] minor typo

cc3c513

doerlbh mentioned this pull request Nov 1, 2019

Implement DatasetList options (including sliding window operations) #24

Closed

doerlbh added 4 commits November 1, 2019 16:47

[DBG] style fixes

3d01963

[DBG] more style fixes

0f8c0c4

[DBG] more style fixes

5c884af

[DBG] minor bug fix

fdcf068

doerlbh requested review from JasperVanDenBosch and HeikoSchuett November 1, 2019 21:07

doerlbh added 3 commits November 1, 2019 17:24

[DBG] simplify code as suggested

3cf9b73

[DBG] minor bug fixes

0e7998c

[DBG] minor style fix

5049128

JasperVanDenBosch approved these changes Nov 2, 2019

View reviewed changes

JasperVanDenBosch merged commit f0e86d7 into master Nov 2, 2019

JasperVanDenBosch deleted the data-dev branch November 2, 2019 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data dev #16

Data dev #16

jdiedrichsen commented Oct 15, 2019

doerlbh left a comment

doerlbh commented Oct 15, 2019

doerlbh commented Oct 15, 2019 •

edited

Loading

doerlbh commented Oct 15, 2019 •

edited

Loading

JasperVanDenBosch commented Oct 15, 2019 •

edited

Loading

JasperVanDenBosch commented Oct 15, 2019

JasperVanDenBosch commented Oct 15, 2019

HeikoSchuett commented Oct 15, 2019

doerlbh commented Oct 15, 2019

doerlbh commented Nov 1, 2019

JasperVanDenBosch left a comment

Data dev #16

Data dev #16

Conversation

jdiedrichsen commented Oct 15, 2019

doerlbh left a comment

Choose a reason for hiding this comment

doerlbh commented Oct 15, 2019

doerlbh commented Oct 15, 2019 • edited Loading

doerlbh commented Oct 15, 2019 • edited Loading

JasperVanDenBosch commented Oct 15, 2019 • edited Loading

JasperVanDenBosch commented Oct 15, 2019

JasperVanDenBosch commented Oct 15, 2019

HeikoSchuett commented Oct 15, 2019

doerlbh commented Oct 15, 2019

doerlbh commented Nov 1, 2019

JasperVanDenBosch left a comment

Choose a reason for hiding this comment

doerlbh commented Oct 15, 2019 •

edited

Loading

doerlbh commented Oct 15, 2019 •

edited

Loading

JasperVanDenBosch commented Oct 15, 2019 •

edited

Loading