Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data dev #16

Merged
merged 54 commits into from
Nov 2, 2019
Merged

Data dev #16

merged 54 commits into from
Nov 2, 2019

Conversation

jdiedrichsen
Copy link
Contributor

Data set class definition -
I know that we talked about having the data set only containing a 2d-array
However, considering how we designed the RDMs class, I think it would be consistent to allow also 3d-arrays - with the variable n_set containign the number of data sets
This makes calculations on data sets with the same observation structure less awkward and generalizes nices.
obs_descriptors and channel_descriptors are forced to be the same over all sets - otherwise you need to make a new Dataset object

We may want an open discussion before this is merged into master

Indicator module in util
This is a collection of useful indicator matrices for coding linear models, pairwise contrast, etc.

Data set class creator
util.indicator is a module with 3 different functions that produce useful indicator matrices
Fixed importing of rsa-subpackages
Copy link
Member

@doerlbh doerlbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that having a 3d Dataset can make it more flexible. I will integrate my updates in your initialized version and pushed it tonight.

@doerlbh doerlbh requested a review from nikokri October 15, 2019 03:20
@doerlbh
Copy link
Member

doerlbh commented Oct 15, 2019

Hi guys, I also added the options where people can implement their own preprocessing function for arbitrary data format (as option args rawdata and preprocess) - they can be found useful in files neurodataset.py and exampledataset.py where submodules can be implemented there.

@doerlbh
Copy link
Member

doerlbh commented Oct 15, 2019

@jdiedrichsen Regarding the 3d-arrays, with the variable n_set containing the number of data sets, I now have a different opinion:

I like the additional flexibility of mulitiple sets of datasets. However, I believe this might create much additional burdens to the data structures of descriptor, obs_descriptor, and channel_descriptor - are they also need to have an additional dimension for the n_set?

An alternative would be, as we also briefly chatted on Oct 7, is to have an additional class, called DatasetList or DatasetSequence, where the Dataset object are collated together in a list. This also enables additional integration of my part on representational dynamics analysis, where RDM movies will be required to create from sliding window data, in this case, a DatasetList/DatasetSequence.

What do you think?

Other than this, I believe the pull request is still valid, since the 3d/2d discussion doesn't interact with other modules in this current commits (which should have already accommodated for all 2d inputs, with 3d inputs as an option).

@doerlbh
Copy link
Member

doerlbh commented Oct 15, 2019

All four methods have been implemented for Dataset class, accommodating 2d measurements as input:

split_obs(by=’descriptor’) returns list of Datasets
split_channel(by=’descriptor’) returns list of Datasets
subset_obs(by=‘descriptor’,value=’value’) returns Dataset
subset_channel(by=‘descriptor’,value=’value’) returns Dataset

@JasperVanDenBosch
Copy link
Contributor

JasperVanDenBosch commented Oct 15, 2019

An alternative would be, as we also briefly chatted on Oct 7, is to have an additional class, called DatasetList or DatasetSequence, where the Dataset object are collated together in a list.

Agreed

@JasperVanDenBosch
Copy link
Contributor

Can either of you have a look at the style issues on codeclimate? See the details link below next to CodeClimate. Many of them are just about whitespace. If you have a linter extension in your IDE it will mark these for you with squiggly lines, or alternatively you can automatically fix these with something like autopep8. Some other issues are repetitions of code, which can be fixed with a helper method.

@JasperVanDenBosch
Copy link
Contributor

Also the unit tests are currently failing, see Travis:

  File "/home/travis/build/rsagroup/pyrsa/pyrsa/data/dataset.py", line 40, in DatasetBase
    def split_obs(self, by=descriptor):
NameError: name 'descriptor' is not defined

@HeikoSchuett
Copy link
Contributor

I fixed the test problem in the rdm-dev branch. This should give an idea how to fix it here: essentially remove discriminator from by= discriminator as discriminator is not a defined variable

@HeikoSchuett HeikoSchuett added this to the 3.0.1 milestone Oct 15, 2019
@doerlbh
Copy link
Member

doerlbh commented Oct 15, 2019

Thanks for the pointers on the unit tests and style issues! Now the unit test issues are all fixed and committed. And will work on the style in pyrsa.data later this week as well.

@doerlbh
Copy link
Member

doerlbh commented Nov 1, 2019

Implementations and unit tests for the class with four main functions are all now complete and passing all criteria. Helpful remarks and requested changes from @ilogue and @HeikoSchuett are all fixed in the current version. Please review the code and let me know if anything else need to be changed, and approve if you think it's ready. Thanks!

Copy link
Contributor

@JasperVanDenBosch JasperVanDenBosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this!

@JasperVanDenBosch JasperVanDenBosch merged commit f0e86d7 into master Nov 2, 2019
@JasperVanDenBosch JasperVanDenBosch deleted the data-dev branch November 2, 2019 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants