-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data dev #16
Data dev #16
Conversation
Data set class creator util.indicator is a module with 3 different functions that produce useful indicator matrices
Fixed importing of rsa-subpackages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that having a 3d Dataset can make it more flexible. I will integrate my updates in your initialized version and pushed it tonight.
Hi guys, I also added the options where people can implement their own preprocessing function for arbitrary data format (as option args rawdata and preprocess) - they can be found useful in files neurodataset.py and exampledataset.py where submodules can be implemented there. |
@jdiedrichsen Regarding the 3d-arrays, with the variable n_set containing the number of data sets, I now have a different opinion: I like the additional flexibility of mulitiple sets of datasets. However, I believe this might create much additional burdens to the data structures of descriptor, obs_descriptor, and channel_descriptor - are they also need to have an additional dimension for the n_set? An alternative would be, as we also briefly chatted on Oct 7, is to have an additional class, called DatasetList or DatasetSequence, where the Dataset object are collated together in a list. This also enables additional integration of my part on representational dynamics analysis, where RDM movies will be required to create from sliding window data, in this case, a DatasetList/DatasetSequence. What do you think? Other than this, I believe the pull request is still valid, since the 3d/2d discussion doesn't interact with other modules in this current commits (which should have already accommodated for all 2d inputs, with 3d inputs as an option). |
All four methods have been implemented for Dataset class, accommodating 2d measurements as input: split_obs(by=’descriptor’) returns list of Datasets |
Agreed |
Can either of you have a look at the style issues on codeclimate? See the details link below next to CodeClimate. Many of them are just about whitespace. If you have a linter extension in your IDE it will mark these for you with squiggly lines, or alternatively you can automatically fix these with something like |
Also the unit tests are currently failing, see Travis:
|
I fixed the test problem in the rdm-dev branch. This should give an idea how to fix it here: essentially remove discriminator from by= discriminator as discriminator is not a defined variable |
Thanks for the pointers on the unit tests and style issues! Now the unit test issues are all fixed and committed. And will work on the style in pyrsa.data later this week as well. |
Implementations and unit tests for the class with four main functions are all now complete and passing all criteria. Helpful remarks and requested changes from @ilogue and @HeikoSchuett are all fixed in the current version. Please review the code and let me know if anything else need to be changed, and approve if you think it's ready. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with this!
Data set class definition -
I know that we talked about having the data set only containing a 2d-array
However, considering how we designed the RDMs class, I think it would be consistent to allow also 3d-arrays - with the variable n_set containign the number of data sets
This makes calculations on data sets with the same observation structure less awkward and generalizes nices.
obs_descriptors and channel_descriptors are forced to be the same over all sets - otherwise you need to make a new Dataset object
We may want an open discussion before this is merged into master
Indicator module in util
This is a collection of useful indicator matrices for coding linear models, pairwise contrast, etc.