New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dataset as a protocol #220
Conversation
Overall, I think it's confusing that Again, I'm not a maintainer so don't listen to me unless the maintainer chimes in. :) |
+1 to Jason's comments, esp. regarding rows and col-names in diff. ns and the P. |
Thank you very much for the feedback. Please let me explain my motivations for the current implementation. I don't see much guidance online for protocol naming conventions, but note the conventions in clojure.lang for interfaces. Interfaces that represent object types (eg, sequence) are prefixed with an "I" (eg, ISeq), and interfaces that represent mixins (eg, Seqable) do not receive a prefix. Likewise, abstract base classes are prefixed with an "A". In both cases, the prefix indicates that the type is not instantiable, so I feel we should continue the convention. I chose "P" to show it is a "P"rotocol, as opposed to an "I"nterface or "A"bstract class, and in line with core.matrix. I did not add a "rows" function to core because I see any use of the "rows" function as a code smell and abstraction violation. Datasets should be accessed through the API in core, not directly through the dataset module. This is because using protocols allows us to add new implementations, and for other implementation "rows" may be very inefficient (eg, getting a specific column from a column-major dataset). My next step, after this pull request is accepted, is to add a variety of accessor functions to minimize the need to call "rows". |
Thank you for the thorough justification! |
Sorry to nag, but were there any objections? Otherwise, can we merge this? Thanks. |
Bump. If this change is not acceptable, please give some indication why. |
I'd be willing to switch to (and help maintain?) a fork with this patch. On Mon, Apr 7, 2014 at 9:55 AM, Matthew Farrell notifications@github.comwrote:
|
No thanks, I'm not really interested in maintaining a fork. |
Hi all, I think this change may be superseded and/or conflict with the new core.matrix based Dataset changes in 1.9.0 SNAPSHOT branch (develop) |
Replaces the old dataset implementation using defrecord with a new implementation based on protocols and deftype.
Currently, this is a minimal implementation to reproduce the old functionality.
Several unit tests required updates to use the col-names function, rather than reading the :column-names field directly.
Discussion from a few months ago:
https://groups.google.com/forum/#!topic/incanter/OaBY1gSJgw8