Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dataset as a protocol #220

Closed
wants to merge 6 commits into from
Closed

Implement dataset as a protocol #220

wants to merge 6 commits into from

Conversation

farrellm
Copy link
Contributor

Replaces the old dataset implementation using defrecord with a new implementation based on protocols and deftype.
Currently, this is a minimal implementation to reproduce the old functionality.

Several unit tests required updates to use the col-names function, rather than reading the :column-names field directly.

Discussion from a few months ago:
https://groups.google.com/forum/#!topic/incanter/OaBY1gSJgw8

@eraserhd
Copy link

eraserhd commented Jan 7, 2014

Overall, I think it's confusing that col-names is in core while rows is in dataset. It seems like some things from core should be moved there, including dataset itself. (Though it's a breaking change and would require a minor version bump).

Again, I'm not a maintainer so don't listen to me unless the maintainer chimes in. :)

@holyjak
Copy link
Contributor

holyjak commented Jan 9, 2014

+1 to Jason's comments, esp. regarding rows and col-names in diff. ns and the P.

@farrellm
Copy link
Contributor Author

farrellm commented Jan 9, 2014

Thank you very much for the feedback. Please let me explain my motivations for the current implementation.

I don't see much guidance online for protocol naming conventions, but note the conventions in clojure.lang for interfaces. Interfaces that represent object types (eg, sequence) are prefixed with an "I" (eg, ISeq), and interfaces that represent mixins (eg, Seqable) do not receive a prefix. Likewise, abstract base classes are prefixed with an "A". In both cases, the prefix indicates that the type is not instantiable, so I feel we should continue the convention. I chose "P" to show it is a "P"rotocol, as opposed to an "I"nterface or "A"bstract class, and in line with core.matrix.

I did not add a "rows" function to core because I see any use of the "rows" function as a code smell and abstraction violation. Datasets should be accessed through the API in core, not directly through the dataset module. This is because using protocols allows us to add new implementations, and for other implementation "rows" may be very inefficient (eg, getting a specific column from a column-major dataset). My next step, after this pull request is accepted, is to add a variety of accessor functions to minimize the need to call "rows".

@holyjak
Copy link
Contributor

holyjak commented Jan 9, 2014

Thank you for the thorough justification!

@farrellm
Copy link
Contributor Author

Sorry to nag, but were there any objections? Otherwise, can we merge this? Thanks.

@farrellm
Copy link
Contributor Author

farrellm commented Apr 7, 2014

Bump. If this change is not acceptable, please give some indication why.

@eraserhd
Copy link

eraserhd commented Apr 7, 2014

I'd be willing to switch to (and help maintain?) a fork with this patch.
Especially if I could contribute the code I'm writing for all the
probability theory stuff I'm learning.

On Mon, Apr 7, 2014 at 9:55 AM, Matthew Farrell notifications@github.comwrote:

Bump. If this change is not acceptable, please give some indication why.

Reply to this email directly or view it on GitHubhttps://github.com//pull/220#issuecomment-39732207
.

@farrellm
Copy link
Contributor Author

No thanks, I'm not really interested in maintaining a fork.

@mikera
Copy link
Contributor

mikera commented Aug 22, 2014

Hi all, I think this change may be superseded and/or conflict with the new core.matrix based Dataset changes in 1.9.0 SNAPSHOT branch (develop)

@farrellm farrellm closed this Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants