conj-rows doesn't work for datasets with different column order #78

avishn opened this Issue May 29, 2012 · 4 comments


None yet

4 participants

avishn commented May 29, 2012

(dataset [:a :b] [[1 10]])
(dataset [:b :a] [[10 1]]))

; [:a :b]
; [1 10]
; [10 1]

@davidjameshumphreys davidjameshumphreys added a commit to davidjameshumphreys/incanter that referenced this issue Apr 5, 2013
@davidjameshumphreys davidjameshumphreys reorder-columns
Added a new function `reorder-columns` for a dataset that will change the order of appearance of the datset columns.  It does not alter the row order.

This should help with the conj-rows issue in #78

I tried to look at changing the conj-rows function, but I was unable to alter it without causing more bugs (sel and $ don't return a dataset for just one row).

This function could be used within conj-rows:
(reorder-columns ba-dataset [:a :b]))

The conj-rows documentation should be updated to reflect the fact that it has some limitations with respect to column order.

alexott commented Jun 9, 2013

@avishn does this still issue for you?

alexott commented Sep 1, 2013

closed as implemented...

@alexott alexott closed this Sep 1, 2013

What's the rational not fixing conj-rows at it's root? It seems to me that it's possible to do so. For example, how about

(defn conj-rows
  "Returns a dataset created by combining the rows of the given datasets   and/or collections.
     (use '(incanter core datasets))
     (def cars (get-dataset :cars))
     (view (conj-rows (to-dataset (range 5)) (to-dataset (range 5 10))))
     (view (conj-rows cars cars))
     (view (conj-rows [[1 2] [3 4]] [[5 6] [7 8]]))
     (view (conj-rows [{:a 1 :b 2} {:a 3 :b 4}] [[5 6] [7 8]]))
     (view (conj-rows (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]) [[5 6] [7 8]]))
     (conj-rows (range 5) (range 5 10))
  ([& args]
     (reduce (fn [A B]
               (let [a (to-dataset A :transpose true)
                     the-column-names (:column-names a)
                     b (reorder-columns (to-dataset B :transpose true) the-column-names)]
                 (dataset the-column-names
                          (concat (to-list a) (to-list b)))))

I wasted several hours to track down to conj-rows' peculiar behavior, by looking at the source code.

Fixing conj-rows' documentation would be helpful, but I wish to remove the burden to user as much as possible. This is a typical incidental complexity, that the user should not bear, if possible.

If you think that there is no negative impact, I can take care of the regression test for the fix. (This is how I fix it for my application.)



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment