Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank rows/cells are omitted from sequences and from select-columns #19

Closed
kornysietsma opened this issue Jan 19, 2014 · 21 comments
Closed

Comments

@kornysietsma
Copy link
Contributor

This is basically the same as the issue my colleague nurous raised against clj-excel:
mebaran/clj-excel#5

The root problem is that iterators in POI skips blank rows and cells. As a result, a blank row won't be returned in a row-seq call, and a blank cell won't be returned in a cell-seq call. Worse, when running functions like select-columns the results can be quite wrong due to these problems.

(note that this is for truly blank cells - a spreadsheet seems to be implemented somewhere as a sparse array. Cells where you've entered data previously, and then deleted the data, may still be returned by these functions)

@mjul
Copy link
Owner

mjul commented Jan 28, 2014

Thanks for reporting this issue. If you have the time, please submit a fix.

@kornysietsma
Copy link
Contributor Author

Actually, on writing some tests for an upcoming patch, I found that select-columns actually mostly works - it does skip blank rows/cells, but as it uses .getColumnIndex to work out column mappings, it assigns the correct column labels anyway.

What is the preferred behaviour for select-columns on a blank row/column?

  • Should it return an empty map for a blank row? (I'm inclined to say yes)
  • Should it return a nil map entry for a blank cell? (I'm inclined to say no)

@mjul
Copy link
Owner

mjul commented Jan 31, 2014

With the Clojure semantics of maps, empty map for a blank row would have the same result as the latter solution in normal use cases, so I would support your recommendation, blank row = empty map.

@mjul
Copy link
Owner

mjul commented Jan 31, 2014

PS: please note that the repo has moved, the old master still works but for good measure consider changing your master to github.com/mjul/docjure (was: github.com/ative/docjure)

@dmichulke
Copy link

For all those with this problem, as a quick fix you can just use with-redefs without changing dependencies or cloning the repo:

(with-redefs [dk.ative.docjure.spreadsheet/row-seq (fn [^Sheet sheet]
                      (map #(.getRow sheet %) (range 0 (inc (.getLastRowNum sheet)))))]
    ;;load your excel file here 
)

@vkz
Copy link

vkz commented Aug 10, 2016

Just to check if the project is still alive. Ability to read sparse tables correctly is kind of a big deal and at least my experiments with this PR show it's legit and does the right thing. Be great to have it merged.

Big kudos to @kornysietsma for the fix!
Big thanks to creators and maintainers!

Note: with this fix rows of different lengths may require padding at the end. Not a big deal and probably not something you want by default but might trip someone. Only matters in cases where you deal with valid tables with some rows without data in their tail-cells.

@mjul
Copy link
Owner

mjul commented Aug 11, 2016

@vkz I agree. If someone would update the pull req (#22) from @kornysietsma to match the latest version and add some example spreadsheets with missing rows and cells to the test suite, I would be happy to merge it.

@kornysietsma
Copy link
Contributor Author

I'll try to update it - hopefully it shouldn't take long, I just need
prodding :-)

On 11 Aug 2016 8:52 a.m., "Martin Jul" notifications@github.com wrote:

@vkz https://github.com/vkz I agree. If someone would update the pull
req (#22 #22) from @kornysietsma
https://github.com/kornysietsma to match the latest version and add
some example spreadsheets with missing rows and cells to the test suite, I
would be happy to merge it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AABFZTogQdEj-I-pHeWd68GJR-_R5HdKks5qetS3gaJpZM4Ba29W
.

@kornysietsma
Copy link
Contributor Author

I've been looking again at this patch, and I realised why I didn't get back
to it - it needs some decisions on how best to implement it, and my
preferred option may be a breaking change for some clients.

The problem is, ideally I'd want to just replace the existing row-seq and
cell-seq with the dense functions, so they always return nil for missing
rows / cells. However, this could easily be a breaking change if there are
clients that don't expect nil return values (for example, the docjure
'set-row-style!' function would need to be modified to ignore nil cells)

Would this be OK?

Alternatively, you could make this optional - pass an extra
missing-cell-policy flag, or set a local binding like missing-cell-policy
that specifies which approach to use, and add a macro like
(with-missing-cell-policy :return-nil ...) - but this might be needless
complication.

So, would you prefer:

  1. Two parallel sets of functions (as per my original patch) cell-seq and
    dense-cell-seq, row-seq and dense-row-seq, users can choose which they
    prefer by calling different functions
  2. Modify cell-seq and row-seq to return nils for blank cells/rows,
    possible breaking change
  3. Let the user choose sparse or dense mode
    3a. via an extra option parameter to lots of functions
    3b. via a local binding
  • Korny

On 11 August 2016 at 12:39, Kornelis Sietsma kornys@gmail.com wrote:

I'll try to update it - hopefully it shouldn't take long, I just need
prodding :-)

On 11 Aug 2016 8:52 a.m., "Martin Jul" notifications@github.com wrote:

@vkz https://github.com/vkz I agree. If someone would update the pull
req (#22 #22) from @kornysietsma
https://github.com/kornysietsma to match the latest version and add
some example spreadsheets with missing rows and cells to the test suite, I
would be happy to merge it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AABFZTogQdEj-I-pHeWd68GJR-_R5HdKks5qetS3gaJpZM4Ba29W
.

Kornelis Sietsma korny at my surname dot com http://korny.info
.fnord { display: none !important; }

@kornysietsma
Copy link
Contributor Author

Hey - trying to close this off... :)

In the absence of responses to my previous over-long essay, I'm going to go with the slightly messy option of letting people choose which way docjure will work via a dynamic var and a wrapper macro.

That seems to be the only way to preserve current behaviour, without duplicating a whole bunch of functions, or adding extra parameters all over the place

So if you wrap code as follows:
(with-missing-values-as-nil (for [row (row-seq sheet) cell (cell-seq row)] ... )

then missing rows and cells will be returned as nil

I'm cleaning up the tests and docs, will probably get a pull request in soon.

@mjul
Copy link
Owner

mjul commented Sep 6, 2016

Hi @kornysietsma
I think your initial initial suggestion of always returning nils for blank rows makes more sense.
That it is not backwards compatible is not a problem: it is a bug-fix, after all.
Thanks for helping out.
Cheers,
Martin

@dmichulke
Copy link

dmichulke commented Sep 6, 2016

Not my decision but I'd also prefer option 2 - returning nil for blank rows/cells.
The reason is that removing nils from the dense version is easy with standard clojure functionality.
Adding nils to the non-dense version at the right places is impossible.

So the former is more general one but still requires only few workarounds to make it work as before.

Probably the changelog should contain the word BREAKING CHANGE and you might want to increase the major version.

@kornysietsma
Copy link
Contributor Author

Cool - I'll go down that route then, it'll keep the code and the tests much
simpler.

  • Korny

On 6 September 2016 at 08:54, Daniel Michulke notifications@github.com
wrote:

Not my decision but I'd also prefer option 2 - returning nil for blank
rows/cells.
The reason is that removing nils from the dense version is easy with
standard clojure functionality
Adding nils to the non-dense version at the right places is impossible.

So the former is more general one but still requires only few workarounds
to make it works a before.

Probably the changelog should contain the word BREAKING CHANGE and you
might want to increase the major version.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AABFZT97_KoVhiMUisrtzX3XhrwBq9yhks5qnRwhgaJpZM4Ba29W
.

Kornelis Sietsma korny at my surname dot com http://korny.info
.fnord { display: none !important; }

@kornysietsma
Copy link
Contributor Author

I've updated the pull request #22

@kinleyd
Copy link

kinleyd commented Sep 8, 2016

@mjul: Thanks for a great tool. All the others here too, esp. also @kornysietsma.

On topic: I think returning nil always for blank rows and cells is the right decision. It will make row-seq and cell-seq much more dependable.

@mjul
Copy link
Owner

mjul commented Sep 19, 2016

The fix for this issue from @kornysietsma has been included in version 1.11.0. Thanks for helping make the library better.

@mjul mjul closed this as completed Sep 19, 2016
@mjul
Copy link
Owner

mjul commented Sep 19, 2016

@kinleyd Thanks for the kind words. I was very happy to see that you are using it in Bhutan as it is one of my favourite countries. If you have any ideas or suggestions, don't hesitate to contribute to the project.

@mjul
Copy link
Owner

mjul commented Sep 19, 2016

@dmichulke Thanks for the thoughtful feedback. I decided not to upgrade the major version as I consider this a bug-fix (blank lines being skipped was a thing leaking through from the underlying POI library). Please keep your suggestions and pull requests coming.

@mjul
Copy link
Owner

mjul commented Sep 19, 2016

@vkz This has now been fixed, so please give it a whirl and submit feedback, suggestions and pull requests to make it even better.

@mjul
Copy link
Owner

mjul commented Sep 19, 2016

@kornysietsma Thanks a lot for helping out with this. It is really good work. Please send more pull requests in the future.

@kinleyd
Copy link

kinleyd commented Sep 19, 2016

@mjul: Nice to know you like Bhutan! docjure is a great library and I look
forward to making some contributions where I can. Keep up the good work.

On Mon, Sep 19, 2016 at 7:22 PM Martin Jul notifications@github.com wrote:

@kornysietsma https://github.com/kornysietsma Thanks a lot for helping
out with this. It is really good work. Please send more pull requests in
the future.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABWrvyNOCM-1J13C485_dTMtSG_OQO2Yks5qrox2gaJpZM4Ba29W
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants