Issue with sheet names containing multi-byte (UTF-8) characters #84

Closed
evanmiller opened this Issue Dec 22, 2016 · 4 comments

Comments

Projects
None yet
2 participants
@evanmiller

Sheet names longer than 32 bytes are disallowed by libxlsxwriter; however I believe the correct behavior is to disallow sheet names longer than 32 characters.

I am trying to debug a customer issue, and I believe the problem is that sanitized strings which are longer than 32 bytes but shorter than 32 characters are passing my program's internal checks, but then failing to be written out to the spreadsheet.

I'd suggest one of:

  • Using a UTF-8-aware strlen
  • Getting rid of the check entirely
  • Having a better failure mode, i.e. if the check fails use "SheetN" instead of returning NULL from workbook_add_worksheet

@jmcnamara jmcnamara self-assigned this Dec 22, 2016

@jmcnamara jmcnamara added the bug label Dec 22, 2016

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara Dec 22, 2016

Owner

This is a bug that I had in the back of my mind and was on my TODO list to fix.

You are correct that the check should be on characters rather than bytes (like in the Perl and Python versions) and it was my intention to add a UTF-8 strlen. That same applies for most other length checks on user supplied data.

The worksheet name check also needs to disallow certain characters in the name: []:*?/\\.

And finally there should be suitable warning around this check as well.

Will fix.

Owner

jmcnamara commented Dec 22, 2016

This is a bug that I had in the back of my mind and was on my TODO list to fix.

You are correct that the check should be on characters rather than bytes (like in the Perl and Python versions) and it was my intention to add a UTF-8 strlen. That same applies for most other length checks on user supplied data.

The worksheet name check also needs to disallow certain characters in the name: []:*?/\\.

And finally there should be suitable warning around this check as well.

Will fix.

@jmcnamara

This comment has been minimized.

Show comment
Hide comment
@jmcnamara

jmcnamara Dec 23, 2016

Owner

I've pushed an initial pass at this to the utf8_strlen branch, see commit above, if you'd care to try it out.

You will still have to handle the NULL error condition in your own code.

Owner

jmcnamara commented Dec 23, 2016

I've pushed an initial pass at this to the utf8_strlen branch, see commit above, if you'd care to try it out.

You will still have to handle the NULL error condition in your own code.

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller Dec 23, 2016

Thanks @jmcnamara. Haven't tested but the patch looks fine to me; I'm looking forward to seeing it integrated.

I'll add NULL handling on my end so that is no problem. It might be useful to have a way to validate worksheet names without actually adding them, so that the error condition can be isolated in client code. lxw_worksheet_name_is_valid or something.

Thanks @jmcnamara. Haven't tested but the patch looks fine to me; I'm looking forward to seeing it integrated.

I'll add NULL handling on my end so that is no problem. It might be useful to have a way to validate worksheet names without actually adding them, so that the error condition can be isolated in client code. lxw_worksheet_name_is_valid or something.

jmcnamara added a commit that referenced this issue Dec 23, 2016

@evanmiller

This comment has been minimized.

Show comment
Hide comment
@evanmiller

evanmiller Dec 26, 2016

Tested master with client code; this bug is now resolved for me. Thanks @jmcnamara!

Tested master with client code; this bug is now resolved for me. Thanks @jmcnamara!

@evanmiller evanmiller closed this Dec 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment