Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel 2010, the epoch, and me #274

Open
tism opened this issue Jan 8, 2014 · 10 comments
Open

Excel 2010, the epoch, and me #274

tism opened this issue Jan 8, 2014 · 10 comments

Comments

@tism
Copy link

tism commented Jan 8, 2014

I think there's something odd with how Excel 2010 (also Numbers on OSX) handle dates when the date is 01/01/1900. If I create a table with axlsx the output offset for 01/01/1900 is 2.0. If I open this file in Excel 2010, the displayed value is 1/2/1900. If I create a file in Excel 2010 with the date 01/01/1900, the output offset is 1. This then resets itself to match axlsx with what I understand is Excel thinking 1900 is a leap year(?). Here's what I got if I make a file with the same dates in it with both axlsx and Excel 2010:

Date axlsx Excel 2010
1/01/1900 2.0 1
2/01/1900 3.0 2
31/01/1900 32.0 31
31/12/1900 366.0 366
1/01/1901 367.0 367
1/01/1902 732.0 732
1/01/9999 2958101.0 2958101

I tried the same with the Excel files created by Numbers, but it only creates xls files and errors on dates before 1904. I've written this up but haven't check all the options, this issue is partially me making sure I'm not losing my mind.

Code used to generate file with axlsx:

Axlsx::Package.new do |p|
  p.workbook.add_worksheet(:name => "Test") do |sheet|
    sheet.add_row [Date.new(1900,1,1)]
    sheet.add_row [Date.new(1900,1,2)]
    sheet.add_row [Date.new(1900,1,31)]
    sheet.add_row [Date.new(1900,12,31)]
    sheet.add_row [Date.new(1901,1,1)]
    sheet.add_row [Date.new(1902,1,1)]
    sheet.add_row [Date.new(9999,1,1)]
  end
  p.serialize('axlsx.xlsx')
end
@jurriaan
Copy link
Collaborator

jurriaan commented Jan 8, 2014

A quick google shows this is done by design (for backwards compatibility): http://support.microsoft.com/kb/214058

@SheetJSDev
Copy link

It looks like axlsx does not correct for the year 1900 bug (excel treats February 29 1900 as a real date even though 1900 wasn't a leap year).

Fixing the date isn't too tricky (you have to shift the date back by one and fix the day of week) but you could obviate the issue by enabling 1904 mode (which starts counting from 1904, completely skipping over the problem areas)

@jurriaan
Copy link
Collaborator

jurriaan commented Jan 8, 2014

See #41, we didn't know about this issue when we chose for the 1900 mode.
1904 mode isn't that great either:

"When you use the 1904 date system and the WEEKDAY function to compute the day of the week for dates prior to January 1, 1904, the function returns a #VALUE! error."

@SheetJSDev
Copy link

@jurriaan that reflects excel's general behavior: negative dates are rejected (you can't assign an 1800 date in 1900 mode).

It's not too hard to work around

According to my notes:

  • February 29 1900 is treated as a Wednesday
  • Date 0 is January 0 1900 (not December 31 1899)
  • DOW is consistent if you work backwards (so the day of week is broken for date values below 60)

@tism
Copy link
Author

tism commented Jan 9, 2014

I've patched my project locally to subtract a day of the date is before March 1st, 1990, but I'm not sure this applies to all versions of the file format. In the tests for DateTimeConverter it indicates that the values it's asserting are from the spec.

Is this the case that versions before 2010 had this issue but expressed it in a different way, and 2010 has set about attempting to fix them?

It is something to note though that Excel 2010 still accepts 29/02/1900 as a valid date.

@SheetJSDev
Copy link

@tism https://github.com/randym/axlsx/blob/master/test/workbook/worksheet/tc_date_time_converter.rb#L21 is wrong: 1900-01-01 should be 1, not 2.

As for the spec, section 18.17.4.1 does not match Excel's behavior. The description matches Excel behavior for today, but does not match the February 29 1900 behavior (note how the spec doesn't discuss the bug).

@tism
Copy link
Author

tism commented Jan 9, 2014

18.17.4.1 - Date Conversion for Serial Date-Times
[Example: When using the 1900 date system, which has a base date of 30th December 1899, a serial date- time of 1.5 represents midday on the 31st December 1899 (serial date-time day 1), or 1899-12-31T12:00. A serial date-time of -4.25 represents 6 pm on the 25th December 1899, or 1899-12-25T18:00. end example]

Ok, I'm with you now. Numbers does the same thing when reading the dates, did anyone implement the spec right?

@SheetJSDev
Copy link

@tism you are asking the wrong question :) The spec is not always correct and the ecosystem is messy.

If you want to replicate Excel's behavior with the broken date: https://github.com/SheetJS/ssf does it in JS (and there are explicit workarounds for date 0 and date 60).

Note that not all spreadsheet systems replicate Excel. Google Docs preserve the excel codes for dates beyond the bad leap year but break dates before 2/29/1900 (try 1/1/1900 in number format -> 2)

The key takeaway is that you can't trust the spec and you have to decide how to handle an environment where different implementations do different things. IMHO the best approach is to preserve Excel's behavior, but @randym may have more insight regarding google docs and excel usage with axlsx.

@tism tism closed this as completed Jan 9, 2014
@tism tism reopened this Jan 9, 2014
@tism
Copy link
Author

tism commented Jan 9, 2014

Edit: That read wrong, sorry.

@SheetJSDev Is there a reference implementation? Or was the standard written based on what was 'supposed' to happen and everyone's double checked against Excel ever since?

The purist in me says that the spec should be followed and that 01/01/1900 is an offset of 2, but Excel is the main (it's possible to consider it the only) use so it's 1.

There doesn't appear to be a nice way to check if you're reading a standard file, or an Excel-style file.

@SheetJSDev
Copy link

@tism the spec was written long after the bug was introduced (and the text is very similar to the date description in MS-VBAL). The spec technically allows negative date codes (which would correspond to dates before 1900) but Excel does not support them. So I definitely wouldn't treat the spec as gospel.

Both approaches are correct wrt today's date code: They actually agree with all date codes above 60. The divergent behavior comes from date codes 0-60, from year 1900, and it's fair to say that a small percentage of excel/gdocs users actually care about those dates.

Normally, I would conclude that Excel's implementation is right and recommend that the "bug" should be replicated (and in fact, that's what I do in https://github.com/SheetJS/js-xls and https://github.com/SheetJS/js-xlsx). However, since Google Docs has some traction, the answer isn't clear-cut.

I suspect that more people (by a wide margin) use files from axlsx in Excel but @randym probably has more info on usage patterns. If it turns out that a significant percentage of users work with those files in google docs, it may make sense to keep the behavior as-is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants