-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAS date formats implementation #86
Conversation
FormatsSAS has huge amount of temporal formats. Parso supports most of them (but not all) in terms of parsing (when the result is represented as epoch seconds, or as a Java Date object). Formatting dates is a tricky task, and for now Parso supports less amount of output formats than it can parse. Here is a lists of supported formats: Date formatsImplemented:
Date-time formatsImplemented:
Not implemented:
Time formatsImplemented:
Not implemented:
|
How to use itThis change adds two new options into the OutputDateType enum:
First one is used to output dates using the full width specified in the format. Second one trims leading spaces (in the same way as SAS Universal Viewer shows dates by default or with a "Trim formatted values" option checked). These options have "_EXPERIMENTAL" suffix to say that this is a not a final solution and something may be changed in future Reader class can be created like previously, but with one of new option. Like here: SasFileReader reader = new SasFileReaderImpl(is, null, SAS_FORMAT_EXPERIMENTAL); After that dates will be produced as formatted strings. |
Test datasetsIn order to have all possible formats on the hands a lot of datasets were created. Each dataset consists of:
These datasets are needed for testing purposes and they are places into the "src/test/resources/dates/sas" directory Some formats have two datasets:
For now these datasets consist about one thousand of date format variations (including width and precision combinations) and tens of thousand of date samples. |
Rounding issuesThere is a slight difference between SAS and Parso results: in rounding of fractional values. For example: I researched a lot, but for some formats I did not find the combination of arithmetic operations to have exactly the same result as SAS does. For some formats I've found ways to have pretty the same result, but resulted code was huge, unobvious and hardly understandable and I've rolled it back in favor of a simpler source code. Anyway implemented solution have more correct values in terms of arithmetic, but result sometimes differs with SAS. |
Unit testsAll code created in scope of this implementation is 100% covered by unit tests. Unit-tests compare original SAS-formatted dates with a Parso-formatted dates and check their equality. In the comment above I've mentioned rounding issues. Unit-tests trying to find and bypass cases when there is a difference in rounded result. Some unit-tests can "skip" rounding issues if there is fractional difference between actual and expected numbers in a most minor position no more than 1. Tests report such cases:
It is less than 1% of such values, but for some users it may be not acceptable. |
I've also updated Rocket table to use this unreleased version of Parso. It can be enabled using "--sas-date-format-type" command line option. Like: java -jar rocket-table.jar --sas-date-format-type=SAS_FORMAT_TRIM_EXPERIMENTAL So it is possible to visually explore this new feature how Parso now formats dates. Then will add UI control to switch format ON/OFF. |
Fallback formatsAs I mentioned in the first comment: not all declared formats are implemented. In such cases the fallback format will be used to format dates:
There is also QTRR SAS format used in the test sources which neither declared in main Parso code, nor implemented as a format. It is only used in unit-tests, to check how Parso handle unknown formats. |
Performance and thread-safetyFormats are declared as Java Enums:
Each enum element produces format function for the given width and precision. This function is a kind of closure; it consists of pre-calculated patterns, adjusted precision or some other things that can be calculated once for the specific format. SasTemporalFormatter caches all these closures and uses them to format dates. Formatting is not such fast as presenting date as row value, epoch seconds or Java Date. BigDecimal, DecimalFormat and DateTimeFormatter Java classes are involved into the formatting, so it has a bit overhead against of plain arithmetic operations. In a normal way each instance of the SasFileParser has it's own instance of a SasTemporalFormatter. SasFileParser itself is a single-threaded, so no thread-safety issues expected here. |
Looks like this is all I was going to say. Now it can be reviewed. |
Cleaned up it a bit, removed commented unused lines. |
This change adds SAS-like formatting of Dates, Times and DateTimes.
I'll put details in comments below.
Please, don't merge it until I describe here "What and Why".