Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header consistency, validation, and standards. #170

Merged
merged 42 commits into from Nov 1, 2021

Conversation

attipaci
Copy link
Collaborator

@attipaci attipaci commented Sep 30, 2021

Lots of fixes and improvements to the handling of headers:

Header Keywords

  • NEW Validating base keywords, for header cards created in code. If a header card is created with an invalid keyword, a HeaderCardException is thrown, with an informative description. The validation enforces the following properties:
    • Alphanumeric only, plus - and _, and max 8 characters for base keywords.
    • Printable standard ASCII characters (0x20 thru 0x7E) for HIERARCH style keys. (Optional switch for case-sensitive vs upper-case-only processing via IHierarchKeyFormatter.setCaseSensitive(boolean))
    • Maximum viable length for HIERARCH style keywords.
    • Blank keywords (spaces only) are not allowed with a non-null value.
  • NEW static HeaderCard.validateKey(String key) can be used to check the validity of FITS keywords from anywhere, any time.
  • HIERARCH-style keyword handling:
    • NEW IHierarchKeyFormatter allows to select between upper-case only (case-insensitive) hierarch components (default), or mixed-case (case-sensitive) components.
    • NEW IHierarchKeyFormatter is used to figure out exact space left after HIERARCH keys for the values. (So we don't breach the 80-character limit).
    • NEW HeaderCard.hasHierarchKey() can be used to check if the keyword is a long HIERARCH-style keyword (true only when FitsFactory.setUseHierarch() is enabled).
  • NonStandard.CONTINUE deprecated in favor of new Standard.CONTINUE since it is now part of the FITS 4.0 standard.

Header values

  • BUG Fix handling string values that themselves start with a quote (these threw an unexpected exception before).
  • NEW HeaderCard keeps track of the underlying value class, which it was created with (making it unnecessary to guess type from serialized value for cards constructed in code, rather than parsed from string/stream). This also eliminates to track 'nullability' separately internally.
  • NEW Header values are also checked for FITS standard ASCII (0x20 thru 0x7E), and a HeaderCardException is thrown if trying to construct a HeaderCard with an illegal value. Similarly, setValue(String) or setComment(String) will throw a runtime IllegalArgumentException is the argument contains characters not supported by FITS in headers.
  • NEW static HeaderCard.validateChars(String key) can be used to check the validity of FITS header values and comments, any time. HeaderCard.isValidChar(char) can be used to check if specific characters are allowed in FITS headers, and String HeaderCard.sanitize(String) can convert arbitrary strings for representation in FITS headers by replacing invalid characters with ? (I.e., making the changes by @olebole visible).
  • Consistent handling of null Header values for all types (String, Boolean, Number, or ComplexValue). FITS allows keyword assignments with no value field, regardless of what constructor or setValue() method was used.

Number values

  • NEW Support for reading/writing hexadecimal header values, which are part of the FITS standard, but not supported prior to this PR.
  • NEW Reading/writing complex number header values as per FITS 4.0 standard via the new ComplexValue class.
  • NEW Header card creation with NaN or Infinite values results in a HeaderCardException being thrown. FITS has no standard for representing such values in the header. Also, setValue(...) with NaN or Infinite values will throw an IllegalArgumentException
  • NEW FitsFactory.setUseExponentD(boolean) can be used to allow 'D' instead of 'E' for the exponent, when the value has more precision than can be supported by a 32-bit float. (FitsFactory.isUseExponentD() can be used to check the setting).
  • NEW When parsing values with 'D' as the exponent, the parsed type will automatically bump from Float to Double even if the number would otherwise fit in a 32-bit float.
  • NEW Formatting of all Number types for FITS headers is handled by the new FlexFormat class:
    • More predictable and consistent handling of decimal values. When precision is controlled explicitly by the user, decimal values will be shown using scientific (exponential) notation always, with up to the specified number of decimal places. Trailing zeroes will be ommitted.
    • If the precision is not explicitly set, the default (fail-safe) behavior is to try choose between fixed-format and scientific format, as appropriate (for decimal types and BigInteger only), automatically. Numbers are printed with their native precision for primitive types. For BigDecimal and BigInteger, precision may be reduced to fit the value in the available card space, but always preserving at least 16 decimal places after the leading figure. Tailing zeroes will be ommitted. A LongValueException is thrown only if there is not enough room to represent the values in the card space available. (This is similar to the earlier behavior but with more predictable results.)
  • NEW All non-string values must fit within the 80-character in header record. If the value cannot fit in the available space a LongValueException (runtime) is thrown.
  • NEW Header card creation and value changes have been consolidated to use Number, instead of separate int, long, float, double, BigInteger, BigDecimal cases. The simplification preserves the prior API, and expands it. It also makes it unnecessary to cast to/from BigDecimal when formatting/parsing number values.
  • NEW HeaderCard.isIntegerType() and isDecimalType() methods to easily determine whether the underlying value is some type of integer (byte, short, int, long, or BigInteger), or a decimal type (float, double, or BigDecimal).
  • Deprecated HeaderCardBuilder.scale(int) / noScale() methods in favor of the more meaningful and aptly named precision(int) / autoPrecision() methods.

Formatting Settings

  • NEW Separate package-level HeaderCardFormatter and HeaderCardParser classes to handle the formatting and parsing of header records. Both are more efficient than the previous implementations (FitsLineAppender and FitsSubString have been deprecated as a result).
  • NEW FitsFactory.setDefaults() method resets ALL settings to default.
  • All settings can now be set or checked via the public API of FitsFactory. (Some were hidden before.)
  • NEW Long comments with long string now fully written/read with internal spaces preserved.
  • NEW Added checks to prevent the creation of invalid cards, by throwing runtime exceptions as appropriate as soon as one tries to construct an invalid card or make changes to an existing card that would render it invalid.
  • More descriptive exception messages, and propagated underlying causes to aid debugging.

Other changes

  • Deprecated some HeaderCard constructors, which should not be public or are convoluted. (We'll remove them in a future release.)
  • NEW Creation of comment-style keys are now preferred via HeaderCard.createCommentStyleCard(String key, String comment), or the the even simpler createCommentCard(String comment) or createHistoryCard(String text). These replace an awfully confusing constructor that was used previously to create comment-style cards.
  • HeaderCard.saveNewHeaderCard(...) has been deprecated. Not only its name being misspelled ('save' instead of 'safe'), it is meant for internal use only, and should not be exposed to the public in future releases.
  • NEW HeaderCard gained a few public convenience methods to make life easier, such as boolean isCommentStyleCard() or int spaceForValue().
  • Much improved Javadoc for HeaderCard (and newly added classes), with more detailed descriptions, and cross-references.
  • Header.get...Value() is made more consistent, with or without default values or standard keys, for all. Especially getStringValue() with a default, which was a nagging omission before.
  • NEW All Header.addValue(...) and Header.insert...(...) methods now return the newly created card instead of void before. The caller may ignore it (same as before) or else use the new card directly instead of having to call Header.findCard() to locate it...
  • NEW Header.addValue(..., Complex, ...), and getComplexValue(...) convenience methods.
  • NEW Header.addHexValue(...), and getHexValue(...) convenience methods.
  • NEW Logging standard violations when parsing headers is disabled by default, but may be enabled via Header.setParserWarningsEnabled(boolean), and the setting may be checked via isParserWarningsEnabled(). When enabled, the parsing of FITS headers will report on any violation of strict FITS standards, even if it does not prevent the successful reading of the header. Developers can use the logged information (with Logger name "nom.tam.fits.HeaderCardParser"), to learn about any/all issues affecting 3rd party headers.
  • NEW Header.insertCommentStyleMultiline() method, which preserves long comments by wrapping them into multiple comment-style cards. It is now the default implementation used by Header.insertComment(String), insertHistory(String), and the new insertUnkeyedComment(String) methods, which now return the number of cards inserted.
  • NEW Added Header.ensureCardSpace(int) to allow to preallocate header space for a minimum number of cards. When writing to stream, the header will add blank cards as necessary to reserve the header space for future additions, as per FITS 4.0 specification. The method is called also when reading headers, s.t. they can stay rewritable with the same amount of reserved space. At the same time, the reading will not actually add trailing blank cards, such that new cards can be added as desired in the reserved preallocated space. Header.getMinimumSize() can be used to find the minimum byte size a header can have given the preallocated card space.
  • In line with the above, Header.resetOriginalSize() is deprecated in favor of using ensureCardSpace(), and Header.getOriginalSize() is tweaked to invariably return the size of the original header as was read (the fact that it did not before was a nasty 'feature'). As a result the header can stay rewritable() though a much broader range of manipulation, as long as it continues to fit within the original byte size.

Test cases have been adjusted as necessary or made more meaningful.

@attipaci attipaci added bug Erroneous behavior of the existing code enhancement A new feature and/or an improved capability API change Changes existing API breaking-potential Alters functionality in ways that may break prior application. style Better style and/or less bug-prone labels Sep 30, 2021
@attipaci attipaci added this to the 1.16.0 milestone Sep 30, 2021
@attipaci attipaci self-assigned this Sep 30, 2021
@attipaci attipaci mentioned this pull request Oct 5, 2021
@attipaci attipaci added deprecate Deprecates or removes API functionality standard Improved compliance to FITS standard labels Oct 11, 2021
@attipaci attipaci linked an issue Oct 13, 2021 that may be closed by this pull request
@attipaci
Copy link
Collaborator Author

attipaci commented Oct 13, 2021

This PR is a significant overhaul of the FITS header handling (mostly affecting HeaderCard and to a lesser degree Header also). It cleans up the organic mess that has grown over the years, with innumerable quirks and pitfalls, but more importantly, it addresses a number of bigger goals, such as:

  • Prevent the creation of invalid FITS headers from code (by throwing appropriate exceptions as soon as one tries to do something that FITS headers cannot support).
  • Support for the FITS 4.0 standard (such as complex values and hexadecimal integers in the header, long comments with long strings, preallocated blank header space...)
  • Consistent and predictable handling of decimal values in headers (with and without explicit precision control)
  • Optional logging of FITS standard violations when reading in (3rd party) FITS files (to help developers better understand any/all issues that these files may have).
  • More comprehensive unit tests, for more cases and scenarios.
  • Keep existing API intact and expand it to provide a more intuitive and consistent interface to users (also deprecating some methods for removal in a future major release).
  • Better Javadoc

@olebole, @mbtaylor, @Zlika, @vforchi, @pdowler, @mabula117, @wcleveland: Since you have been engaged with the repo of late, I would like to invite you to take a look at the PR before I merge it. Build the jar from it if you like, and try it with your application. If you have questions or comments, or you find new bugs in the PR itself, feel free to comment here, or in one of the linked issues -- whichever is more appropriate. If no showstoppers come up, I will plan to merge the PR at the start of November, and proceed to composing a pre-release of 1.16 soon after that, so we can do more testing and fix any remaining issues before the official release (maybe in December)... I also want to thank you for your contributions to the upcoming release, and for your efforts in general! :-)

@mbtaylor
Copy link
Contributor

I have not attempted a full review of all the changes, but I've built it and run some application code and unit tests against it. The construction-time exceptions for invalid headers required some changes in my code, since I'd previously been handling that by some patched code within HeaderCard that silently tidied up illegal characters; but I agree the way it's done now makes more sense.

Other than that, I haven't spotted any issues. The changes are quite extensive, so I can imagine there might be some nasty surprises relating to changed behaviour of my applications in the future... but we'll wait and see.

@attipaci
Copy link
Collaborator Author

Thanks Mark for taking a look. I agree that some nasty surprises might be lurking, but hopefully much fewer than before. (There have been a little too many nasty surprises for my taste earlier). I did try to add a lot more test scenarios also, to try pre-empt as many as I could think of -- but of course, for sure they don't cover all corner cases. When the bugs surface, we can address them with one or more maintenance release(s), which is how it's been done in the past too... I am also planning on a freeze period (~1 month) after composing a release candidate so whoever is willing can test it and we can fix only whatever new critical bugs we uncover in the process...

@attipaci attipaci merged commit 1f069d5 into nom-tam-fits:master Nov 1, 2021
@attipaci attipaci deleted the header-format branch November 1, 2021 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change Changes existing API breaking-potential Alters functionality in ways that may break prior application. bug Erroneous behavior of the existing code deprecate Deprecates or removes API functionality enhancement A new feature and/or an improved capability standard Improved compliance to FITS standard style Better style and/or less bug-prone
Projects
None yet
3 participants