Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the semantics for supporting null objects in arrays in PDF? #157

Open
petervwyatt opened this issue Feb 18, 2022 · 8 comments
Open
Labels
question Further information is requested

Comments

@petervwyatt
Copy link
Member

Clause 7.3.9 Null Object:

"An indirect object reference (see 7.3.10, "Indirect objects") to a nonexistent object shall be treated the same as a null object."
"Specifying the null object as the value of a dictionary entry (7.3.7, "Dictionary objects") shall be equivalent to omitting the entry entirely."

Clause 7.3.7 Dictionary objects:

"A dictionary entry whose value is null (see 7.3.9, "Null object") shall be treated the same as if the entry does not exist."

and noting that no similar statement about null handling for arrays is made in clause 7.3.6.

If there is a Kids array (for example, such as in a name or number tree, or the Kids of a page tree node, or in the StructTree) that has an indirect reference to a "nonexistent object" and there are no other errors then is it correct to interpret that this as NOT an error because:

  1. the non-existent object is equivalent to the null object
  2. because the Kids array (in the cases listed) is not specified to be a fixed length nor does it need to be explicitly indexed numerically so when traversing the array the null object can be skipped over?

However if the array in question needs to have an explicit number of entries, such as an RGB color triplet, then it would be an error? Or does null not count as an indexable array element so if there was a 4th element then it would be valid?

In other words: what are the semantics for supporting null objects in arrays in PDF?
Are there any differences between an explicit null (keyword) versus an implicit null (such as the "nonexistent object" case)?

@petervwyatt petervwyatt added the question Further information is requested label Feb 18, 2022
@datalogics-pgallot
Copy link

I don't think nulls in color value arrays make sense, though maybe in a colorspace array...

nulls are used in Destination arrays (See Table 149, section 12.3.2.2) But those arrays have implicit sizes.

@MatthiasValvekens
Copy link
Member

There are cases where nulls in arrays are valid and even expected. I forget the exact rationale, but when deleting objects from the structure tree, you'll typically see nulls being put in the ParentTree to avoid having to renumber everything.

For a completely different example where the spec has a "you shall use null" statement: see the description of DecodeParms in Table 5.

DecodeParms shall be an array with one entry for each filter in the same order as the Filter array: either the parameter
dictionary for that filter, or the null object if that filter has no parameters (or if all of its parameters have their default values). If > none of the filters have parameters, or if all their parameters have default values, the DecodeParms entry may be omitted.

So: nulls in arrays should definitely count as indexable elements AFAICT, and I would guess that implicit nulls through nonexistent object references are no different in that regard.

@petervwyatt
Copy link
Member Author

So I agree that for certain specific array objects in PDF there is existing wording specific to handling of null array entries at specific array indices. My question really pertains to all the other PDF arrays that are silent on the matter of null as implementation behaviours seem to differ...

I also very much agree with @MatthiasValvekens (I was trying to not bias the discussion with my own opinions):

nulls in arrays should definitely count as indexable elements AFAICT, and I would guess that implicit nulls through nonexistent object references are no different in that regard.

So can we make a factual statement along the lines of the following (needs wordsmithing):

Whenever an implicit or explicit null object occurs as the value of an array entry (7.3.6, "Array objects"), the null object remains as an indexable element and does not alter the position of other array elements or the length of the array. Some PDF array objects also define specific requirements for null array elements."

@datalogics-pgallot
Copy link

I think "implicit or explicit" wording contradicts the null object description in 7.3.9: "There shall be only one object of type null, denoted by the keyword null.", unless you want to change the 7.3.9 sentence that follows to "An indirect object reference (see 7.3.10, "Indirect objects") to a nonexistent object shall be treated as an implicit null object."

@petervwyatt
Copy link
Member Author

Sorry - I wasn't clear. Not in 7.3.9 as that is only for dictionary objects, but added to 7.3.6 Array objects.

@datalogics-pgallot
Copy link

The thing is, the "implicit or explicit null object" is obviously correct phrasing, because you actually can have implicit null objects (see below), but the explicit null object is defined in section 7.3.9 as a singleton ("There shall be only one object of type null,...").

Which makes it weird that the next sentence implicitly (not explicitly) defines one type of implicit null: a reference to a non-existent indirect object.

But that's not the only type of implicit null. There are also:

  • In some cases, a value of "0" which is not otherwise meaningful (e.g. as a zoom factor, see the /XYZ entry in Table 149) is to be treated as a null value.
  • An implicit null could also be specified in a ViewDestination dictionary by omitting the trailing array entries for a /FitV or /FitH ViewDestination.

And I may be missing other cases.

@petervwyatt
Copy link
Member Author

I would disagree:

  • optional array elements are not implicit null objects. Where is that stated? They simply don't exist (don't confuse "file format" with how you might code something in certain programming languages). And I'm sure there are cases in PDF where we specify a default value if array elements at the end of an array don't exist.
  • Table 149 /XYZ is a good example where PDF defines a situation where an explicit null at certain defined indices in the array is handled in very specific ways. This is entirely covered by my proposed wording.

An implicit null arises from clause 7.3.9: "An indirect object reference (see 7.3.10, "Indirect objects") to a nonexistent object shall be treated the same as a null object" - i.e. where a lexically valid construct (an indirect reference) gets treated as a null.

So in Table 147 /XYZ if one of the left, top, zoom array elements was an indirect reference to a non-existent object then the spec is very clear what a processor should do - it wouldn't be classed as an error. However, if page (0-th index) was an indirect reference to a non-existent object then that would be an error, as Table 147 doesn't describe what to do.

@MPBailey
Copy link

I would argue that a reference from a Pages Kids object to a non-existing Page(s) object is an error. As discussed above the reference collapses to an implicit null. So now one of the items in the array is a null, and Table 30 requires all items to be indirect references to child objects. But it’s not, it’s a null.

Implicit or explicit nulls in arrays are not, in themselves errors; they become errors if the specification of that array does not allow null entries.

In purely practical terms this is the best general rule anyway. Otherwise we'd have to go through and invent a lot of specific rules for what a null (implicit or explicit) means in such cases. What does a null in a Kids array in a Pages tree mean? It's not as simple as "treat it as a blank page" because you don't even know if it was expected to be a Pages or Page object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants