Request for documentation clarification #4000

trbabb · 2023-01-02T21:16:47Z

I have a few questions about the use of the API which weren't clear from the documentation.

Specifically, hb_buffers are stateful, but it's somewhat unclear about how and when the state transitions, and what invariants are maintained.

From the documentation:

After adding text, the buffer should be set to HB_BUFFER_CONTENT_TYPE_UNICODE instead, which indicates that it contains un-shaped input characters

Does "should" mean "the user should expect this to have been done automatically by the library", or "the user should do this if they want it to work"?
- If the former, why does it not say "will be set" instead of "should be set"? Are there some conditions where it will not be set after adding text? What are those conditions?
- If the latter, why does the "simple example" not show this being done? Under what conditions is it acceptable not to perform the set operation?

If necessary you can set the content type with hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);

What effect does this have on the state? What are the contents of the buffer after this operation? Should I expect the buffer to contain nonsense, will it contain the original unicode string, or will the buffer essentially be reset()?
In general, what should the user expect to happen if they use unicode accessing / updating methods when the state is TYPE_GLYPHS, or glyph data when the state is TYPE_UNICODE?

I'd love to know the answers; even better if the docs are updated so future devs can benefit! Apologies if I missed anything that's already there. :)

The text was updated successfully, but these errors were encountered:

matthiasclasen · 2023-01-02T21:40:53Z

The wording is a bit unfortunate.

The automatic state transitions work like this:

HB_BUFFER_CONTENT_TYPE_INVALID --- hb_buffer_add_utf8/16/32 ---> HB_BUFFER_CONTENT_TYPE_UNICODE --- hb_shape ---> HB_BUFFER_CONTENT_TYPE_GLYPHS

The only case where you want to manually call hb_buffer_set_content_type (buffer, HB_BUFFER_CONTENT_TYPE_UNICODE) is when you want to reuse a buffer for new text after hb_shape has been called.

khaledhosny · 2023-01-02T22:22:19Z

Also hb_buffer_add() does not set the buffer type and it is on the user to set it.

behdad · 2023-01-03T00:12:02Z

The only case where you want to manually call hb_buffer_set_content_type (buffer, HB_BUFFER_CONTENT_TYPE_UNICODE) is when you want to reuse a buffer for new text after hb_shape has been called.

Even then setting the buffer length to zero resets the content type to INVALID, so it works.

behdad · 2023-01-03T00:12:20Z

Also hb_buffer_add() does not set the buffer type and it is on the user to set it.

Right. So that needs to be documented. And what Matthias wrote should be documented.

trbabb · 2023-01-03T02:19:04Z

Great, this is helpful.

Do I understand correctly that hb_buffer_add_*() (a) does change the state to UNICODE if the state is currently INVALID, but (b) does not change the state if the state is currently GLYPHS?
How about the content of the buffer? Is it reset to empty after changing the content type? (Or some other behavior?)
- If it is reset, is there any reason to change the content type rather than calling reset() or clear_contents()? What is the difference?
What happens if unicode is added without first resetting or changing the buffer type? How about for the other state access/modifiers if the current content type "disagrees"?

matthiasclasen · 2023-01-03T12:35:46Z

It would be excellent to write some tests for these state changes too

behdad · 2023-01-03T19:26:29Z

Do I understand correctly that hb_buffer_add_*() (a) does change the state to UNICODE if the state is currently INVALID,

Correct.

but (b) does not change the state if the state is currently GLYPHS?

It asserts that state is not GLYPHS. And it regardless sets the state to UNICODE.

How about the content of the buffer? Is it reset to empty after changing the content type? (Or some other behavior?)

No. The contents are not modified.

What happens if unicode is added without first resetting or changing the buffer type? How about for the other state access/modifiers if the current content type "disagrees"?

We assert, so you get a crash if debugging is enabled. Otherwise, the functions set the type to what they expect it to be.

behdad · 2023-01-03T19:36:28Z

I wrote the above discussion in:
e6bbf11

behdad closed this as completed in e6bbf11 Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for documentation clarification #4000

Request for documentation clarification #4000

trbabb commented Jan 2, 2023

matthiasclasen commented Jan 2, 2023

khaledhosny commented Jan 2, 2023

behdad commented Jan 3, 2023

behdad commented Jan 3, 2023

trbabb commented Jan 3, 2023

matthiasclasen commented Jan 3, 2023

behdad commented Jan 3, 2023

behdad commented Jan 3, 2023

Request for documentation clarification #4000

Request for documentation clarification #4000

Comments

trbabb commented Jan 2, 2023

matthiasclasen commented Jan 2, 2023

khaledhosny commented Jan 2, 2023

behdad commented Jan 3, 2023

behdad commented Jan 3, 2023

trbabb commented Jan 3, 2023

matthiasclasen commented Jan 3, 2023

behdad commented Jan 3, 2023

behdad commented Jan 3, 2023