Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for documentation clarification #4000

Closed
trbabb opened this issue Jan 2, 2023 · 8 comments
Closed

Request for documentation clarification #4000

trbabb opened this issue Jan 2, 2023 · 8 comments

Comments

@trbabb
Copy link

trbabb commented Jan 2, 2023

I have a few questions about the use of the API which weren't clear from the documentation.

Specifically, hb_buffers are stateful, but it's somewhat unclear about how and when the state transitions, and what invariants are maintained.

From the documentation:

After adding text, the buffer should be set to HB_BUFFER_CONTENT_TYPE_UNICODE instead, which indicates that it contains un-shaped input characters

  • Does "should" mean "the user should expect this to have been done automatically by the library", or "the user should do this if they want it to work"?
    • If the former, why does it not say "will be set" instead of "should be set"? Are there some conditions where it will not be set after adding text? What are those conditions?
    • If the latter, why does the "simple example" not show this being done? Under what conditions is it acceptable not to perform the set operation?

If necessary you can set the content type with hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);

  • What effect does this have on the state? What are the contents of the buffer after this operation? Should I expect the buffer to contain nonsense, will it contain the original unicode string, or will the buffer essentially be reset()?
  • In general, what should the user expect to happen if they use unicode accessing / updating methods when the state is TYPE_GLYPHS, or glyph data when the state is TYPE_UNICODE?

I'd love to know the answers; even better if the docs are updated so future devs can benefit! Apologies if I missed anything that's already there. :)

@matthiasclasen
Copy link
Collaborator

The wording is a bit unfortunate.

The automatic state transitions work like this:

HB_BUFFER_CONTENT_TYPE_INVALID --- hb_buffer_add_utf8/16/32 ---> HB_BUFFER_CONTENT_TYPE_UNICODE --- hb_shape ---> HB_BUFFER_CONTENT_TYPE_GLYPHS

The only case where you want to manually call hb_buffer_set_content_type (buffer, HB_BUFFER_CONTENT_TYPE_UNICODE) is when you want to reuse a buffer for new text after hb_shape has been called.

@khaledhosny
Copy link
Collaborator

Also hb_buffer_add() does not set the buffer type and it is on the user to set it.

@behdad
Copy link
Member

behdad commented Jan 3, 2023

The only case where you want to manually call hb_buffer_set_content_type (buffer, HB_BUFFER_CONTENT_TYPE_UNICODE) is when you want to reuse a buffer for new text after hb_shape has been called.

Even then setting the buffer length to zero resets the content type to INVALID, so it works.

@behdad
Copy link
Member

behdad commented Jan 3, 2023

Also hb_buffer_add() does not set the buffer type and it is on the user to set it.

Right. So that needs to be documented. And what Matthias wrote should be documented.

@trbabb
Copy link
Author

trbabb commented Jan 3, 2023

Great, this is helpful.

  • Do I understand correctly that hb_buffer_add_*() (a) does change the state to UNICODE if the state is currently INVALID, but (b) does not change the state if the state is currently GLYPHS?
  • How about the content of the buffer? Is it reset to empty after changing the content type? (Or some other behavior?)
    • If it is reset, is there any reason to change the content type rather than calling reset() or clear_contents()? What is the difference?
  • What happens if unicode is added without first resetting or changing the buffer type? How about for the other state access/modifiers if the current content type "disagrees"?

@matthiasclasen
Copy link
Collaborator

It would be excellent to write some tests for these state changes too

@behdad
Copy link
Member

behdad commented Jan 3, 2023

  • Do I understand correctly that hb_buffer_add_*() (a) does change the state to UNICODE if the state is currently INVALID,

Correct.

but (b) does not change the state if the state is currently GLYPHS?

It asserts that state is not GLYPHS. And it regardless sets the state to UNICODE.

  • How about the content of the buffer? Is it reset to empty after changing the content type? (Or some other behavior?)

No. The contents are not modified.

  • What happens if unicode is added without first resetting or changing the buffer type? How about for the other state access/modifiers if the current content type "disagrees"?

We assert, so you get a crash if debugging is enabled. Otherwise, the functions set the type to what they expect it to be.

@behdad behdad closed this as completed in e6bbf11 Jan 3, 2023
@behdad
Copy link
Member

behdad commented Jan 3, 2023

I wrote the above discussion in:
e6bbf11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants