Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 'rand' #803

Closed
wants to merge 4 commits into from
Closed

Implement 'rand' #803

wants to merge 4 commits into from

Conversation

dscorbett
Copy link
Collaborator

This adds support for 'rand'. If the 'rand' feature appears in a font but is not explicitly set to a certain value, the feature value is randomized for each glyph. The algorithm is deterministic, seeded from a hash of the input glyphs’ indices. The entire buffer is unsafe to break. The reason to seed based on the whole buffer instead of a constant or user-specified seed is so that strings with the same initial characters do not have the same initial glyphs, which wouldn’t look very random.

This implementation is unoptimized and minimally viable. It probably shouldn’t be merged as is. The point of the pull request is to get some feedback so I don’t spend time optimizing something that doesn’t matter.

@behdad
Copy link
Member

behdad commented Feb 18, 2018

Interesting approach! Thanks. Will take a look.

@behdad
Copy link
Member

behdad commented Feb 19, 2018

The reason to seed based on the whole buffer instead of a constant or user-specified seed is so that strings with the same initial characters do not have the same initial glyphs, which wouldn’t look very random.

But this also means that when typing in a box, everything will change shape as one types...

Same starting words resulting in same shape is consistent with some definition of random, just not other definitions.

@dscorbett
Copy link
Collaborator Author

I removed the hashing of the buffer, so previous glyphs won’t change when typing in a box. This also removes the one inefficiency I was aware of, that it hashed the buffer for each 'rand' lookup instead of just once.

Now that randomization does not depend on all the glyphs, it is not necessarily unsafe to break the whole buffer. It is only unsafe to break from the first randomized glyph to the last randomized glyph. I figured a random font would probably randomize most glyphs, so I kept the call to unsafe_to_break_all, as greater precision wasn’t worth the bookkeeping complexity.

@behdad
Copy link
Member

behdad commented Feb 23, 2018

Thanks. I'm busy reviewing the hb-subset changes. Please remind me to review this in a few weeks. Cheers

@khaledhosny
Copy link
Collaborator

I can now use https://github.com/khaledhosny/punk-otf 🎉

punknova-regular

@behdad
Copy link
Member

behdad commented Feb 23, 2018

Woohoo!

@dscorbett
Copy link
Collaborator Author

Please remind me to review this in a few weeks.

@behdad

@behdad
Copy link
Member

behdad commented Apr 13, 2018

I reviewed the code. Looks great!

Let's add some API on the buffer so user can configure if entire buffer should set the state or not, also a way to set and get the random state such that higher level can do smart things across lines, etc. Maybe also way to set the rng function.

I'll finish and merge in a few days.

@ebraminio
Copy link
Collaborator

I'll finish and merge in a few days.

So, a friendly ping :)

@behdad
Copy link
Member

behdad commented Jul 23, 2018

I like to get this in. Any chance you can rebase and resolve the conflict? Thanks.

@dscorbett
Copy link
Collaborator Author

Done.

Randomization only happens by default. If the user specifies a value for
'rand', that value is respected.
@behdad
Copy link
Member

behdad commented Sep 10, 2018

I'm trying to merge this. I don't understand the default_rand thing in hb-ot-shape.cc

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Actually, I think we need a better way to specify how random interacts with user-specified values.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

I'm trying to merge this. I don't understand the default_rand thing in hb-ot-shape.cc

You want that to allow user-specified value to override random behavior. It doesn't work though, if user specified rand feature value for part of the input only. We need to figure out how to handle that. For now, I'm making it ignore user-value, only allow disabling random, not setting specific value.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Also missing is API to set buffer random state, and query it, such that client can chain it.

Humm. Right now, using the same buffer, calling shape continuously produces different results, right?

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Humm. Right now, using the same buffer, calling shape continuously produces different results, right?

Or guess not.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

We should add random-seed setter/getter to hb-buffer and use it to initialize the random_state in gsubgpos.hh. I'm not sure how to query the buffer state after shaping.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

I pushed my modified branch into https://github.com/harfbuzz/harfbuzz/tree/rand

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Okay, I made it respect user value if set to > 1. Setting to 1 means "randomize". I don't like the exception, but that's better than before.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Also, maybe we should add a setting to buffer to update the random seed from random state after hb_shape call. That way, one will get different results from subsequent calls to shape.

@dscorbett
Copy link
Collaborator Author

How about making UINT_MAX be the special value to enable randomness? That way explicitly setting 'rand' to 1 would work as expected.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

How about making UINT_MAX be the special value to enable randomness? That way explicitly setting 'rand' to 1 would work as expected.

But then user would need to enter, on the command line: --features=rand=2147483647. The reason 1 is special, is that that's what gets passed down when user does --features=rand.

Umm. Ok, that's a crappy argument, since random is default-on. Problem with UINT_MAX is that it would consume all our bits. Wouldn't even fit. Can use 255 for that. Would consume 8 bits. But yeah, I like using a large number. Let me do.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Another thing that would be nice / cool is, when user specifies a value, feed that value back into random state. Such that if user changes glyph directly, the glyphs after that also change. Or is that a bad idea? Guess one can say it's a bad idea.

behdad added a commit that referenced this pull request Sep 10, 2018
Use rand=255 to mean "randomize".

Part of #803
@behdad
Copy link
Member

behdad commented Sep 10, 2018

Ok, implemented rand=255. Like that. Thanks for the idea!

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Ok so let's talk about buffer random seed api:

Are we ok with unsigned int? We don't quite have uint64_t in the headers.

Also do we need a 64-bit random algorithm? Or can we downgrade to 32 bits?

void hb_buffer_set_random_seed (buffer, unsigned int);
unsigned int hb_buffer_get_random_seed (buffer);

Then, how to get random state?

unsigned int hb_buffer_get_random_state (buffer);

?

How to make it loop back the state into seed?

hb_buffer_set_randomness (...);

?
takes an enum?

Not sure. Need ideas.

@dscorbett
Copy link
Collaborator Author

A 32-bit state would be sufficient, but I don’t recommend using unsigned int: it has a variable size, so the algorithm would produce different results on different platforms. It would be nice to keep it consistent for debuggability. HarfBuzz currently statically asserts that an unsigned int is 32 bits but the API would be clearer if it didn’t rely on that detail.

A really ambitious idea would be to leave it to the caller to set the randomization function as a callback and to determine the type of the random state, which would have to be passed around as an opaque hb_random_state_t.

Another idea is to provide an option to randomly seed based on the entire buffer, as I originally had it, which would, I think, give better results for non-interactive text buffers.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

A 32-bit state would be sufficient, but I don’t recommend using unsigned int: it has a variable size, so the algorithm would produce different results on different platforms. It would be nice to keep it consistent for debuggability. HarfBuzz currently statically asserts that an unsigned int is 32 bits but the API would be clearer if it didn’t rely on that detail.

Right. Internally we would keep it uint32_t. In the API we assume unsigned int is uint32_t anyway.

A really ambitious idea would be to leave it to the caller to set the randomization function as a callback and to determine the type of the random state, which would have to be passed around as an opaque hb_random_state_t.

Right. Might be over-engineering.

Another idea is to provide an option to randomly seed based on the entire buffer, as I originally had it, which would, I think, give better results for non-interactive text buffers.

Yes I think I like that as an option as well.

I was thinking today about my future task to add lookup-direction flags. With those, one can make the random lookup work backwards, which would make it process back to front, so strings ending the same will get same ending glyphs... But whole-buffer is also useful, I agree.

@dscorbett
Copy link
Collaborator Author

Might be over-engineering.

It probably is. I guess it depends on the use case for the API to get and set the random state, which I’m not clear on.

@behdad
Copy link
Member

behdad commented Sep 10, 2018

Might be over-engineering.

It probably is.

I think that's overengineering because that forces every client to have to set it up. Or if there would be a default, we don't have a compelling reason why a client might want to change the default RNG algorithm.

I guess it depends on the use case for the API to get and set the random state, which I’m not clear on.

That's different. That one is such that a client can cascade the random state from one text run to the next, such that a full document gets randomless flowing. Ie. if you have five lines, each having the same text, each of them look different.

behdad added a commit that referenced this pull request Sep 11, 2018
Use rand=255 to mean "randomize".

Part of #803
@behdad
Copy link
Member

behdad commented Sep 11, 2018

Merged!

Let's switch to 32bit RNG internally.

@behdad
Copy link
Member

behdad commented Sep 11, 2018

Let's switch to 32bit RNG internally.

Done.

@khaledhosny
Copy link
Collaborator

I’m adding a rand feature to an Arabic font, where many glyphs have alternate forms, but I noticed that the rand lookup is executed first regardless of its order (which I believe is a result of 71c9f84, where it would have been executed last before this).

Now this breaks my font because if an isolated glyph has an alternate, init/medi etc. features will not affect the alternate glyph and shaping will break. I can work around this in some laborious way, but I’m wondering if we should move the rand lookups after those of other default features, e.g. with:

diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 00ecdfab..a7cf6433 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -332,9 +332,6 @@ hb_ot_shape_collect_features (hb_ot_shape_planner_t          *planner,
   map->add_feature (HB_TAG ('d','n','o','m'));
 #endif
 
-  /* Random! */
-  map->enable_feature (HB_TAG ('r','a','n','d'), F_RANDOM, HB_OT_MAP_MAX_VALUE);
-
 #ifndef HB_NO_AAT_SHAPE
   /* Tracking.  We enable dummy feature here just to allow disabling
    * AAT 'trak' table using features.
@@ -364,6 +361,9 @@ hb_ot_shape_collect_features (hb_ot_shape_planner_t          *planner,
     map->enable_feature (HB_TAG ('v','e','r','t'), F_GLOBAL_SEARCH);
   }
 
+  /* Random! */
+  map->enable_feature (HB_TAG ('r','a','n','d'), F_RANDOM, HB_OT_MAP_MAX_VALUE);
+
   for (unsigned int i = 0; i < num_user_features; i++)
   {
     const hb_feature_t *feature = &user_features[i];

or even better, respect the lookup order (not sure how to do that, though).

@behdad
Copy link
Member

behdad commented Mar 11, 2020

Humm. I don't think that change has anything to do with it. It's simply a side-effect of we features in stages. In particular, to match Uniscribe, we pause before each of init, medi, isol, fina... So rand can be applied either before, or after, unless we come up with a completely different scheme.

This is a common problem; same applies to rvrn etc.

@khaledhosny
Copy link
Collaborator

I see. So no way to make it respect lookup order i.e. execute rand lookups based on their order in the lookup list?

@behdad
Copy link
Member

behdad commented Apr 22, 2020

I see. So no way to make it respect lookup order i.e. execute rand lookups based on their order in the lookup list?

We do things in stages and within each stage by lookup-order. So the final order of lookups is not sorted. How would you fit rand in there? You end up putting in either the first or last stage and sort within that...

Same applies to features like rvrn. Some designers like to do their mapping before everything else. Some after. We can't have both, so spec put it first.

@khaledhosny
Copy link
Collaborator

I overlooked the stages bit. All fine then. I ended up not adding rand to that font after all, but if I got to do it for another font I think I’ll randomize the isolated glyphs (even if some look identical) and map these to different positional glyphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants