-
-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine emphasis opcodes #99
Comments
Sounds like a brilliant idea, at least from the perspective of table and documentation maintenance. How would it be code wise? How do you define different behaviour for |
Cool. Well yeah, there should be different behavior of course otherwise it's not very useful. I haven't looked into the code yet so I don't know how easy this change would be. Maybe @MikeGray-APH can give us a clue? |
Yes OK, different behaviour. But then how do you define the behaviour of the following:
|
Oh I see what you mean, I think. The behavior would still be defined with the opcodes. In that sense nothing changes with the way things are now. The only difference is the way we define it in the tables. Does that answer your question? |
In my code, capitols are treat the same as the other emphases except it will process word resets. |
@MikeGray-APH Yes I know. That's why it's probably best to have a predefined class "caps" with slightly different behavior. |
@bertfrees wrote:
I think this should be enclosed in the class definitions themselves, e.g. @MikeGray-APH wrote:
Ideally there should be an opcode for this, maybe |
OK I see where you're coming from, but the two cases are fundamentally different. The problem with using the "class" opcode and $w, $x, etc. in multipass rules is that such tables can't just be included in any table because the number of "class" rules that are defined before the include must be known. Tables with "emphclass" definitions, the way I propose it, can be included in other tables without a problem. A possible issue with my proposal, you might say, is that you can not guarantee that the included tables, and therefore the behavior of your table, will not change. But you could make the same argument for any rule, including your version of the "emphclass" rule. The only possible solution to this problem is to say: the behavior of a table is 100% the responsibility of the table author, and this includes any tables that he wishes to include. Carefully testing you table is what you need. One thing that your proposal has what mine doesn't is that you could override the order of classes, but I don't immediately see any need for that. |
OK, but either solution will completely change typeform handling for external applications. These applications will at least want to know which classes a table has defined and how to use them. Using them can either be done numericcally, as with the current typeform implementation, or using a class name. The latter causes a lot of overhead for longer strings. And using numbers requires there to be a lookup function. Whichever approach you choose, this is going to break every application using this feature of liblouis. Therefore I'm thinking it might be useful to predefine {italic, bold, under} so they are guaranteed to keep their current bits. Another question: how does the |
The usage will stay the same, i.e. numeric. The difference is that now the emphasis classes are defined and documented per table. A look up function is not strictly necessary but could be useful, yes. You have a good point regarding the possibility of breaking how applications currently use liblouis. Yes, I do want typeform handling by external application to change in the long run. And even though things won't break immediately (because we'll make sure the behavior doesn't change initially by running the existing tables through our own conversion tool, and we'll give applications some time to adapt to the new approach), still of course there is the risk that applications are lazy and will break eventually. We could anticipate on that by reserving some bits to bold, italic and underlined (or by having a look up function) and by requiring tables to support at least those 3 classes. I have an idea for handling this in a way that doesn't force table authors to think in the "old" pattern. But first let me explain the new approach and why we need it (in case not everybody is convinced yet). Let's start by saying that "italic", "bold", "underline" etc. are print artifacts, i.e. properties of a font. During transcription these are mapped to braille artifacts (indicators). How that mapping is done depends on language and possibly context (e.g. depending on what types of emphasis appear in a text). Sometimes the braille artifacts have the same name as a print artifact, sometimes not. Up till now liblouis has handled the problem by providing, through a liblouis table, a mapping between the 3 most common emphasis types and a set of indicators. This simple model is limiting in several ways:
For applications that don't do any special handling per language ("braille code agnostic"), this is an acceptable generic solution, provided that the liblouis tables implement the mapping as good as possible. For emphasis beyond bold, italic and underlined, the best an application can do is to either map it to the type it is most similar to, or ignore it. It is clear that this is not an optimal solution for all braille codes and all input. But trying to handle everything is not in the scope of liblouis either:
This means that applications that use liblouis have the responsibility of doing language specific handling anyway, and therefore it's acceptable that the liblouis interface differs between tables. What I like so much about this idea is that it doesn't force the table author into a certain pattern. He can freely choose the interface and how much of the mapping he implements in the table. The interface can be a list of distinct indicators (i.e. braille artifacts, e.g. "ind1", "ind2", etc.), or it can be a list of print artifacts, some of which may map to the same indicators. Or it can be a mixture. To better support multiple emphasis types mapping to the same indicators, without having to duplicate a lot, I had this idea of emphasis "aliases". It could look something like this:
The exact syntax is not so important. What matters is that tables can easily provide a mapping for bold, italic and underlined, ensuring backwards compatibility, while not being stuck with the old approach. |
The alternative, unless I misunderstand the concept, is that application developers look at the classes a table defines and then hard-code them. This will break if a table is later updated with different numbers assigned to these classes. While table authors should avoid such backwards-incompatible changes, I think we should anticipate this by providing a lookup function. This is also required for applications that allow the user to load arbitrary "custom tables". Even if the table behavior isn't changed there's already one incompatibility: the change from Question: what are Another idea I had for preventing duplication was something like this: I.e. the ability to define virtual dot patterns. The same could probably be achieved by assigning a virtual dot, say
|
Yes, that was the idea. Applications would look at the "table API". For every change to a table a note is made in the changelog, so applications that do language specific handling (i.e. use more than ital, bold, under) can adapt themselves with each update. Of course table authors should try to make as little backward incompatible changes as possible, just like with any other software component. A look up function can make this more robust indeed, although it's only really helpful when the order of classes changes (and why would you need to do that?). For applications that allow the user to load custom tables it may be best to rely on ital, bold and under only. If they want more, the custom tables should probably follow a well-defined standard anyway, which could possibly include a fixed order of classes. But again, a look up function could be convenient here. So it's an idea worth considering. Because of the change from char to unsigned short we'll change the version to 3.0.
|
Your idea about "dot pattern aliases" could remove some extra duplication, yes. I need to think about it. I guess I would make it something more general than There are 6 virtual dots by the way (9, a, b, c, d and e), so the number of virtual dot patterns is (2^6 - 1) * 2^8 = 16128. If you want to work out this idea some more, please make a new issue for it (as it's not directly related with the opcode unification). |
I agree, but how are applications supposed to know which bit corresponds with which typeform? A custom table could define ital=1 and bold=2, but it could just as easily define ital=32 and bold=1. The only way around this is to hard-code these three classes with their current values. But this kind of voids the problem dynamic classes are trying to fix. |
Yep, either reserve some bits, or have a look up function. Or what I said earlier about custom tables following a well-defined standard. The contract could simply say that it is illegal to define ital as 32 and bold as 1. The first option, reserving some bits, doesn't necessarily conflict with dynamic classes IMO. We need dynamic classes, not dynamic bits. Besides we'll need to reserve the bit for computer_braille anyway. What matters is that there is a whole range of bits available (maybe starting at bit 5) that can be filled in freely. |
as explained in #99 Note that the unification has been done completely on the level of compilation. The result of the compilation step is exactly the same as before and nothing has changed in the steps following compilation. To do: - Rename "ital", "bold", etc. to the generic "emph_1", "emph_2", etc. everywhere in the code and API. Emphasis classes can be named freely in the translation tables and the code and API should reflect that. - Remove an indirection (see comment in compileTranslationTable.c#L4144)
I think we can safely close this issue as this has been implemented |
The lookup function has been added in 511d91e. |
A possible enhancement to the new opcodes introduced in issue #50 could be to combine certain groups of opcodes into a single opcode. E.g. {
firstwordital
,firstwordbold
,firstwordunder
, ... } are combined into a single opcode namedfirstwordemph
which is followed by a class argument {ital
,bold
,under
, ... }. This could improve the readability and maintainability of both tables and code. Also it could make it less UEB specific.So we're talking about these opcodes:
There are 9 × 10 = 90 possible combinations and each of them has its own opcode. The idea is to replace this with:
*
must be an "emphasis class" previously defined with a rule of the formemphclass <name>
. Possiblyital
,bold
andunder
could be predefined emphasis classes, but the possible classes are not limited to those values, nor are they limited to the list of 10 values that we have now.caps
andscript
may have to be treated separately.The difference with the current situation is that there are only 9 opcodes, that the names can be freely chosen, and that the possible classes are not limited.
The order of class definitions determines how typeform bits are mapped to the classes. The class order could also be documented in the table header.
The text was updated successfully, but these errors were encountered: