Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IE-0019: Unicode command parser support #19

Merged
merged 1 commit into from
May 16, 2023

Conversation

zedlopez
Copy link
Collaborator

@zedlopez zedlopez commented May 16, 2023

  • Proposal: IE-0019
  • Discussion PR link: #19
  • Authors: Andrew Plotkin
  • Language feature name: --
  • Status: Accepted in principle
  • Related proposals: IE-0005
  • Implementation: In progress

Summary

Update the internal processing of the command parser to use word arrays (when compiling to Glulx) instead of byte arrays.

@zedlopez zedlopez merged commit e40df19 into ganelson:main May 16, 2023
@erkyrath
Copy link
Collaborator

Of course, as soon as that went in, I discovered more work to do!

Okay, let me back up to the current status of my branch. This much works:

  • The code Parser.i6t functions have been updated, including OOPS, UNDO, and snippets.
  • All the kit functions listed in the doc have been updated, including parsing of number, real, and time tokens.

Problems still untackled:

  • Parsing of units. This parsing code is generated by I7 (see Literal Patterns.w). Looks like a headache.
  • CPrintOrRun() (in Printing.i6t) should call glk_buffer_to_title_case_uni() rather than VM_LowerToUpperCase(). Last time I tried this (a decade ago), the Mac IDE interpreter was missing glk_buffer_to_title_case_uni(). I haven't checked back to see if that's fixed.
  • If you try to compile an example to C, the compiled game fails with "Unimplemented Glk selector: 321". This is glk_request_line_event_uni(). I imagine other glk_xxx_uni() functions are also missing.

Also, the DICT_WORD_SIZE constant is currently hardwired to 9 (the default) in Definitions.i6t. This ought to be user-adjustable, but it's not. If you try the shiny new syntax:

Use Inform 6 compiler option "$DICT_WORD_SIZE=10";

...then Inter complains at you:

Problem: found a second definition of the name 'DICT_WORD_SIZE' when loading '/main/BasicInformKit'

I'm not sure how to deal with that. The problem is that some other constants (DICT_ENTRY_BYTES, #dict_par1, #dict_par2) are computed from DICT_WORD_SIZE, so Inter needs to understand the value.

(Or does it? #dict_par1, #dict_par2 are weird cases -- they're really only defined here to placate Inter. The values defined in Definitions.i6t never make it through to auto.inf. I suppose they get used in the C compilation path, but as noted above, that doesn't work yet.)

@zedlopez
Copy link
Collaborator Author

Given that it's what the Glk spec recommends for unicode normalization and what the Unicode Parser extension currently does, I assume your plan is to apply glk_buffer_to_lower_case_uni, then glk_buffer_canon_normalize_uni, Andrew?

In Unicode Parser, that happens after tokenization. Should it maybe happen before it?

WI 5.10 currently says:

As it reads in the text, Inform silently converts all kinds of dash (en-rules, em-rules, etc.) to simple hyphens; converts the multiplication symbol to a lower case "x"; converts all kinds of space other than tabs (em-spaces, non-breaking spaces, etc.) to simple spaces, and all kinds of quotation marks to "straight" (non-smart) marks.

...so I'll just note that it'll need revision as part of this.

@erkyrath
Copy link
Collaborator

That recommendation is for player input. My comment about CPrintOrRun() is about case-titling a string for printing; it's not the same situation.

In Unicode Parser, that [lowercase/normalize] happens after tokenization. Should it maybe happen before it?

I hadn't really thought about it. I just dropped it in the same place the old parser did lower-casing.

I don't think it makes a difference. The lowercase/normalize operation shouldn't change whitespace to non-whitespace or vice versa.

WI 5.10 currently says:

Is that paragraph changing? I didn't expect to change anything about how I7 parses code.

@ganelson
Copy link
Owner

The text in WI 5.10 basically documents how the Inform compiler reads source text, which is a different thing. See the last couple of paras of:

https://ganelson.github.io/inweb/foundation-module/4-tf.html

@ganelson
Copy link
Owner

Also, the DICT_WORD_SIZE constant is currently hardwired to 9 (the default) in Definitions.i6t. This ought to be user-adjustable, but it's not. If you try the shiny new syntax:

Use Inform 6 compiler option "$DICT_WORD_SIZE=10";

...then Inter complains at you:

Problem: found a second definition of the name 'DICT_WORD_SIZE' when loading '/main/BasicInformKit'

I can't replicate that. Is this because you've defined DICT_WORD_SIZE in your copy of Definitions.i6t? On the main branch it's not defined there. Is the deal that you need to be able to use this value in kit code?

@erkyrath
Copy link
Collaborator

Is this because you've defined DICT_WORD_SIZE in your copy of Definitions.i6t?

Yes.

I haven't merged your changes from today into my branch yet. I'll try that tonight.

@zedlopez
Copy link
Collaborator Author

That recommendation is for player input. My comment about CPrintOrRun() is about case-titling a string for printing; it's not the same situation.

Sorry, I should have made clearer that my comment was a tangent regarding player input in the context of IE-0019 and not a direct response to your comment.

The text in WI 5.10 basically documents how the Inform compiler reads source text

ah, okay.

@erkyrath
Copy link
Collaborator

Progress: I've fixed unit parsing, as well as a handful of errors that were turned up by the test suite.

All G/Z test cases now pass in my branch.

@erkyrath
Copy link
Collaborator

Progress: All the C test cases now pass too.

... the Mac IDE interpreter was missing glk_buffer_to_title_case_uni()

I think I remembered this wrong. The Mac IDE interpreter does have this function, but it crashes if the last argument (lowerrest) is zero. I'll file that separately.

@curiousdannii curiousdannii added the formal-proposal A formal proposal that has been accepted for consideration by the core Inform team label Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formal-proposal A formal proposal that has been accepted for consideration by the core Inform team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants