Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newcharmap documentation #558

Closed
aaaaaa123456789 opened this issue Aug 26, 2020 · 7 comments · Fixed by #767
Closed

newcharmap documentation #558

aaaaaa123456789 opened this issue Aug 26, 2020 · 7 comments · Fixed by #767
Labels
bug Unexpected behavior / crashes; to be fixed ASAP! docs This affects the documentation (web-specific issues go to rgbds-www) good first issue New to the codebase? You can help no problem!

Comments

@aaaaaa123456789
Copy link
Member

The documentation for newcharmap states that it creates a new, empty charmap. However, this charmap is most certainly not empty:

	newcharmap test
	charmap "a", 1

SECTION "Test", ROM0[0]
Test:
	db "b"

Ideally, there should be a way to create an actually empty charmap, so that the above gives an error. But if this is not possible, at least the documentation should be updated.

@ISSOtm ISSOtm added bug Unexpected behavior / crashes; to be fixed ASAP! docs This affects the documentation (web-specific issues go to rgbds-www) labels Sep 7, 2020
@ISSOtm
Copy link
Member

ISSOtm commented Sep 7, 2020

The behavior is that all characters must translate to something. When would you need another behavior?

@aaaaaa123456789
Copy link
Member Author

I'd expect (and until I tested this, I thought this is how it worked) undefined characters to give an error, as a way of preventing non-existent characters (such as random Unicode lookalikes) from making it into games.

@ISSOtm
Copy link
Member

ISSOtm commented Sep 20, 2020

Well then, I think documentation should state that whenever a character is not found in a charmap, it's simply copied as-is in the ROM. That way, the behavior of an empty charmap would make sense.

@ISSOtm ISSOtm added good first issue New to the codebase? You can help no problem! hacktoberfest labels Sep 22, 2020
@Rangi42
Copy link
Contributor

Rangi42 commented Dec 31, 2020

Looking at lexer.c, it appears that:

  • Source code must be encoded in UTF-8, or in ASCII (a compatible 7-bit subset of UTF-8)
  • Unless the current character map says otherwise, bytes are copied into the ROM as-is (as they're encoded in the source code file). So db "ña" means db $c3, $b1, $61, since ñ is U+00F1 which is UTF-8 encoded c3 b1 (and a is U+0061, 61 in UTF-8 and ASCII).
  • So if the source code has a different encoding (one that's still ASCII-compatible enough for the keywords and opcodes to work), non-ASCII characters might get inserted differently.

@ISSOtm
Copy link
Member

ISSOtm commented Jan 1, 2021

Frankly, strings can use a different encoding than the rest of the document; I don't think we should try to handle those documents, as its out of spec anyway.

@aaaaaa123456789
Copy link
Member Author

Is there anything that cares about string encoding? Should there be?
As long as the encoding is an ASCII superset (so the lexer can find important characters like quotes and backslashes), is there any harm to treating strings like simple binary blobs? Charmaps are binary substitutions, after all.

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 2, 2021

Whatever it is, the documentation to be accurate. Right now it says "By default, a character map contains ASCII encoding", which is incorrect. A default character map is empty, and string characters not in the charmap are inserted directly as their encoded bytes. That encoding is expected/allowed to be UTF-8, not ASCII (although of course ASCII is a subset of it).

As long as the encoding is an ASCII superset (so the lexer can find important characters like quotes and backslashes), is there any harm to treating strings like simple binary blobs?

That's what I was trying to hedge against in the wording, but the current behavior actually doesn't allow that. db "ña" in an ISO-8859-1 file gives an "Input string is not valid UTF-8!" error.

dannye added a commit to dannye/rgbds that referenced this issue Feb 27, 2021
dannye added a commit to dannye/rgbds that referenced this issue Feb 28, 2021
@Rangi42 Rangi42 added bug Unexpected behavior / crashes; to be fixed ASAP! and removed bug Unexpected behavior / crashes; to be fixed ASAP! labels Feb 28, 2021
dannye added a commit to dannye/rgbds that referenced this issue Mar 3, 2021
dannye added a commit to dannye/rgbds that referenced this issue Mar 3, 2021
dannye added a commit to dannye/rgbds that referenced this issue Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected behavior / crashes; to be fixed ASAP! docs This affects the documentation (web-specific issues go to rgbds-www) good first issue New to the codebase? You can help no problem!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants