-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check file encoding and language standard #2587
Conversation
5272355
to
7da10e8
Compare
Non-ASCII characters in current master (09def33): ModelicaStandardLibrary/Modelica/ComplexBlocks.mo Line 2001 in 09def33
You are right, this was not the case for MSL v3.2.2. |
Travis said:
So there are also UTF8 byte-order marks in there. |
Can confirm Modelica/ComplexBlocks.mo and Modelica/Magnetic/QuasiStatic/FluxTubes.mo |
Position 0 means a byte-order mark, so that's to be expected. The question is how to resolve this: Should I change the files to ASCII using html-tags? Does this even work for |
In the MLS version 3.2, revision 2 (July 30, 2013) it is stated on page 161: "Each Modelica file in the file-system is stored in UTF-8 format (defined by The Unicode Consortium; If a UTF-8 encoded Modelica file does not contain a specific UTF-8 character, then it may appear as ASCII file, as the UTF-8 encoded byte order mark is optional. However, I suppose we shall release all Modelica files in UTF-8 format. |
@christiankral Yes, UTF-8 is allowed in MLS. But at least in the past MSL has decided to not use it (all HTML-documentation was changed to use HTML entities instead of unicode). I believe part of the reason is that tool support was lacking at the time (with some using latin1), so if you had a UTF8 version of MSL it got messed up in those tools. I would be fine with either keeping it as ASCII or changing it to UTF8. But we should check the files to conform to this regardless (and if UTF8 is used, I believe to strip the BOM from all files at least). |
Looks all good in Dymola, MWorks.Sysplorer and SimulationX. Not sure if MapleSim (@svorkoetter), OMEdit or Wolfram.SystemModelica (@otronarp) support UTF-8 strings in their GUI. |
OpenModelica has supported UTF-8 since before we had OMEdit, so I don't think that's an issue anymore. And indeed many third-party libraries use UTF-8 strings in the GUI. |
This makes Travis verify that the Modelica files conform to the 3.2 version of the language standard, and that files are encoded in 7-bit ASCII format (no unicode) to make the library consistent.
If we move to UTF-8 the check should pass at least before merge. |
I would prefer to keep BOM - if the files contain any other non-ASCII character. |
That's easy: change
If we decide on that, we should add a check for it (I don't think it's all too complicated). |
I checked the no-BOM files only. |
This is a quick poll for
|
Pressed return to early: The reason to keep BOM is that it makes it clear that the file isn't iso-latin-1 (there are still people using iso-latin-1 for some Modelica files (outside of MSL); we can almost always detect that - but it still feels iffy whereas a BOM is an unambiguous statement that the file isn't iso-latin-1). |
Leaving BOM aside, to fix the current UTF-8 occurrences in the HTML the MSL should always use HTML entities. I'll can push a fix for that. As for text string, the two occurrences can be replaced by images just like we do for math in some places. |
For the poll I would also like UTF-8 with BOM but I couldn't figure out which emoji it was 😖 |
@HansOlsson The code for that is |
d2c43c8
to
86a736b
Compare
@christiankral In order to allow extend the CI checks we needed to decide on an encoding. For this minor release I propose to keep MSL as pure ASCII like all previous ones. |
MSL always used to be ASCII and although the spec allows UTF8 it seems strange to change the encoding now with the next minor. I will prepare a PR to change those characters back to UTF8 for the next MAJOR release.
@dietmarw I'm OK with keeping this MSL release as pure ASCII version |
We should add a PR targeting the next major milestone adding the UTF-8 characters again. |
This makes Travis verify that the Modelica files conform to the 3.2
version of the language standard, and that files are encoded in 7-bit
ASCII format (no unicode) to make the library consistent.
At least MSL was released as ASCII in the past; has this changed? (There are some unicode files checked in right now; the CI job should reflect this) If it has changed, I would change this to verify that all files are UTF-8 instead.