Check file encoding and language standard #2587

sjoelund · 2018-06-05T05:47:17Z

This makes Travis verify that the Modelica files conform to the 3.2
version of the language standard, and that files are encoded in 7-bit
ASCII format (no unicode) to make the library consistent.

At least MSL was released as ASCII in the past; has this changed? (There are some unicode files checked in right now; the CI job should reflect this) If it has changed, I would change this to verify that all files are UTF-8 instead.

beutlich · 2018-06-05T06:07:03Z

Non-ASCII characters in current master (09def33):

ModelicaStandardLibrary/Modelica/ComplexBlocks.mo

Line 2001 in 09def33

textString="∠",

ModelicaStandardLibrary/Modelica/Magnetic/QuasiStatic/FluxTubes.mo

Line 2583 in 09def33

textString="μ")}), Diagram(

ModelicaStandardLibrary/Modelica/Fluid/Examples/DrumBoiler.mo

Line 3 in 09def33

    
           "Drum boiler example, see Franke, Rode, Krüger: On-line Optimization of Drum Boiler Startup, 3rd International Modelica Conference, Linköping, 2003"

You are right, this was not the case for MSL v3.2.2.

sjoelund · 2018-06-05T06:08:13Z

Travis said:

./Modelica/ComplexBlocks.mo: iconv: illegal input sequence at position 0
./Modelica/Magnetic/QuasiStatic/FluxTubes.mo: iconv: illegal input sequence at position 0
./Modelica/Fluid/Examples/DrumBoiler.mo: iconv: illegal input sequence at position 95

So there are also UTF8 byte-order marks in there.

beutlich · 2018-06-05T06:10:15Z

So there are also UTF8 byte-order marks in there.

Can confirm Modelica/ComplexBlocks.mo and Modelica/Magnetic/QuasiStatic/FluxTubes.mo

sjoelund · 2018-06-05T06:11:56Z

Can only confirm Modelica/ComplexBlocks.mo and Modelica/Magnetic/QuasiStatic/FluxTubes.mo

Position 0 means a byte-order mark, so that's to be expected.

The question is how to resolve this: Should I change the files to ASCII using html-tags? Does this even work for textString="μ"?

christiankral · 2018-06-05T06:13:55Z

In the MLS version 3.2, revision 2 (July 30, 2013) it is stated on page 161:

"Each Modelica file in the file-system is stored in UTF-8 format (defined by The Unicode Consortium;
http://www.unicode.org ) and may start with the UTF-8 encoded byte order mark ( 0xef 0xbb 0xbf ); this is
treated as white-space in the grammar."

If a UTF-8 encoded Modelica file does not contain a specific UTF-8 character, then it may appear as ASCII file, as the UTF-8 encoded byte order mark is optional. However, I suppose we shall release all Modelica files in UTF-8 format.

sjoelund · 2018-06-05T06:17:41Z

@christiankral Yes, UTF-8 is allowed in MLS. But at least in the past MSL has decided to not use it (all HTML-documentation was changed to use HTML entities instead of unicode). I believe part of the reason is that tool support was lacking at the time (with some using latin1), so if you had a UTF8 version of MSL it got messed up in those tools.

I would be fine with either keeping it as ASCII or changing it to UTF8. But we should check the files to conform to this regardless (and if UTF8 is used, I believe to strip the BOM from all files at least).

beutlich · 2018-06-05T06:30:39Z

Looks all good in Dymola, MWorks.Sysplorer and SimulationX. Not sure if MapleSim (@svorkoetter), OMEdit or Wolfram.SystemModelica (@otronarp) support UTF-8 strings in their GUI.

sjoelund · 2018-06-05T06:33:48Z

OpenModelica has supported UTF-8 since before we had OMEdit, so I don't think that's an issue anymore. And indeed many third-party libraries use UTF-8 strings in the GUI.

This makes Travis verify that the Modelica files conform to the 3.2 version of the language standard, and that files are encoded in 7-bit ASCII format (no unicode) to make the library consistent.

beutlich · 2018-06-05T07:46:56Z

If we move to UTF-8 the check should pass at least before merge.

HansOlsson · 2018-06-05T07:48:18Z

I would prefer to keep BOM - if the files contain any other non-ASCII character.

sjoelund · 2018-06-05T07:51:08Z

If we move to UTF-8 the check should pass at least before merge.

That's easy: change iconv -f ascii -t ascii to iconv -f utf8 -t utf8.

I would prefer to keep BOM - if the files contain any other non-ASCII character.

If we decide on that, we should add a check for it (I don't think it's all too complicated).

beutlich · 2018-06-05T07:53:20Z

Looks all good in Dymola, MWorks.Sysplorer and SimulationX.

I checked the no-BOM files only.

beutlich · 2018-06-05T07:58:08Z

This is a quick poll for

UTF-8 w/o BOM 👍
UTF-8 with BOM 😆
ASCII ❤️

HansOlsson · 2018-06-05T07:58:55Z

Pressed return to early:

The reason to keep BOM is that it makes it clear that the file isn't iso-latin-1 (there are still people using iso-latin-1 for some Modelica files (outside of MSL); we can almost always detect that - but it still feels iffy whereas a BOM is an unambiguous statement that the file isn't iso-latin-1).

dietmarw · 2018-06-05T08:15:49Z

Leaving BOM aside, to fix the current UTF-8 occurrences in the HTML the MSL should always use HTML entities. I'll can push a fix for that. As for text string, the two occurrences can be replaced by images just like we do for math in some places.

HansOlsson · 2018-06-05T08:26:13Z

For the poll I would also like UTF-8 with BOM but I couldn't figure out which emoji it was 😖

sjoelund · 2018-06-05T08:28:00Z

@HansOlsson The code for that is :laughing:, so use the laugh one I guess even though it looks different.

dietmarw · 2018-06-05T16:16:46Z

@christiankral In order to allow extend the CI checks we needed to decide on an encoding. For this minor release I propose to keep MSL as pure ASCII like all previous ones.
This meant I need to replace μ by mu and &ang; by angle. I will create a PR that adds them back again for the next major release and hope that is fine by you.

MSL always used to be ASCII and although the spec allows UTF8 it seems strange to change the encoding now with the next minor. I will prepare a PR to change those characters back to UTF8 for the next MAJOR release.

christiankral · 2018-06-05T17:29:41Z

@dietmarw I'm OK with keeping this MSL release as pure ASCII version

beutlich · 2018-06-05T19:27:57Z

I'm OK with keeping this MSL release as pure ASCII version

We should add a PR targeting the next major milestone adding the UTF-8 characters again.

sjoelund requested review from dietmarw and beutlich June 5, 2018 05:47

sjoelund force-pushed the travis-ascii branch 2 times, most recently from 5272355 to 7da10e8 Compare June 5, 2018 06:04

sjoelund added this to the MSL3.2.3 milestone Jun 5, 2018

sjoelund and others added 2 commits June 5, 2018 08:43

Check file encoding and language standard

35129a4

This makes Travis verify that the Modelica files conform to the 3.2 version of the language standard, and that files are encoded in 7-bit ASCII format (no unicode) to make the library consistent.

Remove UTF-8 BOM

1cc41ce

beutlich force-pushed the travis-ascii branch from 4370cc5 to 1cc41ce Compare June 5, 2018 06:43

dietmarw previously approved these changes Jun 5, 2018

View reviewed changes

dietmarw dismissed their stale review via 73998c6 June 5, 2018 15:56

dietmarw force-pushed the travis-ascii branch 2 times, most recently from d2c43c8 to 86a736b Compare June 5, 2018 16:13

Convert non-ASCII chars to allow check of MSL

0746e65

MSL always used to be ASCII and although the spec allows UTF8 it seems strange to change the encoding now with the next minor. I will prepare a PR to change those characters back to UTF8 for the next MAJOR release.

dietmarw force-pushed the travis-ascii branch from 86a736b to 0746e65 Compare June 5, 2018 16:24

Back to umlauts as of v3.2.2

d84f67e

beutlich approved these changes Jun 5, 2018

View reviewed changes

dietmarw approved these changes Jun 6, 2018

View reviewed changes

dietmarw merged commit 7ee87a9 into modelica:master Jun 6, 2018

beutlich assigned dietmarw Jun 6, 2018

beutlich added the CI Issue that addresses continuous integration label Jun 6, 2018

dietmarw mentioned this pull request Jun 6, 2018

Reintroduce UTF-8 characters that were removed for v3.2.3 release #2593

Merged

sjoelund deleted the travis-ascii branch June 7, 2018 06:18

beutlich modified the milestone: MSL3.2.3 Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check file encoding and language standard #2587

Check file encoding and language standard #2587

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018 •

edited

Loading

sjoelund commented Jun 5, 2018

christiankral commented Jun 5, 2018

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018 •

edited

Loading

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018

HansOlsson commented Jun 5, 2018

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018 •

edited

Loading

beutlich commented Jun 5, 2018

HansOlsson commented Jun 5, 2018

dietmarw commented Jun 5, 2018 •

edited

Loading

HansOlsson commented Jun 5, 2018

sjoelund commented Jun 5, 2018

dietmarw commented Jun 5, 2018

christiankral commented Jun 5, 2018

beutlich commented Jun 5, 2018

Check file encoding and language standard #2587

Check file encoding and language standard #2587

Conversation

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018 • edited Loading

sjoelund commented Jun 5, 2018

christiankral commented Jun 5, 2018

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018 • edited Loading

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018

HansOlsson commented Jun 5, 2018

sjoelund commented Jun 5, 2018

beutlich commented Jun 5, 2018 • edited Loading

beutlich commented Jun 5, 2018

HansOlsson commented Jun 5, 2018

dietmarw commented Jun 5, 2018 • edited Loading

HansOlsson commented Jun 5, 2018

sjoelund commented Jun 5, 2018

dietmarw commented Jun 5, 2018

christiankral commented Jun 5, 2018

beutlich commented Jun 5, 2018

beutlich commented Jun 5, 2018 •

edited

Loading

beutlich commented Jun 5, 2018 •

edited

Loading

beutlich commented Jun 5, 2018 •

edited

Loading

dietmarw commented Jun 5, 2018 •

edited

Loading