Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to split MSL into single files #2975

Closed
christiankral opened this issue Jun 15, 2019 · 45 comments
Closed

Proposal to split MSL into single files #2975

christiankral opened this issue Jun 15, 2019 · 45 comments
Assignees
Labels
discussion Discussion issue that it not necessarily related to a concrete bug or feature task General work that is not related to a bug or feature
Milestone

Comments

@christiankral
Copy link
Contributor

christiankral commented Jun 15, 2019

Whenever I visually inspect a pull requests (PR) on GitHub I very much dislike that I have no orientation on what Modelica class the actual PR refers to. For example PR #2956 shows that lines 4058 to 4061 are affect. What models was the change made at? I have no idea at all. OK, I could extend the code region around the change,

image

but this is actually not what I am interested in. Instead I would only like to know what Modelica class the change refers to.

I were a lot easier if each Modelica class were a separate file. Possibly merge conflicts could be reduced that way as well -- someone else may know this.

I understand that there have been objections previously, that too many files increase the time to load the Modelica Standard Library. If that were still the case on a modern SSD disk, tool vendors could re-organize the MSL such that every main domain package is organized as one file, or similar. I guess this is manageable, if required.

Currently the file structure of the MSL is very heteregenous. Examples:

  • Modelica/package.mo contains the UsersGuide -- why isn't there a separate directory used instead?
  • Modelica/Electrical/Analog/organizes every sub-package as a file except
  • Modelica/Electrical/Analog/Examples which is organized in single files
  • Modelica/Mechanics/Rotational.mo is one package in one file

I propose to organize each Modelica class in one file.

@christiankral christiankral added the discussion Discussion issue that it not necessarily related to a concrete bug or feature label Jun 15, 2019
@christiankral christiankral added this to the MSL4.0.0 milestone Jun 15, 2019
@dietmarw
Copy link
Member

dietmarw commented Jun 15, 2019

I'm totally support the one file per model structure. I do this in all my other projects but would like to know if there are arguments to NOT do it this way. The released version could after all be shipped as a simple .mo file (since we still don't have a proper standardised archive format like for example OpenModelica offers).

@sjoelund
Copy link
Member

There are some reasons against it, but they are mainly Windows maximum path limitations (when using the standard C interface) and I think you might only hit these limitations if you also kept resources at the same location as the files (but we have them in Resources now, which should be fine).
There are also limitations on maximum number of open files in Linux/Windows if you try to parse all of them in parallel, but even if your CPU had that many cores I don't think you want to open 1024/2048 files at the same time. In any case, having the model split up in multiple files generally speeds up loading.
I also mostly prefer one model per file as it's much easier to find the correct code when not using a Modelica development environment (or github diffs, etc).

@beutlich
Copy link
Member

beutlich commented Jun 16, 2019

I remember a SimulationX benchmark, where loading the MSL with all records/functions/models as single files took twice as long as the current structure. In the end, it is about finding a good compromise on loading performance and PR convenience.

@sjoelund
Copy link
Member

sjoelund commented Jun 16, 2019

That's interesting. In OpenModelica, it is always faster to load multiple files in parallel than a single file for the whole of MSL (using 2 threads is slower than using 1, but even using a single thread to load the full MSL in a single file is slower than loading multiple small files in series)...

I compare the current structure to everything in a single file now though.

@bilderbuchi
Copy link

To approach the root request here:

Instead I would only like to now what Modelica class the change refers to.

The bit of text you see at the start of a diff section is called a "hunk header". There are a couple of predefinied ones, but Modelica is not part of those. See here for some more info--it is possible to add a custom detection regex to obtain a more meaningful description (e.g. the class/model/function name)

@bilderbuchi
Copy link

bilderbuchi commented Jun 17, 2019

Note that by using a global attributes file in connection with the above, you should be able to implement this for all your modelica repos without any changes in the repos themselves, which would be convenient. I haven't tested this, though.

@HansOlsson
Copy link
Contributor

Some minor comments:

  • We cannot assume that everyone uses SSD for MSL. I don't think my work computer (from last year?) is unique in having both an SSD and normal 7200rpm drive.
  • Depending on file size and file system there is also be an extra overhead for having a large number of files; I don't know how significant that is - but I have seen that it sometimes lead to strange results (almost doubling the disk-space needed)
  • The discussion about time to load and reading of files in parallel is important, but misses one aspect: as e.g. Dymola loads parts of MSL as needed. A more fine-grained storage should make that faster.

@henrikt-ma
Copy link
Contributor

henrikt-ma commented Jun 17, 2019 via email

@christiankral
Copy link
Contributor Author

My summary from this discussion:

  • From the tool vendors point of view there seems to be no significant objection against splitting up everything one file per Modelica class. Once done, the tool vendors still have the option to re-structure the Modelica files and directories of the MSL to increase their tool performance, if required.
  • Windows maximum path limitations seem not be relevant due to the structure of the MSL
  • Exceeding the maximum allowed number of open files might not be a limiting issue
  • For the disk space occupied by the MSL code, increasing the file size by splitting up everything into single files not a "my disk is running out if space" issue.
  • There are some options on defining Modelica specific "hunk header", in case we decide not to switch to individual files.

Some more thoughts:

  • Individual files also do have the advantage that it is easier to track changes of one Modelica class.
  • If two or more library offices work at the same time on the same library, individual files help to avoid merging and writing conflicts, once it is decided who works on which files.

So my personal conclusion is that splitting up the MSL into single files does make sense, as the benefits predominate. So far, are there any significant or even no-go objections?

However, I think we can still leave this ticket open for discussion so that others who were not able to respond so far, have the opportunity to make comments as well.

@sjoelund
Copy link
Member

If two or more library offices work at the same time on the same library, individual files help to avoid merging and writing conflicts, once it is decided who works on which files.

That's not really going to help. You can easily merge files if you both work on different parts of the file; usually this is even done automatically.

@beutlich
Copy link
Member

My idea would be to find a good compromise, e.g., one file per package, instead of choosing one or another extreme, i.e., one class per file or one file for all classes, respectively. I also prefer if tool vendors distribute their MSL in the same file structure as the officially released MSL for the sake of comparison, though they are free to restructure for sure.

Can one of the grammar experts (@HansOlsson @sjoelund @henrikt-ma) check how to add and apply the "hunk header" for easier PR overview? Thanks.

@sjoelund
Copy link
Member

You have to add that hunk header to your git config. As for as I could tell, GitHub will not use it which will not make it great for PR preview.

@christiankral
Copy link
Contributor Author

I think the file structure of the MSL hosted on GitHub shall support the library officers and Modelica code developers and maintainers as tight as possible. To me this ticket is mainly about improving the workflow when developing and maintaining the MSL.

@beutlich Why do you think that a compromise with one file per package is "better" than one file per class. I host all my other projects with one file per class. Another compromise could be that the tool vendors agree on re-structuring the MSL in a certain way when shipping it with their tool. I guess in the 21st century this can be performed straight forward without compromising the MSL.

@HansOlsson
Copy link
Contributor

I tried to see if it were possible to construct "hunk header" and I believe the following handles most cases when using git diff -W .
in .gitattributes: *.mo diff=mo
in .git/config

[diff "mo"]
    xfuncname = "^[ ]*(((encapsulated[ ]+|)(partial[ ]+|)(function|model|block|package|connector)[ ])|((replaceable|redeclare)[ ]+(partial[ ]+|)(function|model|block|package|connector)[ ][^=]+$))"

But I haven't investigated how to turn it into a PR, or how it would work on GitHub.

The second part is designed to handle Modelica.Media without tripping on local classes.

@beutlich
Copy link
Member

But I haven't investigated how to turn it into a PR, or how it would work on GitHub.

I asked the GitHub support team for their support. Let's see.

@beutlich
Copy link
Member

Here's their reply:

Hi Thomas,

Thanks for reaching out!

I believe we mainly use gits own definitions to generate diff hunks. As git is an open source project, I think your best bet would be to contribute the change directly to git itself. We periodically upgrade the version of git we use to keep up to date with changes like these.

The following url under the "Contributing to Git" has some more information on how to contribute:

https://git-scm.com/community

Does someone feel comfortable to contribute the Modelica hunk header definition to git? Thanks.

@HansOlsson
Copy link
Contributor

Here's their reply:

Hi Thomas,
Thanks for reaching out!
I believe we mainly use gits own definitions to generate diff hunks. As git is an open source project, I think your best bet would be to contribute the change directly to git itself. We periodically upgrade the version of git we use to keep up to date with changes like these.
The following url under the "Contributing to Git" has some more information on how to contribute:
https://git-scm.com/community

Does someone feel comfortable to contribute the Modelica hunk header definition to git? Thanks.

I can take a look next week, if no-one else volunteers.
BTW: I realized that the pattern above missed "redeclare replaceable" for Modelica.Media so it needs a slight update.

@HansOlsson
Copy link
Contributor

Seems I was more busy than I thought, so it will be next week.

@christiankral
Copy link
Contributor Author

One more argument in favor of the a one file per Modelica class approach: In each Modelica class it is possible to a revision history. For example, Modelica.Electrical.Analog.Examples.Rectifier shows only the creation date and that's the entire history of the model:

image

So the feature of using the revision history is basically not used, as it is way too much redundant work to log all changes performed through GitHub as HTML code. Instead it would make a lot more sense to use the history feature of GitHub itself to show all changes of a Modelica class. So a Modelica tool could easily provide a link to the revision history of each Modelica class once it knows the GitHub repository location. For example, https://github.com/modelica/ModelicaStandardLibrary/commits/master/Modelica/Electrical/Analog/Examples/Rectifier.mo shows the entire revision history of the simulation model Rectifier.mo. I think this is really very beneficial to the Modelica developers and the users as well as each single change can be followed and is fully documented.

@HansOlsson
Copy link
Contributor

HansOlsson commented Jul 4, 2019

I have now investigated three parts:

  • What type of change is needed. It turns out that built-in patterns are written in C, so it seems best to copy an existing case: git/git@d74e786#diff-a05a7b02fcc0bb763698006f4b43d758
  • The desired change. The above indicates that we need xfuncname, wordregex, and test-cases - and on the other hand they can be multi-line with comments. In addition, following the usual pattern it should be for "modelica" not "mo".
  • How to submit: it seems "just making a patch" and sending it in.

xfuncname:

"^[ ]*(redeclare[ ]+|)(final[ ]+|)(inner[ ]+|outer[ ]+|)(replaceable[ ]+|)(encapsulated[ ]+|)(partial[ ]+|)(pure[ ]+|impure[ ]+|)"      
        /* Optional prefixes for classes (in order): encapsulated, final, inner/outer, replaceable,        
            encapsulated, partial, pure/impure 
            Many of the combination of prefixes don't make sense
         */
     "(function|model|block|package|connector|record|(operator(|[ ]+(record|function))))[ ]+"
     /* The different kinds of restricted classes (excluding type as they are one-liners). 
         For operator records there are three variants:
         'operator', 'operator record' and 'operator function' */
           "(extends[ ]+"
         /*
             This is a special case, and it primarily matters for 'replaceable function ...',
              that exists in three variants and we don't want to include the short form
            'replaceable function foo=...;' (as it is basically a one-liner and not a good indicator)
            but only include 'replaceable function foo "...."'   (handled in the next case)
            and 'replaceable function extends foo(bar=...)'.
         */
           "|([a-zA-Z0-9_]+|'[^']+')[ ]*($|\"))"
       /* First the name of a class, It is either a normal C-like word, or a quoted identifier like '+' */
      /* The class-name and the rest is important to avoid spurious matches in the documentation,
          such as 'model machine losses such as ...' */
     /* A normal line will instead match either 'model A' or 'model A "Comment about A" */

Word-regex:

"[a-zA-Z_][a-zA-Z0-9_]*"
/* Normal identifiers */
"|'[^']'"
/* Quoted identifiers */
"|[-+0-9.e]+"
/* Numbers */
"|<=|>=|==|<>"
/* Advanced relational operators */
"|[.][+-*/^]"
/ Element-wise operations, .+, .-, .*, ./, .^ */

However, I haven't compiled this into git and I haven't made any test-cases yet.

Is anyone familiar with whether such changes are likely to be accepted? The small number of language seem to indicate some exclusivity - but it could also be that it such a pain to make the proposal.
(On the other hand "fountain" is supported, which seems obscure - but possibly Oscars-worthy :-)

@AHaumer
Copy link
Contributor

AHaumer commented Jul 4, 2019

To be honest, I think to search for a workaround to see to which file the comparsion (of PRs) belongs to is not an optimal solution.
It would be much much better if the libraries would be split up into single files. It is much easier to track in which file which changes have been applied.
I strongly vote for splitting up the libraries into files - if no tool vendor has onjections. e.g. from dramatically slower loading the MSL.

@bilderbuchi
Copy link

To be honest, I think to search for a workaround to see to which file the comparsion (of PRs) belongs to is not an optimal solution.

This is not intended to identify the file a certain diff hunk belongs to (this already exists now in the Github interface), but to identify the class it belongs to, which directly addresses the issue you reported, i.e.

Whenever I visually inspect a pull requests (PR) on GitHub I very much dislike that I have no orientation on what Modelica class the actual PR refers to. For example PR #2956 shows that lines 4058 to 4061 are affect. What models was the change made at? I have no idea at all.

@bilderbuchi
Copy link

@HansOlsson I think the first step would be to write to the mailing list:

Questions or comments for the Git community can be sent to the mailing list by using the email address git@vger.kernel.org. Bug reports for git should be sent to this mailing list.

There is a document about submitting patches.

It seems there is an easy way to get automated tests/compilation going via Travis if you clone the git/git repo on Github - this probably would save you some hassle with setting up local compilation etc, when working on your changes.
(Note: I have never contributed code to Git, all of this is just documentation I found)

@christiankral
Copy link
Contributor Author

To be honest, I think to search for a workaround to see to which file the comparsion (of PRs) belongs to is not an optimal solution.

This is not intended to identify the file a certain diff hunk belongs to (this already exists now in the Github interface), but to identify the class it belongs to, which directly addresses the issue you reported, i.e.

During the discussion it turned out more and more that this is not only an issue of pull requests (PRs). Getting information on what Modelica class the change of a PR refers to is finally only one aspect in this discussion.

It is also a lot easier to track and to reproduce changes of any Modelica class of the MSL when splitting up the MSL into single files per class: I think this is a very important aspect when it comes to the quality check of the MSL.

@christiankral christiankral changed the title Pull requests and single files Proposal to split MSL into single files Jul 5, 2019
@beutlich
Copy link
Member

beutlich commented Jul 5, 2019

I am not in favour of splitting Modelica.Media.IdealGases.Common.SingleGasesData in 1241 files. This is an overkill. My proposal would be to first try one file per package and maybe one file per example model.

@AHaumer
Copy link
Contributor

AHaumer commented Jul 18, 2019

I would go for a pragmatic solution: Split into single files, except when this is overkill or unfeasible which is probably the case with the 1241 gases.
I started to split the sublibraries (where I'm Library officer) that won't cause problems with open PRs:

  • Electrical.PolyPhase
  • Electrical.PowerConverters
  • Electrical.QuasiStatic
  • Thermal.FluidHeatFlow
  • Thermal.HeatTransfer

I plan to split up the following as soon as open PRs are merged:

  • Electrical.Machines
  • Magnetic.FluxTubes (@ThomasBoedrich agrees?)
  • Magnetic.FundamentalWave
  • Magnetic.QuasiStatic.FluxTubes
  • Magnetic.QuasiStatic.FundamentalWave
  • Mechanics.Rotational
  • Mechanics.Translational

I think that will make life easier when merging maintenance PRs to those Libs.

@AHaumer
Copy link
Contributor

AHaumer commented Jul 23, 2019

done:

  • Electrical.{PolyPhase, PowerConverters, QuasiStatic}
  • Magnetic.*
  • Thermal.*

Planned to do when open PRs are merged:

  • Electrical.Machines
  • Mechanics.{Rotational, Translational}

@christiankral
Copy link
Contributor Author

christiankral commented Jul 30, 2019

I also agreed with @christophclauss to switch

  • Electrical.Analog

to single files.

@tobolar
Copy link
Contributor

tobolar commented Aug 30, 2019

@AHaumer

Planned to do when open PRs are merged:
- Mechanics.{Rotational, Translational}

Maybe it would be less effort to first rename the library Mechanics (see #291) and then to split it up.

@beutlich
Copy link
Member

Maybe it would be less effort to first rename the library Mechanics (see #291) and then to split it up.

#291 is not going to happen after the recent discussions.

@AHaumer
Copy link
Contributor

AHaumer commented Jan 8, 2020

Alreay done (splitted to single files):

  • Electrical.Analog
  • Electrical.Batteries
  • Electrical.Machines
  • Electrical.PolyPhase
  • Electrical.PowerConverters
  • Electrical.QuasiStatic
  • Magnetic.FluxTubes
  • Magnetic.FundamentalWave
  • Magnetic.QuasiStatic.FluxTubes
  • Magnetic.QuasiStatic.FundamentalWave
  • Thermal.FluidHeatFlow
  • Thermal.HeatTransfer

Still left to do:

  • Mechanics.Rotational
  • Mechanics.Translational

@tobolar
Copy link
Contributor

tobolar commented Jan 15, 2020

@AHaumer Are you going to also proceed with Mechanics.MultiBody? Then I would wait with fixing of some issues concerning this package.

@AHaumer
Copy link
Contributor

AHaumer commented Jan 15, 2020

@tobolar Since I'm not library officer for MultiBody, and I'm not a power user of MultiBody, I'd prefer to leave it up to you ...

@christiankral
Copy link
Contributor Author

I will prepare PRs to split the following libraries:

@tobolar If you're still ready, I can prepare a PR so that @tobolar and @MartinOtter (and others) can review the changes

@dietmarw
Copy link
Member

Translational and Rotational have be done again since there was a conflict with a different PR (which is now merged.

@tobolar
Copy link
Contributor

tobolar commented Jan 16, 2020

@christiankral For MultiBody, there are still two PR open: #3113 and #2501.

@AHaumer
Copy link
Contributor

AHaumer commented Jan 16, 2020

@tobolar we won't touch MultiBody, only Rotational and Translational

@beutlich
Copy link
Member

beutlich commented Mar 2, 2020

I believe most of the sub-libraries are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion issue that it not necessarily related to a concrete bug or feature task General work that is not related to a bug or feature
Projects
None yet
Development

No branches or pull requests