Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8257733: Move module-specific data from make to respective module #1611

Open
wants to merge 9 commits into
base: master
from

Conversation

@magicus
Copy link
Member

@magicus magicus commented Dec 3, 2020

A lot (but not all) of the data in make/data is tied to a specific module. For instance, the publicsuffixlist is used by java.base, and fontconfig by java.desktop. (A few directories, like mainmanifest, is actually used by make for the whole build.)

These data files should move to the module they belong to. The are, after all, "source code" for that module that is "compiler" into resulting deliverables, for that module. (But the "source code" language is not Java or C, but typically a highly domain specific language or data format, and the "compilation" is, often, a specialized transformation.)

This misplacement of the data directory is most visible at code review time. When such data is changed, most of the time build-dev (or the new build label) is involved, even though this has nothing to do with the build. While this is annoying, a worse problem is if the actual team that needs to review the patch (i.e., the team owning the module) is missed in the review.

Modules reviewed

  • java.base
  • java.desktop
  • jdk.compiler
  • java.se

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8257733: Move module-specific data from make to respective module

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1611/head:pull/1611
$ git checkout pull/1611

magicus added 2 commits Dec 3, 2020
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 3, 2020

👋 Welcome back ihse! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 3, 2020

@magicus The following labels will be automatically applied to this pull request:

  • 2d
  • build
  • compiler
  • core-libs
  • i18n
  • security
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@magicus magicus marked this pull request as ready for review Dec 4, 2020
@openjdk openjdk bot added the rfr label Dec 4, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 4, 2020

@magicus
Copy link
Member Author

@magicus magicus commented Dec 4, 2020

To facilitate review, here is a list on how the different directories under make/data has moved.

To java.base:

  • blacklistedcertsconverter
  • cacerts
  • characterdata
  • charsetmapping
  • cldr
  • currency
  • lsrdata
  • publicsuffixlist
  • tzdata
  • unicodedata

To java.desktop:

  • dtdbuilder
  • fontconfig
  • macosxicons
  • x11wrappergen

To jdk.compiler:

  • symbols

To jdk.jdi:

  • jdwp

Remaining in make:

  • bundle
  • docs-resources
  • macosxsigning
  • mainmanifest
@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Dec 4, 2020

Are you proposing any text or guidelines to be added to JEP 201 as part of this?

I think the location of jdwp.spec and its location in the build tree may need to be looked at again. It was convenient to have it in the make tree, from which the protocol spec, and source code for the front end (module jdk.jdi) and a header file for the back end (module jdk.jdwp.agent) are created. Given that the JDWP protocol is standard (not JDK-specific) then there may be an argument to put it in src/java.se instead of a JDK-specific module.

@magicus
Copy link
Member Author

@magicus magicus commented Dec 4, 2020

@AlanBateman Well, I don't know about updating JEP 201. Do you mean that data should be added to the list classes, native, conf, legal as presented under the heading "New scheme"? That list does not seem to have been kept up to date anyway. A quick glance also shows that we now have at least man and lib as well in this place. So either we say there's precedence for not updating the list, in which case I will do nothing. Or we should bring JEP 201 up-to-date with current practices, which then of course should include data but also all other new directories that has been added since JEP 201 was written.

I really don't care either way, but my personal opinion is that JEP 201 presented a view on how the plan was to re-organize things, given the knowledge and state of affairs at that time, but how we keep the source code organized and structured from there on, is a living, day-to-day engineering effort that is just hampered by having to update a formal document, that serves little purpose now that it has been implemented.

@magicus
Copy link
Member Author

@magicus magicus commented Dec 4, 2020

And I can certainly move jdwp.spec to java.base instead. That's the reason I need input on this: All I know is that is definitely not the responsibility of the Build Group to maintain that document, and I made my best guess at where to place it.

@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Dec 4, 2020

And I can certainly move jdwp.spec to java.base instead.

If jdwp.spec has to move to the src tree then src/java.se is probably the most suitable home because Java SE specifies JDWP as an optional interface. So nothing to do with java.base and the build will need to continue to generate the sources for the front-end (jdk.jdi) and back-end (jdk.jdwp.agent) as they implement the protocol.

@erikj79
Copy link
Member

@erikj79 erikj79 commented Dec 4, 2020

My understanding of JEPs is that they should be viewed as living documents. In this case, I think it's perfectly valid to update JEP 201 with additional source code layout. Both for this and for the other missing dirs.

@erikj79
Copy link
Member

@erikj79 erikj79 commented Dec 4, 2020

Regarding the chosen layout. Did you consider following the existing pattern of src//{share,}/data?

@magicus
Copy link
Member Author

@magicus magicus commented Dec 4, 2020

@erikj79 Good point, that makes much more sense.

@openjdk openjdk bot removed the serviceability label Dec 7, 2020
@magicus
Copy link
Member Author

@magicus magicus commented Dec 7, 2020

I'm not sure about the formal process for suggesting changes to a delivered JEP, but this is the text I suggest should replace the current definition of the new scheme:

    src/$MODULE/{share,$OS}/classes/$PACKAGE/*.java
                            native/include/*.{h,hpp}
                                   $LIBRARY/*.{c,cpp}
                            conf/*
                            legal/*
                            data/*
                            man/*
                            lib/*

where:

  - $MODULE is a module name (_e.g._, `java.base`);

  - The `share` directory contains shared, cross-platform code, as
    before;

  - The `$OS` directory contains operating-system-specific code, as
    before, where `$OS` is one of `unix`, `windows`, _etc._;

  - The `classes` directory contains Java source files and resource files
    organized into a directory tree reflecting their API `$PACKAGE`
    hierarchy, as before;

  - The `native` directory contains C or C++ source files, as before but
    organized differently:

    - The `include` directory contains C or C++ header files intended to
      be exported for external use (_e.g._, `jni.h`);

    - C or C++ source files are placed in a `$LIBRARY` directory, whose
      name is that of the shared library or DLL into which the compiled
      code will be linked (_e.g._, `libjava` or `libawt`); and, finally,

  - The `conf` directory contains configuration files meant to be edited
    by end users (_e.g._, `net.properties`).

  - The `legal` directory contains legal notices.

  - The `data` directory contains data files needed for building the module.

  - The `man` directory contains man pages in nroff or markdown format.

  - The `lib` directory contains configuration files not meant to be edited
    by end users.

Rendered as markdown, it would look somewhat like this:

src/$MODULE/{share,$OS}/classes/$PACKAGE/*.java
                        native/include/*.{h,hpp}
                               $LIBRARY/*.{c,cpp}
                        conf/*
                        legal/*
                        data/*
                        man/*
                        lib/*

where:

  • $MODULE is a module name (e.g., java.base);

  • The share directory contains shared, cross-platform code, as
    before;

  • The $OS directory contains operating-system-specific code, as
    before, where $OS is one of unix, windows, etc.;

  • The classes directory contains Java source files and resource files
    organized into a directory tree reflecting their API $PACKAGE
    hierarchy, as before;

  • The native directory contains C or C++ source files, as before but
    organized differently:

    • The include directory contains C or C++ header files intended to
      be exported for external use (e.g., jni.h);

    • C or C++ source files are placed in a $LIBRARY directory, whose
      name is that of the shared library or DLL into which the compiled
      code will be linked (e.g., libjava or libawt); and, finally,

  • The conf directory contains configuration files meant to be edited
    by end users (e.g., net.properties).

  • The legal directory contains legal notices.

  • The data directory contains data files needed for building the module.

  • The man directory contains man pages in nroff or markdown format.

  • The lib directory contains configuration files not meant to be edited
    by end users.

@magicus
Copy link
Member Author

@magicus magicus commented Dec 7, 2020

I hope I understood the purpose of the lib directory correctly. Our only instance of this is for java.base/*/lib/security/default.policy.

I also noted that jdk.hotspot.agent violates this scheme by having a top-level test and doc directories, apart from share and the OS directories. The purposes of these two directories are not clear to me. The tests in test are definitely never executed. I don't know if this is an omission, or if they should be removed. The documentation in doc is not exported to the end product, nor is it referenced in any developer documentation. That too should either be removed, or moved to a more suitable home.

Copy link
Contributor

@jonathan-gibbons jonathan-gibbons left a comment

I have reviewed all lines in the patch file with or near instances of jdk.compiler

@openjdk
Copy link

@openjdk openjdk bot commented Dec 7, 2020

@magicus This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the ready label Dec 7, 2020
@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Dec 8, 2020

@AlanBateman The process of modularization was not fully completed with Project Jigsaw, and a few ugly warts remained. I was under the impression that these should be addressed in follow-up fixes, but this was unfortunately never done. Charsets and cldrconverter were both split between a core portion in java.base and the rest in jdk.charsets and jdk.localedata, respectively, but the split was never handled properly, but just "duct taped" in place.

This is a complicated area of the build, not really a Project Jigsaw issue. It's complicated because the source code for the charsets is generated at build time and the set of non-standard charsets included in java.base varies by platform, e.g. there's are several IBMxxx charsets in java.base when building on AIX that are not interesting to include in java.base on other platforms. This means we can't split up the mapping tables in make/data/charsetmapping and put them in different directories. If you are moving them into the src tree then src/java.base (as you have it) is best but will still have the ugly wart that some of these mapping tables will be used to generate code for the jdk.charsets module.

@mlchung
Copy link
Member

@mlchung mlchung commented Dec 8, 2020

@magicus @erikj79 it's now clearly stated that you want everything under make owned by the build team, which is fair. You are right that make has been easily considered as a dumping ground and it's time to define a clean structure.

I reviewed this patch and this looks okay to me. Some follow-up questions and follow-on cleanup for example "should jdwp.spec belong to specs directory vs data? There are platform-specific data (such as charsets) which has been special-case in the makefile and they need follow-on clean up. I agree that this should be cleaned up by the component teams and not part of this PR.

@mlchung
Copy link
Member

@mlchung mlchung commented Dec 8, 2020

With these new files showing up in under src directory, should bin/idea.sh be changed to exclude data to avoid incurring costs in loading JDK project from IDE that many of us use for development?

@naotoj
Copy link
Member

@naotoj naotoj commented Dec 8, 2020

@AlanBateman The process of modularization was not fully completed with Project Jigsaw, and a few ugly warts remained. I was under the impression that these should be addressed in follow-up fixes, but this was unfortunately never done. Charsets and cldrconverter were both split between a core portion in java.base and the rest in jdk.charsets and jdk.localedata, respectively, but the split was never handled properly, but just "duct taped" in place.

I chose to put the data files used for both java.base and the "additional" modules in java.base, based on the comment that Naoto made in https://mail.openjdk.java.net/pipermail/build-dev/2020-March/027044.html:

As to charsetmapping and cldrconverter, I believe they can reside in
java.base, as jdk.charsets and jdk.localedata modules depend on it.

Of course it would be preferable to make a proper split, but that requires work done by the component teams to break the modules apart.

Specifically for make/modules/jdk.charsets/Gensrc.gmk; the code in that file is more or less duplicated in make/modules/java.base/gensrc/GensrcCharsetMapping.gmk, since the same data set is processed twice, once for java.base and once for jdk.charsets. I don't think that means that make/modules/jdk.charsets/Gensrc.gmk should move to any other place.

I still stand by what I wrote above. It's best to put data in java.base for charsets/localedata. Otherwise we would have to duplicate some in each modules source directory.

@magicus
Copy link
Member Author

@magicus magicus commented Dec 8, 2020

@AlanBateman @mlchung @naotoj I can certainly anticipate follow-up cleanups on this patch. In fact, I think this patch has the potential to be a catalyst for such change, since the data that has been "hidden away" in make now becomes more visible to the teams that are capable of doing something about it. (With that said, of course the build team will assist in helping to clear up messy code structure, as we always do.) But I think we should be restrictive in trying too hard to make everything right in this single patch; it's better to get things approximately right and then go through the "warts" one by one and solve them properly.

@AlanBateman What I meant by Jigsaw was that when the monolithic source code were modularized, the separation of concern between java.base and jdk.charsets/jdk.localedata was not complete, compared to (more or less) all other modules. It might very well be the case that we will never be able to make such a separation; but, prior to Jigsaw, this was not a "wart". It only became a code hygiene issue when some of the charsets and localedata was extraced from java.base.

@mlchung My gut reaction is that we should not change idea.sh. First of all, kind of the purpose of this move is to make it clear to module developers the full set of materials their module consists of. That purpose would be sort of defeated, if we were to hide this in a popular IDE configuration. Secondly, I doubt this will add any performance difference. Listing additional files in the workspace is unlikely to do much, unless you actively open and/or interact with these files. But if you are worried, please fetch the PR (Skara adds instructions in the body) and try it out yourself!

Copy link
Contributor

@wangweij wangweij left a comment

The security-related part (cacerts, blacklisted.certs, publicsuffxlist) looks fine to me.

@naotoj
naotoj approved these changes Dec 10, 2020
Copy link
Member

@naotoj naotoj left a comment

Reviewed changes to characterdata, charsetmapping, cldr, currency, lsrdata, tzdata, and unicodedata with minor comment. Looks good overall.

test/jdk/java/util/Locale/LSRDataTest.java Outdated Show resolved Hide resolved
@magicus
Copy link
Member Author

@magicus magicus commented Dec 15, 2020

I think this is almost ready to be pushed, but I'd like to have someone from the client team review the changes for java.desktop as well. @prrace Any change you can have a look at this?

@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Dec 16, 2020

I think it will be annoying to have the charset mapping tables and other data in the src tree, have you looked at other alternatives?

@magicus
Copy link
Member Author

@magicus magicus commented Dec 16, 2020

@AlanBateman Let me re-iterate: the data files are not "make" files. It is just as annoying to have team-specific data files in the make tree! So to keep things as they are is not an option. The fact that they currently reside there is just a continuation of the old view of make as a general dumping ground. I've requested this change since before Project Jigsaw. In fact, I opposed Erik's original Jigsaw patch on this ground, among other things. As a compromise, we agreed that it was to be fixed after Jigsaw, since that project had already dragged out in time for so long and was blocking the release. (See https://bugs.openjdk.java.net/browse/JDK-8055193 for details.)

So what do you propose for alternative? A new top-level data directory? I mean, sure, we could have like data/java.base/share/charsetmapping instead. I personally think that makes less sense. I also think that the person most qualified to judge about charsetmapping is @naotoj, which has already accepted this review as it is.

@naotoj
Copy link
Member

@naotoj naotoj commented Dec 16, 2020

I also think that the person most qualified to judge about charsetmapping is @naotoj, which has already accepted this review as it is.

Although I am the current RE for the charsets component, I succeeded it from Alan/Sherman, so I would like to hear Alan's opinion on this.

@erikj79
erikj79 approved these changes Jan 4, 2021
@prrace
prrace approved these changes Jan 4, 2021
@magicus
Copy link
Member Author

@magicus magicus commented Jan 11, 2021

This PR is not stale; it's just still waiting for input from @AlanBateman.

@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Jan 15, 2021

@magicus Can the CharacterDataXXX.template files move to src/java.base/share/classes/java/lang?

@magicus
Copy link
Member Author

@magicus magicus commented Jan 15, 2021

@AlanBateman That sounds like an excellent idea. I'll update the PR first thing next week. :)

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 15, 2021

Mailing list message from mark.reinhold at oracle.com on i18n-dev:

2020/12/4 6:08:13 -0800, erikj at openjdk.java.net:

On Fri, 4 Dec 2020 12:30:02 GMT, Alan Bateman <alanb at openjdk.org> wrote:

And I can certainly move jdwp.spec to java.base instead. That's the
reason I need input on this: All I know is that is definitely not
the responsibility of the Build Group to maintain that document, and
I made my best guess at where to place it.

And I can certainly move jdwp.spec to java.base instead.

If jdwp.spec has to move to the src tree then src/java.se is probably
the most suitable home because Java SE specifies JDWP as an optional
interface. So nothing to do with java.base and the build will need to
continue to generate the sources for the front-end (jdk.jdi) and
back-end (jdk.jdwp.agent) as they implement the protocol.

My understanding of JEPs is that they should be viewed as living
documents. In this case, I think it's perfectly valid to update JEP
201 with additional source code layout. Both for this and for the
other missing dirs.

Feature JEPs are living documents until such time as they are delivered.
In this case it would not be appropriate to update JEP 201, which is as
much about the transition from the old source-code layout as it is about
the new layout as of 2014.

At this point, and given that we?d already gone beyond JEP 201 prior to
this change (with `man` and `lib` subdirectories), what?d make the most
sense is a new informational JEP that documents the source-code layout.
Informational JEPs can, within reason, be updated over time.

- Mark

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 18, 2021

Mailing list message from Magnus Ihse Bursie on i18n-dev:

On 2021-01-15 19:27, mark.reinhold at oracle.com wrote:

Feature JEPs are living documents until such time as they are delivered.
In this case it would not be appropriate to update JEP 201, which is as
much about the transition from the old source-code layout as it is about
the new layout as of 2014.

At this point, and given that we?d already gone beyond JEP 201 prior to
this change (with `man` and `lib` subdirectories), what?d make the most
sense is a new informational JEP that documents the source-code layout.
Informational JEPs can, within reason, be updated over time.

So I take it I need to create a new informational JEP. :-) Fine, I can
do that. I assume I mostly need to extract the parts of JEP 201 that
describes the (then "new") layout, update wording to show that this is a
description of the current layout, and add the new additions.

It'll be a very short JEP, but in this case, that's probably a virtue.

/Magnus

@openjdk
Copy link

@openjdk openjdk bot commented Jan 18, 2021

@magicus this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout shuffle-data
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
@openjdk openjdk bot added merge-conflict and removed ready labels Jan 18, 2021
@magicus
Copy link
Member Author

@magicus magicus commented Jan 18, 2021

@AlanBateman When I moved the charset templates, I found this gold nugget:

# Copy two Java files that need no preprocessing.
$(SUPPORT_OUTPUTDIR)/gensrc/java.base/java/lang/%.java: $(CHARACTERDATA_TEMPLATES)/%.java.template
	$(call LogInfo, Generating $(@F))
	$(call install-file)

GENSRC_CHARACTERDATA += $(SUPPORT_OUTPUTDIR)/gensrc/java.base/java/lang/CharacterDataUndefined.java \
    $(SUPPORT_OUTPUTDIR)/gensrc/java.base/java/lang/CharacterDataPrivateUse.java

What this means is that we have two "template" files that are just compiled as java files, but only after being copied to gensrc. :-) That definitely needs cleaning up, but this PR is large enough as it is, so I will not do it now.

@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Jan 23, 2021

@AlanBateman When I moved the charset templates, I found this gold nugget:

# Copy two Java files that need no preprocessing.
$(SUPPORT_OUTPUTDIR)/gensrc/java.base/java/lang/%.java: $(CHARACTERDATA_TEMPLATES)/%.java.template
	$(call LogInfo, Generating $(@F))
	$(call install-file)

GENSRC_CHARACTERDATA += $(SUPPORT_OUTPUTDIR)/gensrc/java.base/java/lang/CharacterDataUndefined.java \
    $(SUPPORT_OUTPUTDIR)/gensrc/java.base/java/lang/CharacterDataPrivateUse.java

What this means is that we have two "template" files that are just compiled as java files, but only after being copied to gensrc. :-) That definitely needs cleaning up, but this PR is large enough as it is, so I will not do it now.

Good find, surprised this wasn't spotted before now. We should create a separate issue to rename them and get rid of the copying in the make file.

@magicus
Copy link
Member Author

@magicus magicus commented Jan 26, 2021

We should create a separate issue to rename them and get rid of the copying in the make file.

I opened https://bugs.openjdk.java.net/browse/JDK-8260406.

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 23, 2021

@magicus This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment