Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
The alternative, flat representation of classpath elements #4176
This pull request provides an alternative, more efficient implementation of classpath handling in the compiler. This feature will be delivered in 2.11.5 but it's off by default. To try it out, you must enable it with
Below is the original description of this PR with all remaining details.
This is another PR created for the flat classpath representation. This time the target is changed to 2.11.x branch.
It's finished work started and abandoned by @gkossakowski
The main goal of Grzegorz was to improve the efficiency of classpath. At the end it turned out that the efficiency improvement is not significant so Grzegorz stopped working on that.
On the other hand it was worth to continue these works due to several reasons:
New (flat) classpath has dedicated classes for various file types characteristic for Scala with JVM backend - what allows us to create more efficient, more low-level operations.
One of the most important things took into account during creating this implementation was adding it as an optional feature, which can be marked as experimental and turned on using flag. At the beginning users will use an old classpath implementation by default and they will be able to use the new implementation when:
I took into account also planned better support for JSR-223. So it supports for instance ManifestResources introduced here: #2238
I tested flat classpath in several ways:
The flat classpath with turned on caching allowed me to reduce allocated memory in old gen from about 6GB (for current classpath) to about 750MB. So in the case of IDE and presentation compilers the gain is really significant.
Tests using a computer with SSD:
Tests using other computer with the scala project directory moved to some external HDD connected via USB cable:
Note: I changed the classPath method in Global to the dynamic dispatch. And now there's returned a base class of the old and the new classpath with common interface based on a part of API of the old classpath. I tested it with sbt and there's no problem with compiling projects etc. Previously Grzegorz created certain hack for sbt's compiler interface where current classpath was calling the flat one, when it was needed. I changed this ugly solution to this mentioned dynamic dispatch and removed a hack. We can return to something like this, if it'd turn out that it's necessary.
Thanks to Grzegorz for hints and his initial implementation.
Lastly, the below disclaimer is required by the lawyers:
THIS PROGRAM IS SUBJECT TO THE TERMS OF THE BSD 3-CLAUSE LICENSE.
THE FOLLOWING DISCLAIMER APPLIES TO ALL SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE:
This commit is intended to create the possibility to plug in into the compiler an alternative classpath representation which would be possibly more efficient, use less memory etc. Such an implementation - at least at the beginning - should exist next to the currently existing one and be possible to turn on using a flag. Several places in the compiler have a direct dependency on the classpath implementation. Examples include backend's icode generator and reader, SymbolLoaders, ClassfileParser. After closer inspection, one realizes that all those places depend only on a very small subset of classpath logic: they need to lookup classes from classpath. Hence there's introduced ClassFileLookup trait that encapsulates that functionality. The ClassPath extends that trait and an alternative one also must do it. There's also added ClassRepresentation - the base trait for ClassRep (the inner class of ClassPath). Thanks to that the compiler uses a type which is not directly related to the particular classpath representation as it was doing until now.
The method asClasspathString is now deprecated. Moreover it's moved to ClassFileLookup in the case someone was using it in some project (an alternative classpath also will support it - just in the case). All its usages existing in Scala sources are changed to asClassPathString method. The only difference is the name. Some operations on files or their names are moved from ClassPath to the newly created FileUtils dedicated to classpath. It will be possible to reuse them when implementing an alternative classpath representation. Moreover such allocation-free extension methods like the one added in this commit will improve the readability.
This commit introduces the base trait for flat classpath - an alternative classpath representation. In accordance with the idea and the experimental implementation of @gkossakowski, this representation will try to make the best use of the specificity of a given file type instead of using AbstractFile everywhere. It's possible as .NET backend is no longer supported and we can focus on Java-specific types of files. FlatClassPath extends ClassFileLookup which provides the common interface used also by existing ClassPath. The new implementation is called flat because it's possible to query the whole classpath using just single instance. In the case of the old (recursive) representation there's the structure of nested classpath objects, where each such an object can return only entries from one level of hierarchy but it returns also another classpath objects for nested levels included in it. That's why there's added dedicated PackageLoaderUsingFlatClassPath in SymbolLoaders - approaches are different so also the way of loading packages has to be different. The new package loader is currently unused. There's added also PackageNameUtils which will provide common methods used by classpath implementations for various file types.
There's added AggregateFlatClassPath - an equivalent of MergedClassPath from the old implementation. It is supposed to group classpath instances handling different files being directories, zips or jars. Unlike in the case of the old (recursive) implementation, there won't be a deep, nested hierarchy of classpath instances - just one root (aggregate) and a flat structure of its children. AggregateFlatClassPath ensures the distinction of classpath entries and merges corresponding entries for class and source files into one entry. This is required as SymbolLoaders class makes use of this kind of ClassRepresentation. There are also added unit tests which check whether AggregateFlatClassPath obtains correct entries from classpath instances specified in a constructor and whether it preserves the ordering in the case of repeated entries. There's added a test type of flat classpath using VirtualFiles so it's easy to check the real behaviour.
There's added the flat classpath implementation for directories using java.util.File directly. Since we work with a real directory - not the AbstractFile - we don't need to iterate all entries of a file to get inner entries of some package. We can just find an adequate directory for a package. There are added implementations for a class- and a sourcepath. Both extend DirectoryFileLookup which provides common logic.
This commit adds an implementation of flat classpath which can handle both jar and vanilla zip files. In fact there are two versions - for a class- and a sourcepath. Both extend ZipArchiveFileLookup which provides common logic. They use FileZipArchive. @gkossakowski made a comparison of different ways of handling zips and jars (e.g. using javac's ZipFileIndex). He stated that general efficiency of FileZipArchive, taking into account various parameters, is the best. FileZipArchive is slightly changed. From now it allows to find the entry for directory in all directory entries without iterating all entries regardless of a type. Thanks to that we can simply find a directory for a package - like in the case of DirectoryFileLookup. There's also added possibility to cache classpath representation of classpath elements from jar and zip files across compiler instances. The cache is just a map AbstractFile -> FlatClassPath. It should reduce the number of created classpath and file instances e.g. in the case of many ScalaPresentationCompilers in Scala IDE. To prevent the possibility to avoid a cache, caches are created as a part of factories responsible for the creation of these types of the flat classpath.
There's added the flat classpath type using ManifestResources, closely related to the support for JSR-223 (Scripting for the Java Platform). It uses classes listed in the manifest file placed in the JAR. It's related to jar files so it's created using ZipAndJarFlatClassPathFactory and is cached. In general currently it's not possible to use it in Scala out of the box (without using additional tools such as jarlister) as this support is postponed. The old classpath has been properly prepared in the PR created by @rjolly #2238 so the new one also got this feature. ManifestResources is a ZipArchive without a real underlying file placed on a disk and in addition implementing some methods declared in AbstractFile as unsupported operations. Therefore the implementation has to use the iterator. I wanted to have the similar behaviour as in the case of directories and zip/jar files - be able to get a directory entry for a package without iterating all entries. This is achieved by iterating all entries only once and caching packages. This flat classpath type was the last needed one.
The part of the functionality of a ClassPathContext has been moved to the base trait ClassPathFactory so it can be reused by the newly created FlatClassPathFactory. This new implementation works in similar manner as the ClassPathContext with this difference that it just creates instances of flat classpath. This change doesn't modify the behaviour of the compiler as the interface and the way ClassPathContext works didn't change. Moreover FlatClassPathFactory is currently unused.
This commit adds dedicated FlatClassPathResolver loading classpath entries as FlatClassPath. Most of the common logic from PathResolver for the old classpath has been moved to the base, separate class which isn't dependent on a particular classpath representation. Thanks to that it was possible to reuse it when creating an adequate path resolver for the flat classpath representation. This change doesn't modify the way the compiler works. It also doesn't change nothing from the perspective of someone who already uses PathResolver in some project or even extends it - at least as long as he/she doesn't need to use flat classpath. There are also added JUnit tests inter alia comparing entries created using the old and the new classpath representations (whether the flat one created using the new path resolver returns the same entries as the recursive one).
The structure of scalap's Main has been refactored. EmptyClasspath is deleted. It looks that it was unused since this commit: e594fe5 Also classpath logging is changed and now uses asClassPathString method. It was needed to modify one test because of that but it won't depend on a particular representation. There aren't changes in the way scalap works.
I just finished reviewing all commits. I'm really impressed with quality of this PR. Really good work on both documentation (in commit messages and in the code) and the code itself! Also the structure of commits is really good.
Here're the things that hold me from giving LGTM and merging:
I just kicked community build that will use scala compiler version submitted by this PR: https://jenkins-dbuild.typesafe.com:8499/job/Community-2.11.x-manual/102/console
However, even if we find some issues with it we can fix them in subsequent PRs. For now only the two tasks I outlined above are needed to be done for merging this PR.
I'll read your questions/remarks and answer to them soon.
This commit integrates with the compiler the whole flat classpath representation build next to the recursive one as an alternative. From now flat classpath really works and can be turned on. There's added flag -YclasspathImpl with two options: recursive (the default one) and flat. It was needed to make the dynamic dispatch to the particular classpath representation according to the chosen type of a classpath representation. There's added PathResolverFactory which is used instead of a concrete implementation of a path resolver. It turned out that only a small subset of path resolvers methods is used outside this class in Scala sources. Therefore, PathResolverFactory returns an instance of a base interface PathResolverResult providing only these used methods. PathResolverFactory in combination with matches in some other places ensures that in all places using classpath we create/get the proper representation. Also the classPath method in Global is modified to use the dynamic dispatch. This is very important change as a return type changed to the base ClassFileLookup providing subset of old ClassPath public methods. It can be problematic if someone was using in his project the explicit ClassPath type or public methods which are not provided via ClassFileLookup. I tested flat classpath with sbt and Scala IDE and there were no problems. Also was looking at sources of some other projects like e.g. Scala plugin for IntelliJ and there shouldn't be problems, I think, but it would be better to check these changes using the community build. Scalap's Main.scala is changed to be able to use both implementations and also to use flags related to the classpath implementation. The classpath invalidation is modified to work properly with the old (recursive) classpath representation after changes made in a Global. In the case of the attempt to use the invalidation for the flat cp it just throws exception with a message that the flat one currently doesn't support the invalidation. And also that's why the partest's test for the invalidation has been changed to use (always) the old implementation. There's added an adequate comment with TODO to this file. There's added partest test generating various dependencies (directories, zips and jars with sources and class files) and testing whether the compilation and further running an application works correctly, when there are these various types of entries specified as -classpath and -sourcepath. It should be a good approximation of real use cases.
This commit contains some minor changes made by the way when implementing flat classpath. Sample JUnit test that shows that all pieces of JUnit infrastructure work correctly now uses assert method form JUnit as it should do from the beginning. I removed commented out lines which were obvious to me. In the case of less obvious commented out lines I added TODOs as someone should look at such places some day and clean them up. I removed also some unnecessary semicolons and unused imports. Many string concatenations using + have been changed to string interpolation. There's removed unused, private walkIterator method from ZipArchive. It seems that it was unused since this commit: 9d4994b However, I had to add an exception for the compatibility checker because it was complaining about this change. I made some trivial corrections/optimisations like use 'findClassFile' method instead of 'findClass' in combination with 'binary' to find the class file.
The goal of these changes is to add possibility to: - compare an efficiency and a content of both cp implementations (ClassPathImplComparator) - examine the memory consumption by creating a lot of globals using a specified classpath (ClassPathMemoryConsumptionTester) - it can be considered as e.g. some approximation of ScalaPresentationCompilers in Scala IDE when working with many projects ClassPathMemoryConsumptionTester is placed in main (I mean not test) sources so thanks to that it has properly, out of the box configured boot classpath etc. and it's easy to use it, e.g.: scala scala.tools.nsc.ClassPathMemoryConsumptionTester -YclasspathImpl:<implementation_to_test> -cp <some_cp> -sourcepath <some_sp> -requiredInstances 50 SomeFileToCompile.scala At the end it waits for the "exit" command so there can be used some profiler like JProfiler to look how the given implementation behaves. Also flat classpath implementation is set as a default one to test it on Jenkins. This particular change must be reverted when all tests will pass because for now it's not desirable to make it permanently the default representation.
This commit addresses code review comments. The flat classpath is no longer the default classpath representation. It was the default one just for the test purposes. For now it's not desirable to make it permanently the default representation. ZipAndJarFileLookupFactory is marked as sealed - it should help to limit the ways of creating flat classpath instances for zips and jars.
It wasn't until now. I created sbt project with the following config:
resolvers in Global += Resolver.sonatypeRepo("snapshots") scalaVersion in Global := "2.11.5-SNAPSHOT" //scalacOptions in Global += "-YclasspathImpl:flat" // allow 20 different compiler instances to run in parallel concurrentRestrictions in Global := Tags.limitAll(20) :: Nil lazy val project1 = project lazy val project2 = project ... lazy val project19 = project lazy val project20 = project
Each project has one source file and no extra dependencies apart from standard Java and Scala dependencies.
I attached Yourkit profiler, ran
If you measure memory consumption caused just by compilation (by subtracting the memory consumed by sbt itself) you get:
The difference in memory consumption is not as dramatic as reported originally in the PR but still very noticeable. The gain we get depends on classpath's size. Standard classpath is small.
For embedded newlines, it's preferable to use
This means if I use the method, my code won't run on earlier point releases, which is Mima's whole point. Kind of surprised this got by the censors, since it's trivial to implement as an extension method. It might have been preferable to add it, if necessary, on the nsc side.