New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define steps to reach a Go / No-Go decision on adopting this strawman #21

Closed
Ichoran opened this Issue Jan 19, 2017 · 31 comments

Comments

Projects
None yet
8 participants
@Ichoran
Contributor

Ichoran commented Jan 19, 2017

This proposal began as a strawman, a possible new design for collections. But we're starting to work on it pretty seriously. To avoid wasting our effort, we should define those things we need to check before we can decide whether the strategy is a success and we should attempt to replace the original collections library with this one. (We then have another go/no-go decision about whether we did well enough for us to be able to do that.)

I think we've already demonstrated these things adequately:

  • Superficially identical syntax for common operations
  • A reasonable hierarchy exists for doing things this way (may need tweaking, but it won't just fail)
  • Specialization can be added in where we want it
  • Both mutable and immutable base collections behave sensibly
  • Wrapping arrays as a strawman-collection works fine

Maybe I've just missed it but if not we should come up with a complete list of the things we want to check before we dive in with full effort. These occur to me right now (assuming I haven't missed any of the work which has already been done, which is possible):

  • Can we support String operations?
  • Can we implement Map elegantly? Are maps regular collections, or their own thing which gets boxed as a collection?
  • How do we handle things like BitSet#map and TreeMap#map?
  • Are there any hitches for weird collection types like PriorityQueue?

In particular, we should try to find the places where CBF is really pulling its weight and see what we can do in this system to copy the behavior, or at least do something sound.

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 20, 2017

Contributor

Can we support String operations?

String is already supported (see javaSupport.scala).

Can we implement Map elegantly? Are maps regular collections, or their own thing which gets boxed as a collection?

I think Maps need to be Iterables of pairs, as before. But we should indeed do the exercise to add them.

How do we handle things like BitSet#map and TreeMap#map?

I would expect the same way as String/Array. That is, overload the map operation to have a fully generic one which returns a Set or a Map and a constrained one which returns a BitSet or a TreeMap. But yes, we should try that out.

Are there any hitches for weird collection types like PriorityQueue?

Yes, good question.

I think next step should be to add sets and maps with sepcial focus on irrgular cases such as TreeMap. Then we will see clearer. Who wants to get started on this?

Contributor

odersky commented Jan 20, 2017

Can we support String operations?

String is already supported (see javaSupport.scala).

Can we implement Map elegantly? Are maps regular collections, or their own thing which gets boxed as a collection?

I think Maps need to be Iterables of pairs, as before. But we should indeed do the exercise to add them.

How do we handle things like BitSet#map and TreeMap#map?

I would expect the same way as String/Array. That is, overload the map operation to have a fully generic one which returns a Set or a Map and a constrained one which returns a BitSet or a TreeMap. But yes, we should try that out.

Are there any hitches for weird collection types like PriorityQueue?

Yes, good question.

I think next step should be to add sets and maps with sepcial focus on irrgular cases such as TreeMap. Then we will see clearer. Who wants to get started on this?

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 20, 2017

Contributor

I think another important element for a decision is to do some preliminary benchmarking. If we are in the ballpark of current collections that part would be a go, otherwise we'd have to revise.

We don't need to micro-optimize everything yet, just do a decent job on the critical parts that are exercised by the benchmarks.

Contributor

odersky commented Jan 20, 2017

I think another important element for a decision is to do some preliminary benchmarking. If we are in the ballpark of current collections that part would be a go, otherwise we'd have to revise.

We don't need to micro-optimize everything yet, just do a decent job on the critical parts that are exercised by the benchmarks.

@lihaoyi

This comment has been minimized.

Show comment
Hide comment
@lihaoyi

lihaoyi Jan 21, 2017

Superficially identical syntax for common operations

I think most people would agree on this, but is it worth defining exactly what common means? Either with a hardcoded list of Type#method pairs, or a numeric threshold based on # of usages (whether static or dynamic) in the community build? Or something else?

lihaoyi commented Jan 21, 2017

Superficially identical syntax for common operations

I think most people would agree on this, but is it worth defining exactly what common means? Either with a hardcoded list of Type#method pairs, or a numeric threshold based on # of usages (whether static or dynamic) in the community build? Or something else?

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Jan 21, 2017

Contributor

That's a good point, @lihaoyi. I was working on an ASM-based bytecode scanning tool (so we could scan jars without caring whether we could compile them, whether classes were compatible, whether they were Scala or Java, etc.) to gather statistics like these. I should resurrect it, or we should find something like it, and use it to assist defining and detecting what is common.

Contributor

Ichoran commented Jan 21, 2017

That's a good point, @lihaoyi. I was working on an ASM-based bytecode scanning tool (so we could scan jars without caring whether we could compile them, whether classes were compatible, whether they were Scala or Java, etc.) to gather statistics like these. I should resurrect it, or we should find something like it, and use it to assist defining and detecting what is common.

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 21, 2017

Contributor

@olafurpg has a comprehensive set of projects which he can scan with scala meta. Right now the supported queries are mostly syntactic, however.

Contributor

odersky commented Jan 21, 2017

@olafurpg has a comprehensive set of projects which he can scan with scala meta. Right now the supported queries are mostly syntactic, however.

@odersky odersky closed this Jan 21, 2017

@odersky odersky reopened this Jan 21, 2017

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 21, 2017

Contributor

It's an interesting question how to measure commonality of operations. One obvious (and good!) measure is community build and other common codebases. But that's clearly more oriented towards library providers than library users. And in a sense the most important thing is not to upset the end-users - providers can and will adapt but in particular casual end users will not have the same amount of patience.

I thought one other measure could be usages in tutorials on Scala and related technologies. If we break those, we will confuse past and future users. The problem is that not all tutorials are freely accessible. But quite a few of them are: for instance our Coursera courses (with 600K+ registered users), our "Programming in Scala" book and quite a few other books, or Twitter's Scala school. It would be really cool if we could compile an index of these materials as well.

Contributor

odersky commented Jan 21, 2017

It's an interesting question how to measure commonality of operations. One obvious (and good!) measure is community build and other common codebases. But that's clearly more oriented towards library providers than library users. And in a sense the most important thing is not to upset the end-users - providers can and will adapt but in particular casual end users will not have the same amount of patience.

I thought one other measure could be usages in tutorials on Scala and related technologies. If we break those, we will confuse past and future users. The problem is that not all tutorials are freely accessible. But quite a few of them are: for instance our Coursera courses (with 600K+ registered users), our "Programming in Scala" book and quite a few other books, or Twitter's Scala school. It would be really cool if we could compile an index of these materials as well.

@lihaoyi

This comment has been minimized.

Show comment
Hide comment
@lihaoyi

lihaoyi Jan 21, 2017

And in a sense the most important thing is not to upset the end-users - providers can and will adapt but in particular casual end users will not have the same amount of patience.

Should we focus on code already-written by end users, instead of tutorials targeting new users? Breaking books and tutorials means they will be out of date, but as you said tutorial-writers are "providers" rather than end-users and can release new versions of their books or tutorials, just as library-providers can release new versions of their libraries.

Almost by definition, the new users don't know anything yet, and if we change the language and tutorials the endless stream of new users will learn the new thing.

On the other hand, if someone's maintaining a million-LOC Scala codebase, that would actually be a pain to migrate depending on how breakage occurs. We might not be able to persuade a large company to give us a dump of their massive codebase, but if someone wrote a simple code-analysis tool that plugged into the compiler or ran over bytecodes to dump a txt file containing aggregate statistics of calls to scala.collection.* methods I'd bet we could persuade some real companies to run it on their codebase.

Such aggregate statistics shouldn't contain any sensitive information or IP (who cares if your company uses List#scanRight a lot???) and a company may be willing to make them available purely out of self interest: nudging the 2.13 decision-making process to reduce the amount of pain they, the end users, will see upgrading when 2.13 lands. Which seems to be what we want as well.

lihaoyi commented Jan 21, 2017

And in a sense the most important thing is not to upset the end-users - providers can and will adapt but in particular casual end users will not have the same amount of patience.

Should we focus on code already-written by end users, instead of tutorials targeting new users? Breaking books and tutorials means they will be out of date, but as you said tutorial-writers are "providers" rather than end-users and can release new versions of their books or tutorials, just as library-providers can release new versions of their libraries.

Almost by definition, the new users don't know anything yet, and if we change the language and tutorials the endless stream of new users will learn the new thing.

On the other hand, if someone's maintaining a million-LOC Scala codebase, that would actually be a pain to migrate depending on how breakage occurs. We might not be able to persuade a large company to give us a dump of their massive codebase, but if someone wrote a simple code-analysis tool that plugged into the compiler or ran over bytecodes to dump a txt file containing aggregate statistics of calls to scala.collection.* methods I'd bet we could persuade some real companies to run it on their codebase.

Such aggregate statistics shouldn't contain any sensitive information or IP (who cares if your company uses List#scanRight a lot???) and a company may be willing to make them available purely out of self interest: nudging the 2.13 decision-making process to reduce the amount of pain they, the end users, will see upgrading when 2.13 lands. Which seems to be what we want as well.

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 22, 2017

Contributor

@lihaoyi I was focusing on tutorial writers for two reasons:

  • tutorials tend to be longer-lived than libraries, with a slower rate of innovation.
  • tutorials influence the coding patterns of people learning with them. That way, they are a good proxy for the large number of programmers who use Scala but are not contributors to a highly visible software package.

That said, I like the statistics tool idea.

Contributor

odersky commented Jan 22, 2017

@lihaoyi I was focusing on tutorial writers for two reasons:

  • tutorials tend to be longer-lived than libraries, with a slower rate of innovation.
  • tutorials influence the coding patterns of people learning with them. That way, they are a good proxy for the large number of programmers who use Scala but are not contributors to a highly visible software package.

That said, I like the statistics tool idea.

@olafurpg

This comment has been minimized.

Show comment
Hide comment
@olafurpg

olafurpg Jan 28, 2017

Contributor

I made a small experiment to find out what parts of the collections library is used in open-sources Scala libraries.My methodology:

  • Pick top 1.000 Scala projects according to scaladex.
    • Here are the top 10 libraries, use pagination to see more results.
    • The ranking takes several factors into account, including a simplified "page rank" (how many artifacts depend on this library, even for test dependencies) and Github stars/contributors (if available).
  • Download the latest version of all published artifacts for those projects.
    • 4.298 jar files, 3.8 GB
    • 819.035 classfiles
  • For each classfile, use javap -c and grep for "scala/collection"

Early results:

  • 1.046.449 total results, this includes new, invoke{virtual,special,interface}, checkcast
  • 692.361 invocations
  • 75.709 invocations to scala/collection/mutable/StringBuilder.append
  • 1.720 invocations to methods named view, of which ~900 are to scala/collection/SeqLike$class.view:(Lscala/collection/SeqLike;)Lscala/collection/SeqView; and scala/collection/IterableLike$class.view:(Lscala/collection/IterableLike;II)Lscala/collection/IterableView;
  • 25.093 calls to .map
  • 6.240 calls to .flatMap
  • 4.242 calls to .filter
  • 3.734 calls to .withFilter
  • 2.578 calls to .headOption
  • 31.262 calls to .head 😭

The dataset is available here, feel free to grep/sed/awk as you please: https://drive.google.com/file/d/0B5BBsTPBxGcMdl9zZG1sSnZ6Qlk/view?usp=sharing

Contributor

olafurpg commented Jan 28, 2017

I made a small experiment to find out what parts of the collections library is used in open-sources Scala libraries.My methodology:

  • Pick top 1.000 Scala projects according to scaladex.
    • Here are the top 10 libraries, use pagination to see more results.
    • The ranking takes several factors into account, including a simplified "page rank" (how many artifacts depend on this library, even for test dependencies) and Github stars/contributors (if available).
  • Download the latest version of all published artifacts for those projects.
    • 4.298 jar files, 3.8 GB
    • 819.035 classfiles
  • For each classfile, use javap -c and grep for "scala/collection"

Early results:

  • 1.046.449 total results, this includes new, invoke{virtual,special,interface}, checkcast
  • 692.361 invocations
  • 75.709 invocations to scala/collection/mutable/StringBuilder.append
  • 1.720 invocations to methods named view, of which ~900 are to scala/collection/SeqLike$class.view:(Lscala/collection/SeqLike;)Lscala/collection/SeqView; and scala/collection/IterableLike$class.view:(Lscala/collection/IterableLike;II)Lscala/collection/IterableView;
  • 25.093 calls to .map
  • 6.240 calls to .flatMap
  • 4.242 calls to .filter
  • 3.734 calls to .withFilter
  • 2.578 calls to .headOption
  • 31.262 calls to .head 😭

The dataset is available here, feel free to grep/sed/awk as you please: https://drive.google.com/file/d/0B5BBsTPBxGcMdl9zZG1sSnZ6Qlk/view?usp=sharing

@olafurpg

This comment has been minimized.

Show comment
Hide comment
@olafurpg

olafurpg Jan 28, 2017

Contributor

The maven coordinates for the analyzed artifacts are here: artifacts.zip
Some of them are invalid and don't resolve to actual published artifacts. The file is autogenerated from json responses from scaladex.

Contributor

olafurpg commented Jan 28, 2017

The maven coordinates for the analyzed artifacts are here: artifacts.zip
Some of them are invalid and don't resolve to actual published artifacts. The file is autogenerated from json responses from scaladex.

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 28, 2017

Contributor

Interesting. I was surprised that view is that common. Definitely good that we cater for it instead of removing it.

What about breakOut? That's one of the things we cannot support anymore.

Contributor

odersky commented Jan 28, 2017

Interesting. I was surprised that view is that common. Definitely good that we cater for it instead of removing it.

What about breakOut? That's one of the things we cannot support anymore.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Jan 29, 2017

Contributor

Thanks for gathering this, @olafurpg -- very instructive!

@odersky - 270 uses of breakOut according to a quick grep. This is in the general range of mapResult (on builders; 396 usages) and enqueue (from Queue, 244 usages).

Contributor

Ichoran commented Jan 29, 2017

Thanks for gathering this, @olafurpg -- very instructive!

@odersky - 270 uses of breakOut according to a quick grep. This is in the general range of mapResult (on builders; 396 usages) and enqueue (from Queue, 244 usages).

@fommil

This comment has been minimized.

Show comment
Hide comment
@fommil

fommil Jan 30, 2017

I've found breakout to be incredibly useful for fixing performance issues related to object churn.

Almost every call of mapValues (returns a view, or at least a lazy thing) has been an accident in my experience with huge performance penalty.

fommil commented Jan 30, 2017

I've found breakout to be incredibly useful for fixing performance issues related to object churn.

Almost every call of mapValues (returns a view, or at least a lazy thing) has been an accident in my experience with huge performance penalty.

@fommil

This comment has been minimized.

Show comment
Hide comment
@fommil

fommil Jan 30, 2017

btw, please add ensime to your test list. It's a big codebase by github standards and won't show up in scaladex because it's an app (not a library) and hence not externally referenced.

fommil commented Jan 30, 2017

btw, please add ensime to your test list. It's a big codebase by github standards and won't show up in scaladex because it's an app (not a library) and hence not externally referenced.

@olafurpg

This comment has been minimized.

Show comment
Hide comment
@olafurpg

olafurpg Jan 30, 2017

Contributor

I extended the dataset to include info about the origin of each call-site. This makes it possible to see where these points of interest appear. Here's the data in a tab-separated format with columns 1. project (first 2 package names) 2. classfile name 3. scala.collection callsite:
result5.zip

Here's a visualization for the usage breakOut grouped by project: https://docs.google.com/spreadsheets/d/14jkp9wpVJXD2kPKZWM1vWbc1gt8HRE5NCBJbREbrfwQ/edit?usp=sharing

The spreadsheet is open for anyone to edit, feel free to add your own analysis in a separate sheet.

@fommil Ensime is in the list and appears to be the fourth biggest user of breakOut

Contributor

olafurpg commented Jan 30, 2017

I extended the dataset to include info about the origin of each call-site. This makes it possible to see where these points of interest appear. Here's the data in a tab-separated format with columns 1. project (first 2 package names) 2. classfile name 3. scala.collection callsite:
result5.zip

Here's a visualization for the usage breakOut grouped by project: https://docs.google.com/spreadsheets/d/14jkp9wpVJXD2kPKZWM1vWbc1gt8HRE5NCBJbREbrfwQ/edit?usp=sharing

The spreadsheet is open for anyone to edit, feel free to add your own analysis in a separate sheet.

@fommil Ensime is in the list and appears to be the fourth biggest user of breakOut

@odersky

This comment has been minimized.

Show comment
Hide comment
@odersky

odersky Jan 30, 2017

Contributor

@fommil The strawman proposal can easily obtain the performance gains obtained from breakout by explicit view/fromIterable calls. But it will require a rewrite of every occurrence of breakOut. So this is not so much a matter of principle but of degree. How widespread will be the burden of rewrites?

Contributor

odersky commented Jan 30, 2017

@fommil The strawman proposal can easily obtain the performance gains obtained from breakout by explicit view/fromIterable calls. But it will require a rewrite of every occurrence of breakOut. So this is not so much a matter of principle but of degree. How widespread will be the burden of rewrites?

@fommil

This comment has been minimized.

Show comment
Hide comment
@fommil

fommil Jan 30, 2017

We probably overuse breakOut to be honest (it's come up so much in performance profiling that I now add it by instinct when I'm coding something that will be called a lot). I know one or two of our files that are heavy churn creators (anything around classfile indexing) so I can just profile them again under the new collections lib and do whatever is necessary.

In terms of proprietary usage of breakOut, I have tended to be a bit more surgical about it, so hopefully a grep and a cookbook is all that is needed.

ENSIME might struggle to create a <=2.12 and 2.13+ version of the code so maybe when we support scala 2.13 we will have to drop support for older scala. That's ok, we want to freeze the API in 2.0 anyway in the hopes of dropping 2.10.

fommil commented Jan 30, 2017

We probably overuse breakOut to be honest (it's come up so much in performance profiling that I now add it by instinct when I'm coding something that will be called a lot). I know one or two of our files that are heavy churn creators (anything around classfile indexing) so I can just profile them again under the new collections lib and do whatever is necessary.

In terms of proprietary usage of breakOut, I have tended to be a bit more surgical about it, so hopefully a grep and a cookbook is all that is needed.

ENSIME might struggle to create a <=2.12 and 2.13+ version of the code so maybe when we support scala 2.13 we will have to drop support for older scala. That's ok, we want to freeze the API in 2.0 anyway in the hopes of dropping 2.10.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Feb 7, 2017

Collaborator

At the Scala team meeting last Friday we discussed the option of having a compatibility library for cross-compilation between 2.12 and 2.13 collections. This could not possibly be a one-step process (making 2.12 collections looks like 2.13 or vice versa) because of shared names (like view) with different semantics but possibly get close to 2.13 semantics with different names on top of 2.12 and an aliasing shim for 2.13. This is something we could investigate in Q2 when we should also work on automatic rewrites with Scalafix and testing of some community libraries (e.g. ScalaTest) with the new library types before porting all implementations.

Collaborator

szeiger commented Feb 7, 2017

At the Scala team meeting last Friday we discussed the option of having a compatibility library for cross-compilation between 2.12 and 2.13 collections. This could not possibly be a one-step process (making 2.12 collections looks like 2.13 or vice versa) because of shared names (like view) with different semantics but possibly get close to 2.13 semantics with different names on top of 2.12 and an aliasing shim for 2.13. This is something we could investigate in Q2 when we should also work on automatic rewrites with Scalafix and testing of some community libraries (e.g. ScalaTest) with the new library types before porting all implementations.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Feb 7, 2017

Collaborator

@odersky I actually find 1720 view calls reassuring. That's less than 2 calls per project on average.

Collaborator

szeiger commented Feb 7, 2017

@odersky I actually find 1720 view calls reassuring. That's less than 2 calls per project on average.

@olafurpg

This comment has been minimized.

Show comment
Hide comment
@olafurpg

olafurpg Feb 7, 2017

Contributor

Here is the the usage of view grouped by project https://docs.google.com/spreadsheets/d/14jkp9wpVJXD2kPKZWM1vWbc1gt8HRE5NCBJbREbrfwQ/edit#gid=1958287660

image 1

It seems over 50% of usages come from only 8 projects. Over 11% comes from scala/collection itself.

Contributor

olafurpg commented Feb 7, 2017

Here is the the usage of view grouped by project https://docs.google.com/spreadsheets/d/14jkp9wpVJXD2kPKZWM1vWbc1gt8HRE5NCBJbREbrfwQ/edit#gid=1958287660

image 1

It seems over 50% of usages come from only 8 projects. Over 11% comes from scala/collection itself.

@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Feb 7, 2017

Contributor

I see a count of 9 for dotty/tools but only one occurence of .view in the sources of dotty in compiler/src/dotty/tools, any idea why?

Contributor

smarter commented Feb 7, 2017

I see a count of 9 for dotty/tools but only one occurence of .view in the sources of dotty in compiler/src/dotty/tools, any idea why?

@olafurpg

This comment has been minimized.

Show comment
Hide comment
@olafurpg

olafurpg Feb 7, 2017

Contributor

@smarter Here are the classfiles associated with of those view calls in dotty/tools: https://gist.github.com/olafurpg/fbc42ad1a139c386b9f28104b0c53a68

Here's an extension of result5.zip with two columns: 1. original classfile (omitted in result5.zip) and 2. scala/collection call-site
collection6.zip

Contributor

olafurpg commented Feb 7, 2017

@smarter Here are the classfiles associated with of those view calls in dotty/tools: https://gist.github.com/olafurpg/fbc42ad1a139c386b9f28104b0c53a68

Here's an extension of result5.zip with two columns: 1. original classfile (omitted in result5.zip) and 2. scala/collection call-site
collection6.zip

@fommil

This comment has been minimized.

Show comment
Hide comment
@fommil

fommil Feb 7, 2017

almost all uses of .mapValues I've ever seen has been an accidental use of views, does it show up at the binary level?

fommil commented Feb 7, 2017

almost all uses of .mapValues I've ever seen has been an accidental use of views, does it show up at the binary level?

@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Feb 7, 2017

Contributor

@smarter Here are the classfiles associated with of those view calls in dotty/tools: https://gist.github.com/olafurpg/fbc42ad1a139c386b9f28104b0c53a68

I looked into it and all but one of these occurences of view are not present in sources but occur because we extend some trait from the standard library, so I suspect the real usage of view is much smaller than what you would expect based on the stats you collected.

Contributor

smarter commented Feb 7, 2017

@smarter Here are the classfiles associated with of those view calls in dotty/tools: https://gist.github.com/olafurpg/fbc42ad1a139c386b9f28104b0c53a68

I looked into it and all but one of these occurences of view are not present in sources but occur because we extend some trait from the standard library, so I suspect the real usage of view is much smaller than what you would expect based on the stats you collected.

@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Feb 7, 2017

Contributor

Maybe you could try filtering out all usages of view from *Like classes, since they're unlikely to be called by users?

Contributor

smarter commented Feb 7, 2017

Maybe you could try filtering out all usages of view from *Like classes, since they're unlikely to be called by users?

@DarkDimius

This comment has been minimized.

Show comment
Hide comment
@DarkDimius

DarkDimius Feb 7, 2017

Member

It might be good idea to filter out synthetic bridge methods

Member

DarkDimius commented Feb 7, 2017

It might be good idea to filter out synthetic bridge methods

@smarter

This comment has been minimized.

Show comment
Hide comment
@smarter

smarter Feb 7, 2017

Contributor

It might be good idea to filter out synthetic bridge methods

This seems to be already done, e.g. in Jar.class there's really 4 calls to view, the two that are in the gist provided by @olafurpg are in non-bridge methods:

 public java.lang.Object view();
    descriptor: ()Lscala/collection/IterableView;
    flags: ACC_PUBLIC
--
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokestatic  #243                // Method scala/collection/IterableLike$class.view:(Lscala/collection/IterableLike;)Lscala/collection/IterableView;
         4: areturn
      LocalVariableTable:
--
    Signature: #150                         // ()Ljava/lang/Object;

  public scala.collection.IterableView<java.util.jar.JarEntry, scala.collection.Iterable<java.util.jar.JarEntry>> view(int, int);
    descriptor: (II)Lscala/collection/IterableView;
    flags: ACC_PUBLIC
--
         1: iload_1
         2: iload_2
         3: invokestatic  #247                // Method scala/collection/IterableLike$class.view:(Lscala/collection/IterableLike;II)Lscala/collection/IterableView;
         6: areturn
      LocalVariableTable:
--
        line 36: 0

  public scala.collection.TraversableView view(int, int);
    descriptor: (II)Lscala/collection/TraversableView;
    flags: ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC
--
         1: iload_1
         2: iload_2
         3: invokevirtual #804                // Method view:(II)Lscala/collection/IterableView;
         6: areturn
      LocalVariableTable:
--
        line 36: 0

  public scala.collection.TraversableView view();
    descriptor: ()Lscala/collection/TraversableView;
    flags: ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC
--
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokevirtual #807                // Method view:()Lscala/collection/IterableView;
         4: areturn
      LocalVariableTable:
Contributor

smarter commented Feb 7, 2017

It might be good idea to filter out synthetic bridge methods

This seems to be already done, e.g. in Jar.class there's really 4 calls to view, the two that are in the gist provided by @olafurpg are in non-bridge methods:

 public java.lang.Object view();
    descriptor: ()Lscala/collection/IterableView;
    flags: ACC_PUBLIC
--
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokestatic  #243                // Method scala/collection/IterableLike$class.view:(Lscala/collection/IterableLike;)Lscala/collection/IterableView;
         4: areturn
      LocalVariableTable:
--
    Signature: #150                         // ()Ljava/lang/Object;

  public scala.collection.IterableView<java.util.jar.JarEntry, scala.collection.Iterable<java.util.jar.JarEntry>> view(int, int);
    descriptor: (II)Lscala/collection/IterableView;
    flags: ACC_PUBLIC
--
         1: iload_1
         2: iload_2
         3: invokestatic  #247                // Method scala/collection/IterableLike$class.view:(Lscala/collection/IterableLike;II)Lscala/collection/IterableView;
         6: areturn
      LocalVariableTable:
--
        line 36: 0

  public scala.collection.TraversableView view(int, int);
    descriptor: (II)Lscala/collection/TraversableView;
    flags: ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC
--
         1: iload_1
         2: iload_2
         3: invokevirtual #804                // Method view:(II)Lscala/collection/IterableView;
         6: areturn
      LocalVariableTable:
--
        line 36: 0

  public scala.collection.TraversableView view();
    descriptor: ()Lscala/collection/TraversableView;
    flags: ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC
--
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokevirtual #807                // Method view:()Lscala/collection/IterableView;
         4: areturn
      LocalVariableTable:
@DarkDimius

This comment has been minimized.

Show comment
Hide comment
@DarkDimius

DarkDimius Feb 7, 2017

Member

Those are trait forwarders and apparently they don't get ACC_SYNTHETIC ACC_Bridge
I'd propose to filter them by discarding calls using invokestatic that have aload_0 in previous 5 instructions. Additionally, those 5 previous instructions shound't contain any invoke-s.

Member

DarkDimius commented Feb 7, 2017

Those are trait forwarders and apparently they don't get ACC_SYNTHETIC ACC_Bridge
I'd propose to filter them by discarding calls using invokestatic that have aload_0 in previous 5 instructions. Additionally, those 5 previous instructions shound't contain any invoke-s.

@olafurpg

This comment has been minimized.

Show comment
Hide comment
@olafurpg

olafurpg Feb 7, 2017

Contributor

@fommil I suspect not. At least, mapValues doesn't appear next to view calls in the corpus. Here's the breakdown for mapValues https://docs.google.com/spreadsheets/d/14jkp9wpVJXD2kPKZWM1vWbc1gt8HRE5NCBJbREbrfwQ/edit#gid=26040485

@smarter That is a good observation. There are 26 instances of *Like.class with .view calls, of which 24 are from scala/collection.

@DarkDimius the corpus doesn't contain the leading instructions. I guess asm is better suited than bash+javap for more advanced queries. Let me know if you want the corpus of jars, it's ~3.5gb.

Contributor

olafurpg commented Feb 7, 2017

@fommil I suspect not. At least, mapValues doesn't appear next to view calls in the corpus. Here's the breakdown for mapValues https://docs.google.com/spreadsheets/d/14jkp9wpVJXD2kPKZWM1vWbc1gt8HRE5NCBJbREbrfwQ/edit#gid=26040485

@smarter That is a good observation. There are 26 instances of *Like.class with .view calls, of which 24 are from scala/collection.

@DarkDimius the corpus doesn't contain the leading instructions. I guess asm is better suited than bash+javap for more advanced queries. Let me know if you want the corpus of jars, it's ~3.5gb.

@DarkDimius

This comment has been minimized.

Show comment
Hide comment
@DarkDimius

DarkDimius Feb 7, 2017

Member

@olafurpg, if you'll be filtering out Like-s I'd only filter out "IterableLike$class", i.e. with "$class", other calls inside non-synthetic methods should be valild calls.

Member

DarkDimius commented Feb 7, 2017

@olafurpg, if you'll be filtering out Like-s I'd only filter out "IterableLike$class", i.e. with "$class", other calls inside non-synthetic methods should be valild calls.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Apr 24, 2018

Collaborator

Closing since we've merged the strawman into 2.13.x.

Collaborator

szeiger commented Apr 24, 2018

Closing since we've merged the strawman into 2.13.x.

@szeiger szeiger closed this Apr 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment