Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closeAndCollect in scio-repl hangs on scala 2.12.3 #867

Closed
ravwojdyla opened this issue Oct 4, 2017 · 8 comments
Closed

closeAndCollect in scio-repl hangs on scala 2.12.3 #867

ravwojdyla opened this issue Oct 4, 2017 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@ravwojdyla
Copy link
Contributor

ravwojdyla commented Oct 4, 2017

See title, to reproduce:

sc.parallelize(1 to 10).map(_ => "foo").closeAndCollect

the same thing will work just fine in repl on 2.11.11.

@ravwojdyla ravwojdyla added bug Something isn't working help wanted labels Oct 4, 2017
@ravwojdyla
Copy link
Contributor Author

More info: it seems to be related to ser/de (obviously) of functions in REPL.
Stacktrace of one of the threads:

"direct-runner-worker" #64 prio=5 os_prio=31 tid=0x00007f809cbb4800 nid=0x6213 in Object.wait() [0x0000700001907000]
   java.lang.Thread.State: RUNNABLE
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1148)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2036)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
        at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:72)
        at org.apache.beam.runners.direct.DoFnLifecycleManager$DeserializingCacheLoader.load(DoFnLifecycleManager.java:100)
        at org.apache.beam.runners.direct.DoFnLifecycleManager$DeserializingCacheLoader.load(DoFnLifecycleManager.java:91)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3628)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2336)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2295)
        - locked <0x00000007a1440f70> (a org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache$StrongEntry)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2208)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache.get(LocalCache.java:4053)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4057)
        at org.apache.beam.runners.direct.repackaged.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4986)
        at org.apache.beam.runners.direct.DoFnLifecycleManager.get(DoFnLifecycleManager.java:61)
        at org.apache.beam.runners.direct.ParDoEvaluatorFactory.createEvaluator(ParDoEvaluatorFactory.java:127)
        at org.apache.beam.runners.direct.ParDoEvaluatorFactory.forApplication(ParDoEvaluatorFactory.java:80)
        at org.apache.beam.runners.direct.TransformEvaluatorRegistry.forApplication(TransformEvaluatorRegistry.java:98)
        at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:103)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

also if we split the line, and invoke closeAndCollect separately, it all good, so this works:

scio> val s = sc.parallelize(1 to 10).map(_ => "foo")
scio> s.closeAndCollect

while (again), this does not:

scio> val s = sc.parallelize(1 to 10).map(_ => "foo").closeAndCollect

might be related to scala/bug#10064

@ravwojdyla
Copy link
Contributor Author

Add class based repl via #868 causes:

scio> @BigQueryType.fromQuery("""SELECT 5""") class TTTT
scio> sc.parallelize(1 to 10).map(e => TTTT(Option(e)))
java.io.IOException: Class not found
  at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
  at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
  at com.spotify.scio.util.TransitiveClosureCleaner.com$spotify$scio$util$TransitiveClosureCleaner$$classReader(ClosureCleaner.scala:144)
  at com.spotify.scio.util.TransitiveClosureCleaner.innerClasses(ClosureCleaner.scala:151)
  at com.spotify.scio.util.TransitiveClosureCleaner.getAccessedFields(ClosureCleaner.scala:167)
  at com.spotify.scio.util.TransitiveClosureCleaner.storeAccessedFields(ClosureCleaner.scala:174)
  at com.spotify.scio.util.TransitiveClosureCleaner.cleanOuter(ClosureCleaner.scala:110)
  at com.spotify.scio.util.ClosureCleaner.clean(ClosureCleaner.scala:59)
  at com.spotify.scio.util.ClosureCleaner.clean$(ClosureCleaner.scala:58)
  at com.spotify.scio.util.TransitiveClosureCleaner.clean(ClosureCleaner.scala:104)
  at com.spotify.scio.util.ClosureCleaner$.clean(ClosureCleaner.scala:47)
  at com.spotify.scio.util.ClosureCleaner$.apply(ClosureCleaner.scala:40)
  at com.spotify.scio.util.Functions$$anon$7.<init>(Functions.scala:143)
  at com.spotify.scio.util.Functions$.mapFn(Functions.scala:142)
  at com.spotify.scio.values.SCollection.map(SCollection.scala:366)
  at com.spotify.scio.values.SCollection.map$(SCollection.scala:366)
  at com.spotify.scio.values.SCollectionImpl.map(SCollection.scala:1190)
  ... 57 elided

@andrewsmartin
Copy link
Contributor

andrewsmartin commented Oct 5, 2017

The ClosureCleaner is completely broken for Scala 2.12, and it always fails this way (lambdas are no longer statically generated classes, so the class cannot be read in this way). For Scala 2.12 we've been relying on the compiler to handle issues of serializability, and ClosureCleaner is only invoked if the fn is not already serializable (since we need that to happen in Scala 2.11). Anyway the issue here is that there's something about the repl which causes it to not be serializable, need to investigate more. Without the REPL, no closure cleaning is needed.

@andrewsmartin andrewsmartin changed the title closeAndCollect in scio-repl hands on scala 2.12.3 closeAndCollect in scio-repl hangs on scala 2.12.3 Oct 10, 2017
@nevillelyh
Copy link
Contributor

Is there a short term fix? Or should we revert to 2.11.11 for REPL?

@ravwojdyla
Copy link
Contributor Author

@nevillelyh not that i'm aware of, i think we gonna need to revert.

ravwojdyla pushed a commit that referenced this issue Oct 10, 2017
nevillelyh pushed a commit that referenced this issue Oct 10, 2017
@ravwojdyla ravwojdyla removed their assignment Oct 12, 2017
@nevillelyh
Copy link
Contributor

Is this still relevant? Can we close?

@jbx
Copy link
Contributor

jbx commented Jul 17, 2018

👉 @ravwojdyla 👈
(poke)

@jbx jbx added the P2 label Jul 17, 2018
@ravwojdyla
Copy link
Contributor Author

i don't work on it right now, and don't know whether it's still a problem on 2.12.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants