Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outer and right joins failing #4055

Closed
maccum opened this issue Jul 31, 2018 · 3 comments
Closed

outer and right joins failing #4055

maccum opened this issue Jul 31, 2018 · 3 comments
Assignees

Comments

@maccum
Copy link
Contributor

maccum commented Jul 31, 2018

Hail version:

eb5d13f

What you did:

t1 = hl.Table.parallelize([
    {'a': 'foo', 'b': 1},
    {'a': 'bar', 'b': 2},
    {'a': 'bar', 'b': 2}],
    hl.tstruct(a=hl.tstr, b=hl.tint32),
    key='a')
t2 = hl.Table.parallelize([
    {'t': 'foo', 'x': 3.14},
    {'t': 'bar', 'x': 2.78},
    {'t': 'bar', 'x': -1},
    {'t': 'quam', 'x': 0}],
    hl.tstruct(t=hl.tstr, x=hl.tfloat64),
    key='t')

t1.join(t2, how='outer').show()

# or

t1.join(t2, how='right').show()

What went wrong (all error messages here, including the full java stack trace):

FatalError: HailException: OrderedRVD error! Unexpected PK in partition 1
Range bounds for partition 1: ([bar]-[foo]]
Invalid PK: [quam]
Full key: [quam]

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 19.0 failed 1 times, most recent failure: Lost task 0.0 in stage 19.0 (TID 24, localhost, executor driver): is.hail.utils.HailException: OrderedRVD error! Unexpected PK in partition 1
Range bounds for partition 1: ([bar]-[foo]]
Invalid PK: [quam]
Full key: [quam]
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
at is.hail.utils.package$.fatal(package.scala:26)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.next(OrderedRVD.scala:1031)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.next(OrderedRVD.scala:1012)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.rvd.RVD$$anonfun$4$$anon$1.hasNext(RVD.scala:226)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.hasNext(OrderedRVD.scala:1015)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:1004)
at is.hail.utils.richUtils.RichIterator$$anon$5.isValid(RichIterator.scala:21)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:46)
at is.hail.utils.FlipbookIterator$$anon$1.calculateValidity(FlipbookIterator.scala:178)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine$class.refreshValidity(FlipbookIterator.scala:167)
at is.hail.utils.FlipbookIterator$$anon$1.refreshValidity(FlipbookIterator.scala:176)
at is.hail.utils.FlipbookIterator$$anon$1.advance(FlipbookIterator.scala:181)
at is.hail.utils.StagingIterator.stage(FlipbookIterator.scala:59)
at is.hail.utils.StagingIterator.hasNext(FlipbookIterator.scala:69)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at is.hail.utils.FlipbookIterator.foreach(FlipbookIterator.scala:102)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at is.hail.annotations.RegionValueArrayBuffer.$plus$plus$eq(WritableRegionValue.scala:71)
at is.hail.utils.FlipbookIterator.cartesianProduct(FlipbookIterator.scala:356)
at is.hail.utils.FlipbookIterator$$anonfun$outerJoin$1.apply(FlipbookIterator.scala:346)
at is.hail.utils.FlipbookIterator$$anonfun$outerJoin$1.apply(FlipbookIterator.scala:343)
at is.hail.utils.FlipbookIterator$$anon$5.(FlipbookIterator.scala:146)
at is.hail.utils.FlipbookIterator.flatMap(FlipbookIterator.scala:144)
at is.hail.utils.FlipbookIterator.outerJoin(FlipbookIterator.scala:343)
at is.hail.annotations.OrderedRVIterator.outerJoin(OrderedRVIterator.scala:117)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$4.apply(KeyedOrderedRVD.scala:48)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$4.apply(KeyedOrderedRVD.scala:48)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$orderedJoin$1.apply(KeyedOrderedRVD.scala:60)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$orderedJoin$1.apply(KeyedOrderedRVD.scala:56)
at is.hail.sparkextras.ContextRDD$$anonfun$czipPartitions$1$$anonfun$apply$26.apply(ContextRDD.scala:357)
at is.hail.sparkextras.ContextRDD$$anonfun$czipPartitions$1$$anonfun$apply$26.apply(ContextRDD.scala:357)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$22$$anonfun$apply$23.apply(ContextRDD.scala:310)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$22$$anonfun$apply$23.apply(ContextRDD.scala:310)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.hasNext(OrderedRVD.scala:1015)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.utils.package$.getIteratorSizeWithMaxN(package.scala:357)
at is.hail.sparkextras.ContextRDD$$anonfun$12.apply(ContextRDD.scala:444)
at is.hail.sparkextras.ContextRDD$$anonfun$12.apply(ContextRDD.scala:444)
at is.hail.sparkextras.ContextRDD$$anonfun$runJob$1.apply(ContextRDD.scala:471)
at is.hail.sparkextras.ContextRDD$$anonfun$runJob$1.apply(ContextRDD.scala:469)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at is.hail.sparkextras.ContextRDD.runJob(ContextRDD.scala:467)
at is.hail.sparkextras.ContextRDD.head(ContextRDD.scala:444)
at is.hail.rvd.OrderedRVD.head(OrderedRVD.scala:346)
at is.hail.rvd.OrderedRVD.head(OrderedRVD.scala:32)
at is.hail.rvd.RVD$class.takeAsBytes(RVD.scala:247)
at is.hail.rvd.OrderedRVD.takeAsBytes(OrderedRVD.scala:32)
at is.hail.rvd.RVD$class.take(RVD.scala:251)
at is.hail.rvd.OrderedRVD.take(OrderedRVD.scala:32)
at is.hail.table.Table.take(Table.scala:637)
at is.hail.table.Table.showString(Table.scala:673)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)is.hail.utils.HailException: OrderedRVD error! Unexpected PK in partition 1
Range bounds for partition 1: ([bar]-[foo]]
Invalid PK: [quam]
Full key: [quam]
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
at is.hail.utils.package$.fatal(package.scala:26)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.next(OrderedRVD.scala:1031)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.next(OrderedRVD.scala:1012)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.rvd.RVD$$anonfun$4$$anon$1.hasNext(RVD.scala:226)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.hasNext(OrderedRVD.scala:1015)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:1004)
at is.hail.utils.richUtils.RichIterator$$anon$5.isValid(RichIterator.scala:21)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:46)
at is.hail.utils.FlipbookIterator$$anon$1.calculateValidity(FlipbookIterator.scala:178)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine$class.refreshValidity(FlipbookIterator.scala:167)
at is.hail.utils.FlipbookIterator$$anon$1.refreshValidity(FlipbookIterator.scala:176)
at is.hail.utils.FlipbookIterator$$anon$1.advance(FlipbookIterator.scala:181)
at is.hail.utils.StagingIterator.stage(FlipbookIterator.scala:59)
at is.hail.utils.StagingIterator.hasNext(FlipbookIterator.scala:69)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at is.hail.utils.FlipbookIterator.foreach(FlipbookIterator.scala:102)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at is.hail.annotations.RegionValueArrayBuffer.$plus$plus$eq(WritableRegionValue.scala:71)
at is.hail.utils.FlipbookIterator.cartesianProduct(FlipbookIterator.scala:356)
at is.hail.utils.FlipbookIterator$$anonfun$outerJoin$1.apply(FlipbookIterator.scala:346)
at is.hail.utils.FlipbookIterator$$anonfun$outerJoin$1.apply(FlipbookIterator.scala:343)
at is.hail.utils.FlipbookIterator$$anon$5.(FlipbookIterator.scala:146)
at is.hail.utils.FlipbookIterator.flatMap(FlipbookIterator.scala:144)
at is.hail.utils.FlipbookIterator.outerJoin(FlipbookIterator.scala:343)
at is.hail.annotations.OrderedRVIterator.outerJoin(OrderedRVIterator.scala:117)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$4.apply(KeyedOrderedRVD.scala:48)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$4.apply(KeyedOrderedRVD.scala:48)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$orderedJoin$1.apply(KeyedOrderedRVD.scala:60)
at is.hail.rvd.KeyedOrderedRVD$$anonfun$orderedJoin$1.apply(KeyedOrderedRVD.scala:56)
at is.hail.sparkextras.ContextRDD$$anonfun$czipPartitions$1$$anonfun$apply$26.apply(ContextRDD.scala:357)
at is.hail.sparkextras.ContextRDD$$anonfun$czipPartitions$1$$anonfun$apply$26.apply(ContextRDD.scala:357)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$22$$anonfun$apply$23.apply(ContextRDD.scala:310)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$22$$anonfun$apply$23.apply(ContextRDD.scala:310)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at is.hail.rvd.OrderedRVD$$anonfun$apply$21$$anon$3.hasNext(OrderedRVD.scala:1015)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.utils.package$.getIteratorSizeWithMaxN(package.scala:357)
at is.hail.sparkextras.ContextRDD$$anonfun$12.apply(ContextRDD.scala:444)
at is.hail.sparkextras.ContextRDD$$anonfun$12.apply(ContextRDD.scala:444)
at is.hail.sparkextras.ContextRDD$$anonfun$runJob$1.apply(ContextRDD.scala:471)
at is.hail.sparkextras.ContextRDD$$anonfun$runJob$1.apply(ContextRDD.scala:469)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Hail version: devel-eb5d13fe97fc
Error summary: HailException: OrderedRVD error! Unexpected PK in partition 1
Range bounds for partition 1: ([bar]-[foo]]
Invalid PK: [quam]
Full key: [quam]

@tpoterba
Copy link
Contributor

tpoterba commented Aug 2, 2018

probably fixed as part of Patrick's join work

@tpoterba
Copy link
Contributor

tpoterba commented Aug 2, 2018

updated error:

is.hail.utils.HailException: OrderedRVD error! Unexpected key in partition 1
  Range bounds for partition 1: ([bar]-[foo]]
  Key should be in partition 1: ([bar]-[foo]]
  Invalid key: [quam]

🤔

@patrick-schultz
Copy link
Collaborator

fixed by #4094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants