Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MatrixKeyRowsBy with isSorted=true does not change RVD type after casting to a Table #5396

Closed
chrisvittal opened this issue Feb 20, 2019 · 2 comments

Comments

Projects
None yet
2 participants
@chrisvittal
Copy link
Collaborator

commented Feb 20, 2019

Code:

import hail as hl
from hail import ir

hl.init()
mt = hl.import_vcf('path/to/vcf')
mt = hl.MatrixTable(ir.MatrixKeyRowsBy(mt._mir, ['locus'], is_sorted=True))
ht = mt._localize_entries('_e', '_c')
j = hl.Table._multi_way_zip_join([ht, ht], 'd', 'g')
j.write('tst.ht')

Java Stack Trace:

py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.expr.ir.Interpret.interpretJSON.
: java.util.NoSuchElementException: key not found: alleles
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at scala.collection.AbstractMap.default(Map.scala:59)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:59)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at is.hail.rvd.RVDType.<init>(RVDType.scala:24)
	at is.hail.expr.ir.TableMultiWayZipJoin.execute(TableIR.scala:669)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:775)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:93)
	at is.hail.expr.ir.Interpret$.apply(Interpret.scala:63)
	at is.hail.expr.ir.Interpret$.interpretJSON(Interpret.scala:22)
	at is.hail.expr.ir.Interpret.interpretJSON(Interpret.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

Some debugging showed that the RVD Key type for the children of MultiWayZipJoin was still [locus, alleles], I would expect the RVD key to be [locus].

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

The RVD key type is the physical key. This is totally allowed. I believe the bug is in multi way zip join, which needs to truncate the key to the join key first.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

I also think this would replicate with is_sorted=False

tpoterba added a commit to tpoterba/hail that referenced this issue Mar 26, 2019

@danking danking closed this in #5693 Apr 1, 2019

danking added a commit that referenced this issue Apr 1, 2019

Fix TableMultiWayZipJoin key behavior (#5693)
* Fix TableMultiWayZipJoin key behavior

fixes #5396

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.