Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] pipeline triggers type inference failure #13699

Closed
danking opened this issue Sep 23, 2023 · 4 comments · Fixed by #13702
Closed

[query] pipeline triggers type inference failure #13699

danking opened this issue Sep 23, 2023 · 4 comments · Fixed by #13702
Assignees
Labels

Comments

@danking
Copy link
Contributor

danking commented Sep 23, 2023

What happened?

Also fails if the show is a _force_count.

reproduction.tar.gz

Here's a simple reproducer:

import hail as hl
b = hl.utils.range_table(1)
b = b.key_by(interval=hl.interval(b.idx, b.idx))
b = b.annotate(target='foo')
a = hl.utils.range_table(1)
a = a.annotate(x=b.index(a.idx, all_matches=True).target)
a = a.annotate(y=a.x.map(lambda _: hl.rand_cat([0.3, 0.2, 0.5], seed=0)))
a._force_count()

Version

0.2.124

Relevant log output

---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
Cell In[1], line 10
      8 a = a.annotate(x=b.index(a.idx, all_matches=True).target)
      9 a = a.annotate(y=a.x.map(lambda _: hl.rand_cat([0.3, 0.2, 0.5], seed=0)))
---> 10 a._force_count()

File /private/tmp/hail/hail/python/hail/table.py:441, in Table._force_count(self)
    440 def _force_count(self):
--> 441     return Env.backend().execute(ir.TableToValueApply(self._tir, {'name': 'ForceCountTable'}))

File /private/tmp/hail/hail/python/hail/backend/py4j_backend.py:72, in Py4JBackend.execute(self, ir, timed)
     71 def execute(self, ir, timed=False):
---> 72     jir = self._to_java_value_ir(ir)
     73     stream_codec = '{"name":"StreamBufferSpec"}'
     74     # print(self._hail_package.expr.ir.Pretty.apply(jir, True, -1))

File /private/tmp/hail/hail/python/hail/backend/py4j_backend.py:164, in Py4JBackend._to_java_value_ir(self, ir)
    163 def _to_java_value_ir(self, ir):
--> 164     return self._to_java_ir(ir, self._parse_value_ir)

File /private/tmp/hail/hail/python/hail/backend/py4j_backend.py:145, in Py4JBackend._to_java_ir(self, ir, parse)
    143     r = CSERenderer(stop_at_jir=True)
    144     # FIXME parse should be static
--> 145     ir._jir = parse(r(finalize_randomness(ir)), ir_map=r.jirs)
    146 return ir._jir

File /private/tmp/hail/hail/python/hail/backend/py4j_backend.py:149, in Py4JBackend._parse_value_ir(self, code, ref_map, ir_map)
    148 def _parse_value_ir(self, code, ref_map={}, ir_map={}):
--> 149     return self._jbackend.parse_value_ir(
    150         code,
    151         {k: t._parsable_string() for k, t in ref_map.items()},
    152         ir_map)

File ~/miniconda3/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()

File /private/tmp/hail/hail/python/hail/backend/py4j_backend.py:35, in handle_java_exception.<locals>.deco(*args, **kwargs)
     33     tpl = Env.jutils().handleForPython(e.java_exception)
     34     deepest, full, error_id = tpl._1(), tpl._2(), tpl._3()
---> 35     raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
     36 except pyspark.sql.utils.CapturedException as e:
     37     raise FatalError('%s\n\nJava stack trace:\n%s\n'
     38                      'Hail version: %s\n'
     39                      'Error summary: %s' % (e.desc, e.stackTrace, hail.__version__, e.desc)) from None

FatalError: ClassCastException: class is.hail.types.virtual.TStruct cannot be cast to class is.hail.types.virtual.TIterable (is.hail.types.virtual.TStruct and is.hail.types.virtual.TIterable are in unnamed module of loader 'app')

Java stack trace:
java.lang.RuntimeException: typ: inference failure:
	at is.hail.expr.ir.IR.typ(IR.scala:38)
	at is.hail.expr.ir.IR.typ$(IR.scala:33)
	at is.hail.expr.ir.ToStream.typ(IR.scala:300)
	at is.hail.expr.ir.IRParser$.$anonfun$ir_value_expr_1$81(Parser.scala:1111)
	at is.hail.utils.StackSafe$More.advance(StackSafe.scala:60)
	at is.hail.utils.StackSafe$.run(StackSafe.scala:16)
	at is.hail.utils.StackSafe$StackFrame.run(StackSafe.scala:32)
	at is.hail.expr.ir.IRParser$.$anonfun$parse_value_ir$1(Parser.scala:2157)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:2153)
	at is.hail.expr.ir.IRParser$.parse_value_ir(Parser.scala:2157)
	at is.hail.backend.spark.SparkBackend.$anonfun$parse_value_ir$2(SparkBackend.scala:691)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:76)
	at is.hail.utils.package$.using(package.scala:637)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:76)
	at is.hail.utils.package$.using(package.scala:637)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:62)
	at is.hail.backend.spark.SparkBackend.$anonfun$withExecuteContext$1(SparkBackend.scala:345)
	at is.hail.backend.spark.SparkBackend.$anonfun$parse_value_ir$1(SparkBackend.scala:690)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:59)
	at is.hail.backend.spark.SparkBackend.parse_value_ir(SparkBackend.scala:689)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)

java.lang.ClassCastException: class is.hail.types.virtual.TStruct cannot be cast to class is.hail.types.virtual.TIterable (is.hail.types.virtual.TStruct and is.hail.types.virtual.TIterable are in unnamed module of loader 'app')
	at is.hail.expr.ir.InferType$.apply(InferType.scala:115)
	at is.hail.expr.ir.IR.typ(IR.scala:36)
	at is.hail.expr.ir.IR.typ$(IR.scala:33)
	at is.hail.expr.ir.ToStream.typ(IR.scala:300)
	at is.hail.expr.ir.IRParser$.$anonfun$ir_value_expr_1$81(Parser.scala:1111)
	at is.hail.utils.StackSafe$More.advance(StackSafe.scala:60)
	at is.hail.utils.StackSafe$.run(StackSafe.scala:16)
	at is.hail.utils.StackSafe$StackFrame.run(StackSafe.scala:32)
	at is.hail.expr.ir.IRParser$.$anonfun$parse_value_ir$1(Parser.scala:2157)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:2153)
	at is.hail.expr.ir.IRParser$.parse_value_ir(Parser.scala:2157)
	at is.hail.backend.spark.SparkBackend.$anonfun$parse_value_ir$2(SparkBackend.scala:691)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:76)
	at is.hail.utils.package$.using(package.scala:637)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:76)
	at is.hail.utils.package$.using(package.scala:637)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:62)
	at is.hail.backend.spark.SparkBackend.$anonfun$withExecuteContext$1(SparkBackend.scala:345)
	at is.hail.backend.spark.SparkBackend.$anonfun$parse_value_ir$1(SparkBackend.scala:690)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:59)
	at is.hail.backend.spark.SparkBackend.parse_value_ir(SparkBackend.scala:689)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)




Hail version: 0.2.124-b115f6a6ec23
Error summary: ClassCastException: class is.hail.types.virtual.TStruct cannot be cast to class is.hail.types.virtual.TIterable (is.hail.types.virtual.TStruct and is.hail.types.virtual.TIterable are in unnamed module of loader 'app')
@danking danking added the bug label Sep 23, 2023
@danking
Copy link
Contributor Author

danking commented Sep 23, 2023

Also this:

import hail as hl

mt = hl.read_matrix_table('nfam_100_nindep_0_step1_includeMoreRareVariants_poly.mt')
gene_intervals = hl.import_locus_intervals('genes.interval_list')
rows = mt.rows()
rows = rows.annotate(gene=gene_intervals.index(rows.locus, all_matches=True).target)
rows = rows.select('gene')
rows = rows.explode('gene')
rows = rows.filter(hl.is_defined(rows.gene))
rows = rows.group_by('gene').aggregate(start=hl.agg.min(rows.locus.position),
                                       end=hl.agg.max(rows.locus.position),
                                       n_variants=hl.agg.count())
rows = mt.rows()
rows = rows.annotate(gene=gene_intervals.index(rows.locus, all_matches=True).target)
foo = rows
foo = foo.annotate(csq=foo.gene.map(lambda _: hl.rand_cat([0.3, 0.2, 0.5], seed=0)))
foo._force_count()

@danking
Copy link
Contributor Author

danking commented Sep 23, 2023

Also this:

import hail as hl

mt = hl.read_matrix_table('nfam_100_nindep_0_step1_includeMoreRareVariants_poly.mt')
gene_intervals = hl.import_locus_intervals('genes.interval_list')
rows = mt.rows()
rows = rows.annotate(gene=gene_intervals.index(rows.locus, all_matches=True).target)
foo = rows
foo = foo.annotate(csq=foo.gene.map(lambda _: hl.rand_cat([0.3, 0.2, 0.5], seed=0)))
foo._force_count()

@danking
Copy link
Contributor Author

danking commented Sep 25, 2023

This fails:

import hail as hl
gene_intervals = hl.import_locus_intervals('genes.interval_list')
rows = hl.balding_nichols_model(1,1,1).rows()
rows = rows.annotate(gene=gene_intervals.index(rows.locus, all_matches=True).target)
foo = rows
foo = foo.annotate(csq=foo.gene.map(lambda _: hl.rand_cat([0.3, 0.2, 0.5], seed=0)))
foo._force_count()

@danking
Copy link
Contributor Author

danking commented Sep 25, 2023

At last, here's one without any source files:

import hail as hl
rows = hl.balding_nichols_model(1,1,1).rows()
gene_intervals = rows.key_by(interval=hl.interval(rows.locus, hl.locus(rows.locus.contig, rows.locus.position + 1)))
gene_intervals = gene_intervals.annotate(target='foo')
rows = rows.annotate(gene=gene_intervals.index(rows.locus, all_matches=True).target)
foo = rows
foo = foo.annotate(csq=foo.gene.map(lambda _: hl.rand_cat([0.3, 0.2, 0.5], seed=0)))
foo._force_count()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants