[query] Add query_matrix_table an analogue to query_table#14806
Conversation
| private val zippedReader = PartitionZippedNativeReader(rowsReader, entriesReader) | ||
|
|
||
| def contextType = rowsReader.contextType | ||
| def fullRowType = zippedReader.fullRowType |
There was a problem hiding this comment.
@patrick-schultz I'm getting an NPE in initialization, I think it's because this assertion in PartitionReader is running before zippedReader is initialized. Thoughts on resolving that?
hail/hail/hail/src/is/hail/expr/ir/IR.scala
Lines 1300 to 1301 in ecaa96b
There was a problem hiding this comment.
That's annoying. My first thought is to make the sub-readers lazy vals. Maybe try that for now, and we can think more if there's any more satisfying fix.
There was a problem hiding this comment.
Thanks for the suggestion. It worked. I guess lazy val can force initialization, but val is only truly valid after full initialization of the class.
There was a problem hiding this comment.
Right, lazy val foo = init is basically private var _foo = null and def foo = { if (_foo == null) _foo = init; _foo }. So it will be initialized the first time it's accessed, rather than when the class initializer is run. Usually that means it's initialized later than the class, but in this case it happens earlier.
e065c6d to
4e0dbe5
Compare
|
I think this is ready for review. |
b04b3f5 to
cd1d3a0
Compare
| private[this] class PartitionEntriesNativeIntervalReader( | ||
| sm: HailStateManager, | ||
| entriesPath: String, | ||
| entriesSpec: AbstractTableSpec, | ||
| uidFieldName: String, | ||
| rowsTableSpec: AbstractTableSpec, | ||
| ) extends PartitionNativeIntervalReader(sm, entriesPath, entriesSpec, uidFieldName) { | ||
| override lazy val partitioner = rowsTableSpec.rowsSpec.partitioner(sm) | ||
| } | ||
|
|
||
| case class PartitionZippedNativeIntervalReader( | ||
| sm: HailStateManager, | ||
| mtPath: String, | ||
| mtSpec: AbstractMatrixTableSpec, | ||
| uidFieldName: String, | ||
| ) extends PartitionReader { | ||
| require(mtSpec.indexed) | ||
|
|
||
| // XXX: rows and entries paths are hardcoded, see MatrixTableSpec | ||
| private lazy val rowsReader = | ||
| PartitionNativeIntervalReader(sm, mtPath + "/rows", mtSpec.rowsSpec, "__dummy") | ||
|
|
||
| private lazy val entriesReader = | ||
| new PartitionEntriesNativeIntervalReader( | ||
| sm, | ||
| mtPath + "/entries", | ||
| mtSpec.entriesSpec, | ||
| uidFieldName, | ||
| rowsReader.tableSpec, | ||
| ) | ||
|
|
||
| private lazy val zippedReader = PartitionZippedNativeReader(rowsReader, entriesReader) | ||
|
|
||
| def contextType = rowsReader.contextType | ||
| def fullRowType = zippedReader.fullRowType |
There was a problem hiding this comment.
Forgive me Chris, I'm not really following what's mandating all these private lazy vals.
Since you're not using PartitionZippedNativeIntervalReader for pattern matching, can you make this a smart constructor that returns an anonymous instance of PartitionReader? Perhaps that might solve the initialisation order issue?
There was a problem hiding this comment.
Ok. So I'm using these values as components for the code generation. It's much easier and more understandable to reuse what we already have. They're private because they're pure implementation detail and they're lazy because without it, I run this assertion failing due to an NPE.
hail/hail/hail/src/is/hail/expr/ir/IR.scala
Line 1291 in 84262db
I don't fully understand what we mean by a smart constructor.
There was a problem hiding this comment.
Thanks for explaining. A smart constructor is one that builds and validates the state and returns a plain old data object rather than the pattern of subclassing and doing a whole bunch in the constructor. In this case, it might look something like:
def PartitionZippedNativeIntervalReader(
sm: HailStateManager,
mtPath: String,
mtSpec: AbstractMatrixTableSpec,
uidFieldname: String,
): PartitionReader = {
// build and validate your state here
val rowsReader =
new PartitionNativeIntervalReader(sm, mtPath + "/rows", mtSpec.rowsSpec, "__dummy")
object entriesReader
extends PartitionNativeIntervalReader(
sm,
mtPath + "/entries",
mySpec.entriesSpec,
uidFieldname
) {
override lazy val partitioner =
rowsTableSpec.rowsSpec.partitioner(sm)
}
val zippedReader =
PartitionZippedNativeReader(rowsReader, entriesReader)
...
// return a dumb data object
new ParitionReader {
override def contextType = rowsReader.contextType
override def fullRowType = zippedReader.fullRowType
override val uidFieldName = uidFieldname
override ..
}
}
There was a problem hiding this comment.
That's a useful pattern. I'll keep it in mind for the future.
ehigham
left a comment
There was a problem hiding this comment.
Great work. I have a few comments on readability + test parameterisation. Thanks!
08bfa1a to
4999e0f
Compare
CHANGELOG: Add query_matrix_table an analogue to query_table
6bc01af to
c1840f8
Compare
patrick-schultz
left a comment
There was a problem hiding this comment.
Thanks Chris! A little hacky, but I'm fine with it. Just needs a test for the randomness uids.
| @pytest.mark.parametrize("query,expected", query_matrix_table_rows_test_parameters()) | ||
| def test_query_matrix_table_rows(query_mt_mt, query, expected): | ||
| assert hl.eval(hl.query_matrix_table_rows(query_mt_mt, query, 'e')) == expected |
There was a problem hiding this comment.
It doesn't look like there's any test coverage for the randomness machinary. Following the strategy used by the test_table_randomness_* tests, I think something like this would do it:
def test_query_matrix_table_rows(query_mt_mt):
i1 = hl.interval(27, 45)
i2 = hl.interval(45, 80, includes_end=True)
rows = hl.query_matrix_table_rows(query_mt_mt, i1, 'e').extend(hl.query_matrix_table_rows(query_mt_mt, i2, 'e'))
x = hl.eval(rows.aggregate(hl.struct(r=hl.agg.collect_as_set(ht.r), n=hl.agg.count())))
assert len(x.r) == x.nI don't see any randomness tests for query_table either. Would you mind adding one? (Should be identical to the matrix one.) If they uncover any issues, I'm happy to help debug.
There was a problem hiding this comment.
Let's see if these new tests pass.
219ee7f to
f2a497a
Compare
CHANGELOG: Add query_matrix_table an analogue to query_table
Part of #14499.
Security Assessment
Delete all except the correct answer:
Impact Description
Increases the query python API surface, but is only a query change.