Skip to content

[query] Stop silently exporting null bgens if no GP field exists#12650

Merged
danking merged 4 commits intohail-is:mainfrom
tpoterba:bgen-wtf
Feb 7, 2023
Merged

[query] Stop silently exporting null bgens if no GP field exists#12650
danking merged 4 commits intohail-is:mainfrom
tpoterba:bgen-wtf

Conversation

@tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Feb 3, 2023

could use a second look here -- am I crazy?

If GP is missing from the MT and no expression is passed, there's no GP in the MT passed into the writer.

And if there's no GP in the writer, we export a file worth of nulls.

def emitNullGP(cb: EmitCodeBuilder): Unit = cb.forLoop(cb.assign(i, 0), i < nGenotypes - 1, cb.assign(i, i + 1), add(cb, uncompBuf, 0))
val entries = elt.loadField(cb, entriesFieldName).get(cb).asIndexable
entries.forEachDefinedOrMissing(cb)({ (cb, j) =>
emitNullGP(cb)
}, { case (cb, j, entry: SBaseStructValue) =>
if (entry.st.virtualType.fieldIdx.get("GP").isDefined) {
entry.loadField(cb, "GP").consume(cb, emitNullGP(cb), { gp =>
val gpSum = cb.newLocal[Double]("gpSum", 0d)
gp.asIndexable.forEachDefined(cb) { (cb, idx, x) =>
val gpv = x.asDouble.value
cb.ifx(gpv < 0d,
cb._fatal("found GP value less than 0: ", gpv.toS, ", at sample ", j.toS, " of variant", chr, ":", pos.toS))
cb.assign(gpSum, gpSum + gpv)
cb += (gpResized(idx) = gpv * BgenWriter.totalProb.toDouble)
}
cb.ifx(gpSum >= 0.999 && gpSum <= 1.001, {
cb += uncompBuf.invoke[Int, Byte, Unit]("update", samplePloidyStart + j, BgenWriter.ploidy)
cb += Code.invokeScalaObject6[Array[Double], Array[Double], Array[Int], Array[Int], ByteArrayBuilder, Long, Unit](BgenWriter.getClass, "roundWithConstantSum",
gpResized, fractional, index, indexInverse, uncompBuf, BgenWriter.totalProb.toLong)
}, {
cb.assign(dropped, dropped + 1l)
emitNullGP(cb)
})
})
} else {
emitNullGP(cb)

danking
danking previously requested changes Feb 3, 2023
Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that seems not great. I think we need to keep the if. I also noticed on line 243/245 that we have export_gen where there should be an export/bgen.

entries.forEachDefinedOrMissing(cb)({ (cb, j) =>
emitNullGP(cb)
}, { case (cb, j, entry: SBaseStructValue) =>
if (entry.st.virtualType.fieldIdx.get("GP").isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the GP field still be missing even though the entries aren't? (Meaning we need this if)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is asking whether the GP field is not in the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing GP values are handled by the consume

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do the same for GEN though!

@danking danking merged commit 7705a26 into hail-is:main Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants