Skip to content

[query] Fix accidental deletion of tabix files#12056

Merged
danking merged 2 commits intohail-is:mainfrom
chrisvittal:lowering/fix-tabix-file-deletion
Jul 27, 2022
Merged

[query] Fix accidental deletion of tabix files#12056
danking merged 2 commits intohail-is:mainfrom
chrisvittal:lowering/fix-tabix-file-deletion

Conversation

@chrisvittal
Copy link
Copy Markdown
Collaborator

When cleaning up a text export, we delete all files that our successful
tasks didn't produce. We weren't adding tabix files to this set, and so
were deleting tabix files that were created during vcf export. Now, we
explicitly add tabix files to our 'to keep' list.

When cleaning up a text export, we delete all files that our successful
tasks didn't produce. We weren't adding tabix files to this set, and so
were deleting tabix files that were created during vcf export. Now, we
explicitly add tabix files to our 'to keep' list.
val files = cb.memoize(Code.newArray[String](partPaths.loadLength() * 2))
partPaths.forEachDefined(cb) { case (cb, i, file: SStringValue) =>
val path = file.loadString(cb)
cb += (files(i * 2) = path)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be creating invalid code. I'm getting lir blocks that are not well formed.

This is the stack trace of the block formation that is later failing the 'block well formed' assertion here

E           java.lang.Thread.getStackTrace(Thread.java:1564)
E           is.hail.lir.Block.<init>(X.scala:343)
E           is.hail.asm4s.Code$.apply(Code.scala:54)
E           is.hail.asm4s.Value$$anon$3.get(Code.scala:1282)
E           is.hail.asm4s.package$.valueToCode(package.scala:287)
E           is.hail.asm4s.package$.intToCode(package.scala:358)
E           is.hail.expr.ir.VCFExportFinalizer.$anonfun$writeMetadata$3(MatrixWriter.scala:768)
E           is.hail.expr.ir.VCFExportFinalizer.$anonfun$writeMetadata$3$adapted(MatrixWriter.scala:766)
E           is.hail.types.physical.stypes.concrete.SIndexablePointerValue.$anonfun$forEachDefined$4(SIndexablePointer.scala:101)
E           is.hail.asm4s.CodeBuilderLike.ifx(CodeBuilder.scala:97)
E           is.hail.asm4s.CodeBuilderLike.ifx$(CodeBuilder.scala:88)
E           is.hail.expr.ir.EmitCodeBuilder.ifx(EmitCodeBuilder.scala:38)
E           is.hail.types.physical.stypes.concrete.SIndexablePointerValue.$anonfun$forEachDefined$2(SIndexablePointer.scala:99)
E           is.hail.asm4s.CodeBuilderLike.$anonfun$whileLoop$1(CodeBuilder.scala:119)
E           is.hail.asm4s.CodeBuilderLike.$anonfun$whileLoop$1$adapted(CodeBuilder.scala:119)
E           is.hail.asm4s.CodeBuilderLike.whileLoop(CodeBuilder.scala:114)
E           is.hail.asm4s.CodeBuilderLike.whileLoop$(CodeBuilder.scala:107)
E           is.hail.expr.ir.EmitCodeBuilder.whileLoop(EmitCodeBuilder.scala:38)
E           is.hail.asm4s.CodeBuilderLike.whileLoop(CodeBuilder.scala:119)
E           is.hail.asm4s.CodeBuilderLike.whileLoop$(CodeBuilder.scala:119)
E           is.hail.expr.ir.EmitCodeBuilder.whileLoop(EmitCodeBuilder.scala:38)
E           is.hail.types.physical.stypes.concrete.SIndexablePointerValue.forEachDefined(SIndexablePointer.scala:96)
E           is.hail.expr.ir.VCFExportFinalizer.writeMetadata(MatrixWriter.scala:766)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so with this change, I can get the tests to pass. @tpoterba @patrick-schultz why?

-      val files = cb.memoize(Code.newArray[String](partPaths.loadLength() * 2))
+      val len = partPaths.loadLength()
+      val files = cb.memoize(Code.newArray[String](len * 2))
       partPaths.forEachDefined(cb) { case (cb, i, file: SStringValue) =>
         val path = file.loadString(cb)
-        cb += (files(i * 2) = path)
+        cb += files.update(i, path)
         // FIXME(chrisvittal): this will put the string ".tbi" in generated code, we should just access the htsjdk value
-        cb += (files(i * 2 + 1) = Code.invokeStatic2[htsjdk.tribble.util.ParsingUtils, String, String, String]("appendToPath", path, htsjdk.samtools.util.FileExtensions.TABIX_INDEX))
+        cb += files.update(cb.memoize(i + len), Code.invokeStatic2[htsjdk.tribble.util.ParsingUtils, String, String, String]("appendToPath", path, htsjdk.samtools.util.FileExtensions.TABIX_INDEX))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some logic changes, like i*2 -> i in the middle replacement. Are you saying that before this change, there was a malformed lir, and not after?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I see you just changed the layout of files. I also can't see why this should affect the lir generated.

Copy link
Copy Markdown
Member

@patrick-schultz patrick-schultz Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IntelliJ's "desugar scala code" is to be believed, the old version wasn't actually desugaring to an update. That might explain it because the path wasn't getting used, so the enclosed block stayed empty. I'd expect explicitly calling update is the important change here. The other changes look fine.

@danking danking merged commit 75e852f into hail-is:main Jul 27, 2022
daniel-goldstein pushed a commit to daniel-goldstein/hail that referenced this pull request Aug 8, 2022
* [query] Fix accidental deletion of tabix files

When cleaning up a text export, we delete all files that our successful
tasks didn't produce. We weren't adding tabix files to this set, and so
were deleting tabix files that were created during vcf export. Now, we
explicitly add tabix files to our 'to keep' list.

* fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants