Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hail optimizer changes the type but does not show a difference in printable type #4527

Closed
danking opened this issue Oct 11, 2018 · 14 comments

Comments

Projects
None yet
5 participants
@danking
Copy link
Collaborator

commented Oct 11, 2018

Hail version:

784ab2796878

What you did:

In [1]: import hail as hl
   ...: 
   ...: t1kg = hl.balding_nichols_model(3, 100, 100)
   ...: print(t1kg.describe())
   ...: t1kg = t1kg_sm.repartition(500) 
   ...: t1kg = t1kg._filter_partitions([1])
   ...: t1kg = hl.split_multi(t1kg)
   ...: t1kg._force_count_rows()

What went wrong (all error messages here, including the full java stack trace):

FatalError: HailException: optimization changed type!
  before: Matrix{global:Struct{bn:Struct{n_populations:Int32,n_samples:Int32,n_variants:Int32,n_partitions:Int32,pop_dist:Array[Int32],fst:Array[Float64],mixture:Boolean}},col_key:[sample_idx],col:Struct{sample_idx:Int32,pop:Int32},row_key:[[locus,alleles]],row:Struct{locus:Locus(GRCh37),alleles:Array[String],ancestral_af:Float64,af:Array[Float64],a_index:Int32,was_split:Boolean,old_locus:Locus(GRCh37),old_alleles:Array[String]},entry:Struct{GT:Call}}
  after:  Matrix{global:Struct{bn:Struct{n_populations:Int32,n_samples:Int32,n_variants:Int32,n_partitions:Int32,pop_dist:Array[Int32],fst:Array[Float64],mixture:Boolean}},col_key:[sample_idx],col:Struct{sample_idx:Int32,pop:Int32},row_key:[[locus,alleles]],row:Struct{locus:Locus(GRCh37),alleles:Array[String],ancestral_af:Float64,af:Array[Float64],a_index:Int32,was_split:Boolean,old_locus:Locus(GRCh37),old_alleles:Array[String]},entry:Struct{GT:Call}}

@danking danking self-assigned this Oct 11, 2018

@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2018

Reported by Andrea Ganna.

@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2018

smaller:

In [5]: import hail as hl
   ...: 
   ...: t1kg = hl.balding_nichols_model(3, 100, 100)
   ...: t1kg = hl.split_multi(t1kg)
   ...: t1kg._force_count_rows()
@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2018

simpler:

In [12]: import hail as hl
    ...: 
    ...: t1kg = hl.utils.range_matrix_table(1,1)
    ...: t1kg = t1kg.key_rows_by(locus=hl.locus(hl.str(t1kg.row_idx), t1kg.row_idx), alleles=['A','T'])
    ...: t1kg = hl.split_multi(t1kg)
    ...: t1kg._force_count_rows()
@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2018

simpler:

In [16]: import hail as hl
    ...: 
    ...: t1kg = hl.utils.range_matrix_table(1,1)
    ...: t1kg = t1kg.key_rows_by(locus=hl.locus(hl.str(t1kg.row_idx+1), t1kg.row_idx+1), alleles=['A','T'])
    ...: t1kg.write('/tmp/foo.mt', overwrite=True)
2018-10-11 13:42:55 Hail: INFO: Coerced sorted dataset
2018-10-11 13:42:55 Hail: INFO: wrote 1 items in 1 partitions to /tmp/foo.mt
^[[A
In [17]: import hail as hl
    ...: 
    ...: t1kg = hl.read_matrix_table('/tmp/foo.mt')
    ...: t1kg = hl.split_multi(t1kg)
    ...: t1kg._force_count_rows()

error:

FatalError: HailException: optimization changed type!
  before: Matrix{global:Struct{},col_key:[col_idx],col:Struct{col_idx:Int32},row_key:[[locus,alleles]],row:Struct{row_idx:Int32,locus:Locus(GRCh37),alleles:Array[String],a_index:Int32,was_split:Boolean,old_locus:Locus(GRCh37),old_alleles:Array[String]},entry:Struct{}}
  after:  Matrix{global:Struct{},col_key:[col_idx],col:Struct{col_idx:Int32},row_key:[[locus,alleles]],row:Struct{row_idx:Int32,locus:Locus(GRCh37),alleles:Array[String],a_index:Int32,was_split:Boolean,old_locus:Locus(GRCh37),old_alleles:Array[String]},entry:Struct{}}

describe:

In [18]: import hail as hl
    ...: 
    ...: t1kg = hl.read_matrix_table('/tmp/foo.mt').describe()
    ...: 
----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    'col_idx': int32 
----------------------------------------
Row fields:
    'row_idx': int32 
    'locus': locus<GRCh37> 
    'alleles': array<str> 
----------------------------------------
Entry fields:
    None
----------------------------------------
Column key: ['col_idx']
Row key: ['locus', 'alleles']
----------------------------------------
@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2018

with a code change:

FatalError: HailException: optimization changed type!
  before: Matrix{global:Struct{},col_key:[col_idx],col:Struct{col_idx:Int32},row_key:[[locus,alleles]],row:Struct{row_idx:Int32,locus:Locus(GRCh37),alleles:Array[String],a_index:Int32,was_split:Boolean,old_locus:Locus(GRCh37),old_alleles:Array[String]},entry:Struct{}entriesIndex:3}
  after:  Matrix{global:Struct{},col_key:[col_idx],col:Struct{col_idx:Int32},row_key:[[locus,alleles]],row:Struct{row_idx:Int32,locus:Locus(GRCh37),alleles:Array[String],a_index:Int32,was_split:Boolean,old_locus:Locus(GRCh37),old_alleles:Array[String]},entry:Struct{}entriesIndex:7}

@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2018

@cseed Can we reassign to someone else to determine:

  1. why the entries index is changing in the optimizer
  2. if this is actually a bug or not
  3. make a code change that either fixes the bug or fixes the equality check.

@danking danking assigned cseed and unassigned danking Oct 11, 2018

@cseed cseed assigned chrisvittal and unassigned cseed Oct 12, 2018

@cseed

This comment has been minimized.

Copy link
Collaborator

commented Oct 12, 2018

  1. Because of a bug.

  2. Yes.

  3. The message in the exception should use BaseType.parseableType instead of pretty/toString, since that prints the missingness details.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Oct 12, 2018

this is the parsable type. It's not requiredness, it's entry array index.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Oct 12, 2018

if I have to guess it's related to union rows in PruneDeadFields:

      case MatrixUnionRows(children) =>
        val requestedType = memo.lookup(mir).asInstanceOf[MatrixType]
        MatrixUnionRows(children.map { child =>
          upcast(rebuild(child, memo), requestedType,
            upcastCols = false,
            upcastGlobals = false)
        } )

if upcast can change the entry array pos, this is it

@cseed

This comment has been minimized.

Copy link
Collaborator

commented Oct 12, 2018

this is the parsable type. It's not requiredness, it's entry array index.

Oops, you're totally right, I scanned too fast.

Right, it's that the rvRowType changed. So the rvRowType being part of the MatrixType is really a workaround until we have physical types. It shouldn't be part of the "visible" state of the MatrixType.

Yeah, upcast via MatrixMapRows can do it:

  val newRVRow = newRow.typ.asInstanceOf[TStruct].fieldOption(MatrixType.entriesIdentifier) match {
    case Some(f) =>
      assert(f.typ == child.typ.entryArrayType)
      newRow
    case None =>
      InsertFields(newRow, Seq(
        MatrixType.entriesIdentifier -> GetField(Ref("va", child.typ.rvRowType), MatrixType.entriesIdentifier)))
  }
@danking

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 17, 2018

Andrea Ganna asked about the status of this bug again on Zulip today, FYI.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Oct 17, 2018

it's not hard to fix, I think. Do we have a small replicable example?

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Oct 17, 2018

needs changes to MatrixUnionRows.typ, to account for different entries locations across children, probably, as well. That's probably a distinct existing bug.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Dec 3, 2018

bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.