Cannot convert output from Scirpy to dandelion #343

sbenjamaporn · 2022-06-07T10:49:47Z

Description of the bug

I try to convert the AnnData after clonal assignment by Scirpy into dandelion format. The result showed that "field productive has invalid bool T + T". I would like to convert it for updating germline sequence of each BCR sequences using dandelion because I did not find this function in Scirpy. However, if you could suggest the other ways. Feel free to let me know.

Minimal reproducible example

import scirpy as ir

ABC_irdata_exclude_orphan_dandelion = ir.io.to_dandelion(ABC_irdata_exclude_orphan)
ABC_irdata_exclude_orphan_dandelion

The error message produced by the code above

~/.conda/envs/dandelion/lib/python3.8/site-packages/airr/schema.py in validate_row(self, row)
    276                 if spec == 'number':  self.to_float(row[f], validate=True)
    277             except ValidationError as e:
--> 278                 raise ValidationError('field %s has %s' %(f, e))
    279 
    280         return True

ValidationError: field productive has invalid bool T + T

Version information

versions

The text was updated successfully, but these errors were encountered:

grst · 2022-06-07T14:58:31Z

Hi @sbenjamaporn,

thanks for reporting this issue.
I'm not yet sure what could be the problem. Could you please report the following:

the entire stack trace of the error message (not just the last part as above)
the result of ABC_irdata_exclude_orphan.obs.columns

Thanks,
Gregor

sbenjamaporn · 2022-06-07T17:05:46Z

Thanks for your prompt response!

This is my results attached as a pdf file.

Scirpy_to_Dandelion_error.pdf

grst · 2022-06-08T14:36:20Z

@sbenjamaporn, thanks for the stacktrace! Regarding my second request, you checked adata.columns instead of adata.obs.columns. Could you please send me the result of the latter?

@zktuong, according to the stacktrace, the error occurs within Dandelion. It could theoretically be that there's a problem with scirpy's output, but I have absolutely no idea where the T + T should come from. Do you have an idea what could be the problem?

sbenjamaporn · 2022-06-08T14:42:59Z

Sorry for missunderstanding.
This is the result of adata.obs.columns

zktuong · 2022-06-08T15:52:48Z

hi @grst @sbenjamaporn

I think it's got to do with a malformed entries in productive columns in the adata.obs that airrCell is getting.

So @sbenjamaporn, can you check what's the unique values for:
scirpy's columns:
['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive']
There shouldn't be any T + T in these

I think the following are dandelion's columns:
['productive_VDJ', 'productive_VJ', 'productive', 'productive_summary']
There would be T + T in these. But these columns would be ignored by scirpy when converting to an airr table?

I don't think dandelion use any of these columns when converting (it just refreshes based on the airr table that scirpy produces)

I wonder if the round trip of scirpy <-> dandelion somehow got the columns confused?

sbenjamaporn · 2022-06-08T16:22:55Z

Dear @zktuong @grst

Yes, ['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive'] show only "TRUE or None" output. I agree with you about the convertion between them make it confuse. So, we could not convert scirpy result to dandelion, right?

grst · 2022-06-08T16:31:23Z

What's also weird is that in addition to the IR_V(D)J_1/2_productive you also have a productive column. What does this one contain?

"productive" should be a chain-level attribute (i.e only available as IR_V(D)J_1/2_productive).

Could you please describe

how you initially load the data into scirpy (ir.io.read_10x_vdj, ir.io.read_airr, etc)?
if you do any additional conversions between dandelion and scirpy before this error occurs?

zktuong · 2022-06-08T16:59:24Z

Hmm it looks like those additional productive columns are from dandelion.

Can you try and remove every column from “clone_id_by_size” onwards and see if there’s still the issue of conversion?

sbenjamaporn · 2022-06-08T17:00:29Z

@grst,
The productive column shows "T + T" . I preprocessed my scBCR sequences from Dandelion and then use "ddl.to_scirpy". After that, I define clone by Scirpy. Now, I would like to convert it back in order to update germline to study mutational analysis in Dandelion.

sbenjamaporn · 2022-06-08T18:34:46Z

Dear @grst,

Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)

sbenjamaporn · 2022-06-08T18:35:45Z

@zktuong Thanks for suggestion, I will try!

grst · 2022-06-08T18:40:22Z

Hmm it looks like those additional productive columns are from dandelion.

It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy is just calling scirpy code?

Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)

Happy to discuss this. Could you please open a separate issue and describe what's missing?

zktuong · 2022-06-09T08:25:54Z

It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy is just calling scirpy code?

That's right. it's just a wrapper to call ir.io.from_dandelion.

A small update on this - the issue seems to lie in:

# works ok
irdata = ddl.to_scirpy(vdj) # or ir.io.from_dandelion(vdj)
vdj2 = ir.io.to_dandelion(irdata)

# same issue with ValidationError: field productive has invalid bool T + T appears
irdata = ddl.to_scirpy(vdj, transfer = True)  # or ir.io.from_dandelion(vdj, transfer = True)
vdj2 = ir.io.to_dandelion(irdata)

zktuong · 2022-06-09T08:50:14Z

ok. the 'issue' is with line 273:

scirpy/scirpy/io/_datastructures.py

Lines 260 to 273 in 2c5b99e

    
               def to_airr_records(self) -> Iterable[dict]: 
        
                   """Iterate over chains as AIRR-Rearrangent compliant dictonaries. 
        
                   Each dictionary will also include the cell-level information. 
        
                   Yields 
        
                   ------ 
        
                   Dictionary representing one row of a AIRR rearrangement table 
        
                   """ 
        
                   for tmp_chain in self.chains: 
        
                       chain = AirrCell.empty_chain_dict() 
        
                       # add the actual data 
        
                       chain.update(tmp_chain) 
        
                       # add cell-level attributes 
        
                       chain.update(self)

because dandelion's productive column in the metadata will update the productive key that scirpy was making from to_airr_cells because it appears later

scirpy/scirpy/io/_io.py

Lines 726 to 739 in 2c5b99e

    
           airr_cells = to_airr_cells(adata) 
        
           contig_dicts = {} 
        
           for tmp_cell in airr_cells: 
        
               for i, chain in enumerate(tmp_cell.to_airr_records(), start=1): 
        
                   # dandelion-specific modifications 
        
                   chain.update( 
        
                       { 
        
                           "sequence_id": f"{tmp_cell.cell_id}_contig_{i}", 
        
                       } 
        
                   ) 
        
                   contig_dicts[chain["sequence_id"]] = chain 
        
           data = pd.DataFrame.from_dict(contig_dicts, orient="index")

Can confirm that if just change the name away from productive on dandelion's side, it resolves this.

@sbenjamaporn if you just rename the current productive column to productive_status:

adata.obs.rename(columns={'productive':'productive_status'}, inplace=True)

you should be able to do the transfer.

I will action this on dandelion's side to rename productive to productive_status.

sbenjamaporn · 2022-06-09T16:22:56Z

@zktuong Thanks, It works now!.

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

sbenjamaporn · 2022-06-09T16:25:37Z

@grst
Of course!

grst · 2022-06-09T17:53:56Z

Thanks a lot @zktuong! LMK once you have a release including the fix, then I'll pin the latest version of dandelion.

zktuong · 2022-06-10T12:55:46Z

@zktuong Thanks, It works now!.

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

Let's follow up over on dandelion's side:
zktuong/dandelion#153

zktuong · 2022-06-27T14:11:07Z

Hi @grst,

i've just release sc-dandelion==0.2.3 (it's actually 0.2.2 but i thought my upload went wrong)
Need to wait for pypi/warehouse#11696 to be fixed before making any changes here though... =(

sbenjamaporn added the bug Something isn't working label Jun 7, 2022

zktuong mentioned this issue Jun 9, 2022

[BUG] scirpy conversion - rename productive zktuong/dandelion#153

Closed

grst mentioned this issue Jun 15, 2022

Speed upgrade - Refactor generate network zktuong/dandelion#152

Merged

grst mentioned this issue Jun 28, 2022

Update minimal dandelion version #347

Merged

grst added this to In progress in scirpy-dev Jun 28, 2022

grst closed this as completed in #347 Jun 28, 2022

scirpy-dev automation moved this from In progress to Done Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot convert output from Scirpy to dandelion #343

Cannot convert output from Scirpy to dandelion #343

sbenjamaporn commented Jun 7, 2022

grst commented Jun 7, 2022

sbenjamaporn commented Jun 7, 2022 •

edited

grst commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

zktuong commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

grst commented Jun 8, 2022

zktuong commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

grst commented Jun 8, 2022

zktuong commented Jun 9, 2022 •

edited

zktuong commented Jun 9, 2022

sbenjamaporn commented Jun 9, 2022

sbenjamaporn commented Jun 9, 2022

grst commented Jun 9, 2022

zktuong commented Jun 10, 2022

zktuong commented Jun 27, 2022

Cannot convert output from Scirpy to dandelion #343

Cannot convert output from Scirpy to dandelion #343

Comments

sbenjamaporn commented Jun 7, 2022

Description of the bug

Minimal reproducible example

The error message produced by the code above

Version information

grst commented Jun 7, 2022

sbenjamaporn commented Jun 7, 2022 • edited

grst commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

zktuong commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

grst commented Jun 8, 2022

zktuong commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

sbenjamaporn commented Jun 8, 2022

grst commented Jun 8, 2022

zktuong commented Jun 9, 2022 • edited

zktuong commented Jun 9, 2022

sbenjamaporn commented Jun 9, 2022

sbenjamaporn commented Jun 9, 2022

grst commented Jun 9, 2022

zktuong commented Jun 10, 2022

zktuong commented Jun 27, 2022

sbenjamaporn commented Jun 7, 2022 •

edited

zktuong commented Jun 9, 2022 •

edited