# Write MT Contigs<a class="tocSkip">

**In this notebook, we will parse the PAF files from mapping the MT contigs to the ref version of chrM. We will first select the "best" MT contig based on minimap2 alignments. After selecting the best MT contig for each sample, we will write that MT contig to a file (including reverse complementing the contig, if neccesary).**

 
**The steps that we will take are:**
1. Import Statements & Global Variable Definitions
2. Load Data Table
3. Select Best MT Hits
4. Write MT Contigs

# Preparation

**Install Tools**

In [None]:
## Neccesary because GATK container comes with super old samtools (see note below)
%%capture
%pip install pyfaidx

In [None]:
%%capture
%pip install gcsfs

In [None]:
%%capture
%pip install --upgrade --no-cache-dir terra-pandas
%pip install --upgrade --no-cache-dir terra-notebook-utils

In [None]:
%%capture
%pip install --no-cache-dir -U crcmod

**Import Libraries**

In [28]:
%%capture 
import os
import io
import pandas as pd
import numpy as np
import gcsfs
import gzip

import terra_notebook_utils as tnu
import terra_pandas as tp
from pyfaidx import Fasta
from Bio import SeqIO
from Bio.Seq import Seq, Alphabet 
from Bio.SeqRecord import SeqRecord

**Find Workspace Info**

In [2]:
# Get the Google billing project name and workspace name
PROJECT   = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE = os.path.basename(os.path.dirname(os.getcwd()))
bucket    = os.environ['WORKSPACE_BUCKET'] + "/"

# Verify that we've captured the environment variables
print("Billing project: " + PROJECT)
print("Workspace: " + WORKSPACE)
print("Workspace storage bucket: " + bucket)

Billing project: human-pangenome-ucsc
Workspace: HPRC_Reassembly
Workspace storage bucket: gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/


## Function Definitions

In [3]:
def gz_size(fname):
    with gzip.open(fname, 'rb') as f:
        return f.seek(0, whence=2)

# Load Data Table

In [4]:
clean_sample_df = tp.table_to_dataframe("clean_sample")

clean_sample_df.head()

Unnamed: 0_level_0,mat_masked_fa,mito_against_ref_paf,pat_masked_cleaned_fa,mat_masked_cleaned_fa,pat_mito_contig_ls,pat_contam_results,mat_mito_contig_ls,hifiasm_mat_fa,all_mito_contigs,hifiasm_pat_fa,sample_name,pat_drop_contigs,mat_drop_contigs,pat_masked_fa,mat_contam_results
clean_sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
HG002_downsampled,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/3...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/1...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/2...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/8...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/2...,HG002,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG002_full_v0.14,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/3...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/1...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/f...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/8...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/f...,HG002,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG00438,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/3...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/1...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/8...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,HG00438,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG005,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/3...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/1...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/8...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,HG005,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG00621,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/3...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/1...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/m...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/8...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,HG00621,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...


# Select Best MT Hits

In [5]:
! mkdir raw_contigs
%cd raw_contigs

mkdir: cannot create directory ‘raw_contigs’: File exists
/home/jupyter-user/notebooks/HPRC_Reassembly/edit/raw_contigs


In [6]:
## Set paf column names (incomplete)
paf_col_names = ["Query_sequence_name", "Query_sequence_length", "Query_start", "Query_end",
                 "Relative_strand", "Target_sequence_name","Target_sequence_length", "Target_start",
                 "Target_end", "Number_of_residue_matches", "Alignment_block_length", "Mapping_quality", 
                 "NM", "ms", "AS", "nn", "tp", "cm", "s1"]

In [7]:
## Create dataframe to hold best hits for each sample
all_results_df = pd.DataFrame()
no_good_hit_ls = []

for index, row in clean_sample_df.iterrows():
    
    sample_name        = row.name
    print(f"Pulling: {sample_name}")
    
    ## Pull PAF file with hits of mito contigs against chrM reference
    paf_fp = row['mito_against_ref_paf']
    paf_fn     = os.path.basename(paf_fp)
    
    ! gsutil cp {paf_fp} ./
    
    
    ## If the file is empty, then we don't have any hits
    if gz_size(paf_fn) == 0:
        no_good_hit_ls.append(sample_name)
    
    else:
        ## Pull only first 19 columns of paf (because later columns are inconsistent)
        cut_fn = f"{paf_fn}.cut"
        
        ! zcat {paf_fn} | cut -f 1-19 > {cut_fn}
        paf_df = pd.read_csv(cut_fn, sep='\t', names=paf_col_names)
        
        ## Add sample name
        paf_df['sample_name'] = sample_name
        
        ## Make sure to only pull from alignments which have long enough block lengths
        is_align_over_cutoff = paf_df['Alignment_block_length'] > 16560
        
        paf_df = paf_df[is_align_over_cutoff]
        paf_df = paf_df.reset_index()
            
        if paf_df.empty:
            no_good_hit_ls.append(sample_name)
        
        else:
            ## Extract Ints for NM (number of mismatches) & AS (DP alignment score)
            paf_df['NM_int'] = (paf_df['NM'].str.split(":", expand=True)[2])
            paf_df['NM_int'] = paf_df['NM_int'].astype('int32')

            paf_df['AS_int'] = (paf_df['AS'].str.split(":", expand=True)[2])
            paf_df['AS_int'] = paf_df['AS_int'].astype('int32')

            
            ## Find row(s) with highest alignment score and create new data frame
            AS_max           = paf_df['AS_int'].max()      
            is_AS_max_row    = paf_df['AS_int'] == AS_max
            
            max_df = paf_df.loc[(is_AS_max_row)]
            max_df = max_df.reset_index()

            
            ## Keep rows with lowest number of mismatches (and highest alignment score)
            all_results_df = all_results_df.append(max_df.iloc[max_df['NM_int'].idxmin()])

Pulling: HG002_downsampled
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/b507a11a-14da-493c-905f-c4aa50ff96a8/call-alignAndGzip/HG002_downsampled.paf.gz...
/ [1 files][  3.5 KiB/  3.5 KiB]                                                
Operation completed over 1 objects/3.5 KiB.                                      
Pulling: HG002_full_v0.14
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/607dc26b-98b6-4dca-b02f-cd83fbd3d1a5/call-alignAndGzip/HG002_full_v0.14.paf.gz...
/ [1 files][ 20.2 KiB/ 20.2 KiB]                                                
Operation completed over 1 objects/20.2 KiB.                                     
Pulling: HG00438
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/8f5fbd0c-7779-46c2-92ba-24723937a07d/call-alignAndGzip/HG00438.paf.gz...
/ [1 files][  437.0 B/  437.0 B]                                     

/ [1 files][  3.8 KiB/  3.8 KiB]                                                
Operation completed over 1 objects/3.8 KiB.                                      
Pulling: HG02109
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/648fb3f1-91e7-4571-9044-633394caaee0/call-alignAndGzip/HG02109.paf.gz...
/ [1 files][  961.0 B/  961.0 B]                                                
Operation completed over 1 objects/961.0 B.                                      
Pulling: HG02145
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/4817b297-11a3-445a-9531-ddcd0de9a1bb/call-alignAndGzip/HG02145.paf.gz...
/ [1 files][  2.2 KiB/  2.2 KiB]                                                
Operation completed over 1 objects/2.2 KiB.                                      
Pulling: HG02148
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/953625f8-211f-49d

Pulling: NA21309
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d19b0864-80fb-4b27-b69a-e9b1bd66afd9/minimap2/5ad2fc24-b542-48fa-8eff-0f8e23b82944/call-alignAndGzip/NA21309.paf.gz...
/ [1 files][  6.0 KiB/  6.0 KiB]                                                
Operation completed over 1 objects/6.0 KiB.                                      


In [8]:
no_good_hit_ls

['HG01071']

In [9]:
all_results_df

Unnamed: 0,AS,AS_int,Alignment_block_length,Mapping_quality,NM,NM_int,Number_of_residue_matches,Query_end,Query_sequence_length,Query_sequence_name,...,Target_sequence_name,Target_start,cm,index,level_0,ms,nn,s1,sample_name,tp
0,AS:i:16221,16221.0,16570.0,60.0,NM:i:16,16.0,16555.0,17055.0,31847.0,h2tg000464l,...,NC_012920.1,0.0,cm:i:1613,48.0,13.0,ms:i:16221,nn:i:1,s1:i:16327,HG002_downsampled,tp:A:P
0,AS:i:16201,16201.0,16570.0,60.0,NM:i:17,17.0,16554.0,46291.0,49390.0,h2tg000190l,...,NC_012920.1,0.0,cm:i:1610,40.0,12.0,ms:i:16201,nn:i:1,s1:i:16327,HG002_full_v0.14,tp:A:P
0,AS:i:15686,15686.0,16572.0,60.0,NM:i:41,41.0,16532.0,28076.0,32898.0,h2tg000171c,...,NC_012920.1,0.0,cm:i:1571,0.0,0.0,ms:i:15686,nn:i:1,s1:i:16042,HG00438,tp:A:P
0,AS:i:15721,15721.0,16569.0,60.0,NM:i:39,39.0,16531.0,31164.0,31458.0,h2tg000500l,...,NC_012920.1,0.0,cm:i:1559,440.0,118.0,ms:i:15721,nn:i:1,s1:i:15933,HG005,tp:A:P
0,AS:i:15902,15902.0,16570.0,60.0,NM:i:30,30.0,16541.0,30797.0,47480.0,h2tg000178l,...,NC_012920.1,0.0,cm:i:1583,25.0,6.0,ms:i:15902,nn:i:1,s1:i:16105,HG00621,tp:A:P
0,AS:i:15774,15774.0,16569.0,60.0,NM:i:36,36.0,16534.0,21397.0,31188.0,h2tg000302l,...,NC_012920.1,0.0,cm:i:1572,6.0,2.0,ms:i:15774,nn:i:1,s1:i:16056,HG00673,tp:A:P
1,AS:i:15814,15814.0,16569.0,60.0,NM:i:34,34.0,16536.0,24716.0,37545.0,h2tg000627l,...,NC_012920.1,0.0,cm:i:1568,14.0,4.0,ms:i:15814,nn:i:1,s1:i:16037,HG00733,tp:A:P
0,AS:i:15484,15484.0,16569.0,60.0,NM:i:49,49.0,16521.0,26290.0,27817.0,h2tg000218l,...,NC_012920.1,0.0,cm:i:1540,0.0,0.0,ms:i:15484,nn:i:1,s1:i:15847,HG00735,tp:A:P
0,AS:i:14837,14837.0,16569.0,60.0,NM:i:84,84.0,16486.0,39427.0,46439.0,h2tg000185c,...,NC_012920.1,0.0,cm:i:1476,0.0,0.0,ms:i:14837,nn:i:1,s1:i:15385,HG00741,tp:A:P
0,AS:i:15711,15711.0,16571.0,60.0,NM:i:42,42.0,16530.0,28374.0,32901.0,h2tg000202l,...,NC_012920.1,0.0,cm:i:1558,0.0,0.0,ms:i:15711,nn:i:1,s1:i:15995,HG01106,tp:A:P


# Write MT Contigs

In [39]:
for index, row in clean_sample_df.iterrows():
    
    sample_name = row.name
    id_for_header = row['sample_name']
    
    ## Skip HG01701 because it doesn't have any MT contigs (known problem)
    if(sample_name == "HG01071"): 
        continue
        
    print(f"Pulling: {sample_name}")
    
    ## Download fasta with mito contigs
    mito_contig_fp = row['all_mito_contigs']
    
    ## Be sure to pull the manually rotated contigs when neccesary
    if sample_name == "HG02559":
        mito_contig_fp = "gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/mito_work/manually_rotated/HG02559.all_mito_contigs.fa"
        
    if sample_name == "HG03098":
        mito_contig_fp = "gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/mito_work/manually_rotated/HG03098.all_mito_contigs.fa"
    
    
    mito_contig_fn = os.path.basename(mito_contig_fp)

    ! gsutil cp {mito_contig_fp} ./
    
    
    ## Pull info about hit from dataframe with paf info for all samples
    is_sample = all_results_df['sample_name'] == sample_name
    
    seq_name        = all_results_df[is_sample]['Query_sequence_name'].values[0]
    seq_start       = all_results_df[is_sample]['Query_start'].values[0].astype('int32')
    seq_end         = all_results_df[is_sample]['Query_end'].values[0].astype('int32')
    is_minus_strand = all_results_df[is_sample]['Relative_strand'].values[0] == '-'
    

    ## Use pyfaidx to pull/write sequence 
    final_mito_contig_fn = f"{sample_name}.mt.fa"  
    mito_contgs = Fasta(mito_contig_fn)

    with open(final_mito_contig_fn, 'w') as final_mito_f:
        ## If hit (from minimap2) is on minus strand, we need to rev. compl. the sequence
        if(is_minus_strand):
            mito_entry = mito_contgs[seq_name][seq_start:seq_end].reverse.complement
        else:
            mito_entry = mito_contgs[seq_name][seq_start:seq_end]
        
        ## Write with expected contig naming structure
        # final_mito_f.write(f">{id_for_header}#2#MT\n")
        # final_mito_f.write(mito_entry.seq)
        
        record = SeqRecord(
            Seq(mito_entry.seq),
            id=f"{id_for_header}#2#MT",
            name=f"{id_for_header}#2#MT",
            description="")
        
        SeqIO.write(record, final_mito_f, "fasta")

    ## Copy up to bucket
    ! gsutil cp {final_mito_contig_fn} {bucket}corrected_mt_contig/{final_mito_contig_fn}

Pulling: HG002_downsampled
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/85e90c15-bb68-4ea2-bfe2-56fd21fbb795/extractMitoContigs/2ff987db-5231-49f3-9e89-a06c9cbd7df3/call-extractContigs/cacheCopy/HG002_downsampled.all_mito_contigs.fa...
/ [1 files][611.8 KiB/611.8 KiB]                                                
Operation completed over 1 objects/611.8 KiB.                                    
Copying file://HG002_downsampled.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                                     
Pulling: HG002_full_v0.14
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/85e90c15-bb68-4ea2-bfe2-56fd21fbb795/extractMitoContigs/26d0bbe2-4384-4e37-93ea-270f074dbbbe/call-extractContigs/cacheCopy/HG002_full_v0.14.all_mito_contigs.fa...
/ [1 files][  4.7 MiB/  4.7 MiB]                                                
Operation completed over 

/ [1 files][489.6 KiB/489.6 KiB]                                                
Operation completed over 1 objects/489.6 KiB.                                    
Copying file://HG01243.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                                     
Pulling: HG01258
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/85e90c15-bb68-4ea2-bfe2-56fd21fbb795/extractMitoContigs/302512a7-6a8c-4222-83df-275f4c1a2424/call-extractContigs/cacheCopy/HG01258.all_mito_contigs.fa...
/ [1 files][828.4 KiB/828.4 KiB]                                                
Operation completed over 1 objects/828.4 KiB.                                    
Copying file://HG01258.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                  

Copying file://HG02257.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                                     
Pulling: HG02486
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/85e90c15-bb68-4ea2-bfe2-56fd21fbb795/extractMitoContigs/073b3bcb-879c-46cb-9620-b40e1c787ecb/call-extractContigs/cacheCopy/HG02486.all_mito_contigs.fa...
/ [1 files][738.2 KiB/738.2 KiB]                                                
Operation completed over 1 objects/738.2 KiB.                                    
Copying file://HG02486.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                                     
Pulling: HG02559
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/mito_work/manually_rotated/HG02559.all_mito_contigs.fa...
/ [1 files][ 49

/ [1 files][419.4 KiB/419.4 KiB]                                                
Operation completed over 1 objects/419.4 KiB.                                    
Copying file://HG03516.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                                     
Pulling: HG03540
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/85e90c15-bb68-4ea2-bfe2-56fd21fbb795/extractMitoContigs/423c079c-25d3-4242-a8d2-67b64b1bd791/call-extractContigs/cacheCopy/HG03540.all_mito_contigs.fa...
/ [1 files][472.9 KiB/472.9 KiB]                                                
Operation completed over 1 objects/472.9 KiB.                                    
Copying file://HG03540.mt.fa [Content-Type=application/octet-stream]...
/ [1 files][ 16.5 KiB/ 16.5 KiB]                                                
Operation completed over 1 objects/16.5 KiB.                  

In [40]:
clean_sample_df['corrected_MT_contig'] = f"{bucket}corrected_mt_contig/" + clean_sample_df.index + ".mt.fa"

In [41]:
is_HG01071 = clean_sample_df['sample_name'] == "HG01071"

clean_sample_df.loc[is_HG01071, 'corrected_MT_contig'] = ""

In [42]:
upload_df = clean_sample_df[["sample_name", "mat_masked_cleaned_fa", "pat_masked_cleaned_fa", "corrected_MT_contig"]].copy()
upload_df = upload_df.rename(index={'1': 'corrected_sample_id'})

In [43]:
tp.dataframe_to_table("corrected_sample", upload_df)