# Notebook 2: Find Contigs To Drop From Assemblies<a class="tocSkip">

**In this notebook we are dropping contaminated contigs + mito contigs from our assemblies. Corrected mito contigs will be added back to the assemblies in a later step.**
    
**We also pulled the regions that have (non Mito) contamination and write them as a bed file. This is to make sure that we aren't removing entire contigs when only a small portion is contamiation. This checking is done locally since the results are inconsistent and large regions which clearly are contamination often arent flagged.**


**The steps that we will take are:**
1. Import Statements & Global Variable Definitions
2. Load Data Table
3. Identify Mito & Contaminated Contigs Then Write To File
4. Write Regions With Contamination
6. Create Clean_Sample Data Table

# Import Statements & Global Variable Definitions

## Load Python packages
----

In [1]:
%%capture 
import terra_notebook_utils as tnu
import terra_pandas as tp
import os
import io
import gzip
import pandas as pd
import numpy as np
from Bio import SeqIO
from Bio.Seq import Seq, Alphabet

## Set Environment Variables

In [2]:
# Get the Google billing project name and workspace name
PROJECT = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE = os.path.basename(os.path.dirname(os.getcwd()))
bucket = os.environ['WORKSPACE_BUCKET'] + "/"

# Verify that we've captured the environment variables
print("Billing project: " + PROJECT)
print("Workspace: " + WORKSPACE)
print("Workspace storage bucket: " + bucket)

Billing project: human-pangenome-ucsc
Workspace: HPRC_Reassembly
Workspace storage bucket: gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/


## Function Definitions

In [3]:
def write_contam_contigs(contam_fp: str, contam_contigs_fp: str) -> bool:
    
    header_strings = ["========== COMMON CONTAMINANTS IN EUKARYOTES ==========",
                      "========== MITOCHONDRIAL SEQUENCE ==========",
                      "========== REFSEQ: bacteria ==========",
                      "========== REFSEQ: other_eukaryota ==========",
                      "========== PLASTID SEQUENCE ==========",
                      "========== REFSEQ: viruses_and_viroids ==========",
                      "========== REFSEQ: viridiplantae ==========",
                      "========== REFSEQ: chordata ==========",
                      "========== REFSEQ: other_metazoa ==========",
                      "========== REFSEQ: fungi =========="]
    end_string    = ""

    ## Loop through file, and pull adaptor screen entries. (These are written
    ## in between the header_string and the end_string -- if there are any.)
    contam_contigs_ls = []
    
    with open(contam_fp) as infile, open(contam_contigs_fp, 'w') as outfile:
        copy = False
        found_hits = False
        
        for line in infile:
            if line.strip() in header_strings:
                copy = True
                continue
            elif line.strip().startswith('#'):
                continue
            elif line.strip() == end_string:
                copy = False
                continue
            elif copy:
                ## Pull the contig name
                contig_name = line.strip().split()[0] + "\n"
                
                ## Only add if we haven't already
                if contig_name not in contam_contigs_ls: 
                    contam_contigs_ls.append(contig_name)
                    outfile.write(contig_name)
                    found_hits = True
    
    return found_hits

In [4]:
def write_mito_contigs(contam_fp: str, mito_contigs_fp: str) -> bool:
    
    header_string = "========== MITOCHONDRIAL SEQUENCE =========="
    end_string    = ""

    ## Loop through file, and pull adaptor screen entries. (These are written
    ## in between the header_string and the end_string -- if there are any.)
    contam_contigs_ls = []
    
    with open(contam_fp) as infile, open(mito_contigs_fp, 'w') as outfile:
        copy = False
        found_hits = False
        
        for line in infile:
            if line.strip() == header_string:
                copy = True
                continue
            elif line.strip().startswith('#'):
                continue
            elif "MT" in line.strip():
                continue
            elif line.strip() == end_string:
                copy = False
                continue
            elif copy:
                ## Pull the contig name
                contig_name = line.strip().split()[0] + "\n"
                
                outfile.write(contig_name)
                found_hits = True
    
    return found_hits

In [5]:
def found_contam_contigs(contam_fp: str) -> bool:
    
    header_strings = ["========== COMMON CONTAMINANTS IN EUKARYOTES ==========",
                      "========== REFSEQ: bacteria ==========",
                      "========== REFSEQ: other_eukaryota ==========",
                      "========== PLASTID SEQUENCE ==========",
                      "========== REFSEQ: viruses_and_viroids ==========",
                      "========== REFSEQ: viridiplantae ==========",
                      "========== REFSEQ: chordata ==========",
                      "========== REFSEQ: other_metazoa ==========",
                      "========== REFSEQ: fungi =========="]
    
    with open(contam_fp) as infile:
        
        found_hits = False
        
        for line in infile:
            if line.strip() in header_strings:
                found_hits = True
    
    return found_hits

In [6]:
def write_contam_bed(contam_fp: str, output_bed_fn: str) -> bool:
    
    header_strings = ["========== COMMON CONTAMINANTS IN EUKARYOTES ==========",
                      "========== REFSEQ: bacteria ==========",
                      "========== REFSEQ: other_eukaryota ==========",
                      "========== PLASTID SEQUENCE ==========",
                      "========== REFSEQ: viruses_and_viroids ==========",
                      "========== REFSEQ: viridiplantae ==========",
                      "========== REFSEQ: chordata ==========",
                      "========== REFSEQ: other_metazoa ==========",
                      "========== REFSEQ: fungi =========="]
    end_string    = ""

    ## Loop through file, and pull adaptor screen entries. (These are written
    ## in between the header_string and the end_string -- if there are any.)
    contam_contigs_ls = []
    
    with open(contam_fp) as infile, open(output_bed_fn, 'w') as outfile:
        copy = False
        found_hits = False
        
        for line in infile:
            if line.strip() in header_strings:
                copy = True
                continue
            elif line.strip().startswith('#'):
                continue
            elif line.strip() == end_string:
                copy = False
                continue
            elif copy:
                ## Pull the contig name
                split_line = line.strip().split()

                contig = split_line[0]
                start  = str(int(split_line[6]) - 1)
                stop   = str(split_line[7])
                bed_line = contig + "\t" + start + "\t" + stop + "\n"
            
                outfile.write(bed_line)
                found_hits = True
    
    return found_hits

# Load Data Table

In [7]:
decont_results_df = tp.table_to_dataframe("mask_adapter_sample")

decont_results_df.head()

Unnamed: 0_level_0,mat_adapter_bed,mat_masked_fa,pat_adapter_bed,pat_contam_results,pat_adapter_paf,hifiasm_mat_fa,hifiasm_pat_fa,mat_adapter_paf,sample_name,pat_masked_fa,mat_contam_results
mask_adapter_sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
HG002_downsampled,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/9...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/2...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/2...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,HG002,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG002_full_v0.14,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/9...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/f...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/f...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,HG002,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG00438,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/9...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,HG00438,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG005,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/9...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,HG005,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...
HG00621,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/a...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/9...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/c...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/d...,HG00621,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/7...,gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/k...


# Identify Mito & Contaminated Contigs Then Write To Files

**We will write a file of mito + contamin. contigs that we will use to drop from our assemblies.** 

**We also write a file of just mito contigs that will be used for semimanual correction.**

In [8]:
! mkdir drop_contigs
%cd drop_contigs

mkdir: cannot create directory ‘drop_contigs’: File exists
/home/jupyter-user/notebooks/HPRC_Reassembly/edit/drop_contigs


In [9]:
contam_haplotypes = []

for index, row in decont_results_df.iterrows():
    
    sample_id = row.name
    
    ## Get path to decontamination results
    mat_decont_results_fp = row['mat_contam_results']
    pat_decont_results_fp = row['pat_contam_results']
    
    ## Extract file names
    mat_decont_results_fn = os.path.basename(mat_decont_results_fp)
    pat_decont_results_fn = os.path.basename(pat_decont_results_fp)
    
    ## Copy files to VM
    ! gsutil cp {mat_decont_results_fp} .
    ! gsutil cp {pat_decont_results_fp} .
    
    
    ## WRITE CONTAMINATED CONTIGS (AS LIST) TO FILE
    
    ## output file names
    mat_drop_list_fn = sample_id + ".mat_drop_contigs.txt"
    pat_drop_list_fn = sample_id + ".pat_drop_contigs.txt"    
    
    ## Extract results and write files
    write_contam_contigs(mat_decont_results_fn, mat_drop_list_fn)
    write_contam_contigs(pat_decont_results_fn, pat_drop_list_fn)
    
    ## upload to bucket
    ! gsutil cp {mat_drop_list_fn} {bucket}contigs_to_drop/{mat_drop_list_fn}
    ! gsutil cp {pat_drop_list_fn} {bucket}contigs_to_drop/{pat_drop_list_fn}  
    
    
    ## WRITE MITO CONTIGS (AS LIST) TO FILE
    
    ## output file names
    mat_mito_list_fn = sample_id + ".mat_mito_contigs.txt"
    pat_mito_list_fn = sample_id + ".pat_mito_contigs.txt"    
    
    ## Extract results and write files
    write_mito_contigs(mat_decont_results_fn, mat_mito_list_fn)
    write_mito_contigs(pat_decont_results_fn, pat_mito_list_fn)
    
    ## upload to bucket
    ! gsutil cp {mat_mito_list_fn} {bucket}mito_work/contig_list/{mat_mito_list_fn}
    ! gsutil cp {pat_mito_list_fn} {bucket}mito_work/contig_list/{pat_mito_list_fn}
    
    
    ## ADD TO LIST OF CONTAMINATED CONTIGS (TO CHECK BY EYE)
    if (found_contam_contigs(mat_decont_results_fn)):
        hap_name = sample_id + "-mat"
        contam_haplotypes.append(hap_name)
        
    if (found_contam_contigs(pat_decont_results_fn)):
        hap_name = sample_id + "-pat"
        contam_haplotypes.append(hap_name)

Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG002_downsampled.mat.contamination.short...
/ [1 files][  864.0 B/  864.0 B]                                                
Operation completed over 1 objects/864.0 B.                                      
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG002_downsampled.pat.contamination.short...
/ [1 files][499.0 KiB/499.0 KiB]                                                
Operation completed over 1 objects/499.0 KiB.                                    
Copying file://HG002_downsampled.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][  240.0 B/  240.0 B]                                                
Operation completed over 1 objects/240.0 B.                                      
Copying file://HG002_downsampled.pat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][   60.0 B/   60.0 B]                                                
Operation complete

/ [1 files][  144.0 B/  144.0 B]                                                
Operation completed over 1 objects/144.0 B.                                      
Copying file://HG00673.pat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][   12.0 B/   12.0 B]                                                
Operation completed over 1 objects/12.0 B.                                       
Copying file://HG00673.mat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][  144.0 B/  144.0 B]                                                
Operation completed over 1 objects/144.0 B.                                      
Copying file://HG00673.pat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][   12.0 B/   12.0 B]                                                
Operation completed over 1 objects/12.0 B.                                       
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG00733.mat.contamination.short...
/ [1 files][111.

Copying file://HG01358.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][  408.0 B/  408.0 B]                                                
Operation completed over 1 objects/408.0 B.                                      
Copying file://HG01358.pat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying file://HG01358.mat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][  408.0 B/  408.0 B]                                                
Operation completed over 1 objects/408.0 B.                                      
Copying file://HG01358.pat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/ker

/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG02055.mat.contamination.short...
/ [1 files][  999.0 B/  999.0 B]                                                
Operation completed over 1 objects/999.0 B.                                      
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG02055.pat.contamination.short...
/ [1 files][  127.0 B/  127.0 B]                                                
Operation completed over 1 objects/127.0 B.                                      
Copying file://HG02055.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][  288.0 B/  288.0 B]                                                
Operation completed over 1 objects/288.0 B.                                      
Copying file://HG02055.pat_drop_contigs.txt [Cont

Copying file://HG02257.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][  156.0 B/  156.0 B]                                                
Operation completed over 1 objects/156.0 B.                                      
Copying file://HG02257.pat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying file://HG02257.mat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][  156.0 B/  156.0 B]                                                
Operation completed over 1 objects/156.0 B.                                      
Copying file://HG02257.pat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/ker

/ [1 files][   12.0 B/   12.0 B]                                                
Operation completed over 1 objects/12.0 B.                                       
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG02717.mat.contamination.short...
/ [1 files][  1.5 KiB/  1.5 KiB]                                                
Operation completed over 1 objects/1.5 KiB.                                      
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/HG02717.pat.contamination.short...
/ [1 files][   94.0 B/   94.0 B]                                                
Operation completed over 1 objects/94.0 B.                                       
Copying file://HG02717.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][  468.0 B/  468.0 B]                                                
Operation completed over 1 objects/468.0 B.                                      
Copying file://HG02717.pat_drop_contigs.txt [Cont

Copying file://HG03453.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][  204.0 B/  204.0 B]                                                
Operation completed over 1 objects/204.0 B.                                      
Copying file://HG03453.pat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying file://HG03453.mat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][  204.0 B/  204.0 B]                                                
Operation completed over 1 objects/204.0 B.                                      
Copying file://HG03453.pat_mito_contigs.txt [Content-Type=text/plain]...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/ker

/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/NA18906.mat.contamination.short...
/ [1 files][  127.0 B/  127.0 B]                                                
Operation completed over 1 objects/127.0 B.                                      
Copying gs://fc-0c2122a8-6725-4199-b90e-828ab006078f/kerstin_decontam_results/NA18906.pat.contamination.short...
/ [1 files][  128.0 B/  128.0 B]                                                
Operation completed over 1 objects/128.0 B.                                      
Copying file://NA18906.mat_drop_contigs.txt [Content-Type=text/plain]...
/ [1 files][   12.0 B/   12.0 B]                                                
Operation completed over 1 objects/12.0 B.                                       
Copying file://NA18906.pat_drop_contigs.txt [Cont

In [10]:
## Add links to contgis that we will drop (used by dropFastaContigs workflow)
decont_results_df['mat_drop_contigs'] = f"{bucket}contigs_to_drop/" + decont_results_df.index + ".mat_drop_contigs.txt"
decont_results_df['pat_drop_contigs'] = f"{bucket}contigs_to_drop/" + decont_results_df.index + ".pat_drop_contigs.txt"

## Add links to mito contigs that we will use for MT assembly (will map against reference with minimap2 next)
decont_results_df['mat_mito_contig_ls'] = f"{bucket}mito_work/contig_list/" + decont_results_df.index + ".mat_mito_contigs.txt"
decont_results_df['pat_mito_contig_ls'] = f"{bucket}mito_work/contig_list/" + decont_results_df.index + ".pat_mito_contigs.txt"

# Write Regions With Contamination

**Write bed files of all of the regions that are contaminated in the samples that have contamination. We will use these files to check for whether or not the entirety of the contigs that we are dropping are contamination. If not, then we might be throwing away good sequence.**

*Will do the checking on local machine*

## Show Which Assemblies We Need Beds For

In [11]:
print(contam_haplotypes)

['HG002_downsampled-pat', 'HG002_full_v0.14-mat', 'HG002_full_v0.14-pat', 'HG00733-mat', 'HG00733-pat']


## HG002 Downsampled

In [12]:
sample_id = 'HG002_downsampled'

is_sample_row = decont_results_df.index == sample_id
row           = decont_results_df.loc[is_sample_row]

pat_decont_results_fp = row['pat_contam_results'][0]
    
pat_decont_results_fn = os.path.basename(pat_decont_results_fp)

pat_bed_fn = f"{sample_id}.pat.contam.bed"

write_contam_bed(pat_decont_results_fn, pat_bed_fn)

! gsutil cp {pat_bed_fn} {bucket}kerstin_decontam_results/bed_files/{pat_bed_fn}

Copying file://HG002_downsampled.pat.contam.bed [Content-Type=application/octet-stream]...
/ [1 files][105.3 KiB/105.3 KiB]                                                
Operation completed over 1 objects/105.3 KiB.                                    


## HG002 Full

In [13]:
sample_id = 'HG002_full_v0.14'

is_sample_row = decont_results_df.index == sample_id
row           = decont_results_df.loc[is_sample_row]

mat_decont_results_fp = row['mat_contam_results'][0]
pat_decont_results_fp = row['pat_contam_results'][0]
    
mat_decont_results_fn = os.path.basename(mat_decont_results_fp)
pat_decont_results_fn = os.path.basename(pat_decont_results_fp)

mat_bed_fn = f"{sample_id}.mat.contam.bed"
pat_bed_fn = f"{sample_id}.pat.contam.bed"

write_contam_bed(mat_decont_results_fn, mat_bed_fn)
write_contam_bed(pat_decont_results_fn, pat_bed_fn)

! gsutil cp {mat_bed_fn} {bucket}kerstin_decontam_results/bed_files/{mat_bed_fn}
! gsutil cp {pat_bed_fn} {bucket}kerstin_decontam_results/bed_files/{pat_bed_fn}

Copying file://HG002_full_v0.14.mat.contam.bed [Content-Type=application/octet-stream]...
/ [1 files][ 83.6 KiB/ 83.6 KiB]                                                
Operation completed over 1 objects/83.6 KiB.                                     
Copying file://HG002_full_v0.14.pat.contam.bed [Content-Type=application/octet-stream]...
/ [1 files][895.2 KiB/895.2 KiB]                                                
Operation completed over 1 objects/895.2 KiB.                                    


## HG00733

In [14]:
sample_id = 'HG00733'

is_sample_row = decont_results_df.index == sample_id
row           = decont_results_df.loc[is_sample_row]

mat_decont_results_fp = row['mat_contam_results'][0]
pat_decont_results_fp = row['pat_contam_results'][0]
    
mat_decont_results_fn = os.path.basename(mat_decont_results_fp)
pat_decont_results_fn = os.path.basename(pat_decont_results_fp)

mat_bed_fn = f"{sample_id}.mat.contam.bed"
pat_bed_fn = f"{sample_id}.pat.contam.bed"

write_contam_bed(mat_decont_results_fn, mat_bed_fn)
write_contam_bed(pat_decont_results_fn, pat_bed_fn)

! gsutil cp {mat_bed_fn} {bucket}kerstin_decontam_results/bed_files/{mat_bed_fn}
! gsutil cp {pat_bed_fn} {bucket}kerstin_decontam_results/bed_files/{pat_bed_fn}

Copying file://HG00733.mat.contam.bed [Content-Type=application/octet-stream]...
/ [1 files][ 24.0 KiB/ 24.0 KiB]                                                
Operation completed over 1 objects/24.0 KiB.                                     
Copying file://HG00733.pat.contam.bed [Content-Type=application/octet-stream]...
/ [1 files][ 24.0 KiB/ 24.0 KiB]                                                
Operation completed over 1 objects/24.0 KiB.                                     


# Create Clean_Sample Data Table

In [21]:
upload_df = decont_results_df.copy()
upload_df = upload_df.rename(index={'1': 'clean_sample_id'})

## Don't need adapter info going forward
upload_df.drop(columns = ["mat_adapter_bed", "mat_adapter_paf", "pat_adapter_bed", "pat_adapter_paf"],
               inplace = True)

In [22]:
tp.dataframe_to_table("clean_sample", upload_df)