# Filtering variants

This is a [Jupyter notebook](https://jupyter.org/) containing a guide for filtering variant call (vcf) files that have been annotated using [SNPEff](https://pcingola.github.io/SnpEff/).

This workflow was developed on output from the nfcore/sarek pipeline (https://nf-co.re/sarek) which is a variant calling and annotation pipeline built in the [Nextflow](https://www.nextflow.io/) workflow manager system. Sarek includes SNPEff annotation as part of the workflow. However, any SNPEff annotated vcf file can be used as input for this workflow, not just those generated by nfcore/sarek.

This workflow was prepared by the [eResearch Office, QUT.](https://qutvirtual4.qut.edu.au/group/staff/governance/organisational-structure/academic-division/research-portfolio/research-infrastructure/eresearch)

**********************************

# Contents
[1. How to use this notebook](#overview)

[2. Requirements](#require)

[3. Installing SNPEff](#install)

[4. Filtration principles](#filt)

[5. Filter by impact](#impact)

[6. Filter with custom parameters](#custom)

[7. Look for overlaps](#overlap)

[8. Merging vcfs](#merge)

[9. Output as table](#table)

[10. Removing common SNPS](#common)

[11. Comparing groups](#groups)

[12. Download your results](#download)


***************************

## How to use this notebook <a class="anchor" id="overview"></a>

Juypter Notebooks run a 'kernel' that allow code to be run in code 'cells' in the Notebook. This Notebook is running the BASH kernel, which allows for commands to be run on QUTs high performance compute cluster (HPC).

You can run a code cell by clicking on the cell itself and clicking the run button (at the top of this Notebook), or by pressing shift+enter.

<div class="alert alert-block alert-warning">
As an example, run the following code cell to list the contents of your HPC home directory.
</div>

In [None]:
ls $HOME

Before every code cell is a colour-coded text box that tells you what the code cell will do. 

<div class="alert alert-block alert-success">
A green text box indicates a code cell that must be run, without alteration, to complete the workflow.
</div>

<div class="alert alert-block alert-warning">
A yellow text box indicates an optional code cell that doesn't have to be run to complete the workflow, but can be run to complete optional tasks.
</div>

<div class="alert alert-block alert-info">
A blue text box indicates a code cell that requires user input - it must be run to complete the workflow and the user needs to modify the command in the cell. This is required for user-specific or project-specific operations, such as moving to a directory that contains the project data files.
</div>

In this guide you will find instructions and code cells to run that will filter vcf files that were annoated by SNPeff.

*******************************

## Requirements <a class="anchor" id="require"></a>

You will need a QUT HPC account. If you are seeing this Notebook, you most likely already have a HPC account. Regardless, you can request an account be created, or request any other HPC or bioinformatics support, via the portal here: https://eresearchqut.atlassian.net/servicedesk/customer/portals

You will also need to have all the annotated vcf files in a directory on the HPC. Sarek outputs these files as '..samplename..'snpEff.ann.vcf.gz. Depending on how you ran sarek or how your data is organised, these may be in separate directories based on treatment group or individual samples. 

It is recommended to copy all these files to a single directory. As directory and file structure varies per project, it's not possible to include the code to achieve this in this Notebook and thus the user must do this manually prior to running this workflow.

A guide to copying and moving data via the Linux command line is [here](https://www.thegeekdiary.com/linux-command-line-basics-for-beginners-copy-and-move-files-and-directories-cp-and-mv-commands/), or you can contact the HPC staff to locate/move your data files by sumbiting a request ticket [here](https://eresearchqut.atlassian.net/servicedesk/customer/portals)

<div class="alert alert-block alert-info">
Enter the directory that contains your SNPEff annotated vcf files. You will need to change the directory path to where you moved your '..samplename..'snpEff.ann.vcf.gz files to. You can find this path by typing 'pwd' on the command-line when you are in that directory, or by contacting the HPC staff via the portal (above link). The structure of the below command should be `root_path=/directory/containing/my/vcf/files`.
</div>

In [None]:
root_path=/work/liver/nextflow/sarek/individual/sarek_VCFs_annotation/All_samples

<div class="alert alert-block alert-success">
Now move to the above directory (cd = change directory): 
</div>

In [None]:
cd $root_path

**NOTE: the above two code cells must be run every time you use this Notebook.**

<div class="alert alert-block alert-warning">
To see if you are in the correct directory, run the 'ls' code cell below. You should see a list of all your SNPEff annotated vcf files. If you don't see the files, you've entered the above location incorrectly and need to correct and re-run the above code cell.
</div>

In [None]:
ls

****************************

## Installing SNPEff <a class="anchor" id="install"></a>

SNPEff needs to be installed first to run the various SNPEff command-line filtration options.

**This section only needs to be run once per project. If you've already run the code in this section, there is no need to run it again.**

<div class="alert alert-block alert-success">
Create a new directory called 'snpeff' and download the latest version of SNPEff to that directory, then unzip the file.
</div>

In [None]:
mkdir snpeff
cd snpeff
wget https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
unzip -q snpEff_latest_core.zip
cd $root_path

<div class="alert alert-block alert-success">
SNPEff is based on Java, so the Java module will need to be loaded before running SNPEff.
</div>

In [None]:
module load java

<div class="alert alert-block alert-success">
Check the version of SNPEff, to make sure it has been installed correctly
</div>

In [None]:
java -jar ./snpeff/snpEff/snpEff.jar -version

********************

## Filtration principles <a class="anchor" id="filt"></a>

<div class="alert alert-block alert-warning">
If you look in the installed SNPEff directory (you can run the code cell below to do this), you'll see the two main tools: snpEff.jar and SnpSift.jar.
    </div>

In [None]:
ls ./snpeff/snpEff/

snpEff.jar
> is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid change)

SnpSift.jar
>is a toolbox that allows you to filter and manipulate annotated files.

snpEff has already been run on your samples, as part of the nfcore/sarek workflow, creating the '..samplename..'snpEff.ann.vcf.gz files we're working with here. 

In this Notebook we will be using SnpSift.jar to filter your samples by certain parameters, such as high impact variants.

SnpSift.jar is highly flexible and can be used to filter by quality, read depth, genotype and any of the annotation fields generated by snpEff.jar (such as frameshift mutations, synonymous, missense, splice regions, intron, exon, etc, etc).

A list and explanation of the filtration options can be seen here:

https://pcingola.github.io/SnpEff/ss_filter/

And a list of the possible annotation fields can be seen here:

https://pcingola.github.io/SnpEff/adds/VCFannotationformat_v1.0.pdf

<div class="alert alert-block alert-warning">
By following this guide, you have installed SnpSift.jar in ./snpeff/snpEff/SnpSift.jar and can run any of the examples in the link above by adding './snpeff/snpEff/' before 'SnpSift.jar' in any command.

For example, if you wanted to run the first example in https://pcingola.github.io/SnpEff/ss_filter/ (filter out samples with quality less than 30), you could run the following (change 'variants.vcf' to one of your variant files and 'filtered.vcf' to a suitable name for the output file):
</div>

In [None]:
cat variants.vcf | java -jar ./snpeff/snpEff/SnpSift.jar filter " ( QUAL >= 30 )" > filtered.vcf

SnpSift.jar can do more than filter samples, it can extract specific fields from your annotated vcf file, join by genomic regions, split by chomosome, etc. See all the SnpSift.jar commands here:  https://pcingola.github.io/SnpEff/ss_introduction/

***************************

## Filter by impact <a class="anchor" id="impact"></a>

SNPEff annotates variants by HIGH, MODERATE, LOW or MODIFIER impact.

https://pcingola.github.io/SnpEff/se_inputoutput/#impact-prediction

>HIGH	The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.

>MODERATE	A non-disruptive variant that might change protein effectiveness.

>LOW	Assumed to be mostly harmless or unlikely to change protein behavior.

>MODIFIER	Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.

A list of all the effects (e.g. deletions, duplications, frame shift, etc) can be seen here, including if their impact category: https://pcingola.github.io/SnpEff/se_inputoutput/#effect-prediction-details

<div class="alert alert-block alert-success">
Create a main directory to store filtered results from each sample group, which will be in their own sample group subdirectories:
</div>

In [None]:
cd $root_path
mkdir Filtered

You can filter out variants by impact (e.g. only keep high impact variants).

<div class="alert alert-block alert-success">
Run one or more of the three following code cells. The first cell filters out HIGH impact variants, the second cell MODERATE and the third cell LOW. You may only be interested in HIGH impact variants, in which case just run the first cell. Note: this will not alter the original vcf files. New vcf files will be created in 'filtered_high', 'filtered_moderate' or 'filtered_low' directories. These commands may take several minutes to run.
</div>

In [None]:
mkdir Filtered/filtered_high
for f1 in *.vcf*
do
echo filtering $f1
java -jar ./snpeff/snpEff/SnpSift.jar filter "ANN[*].IMPACT has 'HIGH'" $f1 > "Filtered/filtered_high/high_$f1"
done

In [None]:
mkdir Filtered/filtered_moderate
for f1 in *.vcf*
do
echo filtering $f1
java -jar ./snpeff/snpEff/SnpSift.jar filter "ANN[*].IMPACT has 'MODERATE'" $f1 > "Filtered/filtered_moderate/moderate_$f1"
done

In [None]:
mkdir Filtered/filtered_low
for f1 in *.vcf*
do
echo filtering $f1
java -jar ./snpeff/snpEff/SnpSift.jar filter "ANN[*].IMPACT has 'LOW'" $f1 > "Filtered/filtered_low/low_$f1"
done

<div class="alert alert-block alert-warning">
To check that the filtered vcf files were created by listing the files in the 'filtered_high' directory. You can check the MODERATE of LOW results also by changing 'filtered_high' to 'filtered_moderate' or 'filtered_low' in the code cell below.
</div>

In [None]:
ls Filtered/filtered_high

Now you can look for filtered variants that are shared between samples files, by running the output files from this section in the 'Look for overlaps' section below. 

***************************

## Filter with custom parameters <a class="anchor" id="custom"></a>

In the previous section we filtered by HIGH, MEDIUM and LOW impact. SnpSift can filter by a multitude of other parameters or fields. You can also combine filters in a variety of ways. This is highly configurable and customisable, so the below code is an example. It's up to you to replace this with a filtration structure of your choice. See here for filtration options:

https://pcingola.github.io/SnpEff/ss_filter/

<div class="alert alert-block alert-warning">
Enter your own custom filtration below. By default it contains parameters to filter out only indels that have > 20 quality score. Substitute this with your own parameters.
    
Change the first line (`filtdir=...`) to a directory name of your choosing.
Change the filtration parameters (`"(( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )"`) to parameters of your choice.
</div>

In [None]:
filtdir=Indel_20
cd $root_path
mkdir Filtered/$filtdir
for f1 in *.vcf*
do
echo filtering $f1
java -jar ./snpeff/snpEff/SnpSift.jar filter "(( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )" $f1 > "Filtered/$filtdir/$filtdir_$f1"
done
cd $root_path

This will run your chosen filtraion parameters on all your samples (the `for .. do .. done` loop) and output them to a subdirectory of the main 'Filtered' directory, that you named in the `filtdir = ...` variable.

Now you can look for filtered variants that are shared between samples files, by running the output files from this section in the 'Look for overlaps' section below. 

*******************

## Look for overlaps <a class="anchor" id="overlap"></a>

Usually you will have multiple samples per treatment group, which means you'll want to find common variants across multiple samples. This is done by creating intersections between two or more vcf files.

<div class="alert alert-block alert-success">
We'll be using bcftools to look for intersects. bcftools is already installed on the HPC as a module. Load the bcftools module:
</div>

In [None]:
module load bcftools

<div class="alert alert-block alert-success">
Create a main directory to store each sample group intersect results, which will be in their own sample group subdirectories:
</div>

In [None]:
cd $root_path
mkdir Intersect

<div class="alert alert-block alert-info">
Select a target treatment group.
In the code cell below, enter a character string that is unique to the target treatment group, i.e. is used in the name of every sample of only that group (i.e. change `group=...` to the target group name from your sample names).
</div>

In [None]:
group=ALD_EtOH

<div class="alert alert-block alert-warning">
If you can't remember your sample names, run 'ls' to see them:
</div>

In [None]:
ls

<div class="alert alert-block alert-success">
We're looking for intersects on the filtered results, so you'll need to look for intersects on each of the 'Filtered' subdirectories.
Check the names of your 'Filtered' subdirectories.
</div>

In [None]:
ls Filtered

<div class="alert alert-block alert-info">
Enter one of these subdirectory names below (`filtdir=...`).
</div>

In [None]:
filtdir=filtered_highHOMCOV

<div class="alert alert-block alert-warning">
You can also check that you are only pulling out the correct samples. The samples you will be intersecting are: 
</div>

In [None]:
ls Filtered/$filtdir/*$group*

<div class="alert alert-block alert-success">
Bcftools requires that the vcf file is compressed and indexed. First we'll compress the vcf files:
</div>

In [None]:
mkdir Intersect/$group"_"$filtdir"_intersect"
cd Filtered/$filtdir
for f1 in *$group*
do
echo compressing $f1
bgzip -c $f1 > $root_path/Intersect/$group"_"$filtdir"_intersect"/$f1".gz"
done
cd $root_path

<div class="alert alert-block alert-success">
Then we'll index the vcf files:
</div>

In [None]:
cd Intersect/$group"_"$filtdir"_intersect"
for f1 in *$group*".gz"
do
echo indexing $f1
bcftools index $f1
done
cd $root_path

<div class="alert alert-block alert-warning">
Now we can check for intersections between these samples.
First, find the number of samples in your treatment group. We'll use this number to calculate the number of overlaps we want to find: 
</div>

In [None]:
ls Intersect/$group"_"$filtdir"_intersect"/*$group*".gz" | wc -l

<div class="alert alert-block alert-info">
Then run `bcftools isec` to find intersects.
    
The '-n+7' option says to output variants that intersect in 7 or more of your samples. You should adjust this to a number suitable to the total number of samples in the treatment group. For example, if you have n=20, setting -n+10 would output variants that are in 50% of your samples. -n+20 would only output variants that are in all samples, -n+2 in 2 or more samples, etc.
</div>

In [None]:
cd Intersect/$group"_"$filtdir"_intersect"
bcftools isec -n+7 -c all *$group*".gz" > $group"_sharedPositions.txt"
cd $root_path

Now that you have a table of intersecting variants, you need to pull out just these intersects from your vcf files. For this we use bcftools view:

In [None]:
cd $root_path
cd Intersect/$group"_"$filtdir"_intersect"
for f1 in *$group*".gz"
do
echo filtering $f1
bcftools view -T $group"_sharedPositions.txt" $f1 -Oz > "sharedPositions_"$f1
done
cd $root_path

These filtered vcfs need to be re-indexed:

In [None]:
cd Intersect/$group"_"$filtdir"_intersect"
for f1 in sharedPositions*.vcf.gz
do
echo indexing $f1
bcftools index $f1
done
cd $root_path

You can find intersects in other groups by going to the treatment group selection code ('group=..'), changing the group name and then running through the process again. You can do this for each treatment group. 

*******************

## Merging vcfs <a class="anchor" id="merge"></a>

Once you have filtered and/or found intersecting variants between samples, you can merge all treatment group samples into a single vcf file. We'll be using bcftools to accomplish this. If you've started a new session, make sure you run the 'module load bcftools' code cell in the previous section.

<div class="alert alert-block alert-success">
Create a main directory to store merged vcf files for each sample group, which will be in their own sample group subdirectories:
</div>

In [None]:
cd $root_path
mkdir Merged

<div class="alert alert-block alert-warning">
We need to merge the filtered and intersected results, that you generated in the previous sections. These are in subdirectories in the main 'Intersect' directory.
See here for a list of the intersected samples group directories:
</div>

In [None]:
cd $root_path
ls -d Intersect/*/ | cut -f2 -d'/'

<div class="alert alert-block alert-info">
Enter one of these subdirectory names in `intdir=..` below:
</div>

In [None]:
intdir=ALD_EtOH_filtered_high_intersect

<div class="alert alert-block alert-success">
Use bcftools merge to merge the vcf files:
</div>

In [None]:
bcftools merge Intersect/$intdir/sharedPositions*.vcf.gz > Merged/$intdir"_merged.vcf"

You can cycle though each 'Intersect' subdirectory by repeating the above two code cells with each subdirectory name.

*************

## Output as table <a class="anchor" id="table"></a>

SnpSift.jar can extract any fields in a vcf, including any annotation fields, outputting this as a table. This is useful for importing your filtered results into another program like R or Excel.

https://pcingola.github.io/SnpEff/ss_extractfields/

For example, if you wanted to output chromosome, position, ID and allele frequency from a vcf file, you could run the following (change 'variants.vcf' to one of your vcf files, and 'table.txt' to a more informative name):

`java -jar ./snpeff/snpEff/SnpSift.jar extractFields variants.vcf CHROM POS ID AF > table.txt`

<div class="alert alert-block alert-success">
We'll be creating a table on the final results from all the previous analysis steps: filtering -> finding overlaps -> merging treatment groups. Thus the tables will be generated from the vcf files in the 'Merged' directory.
First, see what vcf files are in that directory:
</div>

In [None]:
cd $root_path/Merged
ls *.vcf

<div class="alert alert-block alert-success">
Now you can enter the names of one of these vcf files in `'tabfile=..`
</div>

In [None]:
tabfile=Healthy_filtered_high_intersect_merged.vcf

<div class="alert alert-block alert-success">
Then run SnpSift to pull out the required fields. **NOTE:** By default these fields are chromosome (CHROM), position (POS), reference allele (REF), alternate allele (ALT), variant ID, allele frequency (AF), gene name ("LOF[*].GENE"), Ensembl gene ID ("LOF[*].GENEID") and functional effect "ANN[*].EFFECT" (there may be multiple effects, where each will be in a separate column). 
You can add or remove fields as you like. A list of fileds is here: https://pcingola.github.io/SnpEff/ss_extractfields/
</div>

In [None]:
cd $root_path
java -jar ./snpeff/snpEff/SnpSift.jar extractFields $root_path/Merged/$tabfile CHROM POS REF ALT ID AF "LOF[*].GENE" "LOF[*].GENEID" "ANN[*].EFFECT" > $root_path/Merged/$tabfile"_table.txt"

<div class="alert alert-block alert-warning">
You can view the top 10 rows of your table like so (useful for checking you have the information you need):
</div>

In [None]:
head $root_path/Merged/$tabfile"_table.txt"

<div class="alert alert-block alert-warning">
As an alternative to running each vcf file at a time, you can generate tables for ALL vcf files in the 'Merged' directory by running the below code cell. As before ypu can add or remove fields as you prefer.
</div>

In [None]:
cd $root_path/Merged/
for tabfile in *.vcf
do
echo Creating table for $tabfile
java -jar $root_path/snpeff/snpEff/SnpSift.jar extractFields $root_path/Merged/$tabfile CHROM POS REF ALT ID AF "LOF[*].GENE" "LOF[*].GENEID" "ANN[*].EFFECT" > $root_path/Merged/$tabfile"_table.txt"
done
cd $root_path

*************

## Removing common SNPS <a class="anchor" id="common"></a>

Many of the SNPs detected may be commonly found variants, which are unlikely to be associated with a trait or disease. These should be removed from your final table.

[UCSC](https://genome.ucsc.edu/) maintains a database of common SNPs, based on the [NCBI dbSNP database](https://www.ncbi.nlm.nih.gov/snp/).

http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=varRep&hgta_track=snp146&hgta_table=snp146&hgta_doSchema=describe+table+schema

Common SNPs are defined as:

>SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. (i.e. uniquely mapped variants that appear in at least 1% of the population)

**NOTE: this section will only remove identified (common) dbSNPs. Any variants in your dataset that don't have a dbSNP ID (i.e. any variants with an 'rs.......' identifier) will be retained. These may also be common variants, but this is unlikely to be the case.**

<div class="alert alert-block alert-success">
First, download the table of common dbSNPs from UCSC.
NOTE: The below code cell does several tasks and may take several minutes to complete. It only needs to be run once though. If you're re-running this section for your other variant tables, skip the below cell.**
</div>

In [None]:
cd $root_path
mkdir dbases
cd dbases
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp146Common.txt.gz
gzip -df snp146Common.txt.gz
awk '{print $5}' snp146Common.txt > snpIDs.txt

<div class="alert alert-block alert-success">
View a list of your tables:
</div>

In [None]:
cd $root_path/Merged
ls *table.txt

<div class="alert alert-block alert-info">
Choose a table from the above list. Enter the table name below (`snptab=<filename>`):
</div>

In [None]:
snptab=ALD_EtOH_filtered_high_intersect_merged.vcf_table.txt

<div class="alert alert-block alert-success">
Find any of the common dbSNP IDs in your table.
NOTE: this may take several minutes, as the common dbSNP database contains 15 million SNPs to match.
</div>

In [None]:
cd $root_path/Merged
grep -Fvwf $root_path/dbases/snpIDs.txt $snptab > ${snptab%%.vcf_table.txt}".table.nocommon.txt"



*******************

## Comparing groups <a class="anchor" id="groups"></a>

Now that you have a table of variants that have been filtered by function and merged as a group, you can compare the groupwise results to see if there are common or differing variants between groups.

This is particularly useful if you want to compare control vs treatment groups, e.g. healthy vs disease.

<div class="alert alert-block alert-success">
Create a main directory to store groupwise comparison results:
</div>

In [None]:
cd $root_path/Merged
mkdir Groupwise

<div class="alert alert-block alert-success">
We'll be comparing the variant tables we created in the previous 'Output as table' section, so first see the list of available tables for comparison:
</div>

In [None]:
cd $root_path/Merged
ls *.txt

<div class="alert alert-block alert-info">
Now you can select two of these files to look for overlaps. **NOTE: it's important to select meaningful comparisons, i.e. based on the same filtration parameters, e.g. HIGH impact healthy vs HIGH impact disease.**

In the two code cells below, enter first the filename of the control or baseline group (`baseline=<filename>`) and the filename of the group you want to compare to that (`comparison=<filename>`).
    
</div>

In [None]:
baseline=Healthy_filtered_high_intersect_merged.vcf_table.txt

In [None]:
comparison=ALD_EtOH_filtered_high_intersect_merged.vcf_table.txt

<div class="alert alert-block alert-success">
Before we look for overlaps, the files need to be sorted:
</div>

In [None]:
sort $baseline -o $baseline
sort $comparison -o $comparison

<div class="alert alert-block alert-info">
In the below cell, create a meaningful name for the output file (`outfile=<filename>`). We could automatically do this by combining control and comparison filenames, but the filenames are getting a bit long to do that. E.g. name the file based on the filtration method and the comparison groups - if the control group is 'healthy' and the treatment group is 'diseased' and the filtration was based on HIGH impact variants, you might call the file 'HIGH_impact_healthy_vs_diseased.txt'
</div>

In [None]:
outfile=HIGH_impact_healthy_vs_ALD_EtOH.txt

<div class="alert alert-block alert-success">
Now use the `comm` command (https://man7.org/linux/man-pages/man1/comm.1.html) to find overlaps:
</div>

In [None]:
comm -13 $baseline $comparison > ./Groupwise/$outfile

Note: by default `comm` outputs all comparisons - lines unique to FILE1, lines unique to FILE1 and lines common to both files. For our purposes we'd typically want to output variants that are unique to the treatment group, thus we use the `-13` parameters (which suppresses column 1 - lines unique to FILE1, and column 3 - lines that appear in both files). If you wanted to instead find variants that are common to both groups, you would change this to `-12`, or if you wanted to find variants unique to the control group, you'd use `-23`.

You can repeat this for all groupwise comparisons you'd like to make by by entering differeny groups from the `baseline=..` line onward. Remember to always create a new output filename in `outfile=..` for each cmparison, or you will overwrite the previous file you created.

*************

## Download your results <a class="anchor" id="download"></a>

As a final step, you can copy your tables to a directory in your HPC home account, so you can then easily access and download it.

<div class="alert alert-block alert-info">
Enter a directory name below. Make the directory name informative for your project.
</div>

In [None]:
outdir=liver_project_final_results

<div class="alert alert-block alert-success">
Run the below code cell to create a subdirectory with the name you provided above in your home directory.
</div>

In [None]:
mkdir $HOME/$outdir

<div class="alert alert-block alert-success">
Now run the below code cell to copy all of your tables to the directory you just created. This also copies the Groupwise folder, containing all the groupwise comparison results from the previous section.
</div>

In [None]:
cp $root_path/Merged/*.txt $HOME/$outdir
cp -r $root_path/Merged/Groupwise $HOME/$outdir

You can now download these files from the File Browser panel to the left (press 'ctrl + shift + f' if it's not visible). Click on the small folder icon to go to your home directory, then click on the directory name you just created. This will contain all your tables, which you can right-click on to download to your local computer. 