# Nanopore Data Folders

Nanopore sequencing runs have a very specific folder structure. This workbook will take you through the file structure we usually expect to see and explain the various filetypes within it.

During the workshop we will look at how to set up a sequencing run in MinKNOW. This will be a demonstration.

When sequencing, the user assigns an Experiment ID - this forms the top level folder name for a data set. In the case below, this is called "ic_131".

In [None]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import igv_notebook
igv_notebook.init()

In [None]:
!ls ../student_projects_2022/data/ic_131


Here we list the contents of the folder and we see the sample name - again this was entered during the sequencing setup and was "Haloferax_clean_RBK004ori".

In [None]:
!ls ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/


The next folder is automatically generated by MinKNOW and provides inforation on the date and configutation of the run. 

In this case it provides the following information:

DATE_TIME_POSITION_FLOWCELLID_RANDOMID

When were these data generated? 

Within this folder, you find the actual sequencing data. 

In [None]:
!ls ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/

Here we find a number of directories and files. We will look at each one in turn.

## REPORT FILES

There are one or more report files - these are available as html, json and md formats. 

The report files provide summary information about the sequencing run. 

To view the html report, click on the link below.

In [None]:
from IPython.display import FileLink
filename="../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/report_FAV40358_20221104_1534_9bd50a0f.html"
FileLink(filename)

In [None]:
!cat ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/report_FAV40358_20221104_1534_9bd50a0f.md

## Sequencing Summary Files

These files provide an overview of every read generated by the sequencer - they are very useful!

In [None]:
!head ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/sequencing_summary_FAV40358_9bd50a0f_619283ee.txt

## Other file types:

There are several other file types and we will have a look at them on the day.

In [None]:
!cat ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/sample_sheet_FAV40358_20221104_1534_9bd50a0f.csv

In [None]:
!head ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/throughput_FAV40358_9bd50a0f_619283ee.csv


In [None]:
!cat ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/final_summary_FAV40358_9bd50a0f_619283ee.txt


In [None]:
!cat ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/pore_activity_FAV40358_9bd50a0f_619283ee.csv

In [None]:
!cat ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/barcode_alignment_FAV40358_9bd50a0f_619283ee.tsv

# DATA Types - FAST5/POD5

Data are divided into "PASS" and "FAIL" based on an arbitrary quality threshold.

In [None]:
!ls ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/fast5_pass/

These files look like:

In [None]:
!ls ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/fast5_pass/barcode01/

They have now been superceded by a file type called "pod5". These files contain the raw signal data.

## DATA Types - fastq:

In [None]:
!ls ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/fastq_pass/

In [None]:
!ls ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/fastq_pass/barcode01/

In [None]:
!zcat  ../student_projects_2022/data/ic_131/Haloferax_clean_RBK004ori/20221104_1529_X2_FAV40358_9bd50a0f/fastq_pass/barcode01/FAV40358_pass_barcode01_9bd50a0f_619283ee_0.fastq.gz | head -4