<a href="https://colab.research.google.com/github/pjd-code/hudson-valley-tick/blob/main/hudson_valley_tick_analysis_qiime%2Bgreengenes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook will quickly run through the steps I used to process the raw reads. 

In [1]:
#clean up the Colab environment
%cd /content/sample_data
!rm *.csv
!rm *.md
!rm *.json
%cd /content/sample_data

/content/sample_data
/content/sample_data


In [None]:
from google.colab import drive
drive.mount('/content/sample_data/googledrive')

In [2]:
!git clone https://github.com/pjd-code/hudson-valley-tick.git

Cloning into 'hudson-valley-tick'...
remote: Enumerating objects: 38, done.[K
remote: Counting objects: 100% (38/38), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 38 (delta 15), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (38/38), done.


In [3]:
%run /content/sample_data/hudson-valley-tick/setup_qiime2.py

In [5]:
%cd /bin
!wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O /bin/bs
!chmod u+x /bin/bs

/bin
--2022-04-20 19:40:00--  https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs
Resolving launch.basespace.illumina.com (launch.basespace.illumina.com)... 18.65.229.81, 18.65.229.12, 18.65.229.70, ...
Connecting to launch.basespace.illumina.com (launch.basespace.illumina.com)|18.65.229.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12218368 (12M) [binary/octet-stream]
Saving to: ‘/bin/bs’


2022-04-20 19:40:00 (117 MB/s) - ‘/bin/bs’ saved [12218368/12218368]



In [6]:
!bs auth
!bs  whoami

Please go to this URL to authenticate:  https://basespace.illumina.com/oauth/device?code=JRwKS
Welcome, Preston Dihle
+----------------+----------------------------------------------------+
| Name           | Preston Dihle                                      |
| Id             | 23726704                                           |
| Email          | preston.dihle@gmail.com                            |
| DateCreated    | 2021-02-05 02:03:22 +0000 UTC                      |
| DateLastActive | 2022-04-20 19:40:24 +0000 UTC                      |
| Host           | https://api.basespace.illumina.com                 |
| Scopes         | READ GLOBAL, CREATE GLOBAL, BROWSE GLOBAL,         |
|                | CREATE PROJECTS, CREATE RUNS, START APPLICATIONS,  |
|                | MOVETOTRASH GLOBAL, WRITE GLOBAL                   |
+----------------+----------------------------------------------------+


In [None]:
!bs project download --id 346661718 --extension=fastq.gz -o /content/sample_data/sequence

In [8]:
%cd /content/sample_data/sequence

/content/sample_data/sequence


In [None]:
#consolidate in one folder
!mkdir samples 
!find . -name "*.gz" -exec mv "{}" samples \;
!rmdir */

In [None]:
!mv \
/content/sample_data/hudson-valley-tick/v4-150bp-se-ref-seqs-gg.qza \
/content/sample_data/hudson-valley-tick/v4-150bp-classifier-gg.qza \
/content/sample_data/sequence

In [None]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path /content/sample_data/sequence/samples \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

In [None]:
!qiime tools peek demux-paired-end.qza

In [None]:
!qiime demux summarize \
  --i-data demux-paired-end.qza \
  --o-visualization demux-paired-end.qzv

In [None]:
# This method denoises single-end sequences, dereplicates them, and filters chimeras.

!qiime dada2 denoise-single \
  --i-demultiplexed-seqs demux-paired-end.qza \
  --p-trim-left 0 \
  --p-trunc-len 150 \
  --o-representative-sequences se-rep-seqs-dada2.qza \
  --o-table se-table-dada2.qza \
  --o-denoising-stats se-stats-dada2.qza

In [None]:
!qiime feature-table summarize \
  --i-table se-table-dada2.qza \
  --o-visualization se-rep-seqs-dada2.qzv \
  --m-sample-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt

!qiime feature-table tabulate-seqs \
  --i-data se-rep-seqs-dada2.qza \
  --o-visualization se-rep-seqs-dada2.qzv

In [None]:
#open reference cluster using grene genes as referance
!qiime vsearch cluster-features-open-reference \
  --i-sequences se-rep-seqs-dada2.qza \
  --i-table se-table-dada2.qza \
  --i-reference-sequences v4-150bp-se-ref-seqs-gg.qza \
  --p-perc-identity .99 \
  --p-threads 0 \
  --o-clustered-table se-vs_clst99-table.qza \
  --o-clustered-sequences se-vs_clst99-seq.qza \
  --o-new-reference-sequences tick_refseq99_gg.qza

In [None]:
!qiime feature-table summarize \
  --i-table se-vs_clst99-table.qza \
  --o-visualization se-vs_clst99-table.qzv \
  --m-sample-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt

!qiime feature-table tabulate-seqs \
  --i-data se-vs_clst99-seq.qza \
  --o-visualization se-vs_clst99-seq.qzv

!qiime metadata tabulate \
  --m-input-file se-stats-dada2.qza \
  --o-visualization se-stats-dada2.qzv

In [None]:
!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences se-vs_clst99-seq.qza \
  --o-alignment aligned-rep-seqs.qza \
  --o-masked-alignment masked-aligned-rep-seqs.qza \
  --o-tree unrooted-tree.qza \
  --o-rooted-tree rooted-tree.qza

In [None]:
!qiime feature-classifier classify-sklearn \
  --i-classifier v4-150bp-classifier-gg.qza \
  --i-reads se-vs_clst99-seq.qza \
  --o-classification taxonomy.qza

In [None]:
!qiime taxa barplot \
  --i-table se-vs_clst99-table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt \
  --o-visualization taxa-bar-plots.qzv

In [None]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny rooted-tree.qza \
  --i-table se-vs_clst99-table.qza \
  --p-sampling-depth 3000 \
  --m-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt \
  --output-dir core-metrics-results 

In [None]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt \
  --o-visualization core-metrics-results/faith-pd-group-significance.qzv

!qiime diversity alpha-group-significance \
  --i-alpha-diversity core-metrics-results/evenness_vector.qza \
  --m-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt \
  --o-visualization core-metrics-results/evenness-group-significance.qzv

In [None]:
!qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt \
  --m-metadata-column collection-source \
  --o-visualization core-metrics-results/unweighted-unifrac-group-significance.qzv \
  --p-pairwise

!qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file /content/sample_data/hudson-valley-tick/plate1-tick-metadata.txt \
  --m-metadata-column sex \
  --o-visualization core-metrics-results/unweighted-unifrac-primers-significance.qzv \
  --p-pairwise