Skip to content

This repository is for maintaining the codes to analyze HCA dataset.

Notifications You must be signed in to change notification settings

jjahanip/RNA_seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA_seq Pipeline

This repository is for maintaining the codes for RNA_seq dataset.

Pipeline:

1. Download the Dataset:

Download the Census of Immune Cells dataset from https://preview.data.humancellatlas.org.

This will create 4 folders with fastq.gz files. Unzip the files in each folder:

tar -xzf *.fastq.gz

2. Prepare the Dataset:

  1. Download the cellranger sofware from here.

  2. Download the GRCh38 as the reference.

  3. Run cellranger count following the instructions from here. For example for MantonBM1:

cellranger count --id=MantonBM1 \
--fastqs=2a87dc5c-0c3c-4d91-a348-5d784ab48b92 \
--transcriptome=<path_to_reference_file> \
--sample=MantonBM1_HiSeq_1,MantonBM1_HiSeq_2,MantonBM1_HiSeq_3,MantonBM1_HiSeq_4,MantonBM1_HiSeq_5,MantonBM1_HiSeq_6,MantonBM1_HiSeq_7,MantonBM1_HiSeq_8

This will create a folder MantonBM1.

Locate the filtered_gene_bc_matrices_h5.h5 file.

3. Run Non-Parametric Clustering:

You have following parameters to tune for different setting:

  • cutoff_thresh : Remove genes that the sum of counts is below threshold
  • dim_red_method: Dimension reduction method { 'PCA' , 'SNE'}
  • red_dim : Reduced dimension

It will visualize samples in a color-coded manner like this:

You can provide gene names (space separate the names) and see the heatmap of the sum of expressesd genes in each cluster.

About

This repository is for maintaining the codes to analyze HCA dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published