# RNA-Seq Data Analysis
In this jupyter notebook we will walk through some data analysis using jupyter notebooks. We will go over the general steps covered in lecture:

1) FastQ quality control
2) Read Mapping
3) Features counts
4) Simple Image generation

We will be using the data from Fu et al (2015) ' EGF- mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival' Nat Cell Biol.

This study examined the expression profiles of basal and luminal cells in the mammary gland of virgin, pregnant and lactating mice. Six groups are present, with one for each combination of cell type and mouse status. Note that two biological replicates are used here, two independent sorts of cells from the mammary glands of virgin, pregnant or lactating mice, however three replicates is usually recommended as a minimum requirement for RNA-seq.


As a first step we wil prepare the notebook to work with the software that we need, we will:
- Install java
- Download fastqc and make it executable

In [1]:
!sudo apt-get install -y default-jre
!java -version

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  default-jre-headless fonts-dejavu-core fonts-dejavu-extra
  libatk-wrapper-java libatk-wrapper-java-jni libfontenc1 libxkbfile1 libxtst6
  libxxf86dga1 openjdk-11-jre x11-utils
Suggested packages:
  mesa-utils
The following NEW packages will be installed:
  default-jre default-jre-headless fonts-dejavu-core fonts-dejavu-extra
  libatk-wrapper-java libatk-wrapper-java-jni libfontenc1 libxkbfile1 libxtst6
  libxxf86dga1 openjdk-11-jre x11-utils
0 upgraded, 12 newly installed, 0 to remove and 29 not upgraded.
Need to get 3,720 kB of archives.
After this operation, 12.7 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 default-jre-headless amd64 2:1.11-72build2 [3,042 B]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxtst6 amd64 2:1.2.3-1build4 [13.4 kB]
Get:3 http://archive.ubuntu

In [2]:
!wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip

--2025-03-31 19:51:06--  https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip
Resolving www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)... 149.155.133.4
Connecting to www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)|149.155.133.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10249221 (9.8M) [application/zip]
Saving to: ‘fastqc_v0.11.9.zip’


2025-03-31 19:51:07 (9.38 MB/s) - ‘fastqc_v0.11.9.zip’ saved [10249221/10249221]



In [3]:
!unzip fastqc_v0.11.9.zip

Archive:  fastqc_v0.11.9.zip
  inflating: FastQC/cisd-jhdf5.jar   
   creating: FastQC/Configuration/
  inflating: FastQC/Configuration/adapter_list.txt  
  inflating: FastQC/Configuration/contaminant_list.txt  
  inflating: FastQC/Configuration/limits.txt  
  inflating: FastQC/fastqc           
  inflating: FastQC/fastqc_icon.ico  
   creating: FastQC/Help/
   creating: FastQC/Help/1 Introduction/
   creating: FastQC/Help/1 Introduction/.svn/
  inflating: FastQC/Help/1 Introduction/.svn/entries  
   creating: FastQC/Help/1 Introduction/.svn/props/
   creating: FastQC/Help/1 Introduction/.svn/text-base/
  inflating: FastQC/Help/1 Introduction/.svn/text-base/1.1 What is FastQC.html.svn-base  
   creating: FastQC/Help/1 Introduction/.svn/tmp/
   creating: FastQC/Help/1 Introduction/.svn/tmp/props/
  inflating: FastQC/Help/1 Introduction/1.1 What is FastQC.html  
   creating: FastQC/Help/2 Basic Operations/
   creating: FastQC/Help/2 Basic Operations/.svn/
  inflating: FastQC/Help/2 Basic

In [4]:
!ls FastQC/

cisd-jhdf5.jar	 Help		 LICENSE_JHDF5.txt  README.md	       sam-1.103.jar
Configuration	 INSTALL.txt	 LICENSE.txt	    README.txt	       Templates
fastqc		 jbzip2-0.9.jar  net		    RELEASE_NOTES.txt  uk
fastqc_icon.ico  LICENSE	 org		    run_fastqc.bat


In [5]:
!chmod +x FastQC/fastqc

In [6]:
!./FastQC/fastqc --version

FastQC v0.11.9


Now we will download the data from Fu et al, we will save it as example.zip (that's the -O below), we will unzip it

In [7]:
!wget 'https://figshare.com/ndownloader/articles/3219673?private_link=f5d63d8c265a05618137' -O example.zip

--2025-03-31 19:53:15--  https://figshare.com/ndownloader/articles/3219673?private_link=f5d63d8c265a05618137
Resolving figshare.com (figshare.com)... 52.30.109.106, 52.17.159.36, 52.49.122.173, ...
Connecting to figshare.com (figshare.com)|52.30.109.106|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 412386843 (393M) [application/zip]
Saving to: ‘example.zip’


2025-03-31 19:53:32 (24.3 MB/s) - ‘example.zip’ saved [412386843/412386843]



In [8]:
!unzip example.zip

Archive:  example.zip
 extracting: SRR1552444.fastq.gz     
 extracting: SRR1552445.fastq.gz     
 extracting: SRR1552446.fastq.gz     
 extracting: SRR1552447.fastq.gz     
 extracting: SRR1552448.fastq.gz     
 extracting: SRR1552449.fastq.gz     
 extracting: SRR1552450.fastq.gz     
 extracting: SRR1552451.fastq.gz     
 extracting: SRR1552452.fastq.gz     
 extracting: SRR1552453.fastq.gz     
 extracting: SRR1552454.fastq.gz     
 extracting: SRR1552455.fastq.gz     
 extracting: targets2.txt            
 extracting: chr1_mm10.00.b.array    
 extracting: chr1_mm10.00.b.tab      
 extracting: chr1_mm10.files         
 extracting: chr1_mm10.reads         


Now that it is unzipped lets take a look at one (by running the code below), what do you observe, please take a moment to identify the four lines that indentify a single sequencing read.

In [9]:
!zcat SRR1552446.fastq.gz

@SRR1552446.1 DCV4KXP1:223:C2CTUACXX:1:1101:1628:2240 length=100
TGGTGATTGCTTCTCTGGAAAACTATGCTCCTCTGTCAAGAATCGGCTTTTTCGTTAAAGCAGGCAGTAGATNTGANGACNCCCACAACTTTGGAACCCC
+SRR1552446.1 DCV4KXP1:223:C2CTUACXX:1:1101:1628:2240 length=100
@@@BDFFFHGHHGIGIJJIJGIIJJGIJIIGIJJIIJJIIJJJJJJJJJ3?<*7=FBBF48@3C;;@=).?E############################
@SRR1552446.2 DCV4KXP1:223:C2CTUACXX:1:1101:1875:2224 length=100
CTCATCCTTCTCCACACACAGACACCAAGTCTGGGAAAGGCAGCCACCTGCTCTGACCCAATGGCCAGCACNNGCNNNTTNAAGCACCGAGGGCCCAGTG
+SRR1552446.2 DCV4KXP1:223:C2CTUACXX:1:1101:1875:2224 length=100
CCCFFFFFHHHHHJJIJJJJJJJJJJJJJIIJJJIIJJJJJJHIJJJJJJJIJJJIJJJIIJJHHHHFFFF##,5###,,#,,<BDDDBDDBDDDDDDDD
@SRR1552446.3 DCV4KXP1:223:C2CTUACXX:1:1101:1785:2248 length=100
GTTAAAGTCTGCACTGACATTTCCTTGTCTGCCGTTGCATGCCGTTGGCATGCAAGGTGTTAATGACCTGCAACATGGTGGAGTGCCCTGAACCCTAACT
+SRR1552446.3 DCV4KXP1:223:C2CTUACXX:1:1101:1785:2248 length=100
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJFFHIIJJJJJJIHHHHFFCFFEEDDDCBDDDDDDDDDDDDDDD
@SRR

Now, let's take run fastqc on a single file!

In [16]:
!./FastQC/fastqc SRR1552444.fastq.gz

Started analysis of SRR1552444.fastq.gz
Approx 100% complete for SRR1552444.fastq.gz
Analysis complete for SRR1552444.fastq.gz


Based on what we covered in the lecture portion, what do you think of the data,
#### is this overall good or bad quality?

In [None]:
#Great quality score
#Something is wrong with the beginning of each read?

Before we continue with the analysis, lets rename the files,
SRRXXXXX is the accession number in a public repository, this is how we share 'omic data generated, lets raname it based on the experimental designed presented above

|ACCESSION ID | Sample Name |
|---|---|
|SRR1552444   |  Sample_luminalvirgin_01 |
|SRR1552445   |  Sample_luminalvirgin_02 |
|SRR1552446   |  Sample_luminalpregnant_01 |
|SRR1552447   |  Sample_luminalpregnant_02 |
|SRR1552448   |  Sample_luminallactate_01 |
|SRR1552449   |  Sample_luminallactate_02 |
|SRR1552450   |  Sample_basalvirgin_01 |
|SRR1552451   |  Sample_basalvirgin_02 |
|SRR1552452   |  Sample_basalpregnant_01 |
|SRR1552453   |  Sample_basalpregnant_02 |
|SRR1552454   |  Sample_basallactate_01 |
|SRR1552455   |  Sample_basallactate_02 |



In [18]:
!mv SRR1552446.fastq.gz Sample_luminalvirgin_02.fastq.gz

Now based on what we observed in our fastqc outputs lets start doing some QC on our actual data, we will be using cutadapt which is a popular software but my no means the only one.

In [19]:
!pip install cutadapt




In [20]:
!cutadapt --version

5.0


Lets run cutadapt on a sample, asking to cut the first 15 bases and to keep sequences with quality scores higher than 28

In [21]:
!cutadapt --cut 15 -q 28  -o Trimmed_luminalvirgin_02.fastq Sample_luminalvirgin_02.fastq.gz

This is cutadapt 5.0 with Python 3.11.11
Command line parameters: --cut 15 -q 28 -o Trimmed_luminalvirgin_02.fastq Sample_luminalvirgin_02.fastq.gz
Processing single-end reads on 1 core ...
Done           00:00:00         1,000 reads @   8.6 µs/read;   7.01 M reads/minute

=== Summary ===

Total reads processed:                   1,000
Reads written (passing filters):         1,000 (100.0%)

Total basepairs processed:       100,000 bp
Quality-trimmed:                   3,555 bp (3.6%)
Total written (filtered):         81,445 bp (81.4%)


Let's run fastqc on it one more time and download the data,
what do you observe?
remember to check the actual file name to download!

In [22]:
!./FastQC/fastqc Trimmed_luminalvirgin_02.fastq

Started analysis of Trimmed_luminalvirgin_02.fastq
Approx 100% complete for Trimmed_luminalvirgin_02.fastq
Analysis complete for Trimmed_luminalvirgin_02.fastq


In [None]:
#Great quality ber base read once again
#Error is seen in seq length distribution

### 2. Read Mapping
We will be using hisat2, remember there are many options!! We will use this as a case example, right now I just want you to be aware that this is not the only option!

First, we will install hisat2

In [23]:
!apt-get install hisat2

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  bcftools libhts3 libhtscodecs2 python3-hisat2 samtools
Suggested packages:
  python3-numpy python3-matplotlib texlive-latex-recommended cwltool
The following NEW packages will be installed:
  bcftools hisat2 libhts3 libhtscodecs2 python3-hisat2 samtools
0 upgraded, 6 newly installed, 0 to remove and 29 not upgraded.
Need to get 5,505 kB of archives.
After this operation, 17.1 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libhtscodecs2 amd64 1.1.1-3 [53.2 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libhts3 amd64 1.13+ds-2build1 [390 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 bcftools amd64 1.13-1 [697 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 hisat2 amd64 2.2.1-3 [3,832 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy/un

We are going to use the approach "align against a reference genome", from last week lecture you should remember that there were multiple posibilities, to do that we have to
1) Download the genome, 2) run hisat2 with our data

In [24]:
!wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/mm10.tar.gz

--2025-03-31 20:13:09--  ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/mm10.tar.gz
           => ‘mm10.tar.gz’
Resolving ftp.ccb.jhu.edu (ftp.ccb.jhu.edu)... 128.220.174.63
Connecting to ftp.ccb.jhu.edu (ftp.ccb.jhu.edu)|128.220.174.63|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/infphilo/hisat2/data ... done.
==> SIZE mm10.tar.gz ... 3804366597
==> PASV ... done.    ==> RETR mm10.tar.gz ... done.
Length: 3804366597 (3.5G) (unauthoritative)


2025-03-31 20:13:56 (77.8 MB/s) - ‘mm10.tar.gz’ saved [3804366597]



In [25]:
!tar -xzf mm10.tar.gz



In [26]:
!ls mm10

genome.1.ht2  genome.3.ht2  genome.5.ht2  genome.7.ht2	make_mm10.sh
genome.2.ht2  genome.4.ht2  genome.6.ht2  genome.8.ht2


In [27]:
!hisat2 -x mm10/genome -U SRR1552444.fastq.gz -S Aln_luminalvirgin_02.sam

1000 reads; of these:
  1000 (100.00%) were unpaired; of these:
    47 (4.70%) aligned 0 times
    851 (85.10%) aligned exactly 1 time
    102 (10.20%) aligned >1 times
95.30% overall alignment rate


In [28]:
!hisat2 -x mm10/genome -U SRR1552453.fastq.gz -S Aln_luminalvirgin_03.sam


1000 reads; of these:
  1000 (100.00%) were unpaired; of these:
    35 (3.50%) aligned 0 times
    886 (88.60%) aligned exactly 1 time
    79 (7.90%) aligned >1 times
96.50% overall alignment rate


Let's take a look at the alignment files!, take a moment to identify the read name, the sequence where it is aligning, the alignment shorthad string

In [29]:
!samtools view Aln_luminalvirgin_02.sam


SRR1552444.1	0	chr13	17895420	60	100M	*	0	0	CTGCCCTCAGCTATCTTCTCATGCTGCAAGTCTGACTCCACCGTCCTAGGTGTAGGAGCTGTCTCCATGGANNGGTNACANGTACATACAGTCTACAGCC	CCCFFFFFHHHHHJJJJJJJJJJJJIJJJJJJIJIJJJJJJJJIJJJJJJHHHIIJJIJJJJJJJJJJJJG##-5;#,5=#,;?BDEEDDDEDDCDDDDD	AS:i:-4	XN:i:0	XM:i:4	XO:i:0	XG:i:0	NM:i:4	MD:Z:71T0G3C3T19	YT:Z:UU	NH:i:1
SRR1552444.2	0	chr7	116509593	60	33M12211N67M	*	0	0	AATAAAAAAGATAAAACCTTGGCCTGTCTGAAGATGAGGTGGAGGATCATCCAAGTACAGTACTGTTTTCTCTTGGTTCCGTGCATGCTGACCGCTCTGG	@@<DDD?DH?DHFIG<EEGHGHE@FHDFHIDCDDHGIII3?B?FGGICCBHCDFI=FGGHGHE@EHIDHEH7??@C;B;;A@BBBEC>3:>?A:=<BBA:	AS:i:-1	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:100	YT:Z:UU	XS:A:+	NH:i:1
SRR1552444.3	0	chr6	125114298	60	100M	*	0	0	CAACAAGGAGGGAGAAGACAGCAGTGTTATCCACTATGACGATAAGGCCATTGAACGACTGCTGGATCGAANNCANNNTGNGACTGAAGACACAGAATTG	CCCFFFFFGHHHGJJJJJJJJJJJIIIJIJJJJJJJJIJJJIJJJJJIJIIJIJJJIJJHHHHFFFFFDDE##,,###,,#,5?BDDDDCDDDDDDDDDC	AS:i:-6	XN:i:0	XM:i:6	XO:i:0	XG:i:0	NM:i:6	MD:Z:71A0C2G0G0A2A19	YT:Z:UU	NH:i:1
SRR1552444.4	4	*	0	0	*	*	0

Interestingly we can run fastqc on a sam/bam file, take a moment to run it and donwload it as we have done before!, Hint: Here is the first line of code. What do you observe!

In [30]:
!./FastQC/fastqc  Aln_luminalvirgin_02.sam

Started analysis of Aln_luminalvirgin_02.sam
Approx 100% complete for Aln_luminalvirgin_02.sam
Analysis complete for Aln_luminalvirgin_02.sam


In [31]:
!ls

Aln_luminalvirgin_02_fastqc.html  SRR1552444_fastqc.zip
Aln_luminalvirgin_02_fastqc.zip   SRR1552444.fastq.gz
Aln_luminalvirgin_02.sam	  SRR1552447.fastq.gz
Aln_luminalvirgin_03.sam	  SRR1552448.fastq.gz
chr1_mm10.00.b.array		  SRR1552449.fastq.gz
chr1_mm10.00.b.tab		  SRR1552450.fastq.gz
chr1_mm10.files			  SRR1552451.fastq.gz
chr1_mm10.reads			  SRR1552452.fastq.gz
example.zip			  SRR1552453.fastq.gz
FastQC				  SRR1552454.fastq.gz
fastqc_v0.11.9.zip		  SRR1552455.fastq.gz
mm10				  targets2.txt
mm10.tar.gz			  Trimmed_luminalvirgin_02.fastq
sample_data			  Trimmed_luminalvirgin_02_fastqc.html
Sample_luminalvirgin_02.fastq.gz  Trimmed_luminalvirgin_02_fastqc.zip
SRR1552444_fastqc.html


In [None]:
#Per base seq is off at the beginning position of each read

### 3.- Features counts
We will bse now using subread (https://subread.sourceforge.net/), which comprises a suite of software programs for processing next-gen sequencing read data, importantly for our case it can sum and summarize hits really fast

In [32]:
!apt-get install subread

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  subread
0 upgraded, 1 newly installed, 0 to remove and 29 not upgraded.
Need to get 565 kB of archives.
After this operation, 1,928 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 subread amd64 2.0.3+dfsg-1 [565 kB]
Fetched 565 kB in 1s (574 kB/s)
Selecting previously unselected package subread.
(Reading database ... 126572 files and directories currently installed.)
Preparing to unpack .../subread_2.0.3+dfsg-1_amd64.deb ...
Unpacking subread (2.0.3+dfsg-1) ...
Setting up subread (2.0.3+dfsg-1) ...
Processing triggers for man-db (2.10.2-1) ...


In [33]:
!wget https://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/Mus_musculus.GRCm38.102.gtf.gz

--2025-03-31 20:18:30--  https://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/Mus_musculus.GRCm38.102.gtf.gz
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.169
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.169|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 33443321 (32M) [application/x-gzip]
Saving to: ‘Mus_musculus.GRCm38.102.gtf.gz’


2025-03-31 20:18:33 (13.6 MB/s) - ‘Mus_musculus.GRCm38.102.gtf.gz’ saved [33443321/33443321]



The reference genome does not have coordiates per each gene, so we will be downloading a GTF files (general feature format) You can learn more about a GTF file from your favorite website (https://useast.ensembl.org/info/website/upload/gff.html)

In [34]:
!gunzip Mus_musculus.GRCm38.102.gtf.gz

In [35]:
!head -n 100 Mus_musculus.GRCm38.102.gtf

#!genome-build GRCm38.p6
#!genome-version GRCm38
#!genome-date 2012-01
#!genome-build-accession NCBI:GCA_000001635.8
#!genebuild-last-updated 2020-02
1	havana	gene	3073253	3074322	.	+	.	gene_id "ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1";
1	havana	transcript	3073253	3074322	.	+	.	gene_id "ENSMUSG00000102693"; gene_version "1"; transcript_id "ENSMUST00000193812"; transcript_version "1"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1"; transcript_name "4933401J01Rik-201"; transcript_source "havana"; transcript_biotype "TEC"; havana_transcript "OTTMUST00000127109"; havana_transcript_version "1"; tag "basic"; transcript_support_level "NA";
1	havana	exon	3073253	3074322	.	+	.	gene_id "ENSMUSG00000102693"; gene_version "1"; transcript_id "ENSMUST00000193812"; transcript_version "1"; exon_numbe

We ca finally count features! But before going there take a moment to run cutadapt and hisat2 on two more samples.
HINT: We are reapeaing the code boxes that start with "!cutadapt --cut" and "!hisat2 -x " do it for at two more samples, remeber to change the names and track them as trimmed_XXXXX and aln_XXXXX

1lsHopefully after this you should have three more files, run features counts on them as follow, (NOTE the code block below might not work unless

In [36]:
!featureCounts -a Mus_musculus.GRCm38.102.gtf -o counts.txt  *.sam


       [44;37m =====      [0m[36m   / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
       [44;37m   =====    [0m[36m  | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
       [44;37m     ====   [0m[36m   \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
       [44;37m       ==== [0m[36m   ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
	  v2.0.3

||  [0m                                                                          ||
||             Input files : [36m2 SAM files  [0m [0m                                   ||
||  [0m                                                                          ||
||                           [36mAln_luminalvirgin_02.sam[0m [0m                        ||
||                           [36mAln_luminalvirgin_03.sam[0m [0m                        ||
||  [0m                                                                          ||
||             Output file : [36mcounts.txt[0m [0m                                      

In [37]:
!head -n 50 counts.txt

# Program:featureCounts v2.0.3; Command:"featureCounts" "-a" "Mus_musculus.GRCm38.102.gtf" "-o" "counts.txt" "Aln_luminalvirgin_02.sam" "Aln_luminalvirgin_03.sam" 
Geneid	Chr	Start	End	Strand	Length	Aln_luminalvirgin_02.sam	Aln_luminalvirgin_03.sam
ENSMUSG00000102693	1	3073253	3074322	+	1070	0	0
ENSMUSG00000064842	1	3102016	3102125	+	110	0	0
ENSMUSG00000051951	1;1;1;1;1;1;1	3205901;3206523;3213439;3213609;3214482;3421702;3670552	3207317;3207317;3215632;3216344;3216968;3421901;3671498	-;-;-;-;-;-;-	6094	0	0
ENSMUSG00000102851	1	3252757	3253236	+	480	0	0
ENSMUSG00000103377	1	3365731	3368549	-	2819	0	0
ENSMUSG00000104017	1	3375556	3377788	-	2233	0	0
ENSMUSG00000103025	1	3464977	3467285	-	2309	0	0
ENSMUSG00000089699	1;1	3466587;3513405	3466687;3513553	+;+	250	0	0
ENSMUSG00000103201	1	3512451	3514507	-	2057	0	0
ENSMUSG00000103147	1	3531795	3532720	+	926	0	0
ENSMUSG00000103161	1	3592892	3595903	-	3012	0	0
ENSMUSG00000102331	1;1	3647309;3658847	3650509;3658904	-;-	3259	0	0
ENSMUSG00000102348	

#### 4.- Now let's visualize our data!

Download your counts table and load it into R. Answer the following questions on an R-markdown. I want to see your code!

1.- Use an online resource and look up the function mutate and use it to to sum all the counts per row. Which is the gene with most counts accross all samples.

2.- What Is the length of the longest Gene and shortest Gene.

3.- Use ggplot() and plot a histogram of counts across all samples (your new col with the sum).

4.- use heatmap to visualize gene expression (Each row represents a gene, Each column represents a sample, The color represents the effect size)
