# RStudio 

The final steps of our analysis will be performed on [RStudio Cloud](https://rstudio.cloud).  
RStudio is the official platform to run analysis using R, a programming language particularly used for statistics and bioinformatics. This cloud instance of RStudio allows users to use R and RStudio without the need to install these programs locally on their machine. 

You should already be logged in with your credentials, so let's start. 

Go to our [NGS_Data_Practice](https://rstudio.cloud/spaces/8310/projects) workspace on RStudio Cloud. You should see a project called **VCF_Annotation**, click on it to open RStudio.  

When the project is loaded, you will see your RStudio interface. This is where you will work with R commands. 

___ 

## 1. Functional annotation

The functional annotation of the mitochondrial variants we found will be performed using data hosted on [HmtVar](https://www.hmtvar.uniba.it), a database of human mitochondrial variations.  
For each variants we found, we will launch a query on HmtVar, retrieve all the relevant information and update our VCF file with these new data. 

First of all, we need to upload on RStudio the VCF file we created after our Galaxy workflow.  
In the **Files** pane, click on **Upload**, then select the downloaded VCF file and click on **OK**. 

![](data/imgs/rstudio_2.jpg)

You should now see your VCF file listed in the **Files** pane.  
There's another file listed here, named **annotate_vcf.R**. This is an R script file, which contains functions and operations that will retrieve data from HmtVar and parse them to annotate our mitochondrial variants. 

![](data/imgs/rstudio_3.jpg)

If you want, you can open this script by clicking on it; a new pane will open (the **Source** pane), showing the content of the file.  
It is not necessary for you to understand what is written in this script file; you just have to know that it will gather variants from an input VCF file, look for their entries in HmtVar and retrieve some specific information, that will finally be added to the original VCF file. 

To run the **annotate_vcf.R** script and perform the functional annotation on our VCF file, in the **Console** pane click on the **Terminal** tab. Here you can type commands (not R-specific) and get responses from the machine. 
Type `Rscript annotate_vcf.R <VCF_file_name>`, where `<VCF_file_name>` is the name of the VCF file you downloaded from Galaxy and uploaded to RStudio. In this example, the command would be as follows: 

![](data/imgs/rstudio_4.jpg)

When you hit Return, the script will be launched and the functional annotation will be performed.  
After a while, you should see in your **Files** pane that a new file called **annotated_sample.csv** was created. In order to open it, click on the file and then on **Import Dataset...** then click on **Update** in the upper right corner of the window that will open. Finally, click on **Import** to load this dataset into R.  
The file will be also visible in the **Source** pane.  

![](data/imgs/rstudio_5.jpg)

This file contains a few more columns than the original VCF:  

- `pathogenicity`, which contains variant pathogenicity information offered by HmtVar 
- `locus`, which reports the mitochondrial locus in which the variant is located 
- `aa_change`, with the amino acid change caused by the variant (where applicable) 
- `dbSNP`, which reports the [dbSNP](https://www.ncbi.nlm.nih.gov/projects/SNP/) ID related to the variant 
- `clinvar`, with the disease in which the variant is involved 

___ 

## 2. Variants visualization

To complete our analysis procedure, we can visualise our variants on a human mitochondrial genome representation. We will use the [`mitovizR`](https://github.com/robertopreste/mitovizR) package, which needs to be downloaded and installed first. 

R packages can usually be installed with the `install.packages()` function; however, packages still in development are not available through this method, and need to be installed using the `devtools::install_github()` function.  

Go back to the **Console** tab in the **Console** pane, and load the `devtools` package:  
```r 
library(devtools)
```

Now install `mitovizR`: 
```r 
install_github("robertopreste/mitovizR") 
```

You should see that something's going on in your console: these messages tell you that the `mitovizR` package is being downloaded and installed. When you see that the orange messages stopped appearing and the cursor is blinking again, you are ready to go. 

![](data/imgs/rstudio_6.jpg)