<a href="https://colab.research.google.com/github/hanielcedraz/endoGenes/blob/master/run_endoGenes_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# endoGenes - Reference Genes Analysis
This pipeline performs an automated analysis of the three most used algorithms to verify the stability of reference genes

# STEP.1 - Mouting your google drive:
You need to run the next code, then it will generate a link. Click on the link and choose your google account. click on allow. It will generate a code. Copy the code and paste it where you see "Enter your authorization code:" then press enter.


In [0]:
from google.colab import drive
drive.mount('/content/drive/')

# STEP.2 - Cloning the git repository:
Cloning the whole repository for your google drive


In [0]:
%%bash
ln -s /content/drive/My\ Drive /content/MyDrive
cd /content/MyDrive

if ! [ -d endoGenes ]
  then
    git clone https://github.com/hanielcedraz/endoGenes.git
    cd endoGenes
  else
    cd endoGenes
fi
ls -hl
chmod +x endoGenes.R





## First things first
Now that you download the repository into the your google drive, you need to upload your files into the folder "endoGenes" in your google drive.
<br>
<br>

Now tah you have your own data lets take a look at your endogenous and efficiency files to see if everything is alright.

### Endogenous file


In [0]:
import pandas as pd
endogenous = "/content/MyDrive/endoGenes/endogenous_ct.txt" # this is the default file. change it if you have a different name for your dataset.

data_set = pd.read_csv(endogenous, sep="\t")
data_set

### Efficiency list

In [0]:
efficiency = "/content/MyDrive/endoGenes/efficiencies_list.txt" # this is the default file. change it if you have a different name for your dataset.
eff = pd.read_csv(efficiency, sep="\t")
eff

## Is it everything alright?
if yes, you are ready to perform the analysis



# Analysing your dataset
Now you have your own dataset uploaded into the /content/MyDrive/endoGenes. What you have to do is to run the endoGenes pipeline unsing the options for your dataset. e.g. endogenousfile.txt and efficiencyfile.txt.

If the name of your endogenous file is exp_endogenous.txt and the name of your efficiency file is exp_efficiency.txt, then use:

    $ ./endoGenes.R -f exp_endogenous.txt -e exp_efficiency.txt

It will generate a folder named 01-results. <br>
If you want a differente name you need to use the option -o.

    $ ./endoGenes.R -f exp_endogenous.txt -e exp_efficiency.txt -o name_of_folder

Change the code according your dataset and run the next block.
 
If you chose a different name for you results folder, please change it as well in the next block.

In [0]:
%%bash
folder=01-results # if you want a different name, change here the name of your results folder
cd /content/MyDrive/endoGenes

./endoGenes.R -o $folder


cd $folder


## Boxplot by gene

In [0]:
folder = "01-results" # if you chose a different name for your results folder, change here it here

from IPython.display import Image
from IPython.core.display import HTML
Image("/content/MyDrive/endoGenes/"+folder+"/Rplot_boxplot_genes.png")

## Boxplot by groups

In [0]:
Image("/content/MyDrive/endoGenes/"+folder+"/Rplot_boxplot_groups.png")

## Stability genes by geNorm

### Stability Ranking

In [0]:
genorm = "/content/MyDrive/endoGenes/01-results/general_ranking_genorm.csv" 

genorm_rank = pd.read_csv(genorm)
genorm_rank

### Stability plot

In [0]:
Image("/content/MyDrive/endoGenes/"+folder+"/Rplot_gene_stability_by_genorm.png")

### Gene variation plot

In [0]:
Image("/content/MyDrive/endoGenes/"+folder+"/Rplot_gene_variation_by_genorm.png")

## Stability genes by NormFinder

### Stability Ranking

In [0]:
normfinder = "/content/MyDrive/endoGenes/01-results/ranking_Ordered_normfinder.csv"
norm_rank = pd.read_csv(normfinder)
norm_rank

### Stability plot

In [0]:
Image("/content/MyDrive/endoGenes/"+folder+"/Rplot_gene_stability_by_NormFinder.png")

## Stability genes by Bestkeeper

### Descriptive statistics

In [0]:
descrip = "/content/MyDrive/endoGenes/01-results/Bestkeeper_results_CP.statistics.csv"
summary_stat = pd.read_csv(descrip)
summary_stat

### Stability Ranking

In [0]:
bestkeeper = "/content/MyDrive/endoGenes/01-results/Bestkeeper_best_genes_ordered.csv"
best_genes = pd.read_csv(bestkeeper)
best_genes

### Stability plot

In [0]:
Image("/content/MyDrive/endoGenes/"+folder+"/Rplot_gene_stability_by_BestKeeper.png")

## Final Ranking

In [0]:
final_ranking = pd.read_csv("/content/MyDrive/endoGenes/01-results/Final_ranking.csv")
final_ranking