## Load the Required Packages

In [None]:
suppressWarnings(suppressMessages(library(fishHook)))
suppressWarnings(suppressMessages(library(skitools)))

## Now we will need some data
**fishHook utilizes gamme poisson regression to idenfity frequently mutated or amplified/delete regions of the genome from sequencing and microarray data. To do this we need to take a set of genomic targets, and test each one against the hypothesis that they are significantly altered in comparison to the other targets. In this first example we will use genes as our targets and use exome data as the mutational events. Since exome sequencing tends to exhibt strong sequencing bias, we want to include this information in our analysis. To do this we constructed a GRanges called eligible that will indicate the regions that have sufficeint coverage. **

In [None]:
setwd("~/git/fishHook/data")

## Mutational Events

In [None]:
mutational_events = readRDS("events.rds")
mutational_events

## Gene Targets

In [None]:
gene_targets = readRDS("targets.rds")
gene_targets

## Eligible

In [None]:
eligible = readRDS("eligible.rds")
eligible

## The FishHook Object
All of the data manipulations are handled by the fish.hook object given entered data. You can initialize it as follows. 

In [None]:
fish = FishHook$new(targets = gene_targets, events = mutational_events, eligible = eligible)
fish

## Points about FishHook Object
The FishHook object will take various states during our analysis. You can access this through state or from the output of the fish object. You can also access any of the provided variables.

In [None]:
fish$state
fish$targets
fish$events
fish$eligible

## Annotating The FishHook Object
In order to test each hypothesis against the null, which will be the normal mutational load of the genes. To do this we will need to count how many events fall into each target gene. We call this process annotation and can be done as follows. Note that we use verbose=F so as to limit spam. This process should take from a few seconds up to a minute.

In [None]:
fish$annotate(verbose = F)

## Note that the State of our FishHook Object is now "Annotated"
You can access the annotation information with anno

In [None]:
fish
fish$anno

## Scoring the Targets
Now that we have determined the mutational burden (count) at each target, we can now need to create a null model and test each of our hypothesize against this model. Note that because we are using the targets as thier own controlls there is an assumption that a majority of the targets will follow the null hypothesis.

In [None]:
fish$score()

## Note that the State of our FishHook Object is now "Scored"
You can access the scoring information with scores. Or if you want to merge this with the origanl targets data you can use 'all'. This includes the p and q values assigned to each target.

In [None]:
fish
fish$scores[1:10]
fish$all[1:10]

## Visualizing The Data
Grabbing the raw data from the scores field in the fish object is an ok way to manually go through the data but if we are looking to easily identify what is and what is not significant we would have a hard time with the manual inspection. To solve this issue we can utilize a qqplot that will plot the observed distribution of p values versus the expected (uniform) distribution of p values. Significnat hits will be ones that vary greatly from the expected.

In [None]:
suppressWarnings(plot <- fish$qq_plot(plotly = F))


## Visualizing the Data cont.
The above is cool and all but we probably want to annotate the hover text of each point with targets metadata, to do that we can use the columns param in qq_plot(). Note that you can specify any column that is present in the 'all' output. You can also provide your own vectors through annotations. P value will be included in all graphs created but Count, Effectsize, HypothesisID and q will only be added by default if not annotations are specified by the user.

In [None]:
fish$all[1:10]

"Column Annotations"
suppressWarnings(plot1 <- fish$qq_plot(columns = c("gene_name")))
plot1

"Novel Annotations"
suppressWarnings(plot2 <- fish$qq_plot(columns = c("gene_name"), annotations = list(test = c("testing", "123"))))
plot2

## Covariates
Now we know how to test for which targets are a hotspot for mutations. However, mutational hotspots can be caused by various biological phenomina that are unrelated to cancer. Fore example, replication timing, transcription status, chromatin state and sequence context can all play a role in the formation of mutations. We refer to these biological factors that influence mutation covariates. FishHook has its own object for instantiating covariates, but first lets load up the replication timing covariate as a Genomic Ranges object. It contain a 'score' for each region of the genome.

In [None]:
replication_timing = readRDS("covariate.rds")
replication_timing

## Creating Covariates
**The following information is required when creating covariates:**

Covariate(referenced with cvs): This is meat of the object and in this case will be our replication timing object. It can be of class GRanges, character (file path), RleList or ffTrack object. In this case replication timing is a GRanges Object.**

Type: There are three covariate types. Numeric, like replication timing where each region gets a numeric value assigned to it. Interval, where we indicate regions that are "marked" with this covariate. For example, H3K9me3. Sequence, which can be something like GC content.

Name: The name you give to this covariate

**Other Parameters that are not always required:**

Field: This is for numeric covariates and is the column name where the 'score' is held. Note that it is set to 'score' by default.

Signature: This is only required if the Covariate you are using is an ffTrack Object, this is similar to field.

Pad: This indicates how much to the left and to the right of the covariate we should consider its influence. e.g. if a covariate was from position 100-150 with pad = 5 we would consider it for positions 95-155.


In [None]:
rept = Cov_Arr$new(cvs = replication_timing, type = 'numeric', name = 'rept')
rept

## Covariate Manipulations:
Covariates can be operated on as if they were atomic

In [None]:
rep1 = c(rept,rept,rept)
rep1

In [None]:
rep2 = c(rep1,rept)
rep2

In [None]:
rep3 = rep2[c(1,3)]
rep3

In [None]:
rept = rep3[1]
rept

## Accessing Covariate Fields
Covariate fields such as type are stored as vectors and when you acess the field you will be returned a vector or list in the case of the Covariates themselves that is the same length as your covariates object.

In [None]:
rep3$cvs

In [None]:
rep3$type
rep3$signature

## Multiple Covariates
In the case that you want to create multiple covariates at a given time, you can pass a list of covariate tracks
to the cvs arguement and a vector of correct type to the other arguements.

In [None]:
multi_cov = Cov_Arr$new(cvs = list(replication_timing, replication_timing), name = c('replication1', 'replication2'),
                       type = c('numeric','numeric'), pad = c(0,20))
multi_cov

## fishHook Analysis using Covariates
The only difference is that when we intiate the class, we will need to pass in the Covariates. Note that annotating the covariates takes some extra time. You can speed this part up by using mc.cores (set number of cores) or with parameters we will cover in the next section.

In [None]:
fish = FishHook$new(targets = gene_targets, events = mutational_events, eligible = eligible, covariates = rept)
fish
fish$annotate(mc.cores = 3,verbose = F)
fish$score()
suppressWarnings(plot <- fish$qq_plot(columns = c('gene_name','count','q')))
plot

## fishHook Analysis using Covariates cont.
Covariates rely on our prior knowledge about mutational processes. However, there are likely facotrs that influence mutations that are not known as thus it would be impossible for us to define a covariate for them. However, all of the mutational evidence is present in the mutational landscape (events) and as such we can create a covariate from our events that we will call local mutational density that can model the mutational landscape in the area surrounding our targets. We can use the flag use_local_mut_density for this. The bin for this covariate be specified using local_mut_density_bin and is by default set to 1e6.

In [None]:
fish = FishHook$new(targets = gene_targets, events = mutational_events, eligible = eligible, covariates = rept,
                   use_local_mut_density = T, local_mut_density_bin = 1e5, verbose = F)

fish

fish$annotate(mc.cores = 3,verbose = F)
fish$score()
suppressWarnings(plot <- fish$qq_plot(columns = c('gene_name','count','q')))
plot


## FishHook Extras: Subsetting
The fishHook obeject can be subseted in the following way: fish[i,j,k,l] where: 
i is a vector indicating which targets to keep, 
j is a vector indicating which events to keep,
k is a vector indicating which covariates to keep, and
l is a vector indicating which eligible regions to keep
Here are some examples to play with using the previous fish object

In [None]:
fish
test1 = fish[1:10000,1:100000,c(1,5),1:30] 
test1

In [None]:
fish
test2 = fish[1:10000,1:100000,c(1)]                                                                                                                                                                                                                                                                                                                
test2

In [None]:
fish
test3 = fish[,1:100000,,1:30]                                                                                                                                                                                                                                                                                                                        
test3

In [None]:
fish
test4 = fish[1:10000]                                                                                                                                                                                                                                                                                                                                
test4

In [None]:
fish
test5 = fish[,1:100000]  
test5

In [None]:
fish
test6 = fish[,,1]
test6