Currently a proof-of-concept tool. With the introduction of the signalSet class and related methods, it allows for the generation and manipulation of continuous, per-basepair, signal-like representations from -seq related bed and bigwig files.
Current utilities include signal filtering, basic geometric feature extraction and an easy to use implementation for comparing between regions of generated signals from different -seq experiments, against a reference genome or both.
Features calculated from signals can then be integrated alongside readily-provided -seq average values. Emphasizing the use of diverse sources of information, our package interfaces with BSgenome for the integration of sequence information within regions deemed significant by signal-obtained metrics.
While still in a prototype stage, histoneSig will ideally allow for downstream analyses augmented by features obtained directly from our proposed signal representation, wrapped in efficient data objects that will trivialize interoperation with related analysis tools.
Get R 3.5.1.
Consult the sessionInfo.txt file within this repository to see what's being used in a development environment.
devtools::install_github('cmvcordova/histoneSig')
Starting from a narrow peak (or any other bedfile) and its corresponding bigwig file, we're able to:
## Load our peak or preferred bedfile
np_file <- import.np('path/to/npfile.bed')
## Set the ranges we just got obtained to parse relevant bigWig fragments
parsing_bw_ranges <- granges_chr_filter(np_file)
## Parse bigWig
bw_file <- import.bw(con = BigWigFile("path/to/bwfile.bigWig"),
selection = BigWigSelection(parsing_bw_ranges))
## Obtain signals from both of our files
your_first_signalset <- np_signals_from_bigwig(np_file, bw_file)
Behold, a signalSet
observation in all its splendor
The default method is a lowpass filter. Said
filter can take a fixed window_size
or an equal fraction of each signal in the
set as a fractional
window. You can also pass your own filter functions to
filter_signalSet()
(results may vary).
filtered_signalset <- filter_signalSet(your_first_signalset, fractional = 25)
Now, let's use plotSignal()
to compare the first signal of the signalSet
we've
obtained.
rawsignalplot <- plotSignal(your_first_signalset[1])
filteredsignalplot <- plotSignal(filtered_signalset[1])
gridExtra::grid.arrange(rawsignalplot, filteredsignalplot, ncol=2)
We may also illustrate detected peaks (blue) and valleys (red). These will then be used as references to calculate geometric features.
plotSignal(filtered_signalset[1], highlight="both")
Calculating base features from a given signalSet
is now possible; if posterior
interaction with GenomicRanges
objects is desired, we can set our wraptoGRanges
argument as TRUE
; else, we'll obtain a data.frame
. Here, we'll specify notable
valleys found in our signal and their associated geometric features: valley
width ("extension"), height, area and distances to next and previous peaks in
the provided bedfile.
base_features_from_signalsetlist(filtered_signalset,
section="valley", returns="positions", wraptoGRanges=TRUE)
Finally, for comparative analyses, we may create a feature table from a
GRanges
or signalSet
. Sequence information may be integrated, as a one-hot matrix, setting
the include_sequence
parameter to TRUE
. We'll obtain a neat representation
which may or may not include neat signal and geometric features in its
metadata, as a data.table
. This can then be easily interfaced with other
R libraries/models/packages.
So, from a given GRanges
(or vanilla signalSet
) of the following kind:
After running the following command,
build_feature_table(generic_GRanges, metadata_as_features = TRUE, include_sequence =
TRUE, refgenome = "BSgenome.Hsapiens.UCSC.hg38")
We'd obtain a data.table
like this
Nothing formal here just yet, just drop me a line
- César Miguel Valdez Córdova - Initial work - cmvcordova
See also the list of contributors who participated in this project. Currently empty; you could be the first one!
Pending - probably an MIT one in time.
- Ensenada Center for Scientific Research and Higher Education (CICESE) - Dr. Carlos Brizuela & Dr. Ivetth Corona, for their attentive guidance and valuable suggestions.
- Mexican National Council on Science and Technology (CONACyT) - Generously provided a scholarship for the development of my master's thesis, which gave way to histoneSig.
- Mexican Community of Bioinformatic Software Developers (CDSB) - Dr. Leonardo Collado-Torres & Dr. Alejandro Reyes, for welcoming me to the CDSB, inspiring me to create an R package alongside my thesis and general guidance.
- The internet - it can be a pretty nice place, after all.