Faciliates easier loading and manipulation of genomic data for use with PGV andCase Report.
- To install directly from github:
devtools::install_github("mskilab-org/skilift")
- To install from source:
git clone https://github.com/mskilab-org/skilift.git
devtools::load_all("path/to/clone")
The Skilift class manages a database of patient metadata and genomic plot data. It provides an interface to load, update, query, and manipulate the datafiles.json of PGV programmatically. This object is only necessary if you want to push data to PGV.
The key methods allow converting to/from JSON for data storage, validating the data, adding/removing plots, and generating plot JSON for visualization. The metadata and plots are stored as data tables for easy manipulation.
The conversion methods can also be used in isolation without having to instantiate a Skilift object. See tests (plot generation methods section) for runnable code. This is particularly useful for generating JSON files for use with Case Reports.
The following methods are for the Skilift class:
-
initialize(datafiles_json_path, datadir, settings)
: Initialize a new Skilift object -
update_datafiles_json()
: Update JSON data files on disk -
to_datatable(filter)
: Convert to a single data table, apply optional filter -
add_plots(new_plots_dt)
: Add new plots to PGV -
remove_plots(plots_to_remove_dt)
: Remove plots -
list_higlass_tilesets(endpoint, username, password)
: List bigwig tilesets on higlass -
validate()
: Validate metadata and plots
Methods for converting to JSON:
plot_metadata is a data table containing plot metadata and should have the following columns:
- patient.id
: Patient identifier (e.g '0124')
- source
: output datafile json filename (e.g complex.json)
- x
: filepath to the RDS object with the raw data (e.g ~/gg.rds)
- overwrite
: Whether to overwrite existing plot files
- ref
: Reference genome (e.g 'hg38')
datadir is the parent directory of the sample directory where the raw data file is stored
e.g if the data file is stored in data/0124/gg.rds
then the datadir is data/
settings is the path to the settings file (by default it will use the one included with the package). It's required for parsing seqlengths
-
metadata
: Data table containing patient metadata -
plots
: Data table containing plot metadata -
datadir
: Path to data directory -
settings
: Path to settings file -
higlass_metadata
: List containing endpoint, username, and password to Higlass server
pgvdb <- Skilift$new(datafiles_json_path, datadir, settings)
Create a new Skilift object by passing the path to the datafiles.json, data directory, and settings file.
pgvdb$update_datafiles_json()
Update the datafiles JSON on disk with the current metadata and plots.
dt <- pgvdb$to_datatable(filter = c("patient.id", "HCC"))
Convert metadata and plots to a single data table, applying an optional filter.
pgvdb$add_plots(plots_to_add, cores=2)
Add new plots by passing a data.table
containing required columns:
patient.id
: Patient identifierx
: can be- list(list(server="", uuid="")): a list of the [server, uuid]
- list(GRanges), list(gWalk), list(gGraph): a list containing an object
- Filepath to an RDS object
visible
: Whether plot is visible
If required columns are missing, an empty table will be returned with the
required column names. The type
column (indiciating the type of plot:
scatterplot, genome, walk, bigwig, etc) is derived from x
if not supplied by
the user. For GRanges
, the type
and field
(the name of the score column)
must be specified by the user (either bigwig or scatterplot). The source
column (indicating the name of the plot file inside the pgv data directory) is
derived from the type
, if not supplied by the user.
Unless an overwrite
column is set, it will not overwrite existing plot files,
instead it will just increment the filename of the new plot file (i.e
coverage.json -> coverage2.json). For GRanges containing bigwig data, the
GRange will be converted into a bigwig (with a unique name) and automaticaly
uploaded to the default mskilab higlass server (see above for changing to a
different server). If overwrite is set to TRUE, the converted bigwig file will
be deleted after uploading.
The cores
parameter determines how many cores to use for parallel execution.
By default, it will not run in parallel (i.e use 1 core).
pgvdb$remove_plots(plots_to_remove, delete = FALSE)
Remove plots by passing a data.table
with either:
patient.id
: Remove all plots for a patientpatient.id
andsource
: Remove specific plots
Or alternatively with patient.id
, server
, and uuid
.
The delete
flag determines whether to also delete the source data files.
Passing just patient.id
will remove all plots for that patient.
pgvdb$validate()
Validate metadata and plot data, removing invalid entries. Is automatically called when adding/removing plots. Useful if you make manual changes to the pgvdb plots or metadata.
pgvdb$init_pgv(pgv_dir, build=FALSE)
Initialize a pgv instance loaded with the data in pgvdb at pgv_dir
. This
method will clone/pull the and
create symlinks in the pgv data directory that point to your pgvdb data.
If the build
flag is set to TRUE
it will build pgv instead of launching a
local instance (useful when running on a remote server or hpc). In that case,
you should set pgv_dir
to be a directory inside whichever directory is served
by your remote server (e.g public_html
).
pgvdb$list_higlass_tilesets(endpoint, username, password)
Return all bigwig tileset info on the higlass server as a data.table.
endpoint <- "http://10.1.29.225:8000/api/v1/tilesets/" # dev endpoint
tilesets <- pgvdb$list_higlass_tilesets(
endpoint,
username = "username_here",
password = "password_here"
)
pgvdb$upload_to_higlass(endpoint, datafile, filetype, datatype, coordSystem, name, username, password)
Upload a file to the higlass server. Will also add the file to the current
pgvdb instance. Note that you will need to upload a chromSizes.tsv
file
first, before uploading other files.
endpoint <- "http://10.1.29.225:8000/api/v1/tilesets/" # dev endpoint
pgvdb$upload_to_higlass(
endpoint,
datafile = system.file("extdata", "test_data", "chromSizes.tsv", package = "Skilift"),
filetype = "chromsizes-tsv",
datatype = "chromsizes",
coordSystem = "hg38",
name = "hg38",
username = "username_here",
password = "password_here"
)
pgvdb$upload_to_higlass(
endpoint,
datafile = system.file("extdata", "test_data", "higlass_test_bigwig.bw", package = "Skilift"),
name = "test_bigwig",
filetype = "bigwig",
datatype = "vector",
coordSystem = "hg38",
username = "username_here",
password = "password_here"
)
pgvdb$delete_from_higlass(endpoint, uuid, username, password)
Delete a file from the higlass server. Will also remove the plot from the pgvdb instance. Tilesets can only be deleted by their uuid.