-
Notifications
You must be signed in to change notification settings - Fork 9
Fast Higashi Usage
ruochiz edited this page Apr 22, 2022
·
8 revisions
Fast-Higashi shares the same set of input as Higashi, and thus uses the same code to parse them.
Run the following commands to process the input data (only needs to be run once).
from higashi.Higashi_wrapper import Higashi
higashi_model = Higashi(config_path)
higashi_model.process_data()
This function will finish the following tasks:
- generate a dictionary that'll map genomic bin loci to the node id.
- extract data from the data.txt and turn that into the format of hyperedges (triplets)
- create contact maps based on sparse scHi-C for visualization, baseline model, and generate node attributes
The above function is also equivalent to
higashi_model.generate_chrom_start_end()
higashi_model.extract_table()
higashi_model.create_matrix()
Before each step is executed, a message would be printed indicating the progress, which helps the debugging process.
from fasthigashi.FastHigashi_Wrapper import *
# Initialize the model
model = FastHigashi(config_path=config_path,
path2input_cache="/work/magroup/ruochiz/fast_higashi_git/pfc_500k",
path2result_dir="/work/magroup/ruochiz/fast_higashi_git/pfc_500k",
off_diag=100,
filter=False,
do_conv=False,
do_rwr=False,
do_col=False,
no_col=False)
# Pack from sparse mtx to tensors
model.prep_dataset()
**required arguments:**
config_path The path to the configuration JSON file that you created.
path2input_cache The path to the directory where the cached tensor file will be stored
path2result_dir The path to the directory where the cached tensor file will be stored
off_diag Maximum No of diagonals to consider. When set as 100, the 0-100th diagonal would be considered
filter Whether only use cells that pass the quality control standard to learn the meta-interactions, and then infers the embeddings for the result of the cells. (Works better for datasets where the contacts per cell metrics vary drastically)
do_conv Whether use linear convolution or not
do_rwr Whether use partial random walk with restart or not
do_col Whether use sqrt_vc normalization or not (recommended to be False), the program would automatically uses it when needed
no_col Whether force the program to not use sqrt_vc normalization (recommended to be False), the program would automatically uses it when needed
model.run_model(dim1=.6,
rank=256,
n_iter_parafac=1,
extra="")
**required arguments:**
dim1 The scaling factor to calculate the chromosome specific embeddings. For a chromosome with n bins, studied at a resolution of xbp, its chromosome specific embedding size will be int(dim1 * n * x / 1000000)
rank Size of the meta-embedding size (embeddings shared across all chromosomes)
n_iter_parafac Number of iterations for parafac within each Fast-Higashi iteration
extra Extra annotation strings when saving the results
embedding = model.fetch_cell_embedding(final_dim=256)
Higashi ~ ~ Wiki
- Input files
- Usage (API)
- [Fast-Higashi initialized Higashi (Under construction)]
- Runtime of Fast-Higashi