-
Notifications
You must be signed in to change notification settings - Fork 34
Denovo motif discovery
The command chrombpnet modisco_motifs will soon be deprecated! please use the command directly from the tfmodisco-lite
repo.
For ChromBPNet releases <=v0.1.3 the -n seqlets has been hardcoded to 50000, and not functioning as expected. If you wanted to run it with 1000000 seqlets it was not running properly.
To do motif discovery with tfmodisco-lite and to generates reports with TOMTOM motif annotations we provide the following command.
modisco motifs [-h] -i H5PY -n MAX_SEQLETS -op OUTPUT_PREFIX [-l N_LEIDEN] [-w WINDOW] [-v]
To get a representative summary of all the TF-motifs learnt by the model, we recommend running tfmodisco-lite on contribution scores from all the peaks and with the -n
argument set to 1000000 (1 million). By setting -n
to 1 million you are increasing representative power of modisco-lite (please note that this will also increase the computation time)
required arguments:
-i H5PY, --h5py H5PY A legacy h5py file containing the one-hot encoded sequences and shap scores.
-n MAX_SEQLETS, --max-seqlets MAX_SEQLETS
The maximum number of seqlets per metacluster.
-op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
The path to the output file.
optional arguments:
-l N_LEIDEN, --n-leiden N_LEIDEN
The number of Leiden clusterings to perform with different random seeds.
-w WINDOW, --window WINDOW
The window surrounding the peak center that will be considered for motif discovery.
-v, --verbose Controls the amount of output from the code.
Note: Note that prefix can include a directory path and prefix for the output file. Make sure that the directory in output_prefix exists.
The output from the command will include the following:
-
output_prefix
_modisco.h5: A TF-Modisco lite object (for more information read - tfmodisco-lite) -
output_prefix
_reports: A directory containing the following objects-
motifs.pdf
: An pdf link to view the modisco report. This report will contain the CWM representations of the motifs consolidated from the contribution scores along with the three closes matching hits from TOMTOM. -
motifs.html
: An html link to view the modisco report. This report will contain the CWM representations of the motifs consolidated from the contribution scores along with the three closes matching hits from TOMTOM. -
.png
: Images corresponding to the CWMs summarized from the scores and the TOMTOM closes match PWMs.
-