Skip to content

Denovo motif discovery

Anusri Pampari edited this page Oct 28, 2023 · 11 revisions

The command chrombpnet modisco_motifs will soon be deprecated! please use the command directly from the tfmodisco-lite repo. For ChromBPNet releases <=v0.1.3 the -n seqlets has been hardcoded to 50000, and not functioning as expected. If you wanted to run it with 1000000 seqlets it was not running properly.

To do motif discovery with tfmodisco-lite and to generates reports with TOMTOM motif annotations we provide the following command.

Usage

modisco motifs [-h] -i H5PY -n MAX_SEQLETS -op OUTPUT_PREFIX [-l N_LEIDEN] [-w WINDOW] [-v]

To get a representative summary of all the TF-motifs learnt by the model, we recommend running tfmodisco-lite on contribution scores from all the peaks and with the -n argument set to 1000000 (1 million). By setting -n to 1 million you are increasing representative power of modisco-lite (please note that this will also increase the computation time)

Input Format

required arguments:
  -i H5PY, --h5py H5PY  A legacy h5py file containing the one-hot encoded sequences and shap scores.
  -n MAX_SEQLETS, --max-seqlets MAX_SEQLETS
                        The maximum number of seqlets per metacluster.
  -op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
                        The path to the output file.

optional arguments:
  -l N_LEIDEN, --n-leiden N_LEIDEN
                        The number of Leiden clusterings to perform with different random seeds.
  -w WINDOW, --window WINDOW
                        The window surrounding the peak center that will be considered for motif discovery.
  -v, --verbose         Controls the amount of output from the code.

Output Format

Note: Note that prefix can include a directory path and prefix for the output file. Make sure that the directory in output_prefix exists.

The output from the command will include the following:

  • output_prefix_modisco.h5: A TF-Modisco lite object (for more information read - tfmodisco-lite)
  • output_prefix_reports: A directory containing the following objects
    • motifs.pdf: An pdf link to view the modisco report. This report will contain the CWM representations of the motifs consolidated from the contribution scores along with the three closes matching hits from TOMTOM.
    • motifs.html: An html link to view the modisco report. This report will contain the CWM representations of the motifs consolidated from the contribution scores along with the three closes matching hits from TOMTOM.
    • .png: Images corresponding to the CWMs summarized from the scores and the TOMTOM closes match PWMs.