# Makefile preparation
This notebooks prepares the makefiles needed for the context specific and dynamic gene regulatory network (GRN) inference pipeline.

1. Generate makefiles from the template using helper script `makefile_template.sh`

Usage:

In [1]:
!dictys_helper makefile_template.sh -h

Usage: makefile_template.sh [-h] [makefile1.mk ...]
Generate network inference pipeline makefiles in current working folder from template
  makefile1.mk ...    Name of each makefile to generate from template.
                      If omitted, all available makefiles will be generated.
  -h                  Display this help


In [2]:
%%bash
rm -Rf ../makefiles
mkdir ../makefiles
cd ../makefiles
dictys_helper makefile_template.sh common.mk config.mk env_none.mk static.mk


2. Update `config.mk` based on the dataset and your computing platform using helper script `makefile_update.sh`

Usage:

In [3]:
!dictys_helper makefile_update.py -h

usage: makefile_update.py [-h] makefile_path json_string

Updates makefile variable assignments with values provided in json string

positional arguments:
  makefile_path  Path of makefile to update and rewritten.
  json_string    Update to be made in json format:
                 {"variable_name":"new_value"}. Variable names can have "+"
                 suffix to indicate appending to current value.

optional arguments:
  -h, --help     show this help message and exit


You should edit the follow variable values below based on your own computing platform:
* `NTH`: The number of cores to use for each job. Note the total used is further multiplied by the number of parallel jobs to run (see notebooks for network inference)
* `DEVICE`: The device to use for pytorch. You need to have a compatible GPU and specify a proper CUDA version during Dictys installation to be able to use 'cuda:0', etc. **Note: using CPU may take days or over a week when you run [3-static-inference.ipynb](3-static-inference.ipynb) for this example.**

In [4]:
!dictys_helper makefile_update.py ../makefiles/config.mk '{"ENVMODE": "none", "NTH": "4", "DEVICE": "cuda:0", "GENOME_MACS2": "hs", "JOINT": "1"}'


Other variables:
* `ENVMODE`: Mode to run Dictys. `none` means Dictys can be run directly without additional environment entrance steps.
* `GENOME_MACS2`: Genome size used by macs2. Can be a number or shortcuts like hs.
* `JOINT`: Whether the dataset is a joint quantification of transcriptome and chromatin accessibility. Affects multiple preprocessing steps like cell subsetting, cell removal, and quality checks.

See the configuration makefile

In [5]:
!cat ../makefiles/config.mk

# Lingfei Wang, 2022. All rights reserved.
#This file contains parameters for whole run and individual steps to be edited for your dataset
#This file should be edited to configure the run
#This file should NOT be directly used for any run with `makefile -f` 

############################################################
# Run environment settings
############################################################
#Which environment to use, corresponding to env_$(ENVMODE).mk file
ENVMODE=none
#Maximum number of CPU threads for each job
#This is only nominated and passed through to other softwares without any guarantee.
NTH=4
#Device name for pyro/pytorch
#Note: cuda devices other than cuda:0 could be incompatible with singularity environment
DEVICE=cuda:0

############################################################
# Dataset settings
############################################################

#Genome size for Macs2, accept shortcuts like mm & hs
GENOME_MACS2=hs
#Whethe

3. Checks of input data before context specific GRN inference run

Checks are strongly recommended and can reduce inference pipeline reruns due to errors and therefore save you running time. If you find any error or unexpected output here, change the input files in `data` or `makefiles` folder by regenerating the relevant files in earlier steps, or by editing them by hand. After that, rerun the checks here to make sure everything is expected.

In [6]:
!cd .. && dictys_helper makefile_check.py

Joint profile: True
Found 11898 cells with RNA profile
Found 24036 genes with RNA profile
Found 11898 cells with ATAC profile
Found 769 motifs
Found 678 TFs
Found 461 TFs in current dataset
Missing 217 TFs in current dataset: ANDR,AP2A,AP2B,AP2C,AP2D,ARI3A,ARI5B,ATF6A,BARH1,BARH2,BC11A,BHA15,BHE22,BHE23,BHE40,BHE41,BMAL1,BRAC,BSH,COE1,COT1,COT2,CR3L1,CR3L2,ERR1,ERR2,ERR3,EVI1,GCR,HEN1,HMBX1,HME1,HME2,HNF6,HTF4,HXA1,HXA10,HXA11,HXA13,HXA2,HXA5,HXA7,HXA9,HXB1,HXB13,HXB2,HXB3,HXB4,HXB6,HXB7,HXB8,HXC10,HXC11,HXC12,HXC13,HXC6,HXC8,HXC9,HXD10,HXD11,HXD12,HXD13,HXD3,HXD4,HXD8,HXD9,ITF2,KAISO,MCR,MGAP,MLXPL,MYBA,MYBB,NDF1,NDF2,NF2L1,NF2L2,NFAC1,NFAC2,NFAC3,NFAC4,NGN2,NKX21,NKX22,NKX23,NKX25,NKX28,NKX31,NKX32,NKX61,NKX62,ONEC2,ONEC3,OZF,P53,P5F1B,P63,P73,PEBB,PHX2A,PHX2B,PIT1,PKNX1,PLAL1,PO2F1,PO2F2,PO2F3,PO3F1,PO3F2,PO3F3,PO3F4,PO4F1,PO4F2,PO4F3,PO5F1,PO6F1,PO6F2,PRD14,PRGR,RHXF1,RORG,RX,SMCA1,SMCA5,SRBP1,SRBP2,STA5A,STA5B,STF1,SUH,TF2LX,TF65,TF7L1,TF7L2,TFE2,THA,THA11,THB,TWST1,TYY1,TYY2,UBIP