DeepSEQreen is a PyTorch-based CPI prediction toolkit with a unified data pipeline and a modular framework. CDPN-VS is a barebone version of DeepSEQreen for demonstrative purposes.
Install PyTorch according to instructions: https://pytorch.org/get-started/locally/.
Install other requirements:
pip install -r requirements.txtTrain model with a preset model configuration from configs/preset/ and a sweep (experiment) configuration from configs/sweep/:
# Using preset model `graph_dta_gin` for GraphDTA-GIN and experiment `demo` for pre-split demo dataset.
# Train on GPU
python -m deepseqreen.train preset=graph_dta_gin sweep=demo trainer=gpu tags=[demo,graph_dta_gin]python -m deepseqreen.train preset=graph_dta_gin sweep=demo ckpt_path=/path/to/ckpt/name.ckpt tags=[demo,graph_dta_gin]Note: Checkpoint can be either path or URL.
python -m deepseqreen.test preset=graph_dta_gin data.data_file=/path/to/test.csv ckpt_path=/path/to/ckpt_name.ckpt tags=[demo,graph_dta_gin]DeepSEQreen requires input data files in CSV format similar to those from TDC. The CSV file should contain the following columns:
ID1: The identifier for the compounds (optional).X1: The SMILES for the compounds.ID2: The identifier for the targets (optional).X2: The sequence for the targets.Y: The label for the interaction (e.g., binary interaction, binding affinity, etc.).
ID1,X1,ID2,X2,Y
DB06075,C[C@]1(O)C[C@@H](c2nc(-c3ccc4ccc(-c5ccccc5)nc4c3)c3c(N)nccn32)C1,Q9UPZ9,MNRYTTIRQLGDGTYGSVLLGRSIESGELIAIKKMKRKFYSWEECMNLREVKSLKKLNHANVVKLKEVIRENDHLYFIFEYMKENLYQLIKERNKLFPESAIRNIMYQILQGLAFIHKHGFFHRDLKPENLLCMGPELVKIADFGLAREIRSKPPYTDYVSTRWYRAPEVLLRSTNYSSPIDVWAVGCIMAEVYTLRPLFPGASEIDTIFKICQVLGTPKKTDWPEGYQLSSAMNFRWPQCVPNNLKTLIPNASSEAVQLLRDMLQWDPKKRPTASQALRYPYFQVGHPLGSTTQNLQDSEKPQKGILEKAGPPPYIKPVPPAQPPAKPHTRISSRQHQASQPPLHLTYPYKAEVSRTDHPSHLQEDKPSPLLFPSLHNKHPQSKITAGLEHKNGEIKPKSRRRWGLISRSTKDSDDWADLDDLDFSPSLSRIDLKNKKRQSDDTLCRFESVLDLKPSEPVGTGNSAPTQTSYQRRDTPTLRSAAKQHYLKHSRYLPGISIRNGILSNPGKEFIPPNPWSSSGLSGKSSGTMSVISKVNSVGSSSTSSSGLTGNYVPSFLKKEIGSAMQRVHLAPIPDPSPGYSSLKAMRPHPGRPFFHTQPRSTPGLIPRPPAAQPVHGRTDWASKYASRR,0
DB06075,C[C@]1(O)C[C@@H](c2nc(-c3ccc4ccc(-c5ccccc5)nc4c3)c3c(N)nccn32)C1,Q9Y2K2,MAAAAASGAGGAAGAGTGGAGPAGRLLPPPAPGSPAAPAAVSPAAGQPRPPAPASRGPMPARIGYYEIDRTIGKGNFAVVKRATHLVTKAKVAIKIIDKTQLDEENLKKIFREVQIMKMLCHPHIIRLYQVMETERMIYLVTEYASGGEIFDHLVAHGRMAEKEARRKFKQIVTAVYFCHCRNIVHRDLKAENLLLDANLNIKIADFGFSNLFTPGQLLKTWCGSPPYAAPELFEGKEYDGPKVDIWSLGVVLYVLVCGALPFDGSTLQNLRARVLSGKFRIPFFMSTECEHLIRHMLVLDPNKRLSMEQICKHKWMKLGDADPNFDRLIAECQQLKEERQVDPLNEDVLLAMEDMGLDKEQTLQSLRSDAYDHYSAIYSLLCDRHKRHKTLRLGALPSMPRALAFQAPVNIQAEQAGTAMNISVPQVQLINPENQIVEPDGTLNLDSDEGEEPSPEALVRYLSMRRHTVGVADPRTEVMEDLQKLLPGFPGVNPQAPFLQVAPNVNFMHNLLPMQNLQPTGQLEYKEQSLLQPPTLQLLNGMGPLGRRASDGGANIQLHAQQLLKRPRGPSPLVTMTPAVPAVTPVDEESSDGEPDQEAVQSSTYKDSNTLHLPTERFSPVRRFSDGAASIQAFKAHLEKMGNNSSIKQLQQECEQLQKMYGGQIDERTLEKTQQQHMLYQQEQHHQILQQQIQDSICPPQPSPPLQAACENQPALLTHQLQRLRIQPSSPPPNHPNNHLFRQPSNSPPPMSSAMIQPHGAASSSQFQGLPSRSAIFQQQPENCSSPPNVALTCLGMQQPAQSQQVTIQVQEPVDMLSNMPGTAAGSSGRGISISPSAGQMQMQHRTNLMATLSYGHRPLSKQLSADSAEAHSLNVNRFSPANYDQAHLHPHLFSDQSRGSPSSYSPSTGVGFSPTQALKVPPLDQFPTFPPSAHQQPPHYTTSALQQALLSPTPPDYTRHQQVPHILQGLLSPRHSLTGHSDIRLPPTEFAQLIKRQQQQRQQQQQQQQQQEYQELFRHMNQGDAGSLAPSLGGQSMTERQALSYQNADSYHHHTSPQHLLQIRAQECVSQASSPTPPHGYAHQPALMHSESMEEDCSCEGAKDGFQDSKSSSTLTKGCHDSPLLLSTGGPGDPESLLGTVSHAQELGIHPYGHQPTAAFSKNKVPSREPVIGNCMDRSSPGQAVELPDHNGLGYPARPSVHEHHRPRALQRHHTIQNSDDAYVQLDNLPGMSLVAGKALSSARMSDAVLSQSSLMGSQQFQDGENEECGASLGGHEHPDLSDGSQHLNSSCYPSTCITDILLSYKHPEVSFSMEQAGV,0
DB06075,C[C@]1(O)C[C@@H](c2nc(-c3ccc4ccc(-c5ccccc5)nc4c3)c3c(N)nccn32)C1,Q76MJ5,MASAVRGSRPWPRLGLQLQFAALLLGTLSPQVHTLRPENLLLVSTLDGSLHALSKQTGDLKWTLRDDPVIEGPMYVTEMAFLSDPADGSLYILGTQKQQGLMKLPFTIPELVHASPCRSSDGVFYTGRKQDAWFVVDPESGETQMTLTTEGPSTPRLYIGRTQYTVTMHDPRAPALRWNTTYRRYSAPPMDGSPGKYMSHLASCGMGLLLTVDPGSGTVLWTQDLGVPVMGVYTWHQDGLRQLPHLTLARDTLHFLALRWGHIRLPASGPRDTATLFSTLDTQLLMTLYVGKDETGFYVSKALVHTGVALVPRGLTLAPADGPTTDEVTLQVSGEREGSPSTAVRYPSGSVALPSQWLLIGHHELPPVLHTTMLRVHPTLGSGTAETRPPENTQAPAFFLELLSLSREKLWDSELHPEEKTPDSYLGLGPQDLLAASLTAVLLGGWILFVMRQQQPQVVEKQQETPLAPADFAHISQDAQSLHSGASRRSQKRLQSPSKQAQPLDDPEAEQLTVVGKISFNPKDVLGRGAGGTFVFRGQFEGRAVAVKRLLRECFGLVRREVQLLQESDRHPNVLRYFCTERGPQFHYIALELCRASLQEYVENPDLDRGGLEPEVVLQQLMSGLAHLHSLHIVHRDLKPGNILITGPDSQGLGRVVLSDFGLCKKLPAGRCSFSLHSGIPGTEGWMAPELLQLLPPDSPTSAVDIFSAGCVFYYVLSGGSHPFGDSLYRQANILTGAPCLAHLEEEVHDKVVARDLVGAMLSPLPQPRPSAPQVLAHPFFWSRAKQLQFFQDVSDWLEKESEQEPLVRALEAGGCAVVRDNWHEHISMPLQTDLRKFRSYKGTSVRDLLRAVRNKKHHYRELPVEVRQALGQVPDGFVQYFTNRFPRLLLHTHRAMRSCASESLFLPYYPPDSEARRPCPGATGR,0