Skip to content

kelsouzs/KAST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

63 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ K-talysticFlow (KAST) β€” Deep Learning Molecular Screening Pipeline

Python License: MIT Version Status Documentation LinkedIn GitHub

  __  __    _     ____  _____ 
  | |/ /   / \   / ___||_   _|
  | ' /   / _ \  \___ \  | |  
  | . \  / ___ \  ___) | | |  
  |_|\_\/_/   \_\|____/  |_|  K-talystic Automated Screening Taskflow

πŸ‘¨β€πŸ”¬ What is K-talysticFlow?

K-talysticFlow (KAST) is an open-source virtual screening pipeline and computational chemistry tool designed to accelerate drug discovery. Built with Python, DeepChem, and TensorFlow, it automates the evaluation of antituberculosis agents and other molecular compounds using advanced Deep Learning models.

Version: 1.0.0 (Stable)
Developed at: Laboratory of Molecular Modeling (LMM-UEFS) β€” Funded by CNPq


✨ Features

  • ⚑ Interactive CLI Menu β€” Easy step-by-step workflow
  • πŸš€ Parallel Processing β€” 5-10x faster with multi-core support
  • 🧠 Deep Learning β€” DeepChem/TensorFlow neural networks
  • 🎯 K-Prediction Score β€” Proprietary ranking for predictions
  • πŸ§ͺ Validation Suite β€” ROC/AUC, Cross-Validation, Enrichment Factor, Similarity
  • πŸ–₯️ One-Click Setup β€” Automated environment creation for Windows and Linux

πŸ“ Folder Structure

KAST/
β”œβ”€β”€ bin/                    # Pipeline scripts (1-5)
β”œβ”€β”€ data/                   # Input SMILES files (.smi)
β”œβ”€β”€ results/                # Outputs (logs, models, reports)
β”œβ”€β”€ settings.py             # Configuration & parallel processing
β”œβ”€β”€ main.py                 # Interactive menu
β”œβ”€β”€ setup.exe               # Windows automated setup
β”œβ”€β”€ setup.sh                # Linux setup script
└── README.md               # This file

βš™οΈ Requirements

  • Python: 3.9+
  • Conda: Required for environment setup
  • Main packages: RDKit, DeepChem, TensorFlow, scikit-learn, pandas, numpy, joblib
  • RAM: 4GB+ (8GB+ recommended for parallel processing)

πŸš€ Installation

πŸ“¦ Windows Users (Easiest Option!)

Using setup.exe (Fully Automated)

1. Download setup.exe from releases
2. Double-click setup.exe in the KAST folder
3. The installer will:
   βœ… Find your Conda installation automatically
   βœ… Create environment 'ktalysticflow' (or update if exists)
   βœ… Install all dependencies
   βœ… Create desktop + Start Menu shortcuts
   βœ… Generate run_kast.bat launcher
4. Click the desktop shortcut to launch KAST!

What does setup.exe do?

  • Locates Conda: Searches standard installation paths (Anaconda3, miniconda3, mambaforge, Program Files, registry)
  • Environment Setup: Creates or updates the ktalysticflow conda environment from environment.yml
  • Creates Launcher: Generates run_kast.bat that automatically activates conda and launches KAST
  • Creates Shortcuts: Desktop and Start Menu shortcuts that run KAST with one click
  • No Terminal Needed: Runs KAST directly without opening Anaconda Prompt or PowerShell

Quick Launch After Setup:

  • Click desktop shortcut "K-talysticFlow 1.0.0"
  • Or: Double-click run_kast.bat in the folder
  • Or: Start Menu β†’ K-talysticFlow 1.0.0

🐧 Linux Users

Using setup.sh (Fully Automated)

# Make script executable
chmod +x setup.sh

# Run setup
./setup.sh

What does setup.sh do?

  • Checks Conda: Verifies Conda is installed
  • Creates Environment: Builds ktalysticflow environment from environment.yml
  • Installs Dependencies: All required packages automatically

After Setup:

# Activate environment
conda activate ktalysticflow

# Launch KAST
python main.py

πŸ› οΈ Manual Setup (All Platforms)

# Create environment from file
conda env create -f environment.yml -y

# Activate
conda activate ktalysticflow

# Verify installation (optional)
python bin/check_env.py

# Launch KAST
python main.py

⚑ Parallel Processing

K-talysticFlow supports multi-core parallel processing for 5-10x faster performance!

Quick Configuration

Edit settings.py (Section 12):

ENABLE_PARALLEL_PROCESSING = True    # On/Off
N_WORKERS = None                     # None = auto-detect (RECOMMENDED)
PARALLEL_BATCH_SIZE = 100000         # Molecules per batch
PARALLEL_MIN_THRESHOLD = 10000       # Min dataset size for parallel

Worker Options:

  • N_WORKERS = None β†’ Auto-detect optimal cores βœ… RECOMMENDED
  • N_WORKERS = -1 β†’ Use all cores
  • N_WORKERS = 4 β†’ Use exactly 4 cores
  • N_WORKERS = 1 β†’ Disable parallel (sequential only)

Or Configure Interactively

python main.py
β†’ [8] Advanced Options
β†’ [3] Configure CPU Cores
β†’ Choose auto or specific number

Scripts with Parallel Support

Script Speedup
2_featurization.py 5-10x
4_3_tanimoto_similarity.py 3-5x
4_4_learning_curve.py 4-8x
5_0_featurize_for_prediction.py 5-10x

Automatic activation: Parallel mode only engages when dataset > 10,000 molecules.


🎯 Quick Start

# Launch interactive menu
python main.py

Menu Options:

  1. Prepare & Split Data
  2. Generate Fingerprints
  3. Train Model
  4. Evaluate (ROC/AUC, Cross-Val, etc.)
  5. Predict New Molecules
  6. View Results
  7. Check Data Status
  8. Advanced Tools (env check, parallel test, config)

Or run individual scripts:

python bin/1_preparation.py          # Prepare data
python bin/2_featurization.py        # Featurize molecules
python bin/3_create_training.py      # Train model
python bin/4_0_evaluation_main.py    # Evaluate
python bin/5_1_run_prediction.py     # Predict

πŸ“Š Outputs

All results saved to results/ folder:

results/
β”œβ”€β”€ 01_train_set.csv                 # Training data
β”œβ”€β”€ 01_test_set.csv                  # Test data
β”œβ”€β”€ 4_0_evaluation_report.txt        # Main metrics (AUC, accuracy)
β”œβ”€β”€ 4_1_cross_validation_results.txt # Cross-validation scores
β”œβ”€β”€ 4_2_enrichment_factor.txt        # Enrichment analysis
β”œβ”€β”€ 4_3_tanimoto_similarity.txt      # Similarity analysis
β”œβ”€β”€ 4_4_learning_curve.txt           # Model learning progression
β”œβ”€β”€ 05_new_molecule_predictions.csv  # Predicted molecules (K-Score ranked)
β”œβ”€β”€ plots/                           # ROC, Learning Curves (PNG/PDF)
└── logs/                            # kast_YYYYMMDD.log

❓ FAQ

Q: Which setup should I use?
A: Windows β†’ setup.exe (one-click). Linux β†’ ./setup.sh. Both handle everything automatically.

Q: Do I need to type conda activate every time?
A: No! Shortcuts and run_kast.bat handle it automatically.

Q: How much faster is parallel processing?
A: 5-10x faster for large datasets (100K+ molecules). Automatic on/off based on dataset size.

Q: Where are my results?
A: All outputs in results/ folder (logs, models, plots, CSVs).

Q: My setup failed - what do I do?
A: Run python bin/check_env.py to diagnose. Check the logs in results/logs/.

Q: Can I change settings after setup?
A: Yes! Edit settings.py anytime or use menu option [8]β†’[3] for interactive config.


πŸ”— Links


πŸ‘₯ Authors

  • KΓ©ssia Souza Santos β€” GitHub | LinkedIn
  • Advisor: Prof. Dr. Manoelito C. Santos Junior
  • Lab: Laboratory of Molecular Modeling (LMM-UEFS)
  • Funding: CNPq (Brazilian National Research Council)

πŸ“œ License

MIT License β€” See LICENSE file


Questions or bugs? Open an issue or email kelsouzs.uefs@gmail.com

About

K-talysticFlow (KAST): Automated Deep Learning Pipeline for Virtual Screening and Drug Discovery

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors