__ __ _ ____ _____
| |/ / / \ / ___||_ _|
| ' / / _ \ \___ \ | |
| . \ / ___ \ ___) | | |
|_|\_\/_/ \_\|____/ |_| K-talystic Automated Screening Taskflow
K-talysticFlow (KAST) is an open-source virtual screening pipeline and computational chemistry tool designed to accelerate drug discovery. Built with Python, DeepChem, and TensorFlow, it automates the evaluation of antituberculosis agents and other molecular compounds using advanced Deep Learning models.
Version: 1.0.0 (Stable)
Developed at: Laboratory of Molecular Modeling (LMM-UEFS) β Funded by CNPq
- β‘ Interactive CLI Menu β Easy step-by-step workflow
- π Parallel Processing β 5-10x faster with multi-core support
- π§ Deep Learning β DeepChem/TensorFlow neural networks
- π― K-Prediction Score β Proprietary ranking for predictions
- π§ͺ Validation Suite β ROC/AUC, Cross-Validation, Enrichment Factor, Similarity
- π₯οΈ One-Click Setup β Automated environment creation for Windows and Linux
KAST/
βββ bin/ # Pipeline scripts (1-5)
βββ data/ # Input SMILES files (.smi)
βββ results/ # Outputs (logs, models, reports)
βββ settings.py # Configuration & parallel processing
βββ main.py # Interactive menu
βββ setup.exe # Windows automated setup
βββ setup.sh # Linux setup script
βββ README.md # This file
- Python: 3.9+
- Conda: Required for environment setup
- Main packages: RDKit, DeepChem, TensorFlow, scikit-learn, pandas, numpy, joblib
- RAM: 4GB+ (8GB+ recommended for parallel processing)
Using setup.exe (Fully Automated)
1. Download setup.exe from releases
2. Double-click setup.exe in the KAST folder
3. The installer will:
β
Find your Conda installation automatically
β
Create environment 'ktalysticflow' (or update if exists)
β
Install all dependencies
β
Create desktop + Start Menu shortcuts
β
Generate run_kast.bat launcher
4. Click the desktop shortcut to launch KAST!
What does setup.exe do?
- Locates Conda: Searches standard installation paths (Anaconda3, miniconda3, mambaforge, Program Files, registry)
- Environment Setup: Creates or updates the
ktalysticflowconda environment fromenvironment.yml - Creates Launcher: Generates
run_kast.batthat automatically activates conda and launches KAST - Creates Shortcuts: Desktop and Start Menu shortcuts that run KAST with one click
- No Terminal Needed: Runs KAST directly without opening Anaconda Prompt or PowerShell
Quick Launch After Setup:
- Click desktop shortcut "K-talysticFlow 1.0.0"
- Or: Double-click
run_kast.batin the folder - Or: Start Menu β K-talysticFlow 1.0.0
Using setup.sh (Fully Automated)
# Make script executable
chmod +x setup.sh
# Run setup
./setup.shWhat does setup.sh do?
- Checks Conda: Verifies Conda is installed
- Creates Environment: Builds
ktalysticflowenvironment fromenvironment.yml - Installs Dependencies: All required packages automatically
After Setup:
# Activate environment
conda activate ktalysticflow
# Launch KAST
python main.py# Create environment from file
conda env create -f environment.yml -y
# Activate
conda activate ktalysticflow
# Verify installation (optional)
python bin/check_env.py
# Launch KAST
python main.pyK-talysticFlow supports multi-core parallel processing for 5-10x faster performance!
Edit settings.py (Section 12):
ENABLE_PARALLEL_PROCESSING = True # On/Off
N_WORKERS = None # None = auto-detect (RECOMMENDED)
PARALLEL_BATCH_SIZE = 100000 # Molecules per batch
PARALLEL_MIN_THRESHOLD = 10000 # Min dataset size for parallelWorker Options:
N_WORKERS = Noneβ Auto-detect optimal cores β RECOMMENDEDN_WORKERS = -1β Use all coresN_WORKERS = 4β Use exactly 4 coresN_WORKERS = 1β Disable parallel (sequential only)
python main.py
β [8] Advanced Options
β [3] Configure CPU Cores
β Choose auto or specific number
| Script | Speedup |
|---|---|
2_featurization.py |
5-10x |
4_3_tanimoto_similarity.py |
3-5x |
4_4_learning_curve.py |
4-8x |
5_0_featurize_for_prediction.py |
5-10x |
Automatic activation: Parallel mode only engages when dataset > 10,000 molecules.
# Launch interactive menu
python main.pyMenu Options:
- Prepare & Split Data
- Generate Fingerprints
- Train Model
- Evaluate (ROC/AUC, Cross-Val, etc.)
- Predict New Molecules
- View Results
- Check Data Status
- Advanced Tools (env check, parallel test, config)
Or run individual scripts:
python bin/1_preparation.py # Prepare data
python bin/2_featurization.py # Featurize molecules
python bin/3_create_training.py # Train model
python bin/4_0_evaluation_main.py # Evaluate
python bin/5_1_run_prediction.py # PredictAll results saved to results/ folder:
results/
βββ 01_train_set.csv # Training data
βββ 01_test_set.csv # Test data
βββ 4_0_evaluation_report.txt # Main metrics (AUC, accuracy)
βββ 4_1_cross_validation_results.txt # Cross-validation scores
βββ 4_2_enrichment_factor.txt # Enrichment analysis
βββ 4_3_tanimoto_similarity.txt # Similarity analysis
βββ 4_4_learning_curve.txt # Model learning progression
βββ 05_new_molecule_predictions.csv # Predicted molecules (K-Score ranked)
βββ plots/ # ROC, Learning Curves (PNG/PDF)
βββ logs/ # kast_YYYYMMDD.log
Q: Which setup should I use?
A: Windows β setup.exe (one-click). Linux β ./setup.sh. Both handle everything automatically.
Q: Do I need to type conda activate every time?
A: No! Shortcuts and run_kast.bat handle it automatically.
Q: How much faster is parallel processing?
A: 5-10x faster for large datasets (100K+ molecules). Automatic on/off based on dataset size.
Q: Where are my results?
A: All outputs in results/ folder (logs, models, plots, CSVs).
Q: My setup failed - what do I do?
A: Run python bin/check_env.py to diagnose. Check the logs in results/logs/.
Q: Can I change settings after setup?
A: Yes! Edit settings.py anytime or use menu option [8]β[3] for interactive config.
- π Full Documentation
- π€ LinkedIn
- π» GitHub Profile
- π« LMM Laboratory
- KΓ©ssia Souza Santos β GitHub | LinkedIn
- Advisor: Prof. Dr. Manoelito C. Santos Junior
- Lab: Laboratory of Molecular Modeling (LMM-UEFS)
- Funding: CNPq (Brazilian National Research Council)
MIT License β See LICENSE file
Questions or bugs? Open an issue or email kelsouzs.uefs@gmail.com