# Calculating default variables

In this notebook, we'll calculate descriptor (variable) sets 1-4:
1. Canonical (UniProt) sequence variables.
2. Structure (PDB) sequence variables.
3. Structure variables (angles, distances, etc.)
4. Ligand variables. 

In [2]:
import logging
import warnings
from random import sample
from pathlib import Path

# Supress import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    from kinactive import DefaultFeatures, DB, DBConfig

Provide general configuration.

In [3]:
logging.basicConfig(level=logging.INFO)

In [4]:
N_PROC = 20
N_CHAINS = 20  # Restrict the number of chains for demonstration

BASE = Path('../data/variable_sets')
BASE.mkdir(exist_ok=True)

DB_PATH = Path('../data/db_v3')

In [5]:
paths = list(DB_PATH.glob('*'))
if N_CHAINS is not None:
    # Sample random chains to calculate the variables on.
    paths = sample(paths, N_CHAINS)

In [6]:
db = DB(DBConfig(io_cpus=N_PROC))
chains = db.load(paths)

INFO:kinactive.db:Got 20 initial paths to read


Reading ChainSequence: 0it [00:00, ?it/s]

Reading ChainStructure: 0it [00:00, ?it/s]

Recovering ancestry for sequences and structures:   0%|          | 0/20 [00:00<?, ?it/s]

INFO:kinactive.db:Parsed 20 `Chain`s


In [7]:
vs = DefaultFeatures()
?vs.calculate_all_vs

[0;31mSignature:[0m
[0mvs[0m[0;34m.[0m[0mcalculate_all_vs[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mchains[0m[0;34m:[0m [0mcollections[0m[0;34m.[0m[0mabc[0m[0;34m.[0m[0mSequence[0m[0;34m[[0m[0mlXtractor[0m[0;34m.[0m[0mcore[0m[0;34m.[0m[0mchain[0m[0;34m.[0m[0mchain[0m[0;34m.[0m[0mChain[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmap_name[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'PK'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnum_proc[0m[0;34m:[0m [0mint[0m [0;34m|[0m [0;32mNone[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbase[0m[0;34m:[0m [0mpathlib[0m[0;34m.[0m[0mPath[0m [0;34m|[0m [0;32mNone[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moverwrite[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m

In [8]:
vs_res = vs.calculate_all_vs(
    chains.collapse_children(), num_proc=N_PROC, base=BASE, overwrite=True
)

INFO:kinactive.features:Calculating sequence variables on canonical seqs


Aggregating variables: 0it [00:00, ?it/s]

Staging calculations:   0%|          | 0/20 [00:00<?, ?it/s]

Calculating variables: 0it [00:00, ?it/s]

INFO:kinactive.features:Resulting shape: (20, 799)
INFO:kinactive.features:Saved defaults_can_seq_vs.csv to ../data/variable_sets
INFO:kinactive.features:Calculating sequence variables on structure seqs


Aggregating variables: 0it [00:00, ?it/s]

Staging calculations:   0%|          | 0/186 [00:00<?, ?it/s]

Calculating variables: 0it [00:00, ?it/s]

INFO:kinactive.features:Resulting shape: (186, 799)
INFO:kinactive.features:Saved defaults_str_seq_vs.csv to ../data/variable_sets
INFO:kinactive.features:Calculating ligand variables


Aggregating variables: 0it [00:00, ?it/s]

Staging calculations:   0%|          | 0/186 [00:00<?, ?it/s]

Calculating variables: 0it [00:00, ?it/s]

INFO:kinactive.features:Resulting shape: (186, 793)
INFO:kinactive.features:Saved default_lig_vs.csv to ../data/variable_sets
INFO:kinactive.features:Calculating structure variables


Aggregating variables: 0it [00:00, ?it/s]

Staging calculations:   0%|          | 0/186 [00:00<?, ?it/s]

Calculating variables: 0it [00:00, ?it/s]

INFO:kinactive.features:Resulting shape: (186, 1693)
INFO:kinactive.features:Saved default_str_vs.csv to ../data/variable_sets
INFO:kinactive.features:Finished calculations


Calculating all four sets on all domains takes ~1h on 20 cores.