# TODOS:
- Mechanism for selecting apo/pred; right now we are selecting at most one apo/pred by `sort_score`
- Relax with requirement of presence of apo and holo
- Agree on what out feature input and output should look like

# Dataset and Loader Tutorial

## Setup

### Installation

`plinder` is available on *PyPI*.

```
pip install plinder
```

### Environment variable configuration
:::{note}
We need to set environment variables to point to the release and iteration of choice.
For the sake of demonstration, this will be set to point to a smaller tutorial example
dataset, which are `PLINDER_RELEASE=2024-06` and `PLINDER_ITERATION=tutorial`.
:::
:::{note}

## Getting the configuration

At first we get the configuration to check that all parameters are correctly set. 
In the snippet below, we will check, if the local and remote *PLINDER* paths point to
the expected location.

In [164]:
import plinder.core.utils.config

cfg = plinder.core.get_config()
print(f"local cache directory: {cfg.data.plinder_dir}")
print(f"remote data directory: {cfg.data.plinder_remote}")

local cache directory: /Users/yusuf/.local/share/plinder/2024-06/v2
remote data directory: gs://plinder/2024-06/v2


## Data ecosystem overview
This tutorial assumes user have downloaded _PLINDER_ dataset before now. While the examples will run without users doing anything, we encourage users to download the data for performance sake. _PLINDER_ data hierarchy is shown below. We have organized this tutorial to follow this same hierarchy from ground up
![image](../static/asset/data/plinder_data_hierarchy.png)

## 0. Structure files

After download all files will be store locally at `~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/systems`. The current default is `PLINDER_RELEASE=2024-06` and `PLINDER_ITERATION-v2`

There, we have sub-folders that corresponds to each systems. In each sub-folder, we have:
- Receptor PDB: receptor.cif and receptor.pdb
- Ligand SDF's: `<biounit_instance_id>.<chain_id>.sdf`. For complexes with more than one ligands, all the sdfs are saved
- Sequence fasta: sequence.fasta
For more information on the file organization, see "<link-to-dataset-tutorial>"

## 1. Structure Python Abstraction
To make interacting with our data seamless, {class} class Structure, a pydantic data class that:
- Loads all the structure files + smiles 
- Gets coordinates
- Featurizes residues and atoms of associated protein and ligand molecules
- Masks molecules to account for resolved vs unresolved part 
To interact with the example, do the following:

### Load the structure for a given system_id
For this purpose we will use `"1avd__1__1.A__1.C"` as our example system id.

In [173]:
from plinder.core.structure.structure import Structure
from plinder.core import PlinderSystem
from pathlib import Path

input_smiles = "CC(=O)N[C@@H]1[C@H]([C@@H]([C@H](O[C@H]1O)CO)O)O" # Need to account for unresolved part of the ligand
input_sdf = Path(cfg.data.plinder_dir)/"systems/1avd__1__1.A__1.C/ligand_files/1.C.sdf"
system_id = "1avd__1__1.A__1.C"
protein_structure_path = Path(cfg.data.plinder_dir)/"systems/1avd__1__1.A__1.C/receptor.cif"
input_sequence_path = Path(cfg.data.plinder_dir)/"systems/1avd__1__1.A__1.C/sequences.fasta"
list_ligand_sdf_and_input_smiles = [
    (input_sdf,input_smiles)]


holo_struc = Structure.load_structure(
    id=system_id,
    protein_path=protein_structure_path,
    protein_sequence=input_sequence_path,
    list_ligand_sdf_and_input_smiles=list_ligand_sdf_and_input_smiles

    )

### List fields
We list all fields and their `FieldInfo` to show which ones are required. `id`, `protein_path` and `protein_sequence` are required. Everything else is optionally. Particularly worth mentioning is the decision to make `list_ligand_sdf_and_input_smiles` optional; this is because ligand will not be availbale in apo and predicted structures.

In [182]:
holo_struc.model_fields

{'id': FieldInfo(annotation=str, required=True),
 'protein_path': FieldInfo(annotation=Path, required=True),
 'protein_sequence': FieldInfo(annotation=Path, required=True),
 'list_ligand_sdf_and_input_smiles': FieldInfo(annotation=Union[list[tuple[Path, str]], NoneType], required=False, default=None),
 'protein_atom_array': FieldInfo(annotation=Union[AtomArray, NoneType], required=False, default=None),
 'ligand_mols': FieldInfo(annotation=Union[dict[str, tuple[Mol, Mol, tuple[ndarray[Any, dtype[+_ScalarType_co]], ndarray[Any, dtype[+_ScalarType_co]]], Mol, tuple[ndarray[Any, dtype[+_ScalarType_co]], ndarray[Any, dtype[+_ScalarType_co]]], tuple[ndarray[Any, dtype[+_ScalarType_co]], ndarray[Any, dtype[+_ScalarType_co]]]]], NoneType], required=False, default=None),
 'add_ligand_hydrogens': FieldInfo(annotation=bool, required=False, default=False),
 'structure_type': FieldInfo(annotation=str, required=False, default='holo')}

### List structure protein properties
Show protein related properties

In [175]:
for property in holo_struc.get_properties():
    if "protein" in property:
        print(property)

protein_backbone_mask
protein_calpha_coords
protein_calpha_mask
protein_chain_ordered
protein_chains
protein_coords
protein_n_atoms
protein_sequence_from_structure
protein_structure_atom_names
protein_structure_b_factor
protein_structure_residue_names
protein_structure_residues
protein_structure_sequence_fasta
protein_structure_tokenized_sequence


#### Protein backbone mask
This is a boolean mask that can be used to select backbone atoms from biotite `AtomArray`. The indices of `True` corresponds to backbone indices.

In [190]:
holo_struc.protein_backbone_mask

array([ True,  True,  True, False, False, False, False, False, False,
        True,  True,  True, False, False, False,  True,  True,  True,
       False, False, False,  True,  True,  True, False, False, False,
       False, False,  True,  True,  True, False, False, False, False,
        True,  True,  True, False,  True,  True,  True, False, False,
       False, False, False, False,  True,  True,  True, False, False,
       False, False, False, False, False, False, False, False, False,
        True,  True,  True, False, False, False, False,  True,  True,
        True, False, False, False, False, False,  True,  True,  True,
       False, False, False, False, False,  True,  True,  True, False,
       False, False, False, False,  True,  True,  True, False,  True,
        True,  True, False, False, False,  True,  True,  True, False,
       False, False, False, False,  True,  True,  True, False, False,
       False, False, False,  True,  True,  True, False, False, False,
       False,  True,

#### Protein Calpha mask
This shows the mask of calpha atoms

In [191]:
holo_struc.protein_calpha_mask

array([False,  True, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False,  True, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False,  True, False, False, False,  True, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False,  True,
       False, False, False, False, False, False, False,  True, False,
       False, False, False, False, False, False,  True, False, False,
       False, False, False, False, False,  True, False, False, False,
        True, False, False, False, False, False,  True, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False,

### Get protein chain ordered
This gives a list of protein chains ordered by how they are in the structure

In [192]:
holo_struc.protein_chain_ordered

['1.A']

### Get protein chain ordered
This gives a list of protein chains ordered by how they are in the structure

### Get protein chains for all atoms
The list of chain IDs in the structure. Order of how they appear not kept.

In [195]:
holo_struc.protein_chains

['1.A']

### Get protein coordinates
This property gets the 3D positions of each of the atoms in protein molecules

In [197]:
holo_struc.protein_coords

array([[31.221, 22.957, 43.101],
       [31.828, 24.118, 42.476],
       [31.979, 23.854, 41.021],
       ...,
       [34.341, 35.018, 24.674],
       [35.484, 35.831, 24.497],
       [33.105, 35.742, 24.15 ]], dtype=float32)

### Get number of atoms of protein molecule

In [198]:
holo_struc.protein_n_atoms

964

### Get protein structure atom names
Returns all atoms names the same way they appear in the structure

In [200]:
holo_struc.protein_structure_atom_names

['C',
 'CA',
 'CB',
 'CD',
 'CD1',
 'CD2',
 'CE',
 'CE1',
 'CE2',
 'CE3',
 'CG',
 'CG1',
 'CG2',
 'CH2',
 'CZ',
 'CZ2',
 'CZ3',
 'N',
 'ND1',
 'ND2',
 'NE',
 'NE1',
 'NE2',
 'NH1',
 'NH2',
 'NZ',
 'O',
 'OD1',
 'OD2',
 'OE1',
 'OE2',
 'OG',
 'OG1',
 'OH',
 'SD',
 'SG']

## Get protein b-factors
Get protein atom occupancies. If not available in a structure, it's set to zero.

In [201]:
holo_struc.protein_structure_b_factor

[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0

### Nam

In [202]:
holo_struc.protein_structure_residue_names

['ALA',
 'ARG',
 'ASN',
 'ASP',
 'CYS',
 'GLN',
 'GLU',
 'GLY',
 'HIS',
 'ILE',
 'LEU',
 'LYS',
 'MET',
 'PHE',
 'PRO',
 'SER',
 'THR',
 'TRP',
 'TYR',
 'VAL']

### Get protein residues number
Residue number as they appear in structure

In [203]:
holo_struc.protein_structure_residues

[3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125]

# Get fasta from protein structure


In [204]:
holo_struc.protein_structure_sequence_fasta

'>receptor\nKCSLTGKWTNDLGSNMTIGAVNSRGEFTGTYTTAVTATSNEIKESPLHGTENTINKRTQPTFGFTVNWKFSESTTVFTGQCFIDRNGKEVLKTMWLLRSSVNDIGDDWKATRVGINIFTRLRT'

### Get tokenized sequence
Get tensor of sequence converted to integer-based amino acid token

In [205]:
holo_struc.protein_structure_tokenized_sequence

tensor([11,  4, 15, 10, 16,  7, 11, 17, 16,  2,  3, 10,  7, 15,  2, 12, 16,  9,
         7,  0, 19,  2, 15,  1,  7,  6, 13, 16,  7, 16, 18, 16, 16,  0, 19, 16,
         0, 16, 15,  2,  6,  9, 11,  6, 15, 14, 10,  8,  7, 16,  6,  2, 16,  9,
         2, 11,  1, 16,  5, 14, 16, 13,  7, 13, 16, 19,  2, 17, 11, 13, 15,  6,
        15, 16, 16, 19, 13, 16,  7,  5,  4, 13,  9,  3,  1,  2,  7, 11,  6, 19,
        10, 11, 16, 12, 17, 10, 10,  1, 15, 15, 19,  2,  3,  9,  7,  3,  3, 17,
        11,  0, 16,  1, 19,  7,  9,  2,  9, 13, 16,  1, 10,  1, 16])

### List ligand properties
Show liagnd related properties

In [176]:
for property in holo_struc.get_properties():
    if "ligand" in property:
        print(property)

input_ligand_conformer2resolved_stacks
input_ligand_conformer2smiles_stacks
input_ligand_conformer_coords
input_ligand_conformers
input_ligand_templates
ligand_chain_ordered
ligand_conformer2resolved_mask
resolved_ligand_mols
resolved_ligand_mols_coords
resolved_ligand_structure2smiles_stacks
resolved_ligand_structure_coords
resolved_smiles_ligand_mask


:::{todo}
- Vladas to write the description for the ligand properties
:::

### Ligand atom id mapping mapping
TODO: Vladas

conforrmer to solved structure mappings

In [None]:
holo_struc.input_ligand_conformer2resolved_stacks

conformer to

### Ligand conformer to input smiles mapping
TODO: Vladas

In [None]:
holo_struc.input_ligand_conformer2smiles_stacks

In [None]:
holo_struc.input_ligand_conformer_coords

### Ligand conformer coordinates
TODO: Vladas


In [None]:
holo_struc.input_ligand_conformer_coords

### Ligand conformer coordinates
TODO: Vladas

{'1.C': (array([[ 9,  4,  5,  6,  7, 11,  1,  0,  3, 14, 13,  8, 12,  2]]),
  array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13]]))}

### Inspect structure

In [183]:
holo_struc

Structure(
    (
        'id',
        '1avd__1__1.A__1.C',
    ),
    (
        'protein_path',
        /Users/yusuf/.local/share/plinder/2024-06/v2/systems/1avd__1__1.A__1.C/receptor.cif,
    ),
    (
        'protein_sequence',
        /Users/yusuf/.local/share/plinder/2024-06/v2/systems/1avd__1__1.A__1.C/sequences.fasta,
    ),
    (
        'list_ligand_sdf_and_input_smiles',
        [
            (
                /Users/yusuf/.local/share/plinder/2024-06/v2/systems/1avd__1__1.A__1.C/ligand_files/1.C.sdf,
                'CC(=O)N[C@@H]1[C@H]([C@@H]([C@H](O[C@H]1O)CO)O)O',
            ),
        ],
    ),
    (
        'protein_atom_array',
        <class 'biotite.structure.AtomArray'> with shape (964,),
    ),
    (
        'ligand_mols',
        {
            '1.C': (
                <rdkit.Chem.rdchem.Mol object at 0x1cd5d07b0>,
                <rdkit.Chem.rdchem.Mol object at 0x1cd5d0580>,
                (
                    <class 'numpy.ndarray'> with shape (1, 15),
      

### Inspect holo ligand
Returns a chain-mapped dictionary of of the form:
```python
{
    "<instance_id>.<chain_id>": (
        rdkit mol of template smiles of type `Chem.Mol`,
        random conformer of rdkit mol of template smiles of type `Chem.Mol`,
        conformer atoms to template smiles map with of type `tuple[NDArray.int_, NDArray.int_]`,
        rdkit mol of solved ligand structure of type `Chem.Mol`,
        solved ligand atom to template smile atom map of type `tuple[NDArray.int_, NDArray.int_]`,
        conformer atoms to solved ligand atom map of type `tuple[NDArray.int_, NDArray.int_]`
    )

}
```

In [184]:
holo_struc.ligand_mols

{'1.C': (<rdkit.Chem.rdchem.Mol at 0x1cd5d07b0>,
  <rdkit.Chem.rdchem.Mol at 0x1cd5d0580>,
  (array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]]),
   array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]])),
  <rdkit.Chem.rdchem.Mol at 0x1cd5d05f0>,
  (array([[ 9,  4,  5,  6,  7, 11,  1,  0,  3, 14, 13,  8, 12,  2]]),
   array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13]])),
  (array([[ 9,  4,  5,  6,  7, 11,  1,  0,  3, 14, 13,  8, 12,  2]]),
   array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13]])))}

### Protein properties

#### Inspect holo sequences
Returns a chain-mapped dictionary of sequences
```python
{
    "<instance_id>.<chain_id>": sequence of type `str`

}
```

In [185]:
holo_struc.input_sequences

{'1.A': 'ARKCSLTGKWTNDLGSNMTIGAVNSRGEFTGTYTTAVTATSNEIKESPLHGTENTINKRTQPTFGFTVNWKFSESTTVFTGQCFIDRNGKEVLKTMWLLRSSVNDIGDDWKATRVGINIFTRLRTQKE'}

#### Inspect atom array
This is a [biotite AtomArray](https://www.biotite-python.org/latest/apidoc/biotite.structure.AtomArray.html) of the receptor protein structure.

In [188]:
holo_struc.protein_atom_array[0]

Atom(np.array([31.221, 22.957, 43.101], dtype=float32), chain_id="1.A", res_id=3, ins_code="", res_name="LYS", hetero=False, atom_name="N", element="N")

#### Inspect protein sequence from input structure
This sequence information derived directly from the structure when aligned with the input sequence can provide information about missing residues

In [189]:
holo_struc.protein_sequence_from_structure

'KCSLTGKWTNDLGSNMTIGAVNSRGEFTGTYTTAVTATSNEIKESPLHGTENTINKRTQPTFGFTVNWKFSESTTVFTGQCFIDRNGKEVLKTMWLLRSSVNDIGDDWKATRVGINIFTRLRT'

#### Inspect unresolved input strcture indices
Unresolved structure original indices with indices matching the residue number of resolved sequence

In [None]:
# holo_struc.unresolved_aligned_indices

#### Inspect unresolved input strcture sequence
Unresolved structure original indices with indices matching the residue number of resolved sequence

#### Inspect  orignal holo PDB ligand loaded from sdf

In [None]:
holo_struc.resolved_ligand_mols #resolved_ligand_mols

#### Inspect input  ligand loaded from smiles

In [None]:
holo_struc.input_ligand_templates #resolved_ligand_mols
holo_struc.input_ligand_conformers  #resolved_ligand_conformers

#### Inspect random conformer of resolved ligand loaded from smiles

In [None]:
holo_struc.input_ligand_conformers  #resolved_ligand_conformers

In [None]:
holo_struc.resolved_ligand_mols['1.C']

In [None]:
holo_struc.resolved_ligand_mols['1.C'].GetConformer().GetPositions()

In [None]:
holo_struc.input_ligand_templates["1.C"]

In [None]:
holo_struc.input_ligand_conformers["1.C"]

In [None]:
holo_struc.input_ligand_conformer_coords

In [None]:
holo_struc.input_ligand_conformers["1.C"].GetConformer().GetPositions()

#### Inspect coordinates of random conformer of resolved ligand loaded from smiles

In [None]:
holo_struc.input_ligand_conformer_coords #resolved_ligand_conformers_coords

#### Inspect coordinates of resolved ligand loaded from smiles and aligned with original ligand

#### Inspect coordinates of original unresolved ligand

In [None]:
holo_struc.resolved_ligand_mols_coords

#### Inspect protein structure dataframe with indices renumbered to match sequence

In [None]:
#holo_struc.protein_dataframe

#### Inspect protein backbone mask

In [None]:
holo_struc.protein_backbone_mask

#### Inspect protein calpha mask

In [None]:
holo_struc.protein_calpha_mask

#### Inspect number of protein atoms

In [None]:
holo_struc.protein_n_atoms

#### Inspect protein chain ids

In [None]:
holo_struc.protein_chains

#### Inspect unresolved structure fasta

In [None]:
holo_struc.protein_structure_sequence_fasta

### Test sequence alignment

In [None]:
holo_struc

In [None]:
apo_struc

In [None]:
# Note for structure alignment to work, apo and holo need to have same chain id
apo_struc.set_chain("1.A")

In [None]:
apo_struc.protein_atom_array

In [None]:
seq_align = holo_struc.get_per_chain_seq_alignments(apo_struc)

In [None]:
seq_align

In [None]:
holo_struc.protein_atom_array[0]

In [None]:
apo_struc.protein_atom_array[0]

### Alignment and Cropping

In [None]:
align_common_seq = holo_struc.align_common_sequence(
        apo_struc,
    )

In [None]:
holo_struc

In [None]:
apo_struc

In [None]:
align_common_seq[0]

In [None]:
align_common_seq[1]

In [None]:
holo_struc.ligand_conformer2resolved_mask

In [None]:
holo_struc.protein_coords

In [None]:
#apo_struc.protein_dataframe

In [None]:
holo_struc.protein_structure_b_factor

In [None]:
test_sys = PlinderSystem(system_id="1avd__1__1.A__1.C", input_smiles_dict={"1.C": "CC(=O)N[C@@H]1[C@H]([C@@H]([C@H](O[C@H]1O)CO)O)O"})

In [None]:
test_sys.holo_structure

In [None]:
test_sys.alt_structures

In [None]:
test_sys.best_linked_structures_paths

In [None]:
#cropped = test_sys.create_masked_bound_unbound_complexes()

In [None]:
mask = holo_struc.protein_atom_array.atom_name == "CA"
holo_struc.filter(
        property="atom_name",
        mask="CA",

    )

In [None]:
holo_struc

In [None]:
holo_struc + apo_struc

In [None]:
holo_struc.protein_atom_array[holo_struc.protein_atom_array.chain_id == "1.A"]

In [None]:
holo_struc

## Loader

In [None]:
from plinder.core.loader import PlinderDataset
from plinder.core.loader.dataset import get_torch_loader
from plinder.core import get_split
from plinder.core.scores import query_links

#### Make plinder dataset

In [None]:
train_dataset = PlinderDataset()
#train_dataset = PlinderDataset(df=splits_df[splits_df.system_id =="6pl9__1__1.A__1.C"])

In [None]:
test_data = train_dataset[1]

In [None]:
test_data

test_data[110]

#### Make torch loader

In [None]:
train_loader = get_torch_loader(
    train_dataset
)

In [None]:
for data in train_loader:

    test_torch = data
    break
    #for k, v in test_torch['input_features'].items():
    #    if v.shape[1] > 1:
    #        break

In [None]:
test_torch.keys()

In [None]:
test_torch['system_ids']

In [None]:
for k, v in test_torch['features_and_coords'].items():
    print(k, v.shape)

In [None]:
holo_struc.ligand_mols

In [None]:
holo_struc.input_ligand_conformer2resolved_stacks