In [1]:
import FetchAlphaFoldPDBs as FETCH

See walkthrough for function details

Three execution options:

- *Uniprot_Full*: Checks accession ID database for an exhaustive search of AlphaFold entries relating to uniprot IDs, for those with matches retrieves the AlphaFold PDBs where possible
  - Hence requires local database AlphaFold accession IDs
  - Also catches where uniprot ID has a '-' present and will search a 'reduced' ID without hyphen, e.g. PXXXXX-Y will attempt to match to PXXXXX, this is captured in output (and ultimately sequences are checked anyway).
- *Uniprot_Guess*: Takes uniprot IDs, *guesses* the name of AlphaFold PDB and attempts to retrieve it
  - Convenient if database check if not necessary
  - Guess is of the form 'AF-{uniprotID}-F1-model_v{latestVersion}.pdb'

- *AlphaFoldPDBs*: Takes a list of AlphaFold PDB names and attempts to retrieve them
  - Useful when AlphaFold PDB name is known with confidence


#### Config

- **RUN_NAME**: *Name used to create output folders/collected pdb directory*
- **OUTPUT_DIR**: *Directory to store outputs*
- **PATH_TO_UNIPROTID_CSV**: *csv containing list of uniprot IDs to query AlphaFold PDB Database*
- **ID_FEATURE_NAME**: _Name of the column in above csv specifying the uniprot ID (default: 'Uniprot_ID' creatively...)_
- **LOCAL_ALPHAFOLD_PDB_DIRECTORIES**: *Directories to check if pdb already present locally (do not include 'collected' pdb directory), default: None*
- DataBase Info:
  - ACCESSION_DB_PATH: *Path to local database of the accession IDs (csv available from [AlphaFolds FTP server](http://ftp.ebi.ac.uk/pub/databases/alphafold/))* 
  - ACCESSION_ID_TABLE_NAME: *table name for accession_ids in database*
  - TABLE_UNIPROT_ID_FEATURE_NAME: *table feature name for uniprot IDs, shouldn't change*
 - ALPHAFOLD_ENDPOINT: https of AlphaFold to pull pdbs, shouldn't change from https://alphafold.ebi.ac.uk/files/
  
#### Outputs
- Creates a file '[RUN_NAME]\_AF\_info.csv' which indicates whether an AlphaFold match was found for ID
- Creates a directory '[RUN_NAME]\_AF\_PDBs' containing all AlphaFold PDBs that could be found/pulled for list of IDs
- Creates a file '[RUN_NAME]\_AF\_info_with_PDB_paths.csv' which is info from '[RUN_NAME]\_AF\_info.csv' with local path to PDB if PDB reachable


In [4]:
CONFIG = {
    'RUN_NAME': 'testFull',
    'OUTPUT_DIR': './testFull/',
    'PATH_TO_UNIPROTID_CSV': './demo_datasets/demo_llps_minus.csv',
    'ID_FEATURE_NAME': "Uniprot_ID",
    'LOCAL_ALPHAFOLD_PDB_DIRECTORIES': None,

    # STATIC
    'ACCESSION_DB_PATH': ".\\..\\AlphaFold\\accession_id_db.db",
    'ACCESSION_ID_TABLE_NAME': "accession_ids",
    'TABLE_UNIPROT_ID_FEATURE_NAME': "UniProtAccessionID",
    'ALPHAFOLD_ENDPOINT': 'https://alphafold.ebi.ac.uk/files/'
}

### Uniprot_Full

In [5]:
# Full Function
FETCH.Uniprot_Full(CONFIG)

Found 84 IDs, 52 unique IDs
Checking which have AlphaFold entries
52 IDs ran, 50 matches, 2 no match
Fetching PDBs for IDs with entries
- - - FINISHED - - - 
52 IDs ran, 50 PDBs found, 2 no PDB found


### Uniprot_Guess

In [3]:
# Guess Function (guesses alphafold pdb names from ID)
CONFIG['RUN_NAME'] = 'testQuick'
CONFIG['OUTPUT_DIR'] = 'testQuick'
FETCH.Uniprot_Guess(CONFIG, assumedVersion=4)

Found 84 IDs, 52 unique IDs
- - - FINISHED - - - 
52 IDs ran, 46 PDBs found, 6 no PDB found


### AlphaFoldPDBs

In [3]:
# PDB pulldown, requires list of pdb names
# Note, if using this function 
#    PATH_TO_UNIPROTID_CSV should point to csv of AF pdb file names
#    ID_FEATURE_NAME should be the column name of pdb names
CONFIG['RUN_NAME'] = 'testPDBPullDown'
CONFIG['OUTPUT_DIR'] = 'testPDBPullDown'
CONFIG['PATH_TO_UNIPROTID_CSV'] = './demo_datasets/demo_llps_minus_pdbnames.csv'
CONFIG['ID_FEATURE_NAME'] = "PDB_FileName"
FETCH.AlphaFoldPDBs(CONFIG)

52 IDs ran, 46 PDBs found, 6 no PDB found
