Skip to content

mnanao/alphafuser

Repository files navigation

Alphafuser: Assemble Protein Complexes from Binary Interaction Data

This pipeline expands a binary list of protein-protein interactions into larger complexes and prepares AlphaFold input structures, leveraging local ColabFold and structural biology toolkits.

Requirements

Install or load the following dependencies:

  • Phenix
  • CCP4
  • DrSASA
  • localcolabfold with access to alphafold2_multimer_v3
  • PyMOL and ImageMagick for thumbnail figure generation (optional)

Input Format

Prepare a CSV file containing binary protein interactions. Example:

Q9FM80,AT2G38750
Q9FM80,AT3G26060
...
  • Use UniProt IDs wherever possible.
  • Ensure naming consistency: avoid mixing synonyms like Q9FM80 and AT5G55580.
  • The common bait protein must be in the first column.
  • Do not use Excel/Google Sheets "pull down" autofill—this may corrupt UniProt IDs.

Input format, option 2, using BioGRID

A script is provided to produce a CSV file of interactors downloaded from BioGRID. Example usage:


get_biogrid_interactions.py  -g Q92542 -s -f accession -o 'Homo sapiens' -e "AFFINITY CAPTURE-WESTERN:2" --accesskey abcdefghij

Gets interactors of uniprot ID Q92542 in Homo Sapiens, looking for at least 2 AFFINITY CAPTURE-WESTERN interactions, with strict checking enforced for database ID mapping, specifying that the "Accession" fields should be checked. Other options here https://www.uniprot.org/help/query-field. Access keys can be obtained here : https://webservice.thebiogrid.org. Note that this uses the script uniprot_id_mapping.py which should be in the PATH.

Building Complexes

python3 assemble_complexes_from_binary_interactions.py \
  -c interactions.txt \
  -o thaliana \
  -b3 \
  -f \
  --strict \
  -a xref_araport,xref_gramene \
  -g

Sequence Trimming

In the same directory as your AF_complex_queue.db SQLite database

alphafold_trim_sequences.pl

Note that PDBs for every sequence in your database will be downloaded to the current directory, and the sequences in your AF_complex_queue.db will be edited (to include underscores in the very common case of discontinuous high confidence regions). This script makes use of the phenix programs phenix.process_predicted_model and phenix.print_sequence and the CCP4 program PDBSET

Running the structure prediction

python3 AF_complexes_release/alphafuser_worker.py

Or on SLURM, to e.g. request a GPU:

srun -pgpu --gres=gpu:1 --mem-per-gpu 48G -Ca40 \
  python3 AF_complexes_release/alphafuser_worker.py

This script calls bsa.pl and must be in the PATH. Theis is a perl wrapper for DrSasa

Multiple alphafuser_worker.py can be run in parallel, but large numbers can cause locking/concurrency issues on the SQLite database

Monitoring Progress

how_many_complexes.sh

Provides statistics on the progress of the computation

Output

A SQLite database (AF_complex_queue.db) will store:

  • Trimmed sequences
  • Complex assembly information, most importantly IPTM and buried surface area
  • Folders containing AlphaFold multimer models

The database schema is as follows:

CREATE TABLE proteins(Uniprot_ID,Protein_name,PDB_path,Sequence,UNIQUE(Protein_name,Sequence));
CREATE TABLE complex_components(complex_name,protein_name);
CREATE TABLE queue (
complex_name TEXT,
percent_connections REAL,
priority INTEGER,
status TEXT,
GPU_type TEXT,
dockq_score REAL,
pLDDT REAL,
ptmscore REAL,
thread INTEGER,
iptm REAL,
average_buried_area REAL,
average_delta_G REAL,
best_buried_area REAL,
best_delta_G REAL,
start_time TIMESTAMP,
end_time TIMESTAMP,
failures INTEGER,
poisoned_based_on INTEGER
);

alphafuser_generate_thumbnails.pl can be used to generate pymol figures colored by protein and pLDDT. The simplist way to run this is to make sure the "newcsv.csv" file from complex assembly above is in the same directory as the directory containing the local-colabfold subdirectories, and run it with no arguments. Some options are provided for use with AWS, but are largely untested.

alphafuser_network_diagram.py can be used to make CSV files that can be used by the included R_network_diagram.R R script to produce a network diagram of the Alphafuser run

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

About

A set of scripts to expand binary interaction data to higher order complexes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors