Alphafuser: Assemble Protein Complexes from Binary Interaction Data

This pipeline expands a binary list of protein-protein interactions into larger complexes and prepares AlphaFold input structures, leveraging local ColabFold and structural biology toolkits.

Requirements

Install or load the following dependencies:

Phenix
CCP4
DrSASA
localcolabfold with access to alphafold2_multimer_v3
PyMOL and ImageMagick for thumbnail figure generation (optional)

Input Format

Prepare a CSV file containing binary protein interactions. Example:

Q9FM80,AT2G38750
Q9FM80,AT3G26060
...

Use UniProt IDs wherever possible.
Ensure naming consistency: avoid mixing synonyms like Q9FM80 and AT5G55580.
The common bait protein must be in the first column.
Do not use Excel/Google Sheets "pull down" autofill—this may corrupt UniProt IDs.

Input format, option 2, using BioGRID

A script is provided to produce a CSV file of interactors downloaded from BioGRID. Example usage:


get_biogrid_interactions.py  -g Q92542 -s -f accession -o 'Homo sapiens' -e "AFFINITY CAPTURE-WESTERN:2" --accesskey abcdefghij

Gets interactors of uniprot ID Q92542 in Homo Sapiens, looking for at least 2 AFFINITY CAPTURE-WESTERN interactions, with strict checking enforced for database ID mapping, specifying that the "Accession" fields should be checked. Other options here https://www.uniprot.org/help/query-field. Access keys can be obtained here : https://webservice.thebiogrid.org. Note that this uses the script uniprot_id_mapping.py which should be in the PATH.

Building Complexes

python3 assemble_complexes_from_binary_interactions.py \
  -c interactions.txt \
  -o thaliana \
  -b3 \
  -f \
  --strict \
  -a xref_araport,xref_gramene \
  -g

Sequence Trimming

In the same directory as your AF_complex_queue.db SQLite database

alphafold_trim_sequences.pl

Note that PDBs for every sequence in your database will be downloaded to the current directory, and the sequences in your AF_complex_queue.db will be edited (to include underscores in the very common case of discontinuous high confidence regions). This script makes use of the phenix programs phenix.process_predicted_model and phenix.print_sequence and the CCP4 program PDBSET

Running the structure prediction

python3 AF_complexes_release/alphafuser_worker.py

Or on SLURM, to e.g. request a GPU:

srun -pgpu --gres=gpu:1 --mem-per-gpu 48G -Ca40 \
  python3 AF_complexes_release/alphafuser_worker.py

This script calls bsa.pl and must be in the PATH. Theis is a perl wrapper for DrSasa

Multiple alphafuser_worker.py can be run in parallel, but large numbers can cause locking/concurrency issues on the SQLite database

Monitoring Progress

how_many_complexes.sh

Provides statistics on the progress of the computation

Output

A SQLite database (AF_complex_queue.db) will store:

Trimmed sequences
Complex assembly information, most importantly IPTM and buried surface area
Folders containing AlphaFold multimer models

The database schema is as follows:

CREATE TABLE proteins(Uniprot_ID,Protein_name,PDB_path,Sequence,UNIQUE(Protein_name,Sequence));
CREATE TABLE complex_components(complex_name,protein_name);
CREATE TABLE queue (
complex_name TEXT,
percent_connections REAL,
priority INTEGER,
status TEXT,
GPU_type TEXT,
dockq_score REAL,
pLDDT REAL,
ptmscore REAL,
thread INTEGER,
iptm REAL,
average_buried_area REAL,
average_delta_G REAL,
best_buried_area REAL,
best_delta_G REAL,
start_time TIMESTAMP,
end_time TIMESTAMP,
failures INTEGER,
poisoned_based_on INTEGER
);

alphafuser_generate_thumbnails.pl can be used to generate pymol figures colored by protein and pLDDT. The simplist way to run this is to make sure the "newcsv.csv" file from complex assembly above is in the same directory as the directory containing the local-colabfold subdirectories, and run it with no arguments. Some options are provided for use with AWS, but are largely untested.

alphafuser_network_diagram.py can be used to make CSV files that can be used by the included R_network_diagram.R R script to produce a network diagram of the Alphafuser run

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alphafuser: Assemble Protein Complexes from Binary Interaction Data

Requirements

Input Format

Input format, option 2, using BioGRID

Building Complexes

Sequence Trimming

Running the structure prediction

Monitoring Progress

Output

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
R_network_diagram.R		R_network_diagram.R
alphafuser_generate_thumbnails.pl		alphafuser_generate_thumbnails.pl
alphafuser_network_diagram.py		alphafuser_network_diagram.py
alphafuser_trim_sequences.pl		alphafuser_trim_sequences.pl
alphafuser_worker.py		alphafuser_worker.py
assemble_complexes_from_binary_interactions.py		assemble_complexes_from_binary_interactions.py
bsa.pl		bsa.pl
get_biogrid_interactions.py		get_biogrid_interactions.py
how_many_complexes.sh		how_many_complexes.sh
uniprot_id_mapping.py		uniprot_id_mapping.py

Folders and files

Latest commit

History

Repository files navigation

Alphafuser: Assemble Protein Complexes from Binary Interaction Data

Requirements

Input Format

Input format, option 2, using BioGRID

Building Complexes

Sequence Trimming

Running the structure prediction

Monitoring Progress

Output

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages