# Quickstart building and using Protein Structures as Graphs with Propythia

The "Structure" modules integrated attempt to solve challenges related to protein classification for the purpose of predicting different biological activities according to structural characteristics

This notebook intends to go over the building of graphs to describe protein structures.

For the usage of this module, it was necessary to create a class object and call the desired methods,
where specific values can be set by the user for the majority of the parameters, although default values
are established.
This work has been structured into different modules to maintain organization and to enhance the
flexibility of the system for various tasks, thereby allowing it to be adjustable to each specific problem. The
following submodules were developed as part of this work

1) Clustering: This aggregates similar sequences, to identify relationships within the data.

2) Graphs: This submodule is designed to download PDB or AlphaFold structure files and generate
 graphs for each protein, using Graphein’s potential for graph construction.
3) ProteinGraphDataset: This module provides a wrapper for the graphs allowing them to be used
 in PyTorch Dataloader, enabling efficient handling and loading of protein data for further analysis.
4) Dataloader preparation: Provides an efficient loading and batching data from the ProteinGraph
Dataset.
5) GNNModel: This module is designed to provide essential methods for DL, including training and
 prediction functions, as well as model analysis for a few models

# 1. Getting Data

In [None]:
import sys
import os
print(os.getcwd())
# Get the directory path of the current script
current_script_directory = os.path.dirname(os.path.abspath("__file__"))
main_directory =  current_script_directory[:current_script_directory.rfind("\\")]
# Construct the path to the src directory
src_directory = os.path.join(main_directory,"src")
srcpro_directory = os.path.join(main_directory, "src\\propythia")
print(src_directory)
print(srcpro_directory)

# Add the src directory to sys.paths
sys.path.append(src_directory)
sys.path.append(srcpro_directory)

c:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\example
c:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\src
c:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\src\propythia


First you have to run a CD-HIT sequence similarity search. Use the following script, adapting with your defined threshold:

cd-hit -i input.fasta -o output_clustered.fasta -c 0.90

where:

1. -i input.fasta: Path to your input file containing sequences in FASTA format.
2. -o output_clustered.fasta: Output file where clustered sequences will be saved.
3. -c 0.90: Sequence identity threshold for clustering (0.90 means 90% identity).



Then use the "read_clstr" function in the clustering.py file to create a dataframe with all sequences with the respective cluster associated. This is important for the following steps.

For the usage of this module, it is essencial to have either PDB of UniProt identifiers.

In [2]:
import pandas as pd

data = [
    {"UniProtID": "A1KXI0", "sequence": "MKFLLVAALCALVAIGSCKPTREEIKTFEQFKKVFGKVYRNAEEEARREHHFKEQLKWVEEHNGIDGVEYAINEYSDMSEQEFSFHLSGGGLNFTYMKMEAAKEPLINTYGSLPQNFDWRQKARLTRIRQQGACGSCWAFAAAGVAESLYSIQKQQSIGLSEQELVDCTYNRYDPSYQCNGCGSGYSTEAFKYMIRTGLVEERNYPYNMRTQWCDPDVEGQRYHVSGYQQLRYHSSDEDVMYTIQQHGPVVIYMHGSNNYFRNLGNGVLRGVAYNDAYTDHAVILVGWGTVQGVDYWIIRNSWGTGWGNGGYGYVERGHNSLGINNYVTYATL", "bio_category": "oribatida", "label": 1, "Cluster": 8462},
    {"UniProtID": "C0HL99", "sequence": "MSWQTYVDDHLMCEIEGNYLTSAAIIGQDGSIWAQSASFPQFKPEEITAIMNDFSEPGTLAPTGLYLGGTKYMVIQGEAGAVIRGKKGPGGVTVKKTNQALIIGIYDEP",  "bio_category": "actinidia deliciosa",  "label": 1,"Cluster": 14030},
    {"UniProtID": "P02754", "sequence": "MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENDECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLVCMENSAEPEQSLVCQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI", "bio_category": "bos taurus", "label": 1, "Cluster": 12743},
    {"UniProtID": "P09582", "sequence": "MWFLALCLAMSLGWTGAEPHFQPRIIGGRECLKNSQPWQVAVYHNGEFACGGVLVNPEWVLTAAHCANSNCEVWLGRHNLSESEDEGQLVQVRKSFIHPLYKTKVPRAVIRPGEDRSHDLMLLHLEEPAKITKAVRVMDLPKKEPPLGSTCYVSGWGSTDPETIFHPGSLQCVDLKLLSNNQCAKVYTQKVTKFMLCAGVLEGKKDTCKGDSGGPLICDGELVGITSWGATPCGKPQMPSLYTRVMPHLMWIKDTMKANT",  "bio_category": "carnivora", "label": 1, "Cluster": 10391},
    {"UniProtID": "P59262", "sequence": "MKFLVNVALVFMVVYISYIYAAPEPEPAPEPEAEADAEADPEAGIGAVLKVLTTGLPALISWIKRKRQQG","bio_category": "hymenoptera", "label": 1,  "Cluster": 15103},
    {"UniProtID": "A5HII1", "sequence": "MGLPKSFVSMSLLFFSTLLILSLAFNAKNLTQRTNDEVKAMYESWLIKYGKSYNSLGEWERRFEIFKETLRFIDEHNADTNRSYKVGLNQFADLTDEEFRSTYLGFTSGSNKTKVSNRYEPRFGQVLPSYVDWRSAGAVVDIKSQGECGGCWAFSAIATVEGINKIVTGVLISLSEQELIDCGRTQNTRGCNGGYITDGFQFIINNGGINTEENYPYTAQDGECNLDLQNEKYVTIDTYENVPYNNEWALQTAVTYQPVSVALDAAGDAFKHYSSGIFTGPCGTAIDHAVTIVGYGTEGGIDYWIVKNSWDTTWGEEGYMRILRNVGGAGTCGIATMPSYPVKYNNQNHPKPYSSLINPPAFSMSKDGPVGVDDGQRYSA",  "bio_category": "actinidia deliciosa", "label": 1, "Cluster": 7144},
    {"UniProtID": "P27740", "sequence": "MANKLFLVSATLALFFLLTNASVYRTGSEFDEHDATNPAGPFRIPKCRKEFQQAQHLKACQQWLHKQAMQSGSGPSWTLDGEFDFEEDMENTQGPQQEPPLLQQCCNELHQEEPLCVCPTLKGASKAVKQQVRQQGQQQQMQQVISRIYQTSTHLPRVCNIRQVSICPFQKTMPGPSY", "bio_category": "brassica rapa",  "label": 1, "Cluster": 12744},
    {"UniProtID": "P15494", "sequence": "MGVFNYETETTSVIPAARLFKAFILDGDNLFPKVAPQAISSVENIEGNGGPGTIKKISFPEGFPFKYVKDRVDEVDHTNFKYNYSVIEGGPIGDTLEKISNEIKIVATPDGGSILKISNKYHTKGDHEVKAEQVKASKEMGETLLRAVESYLLAHSDAYN", "bio_category": "fagales", "label": 1, "Cluster": 13199},
    {"UniProtID": "P0DO15", "sequence": "MMRARFPLLLLGLVFLASVSVSFGIAYWEKENPKHNKCLQSCNSERDSYRNQACHARCNLLKVEKEECEEGEIPRPRPRPQHPEREPQQPGEKEEDEDEQPRPIPFPRPQPRQEEEHEQREEQEWPRKEEKRGEKGSEEEDEDEDEEQDERQFPFPRPPHQKEERNEEEDEDEEQQRESEESEDSELRRHKNKNPFLFGSNRFETLFKNQYGRIRVLQRFNQRSPQLQNLRDYRILEFNSKPNTLLLPNHADADYLIVILNGTAILSLVNNDDRDSYRLQSGDALRVPSGTTYYVVNPDNNENLRLITLAIPVNKPGRFESFFLSSTEAQQSYLQGFSRNILEASYDTKFEEINKVLFSREEGQQQGEQRLQESVIVEISKEQIRALSKRAKSSSRKTISSEDKPFNLRSRDPIYSNKLGKFFEITPEKNPQLRDLDIFLSIVDMNEGALLLPHFNSKAIVILVINEGDANIELVGLKEQQQEQQQEEQPLEVRKYRAELSEQDIFVIPAGYPVVVNATSNLNFFAIGINAENNQRNFLAGSQDNVISQIPSQVQELAFPGSAQAVEKLLKNQRESYFVDAQPKKKEEGNKGRKGPLSSILRAFY", "bio_category": "glycine max",  "label": 1,"Cluster": 3089},
    {"UniProtID": "P31541", "sequence": "MMARALVQSTNILPSVAGERAGQFNGSRKDQRTVRMLCNVKCCSSRLNNFAGLRGCNALDTLLVKSGETLHSKVAAATFVRRPRGCRFVPKAMFERFTEKAIKVIMLAQEEARRLGHNFVGTEQILLGLIGEGTGIAAKVLKSMGINLKDARVEVEKIIGRGSGFIAVEIPFTPRAKRVLELSLEEARQLGHNYIGSEHLLLGLLREGEGVAARVLENLGADPTNIRTQVIRMVGESSEAVGASVGGGTSGLKMPTLEEYGTNLTKLAEEGKLDPVVGRQAQIERVTQILGRRTKNNPCLIGEPGVGKTAIAEGLAQRIANGDVPETIEGKKVITLDMGLLVAGTKYRGEFEERLKKLMEEIKQSDEIILFIDEVHTLIGAGAAEGAIDAANILKPALARGELQCIGATTLDEYRKHIEKDPALERRFQPVKVPEPSVDETIQILKGLRERYEIHHKLHYTDEAIEAAAKLSHQYISDRFLPDKAIDLIDEAGSRVRLRHAQLPEEARELEKELRQITKEKNEAVRGQDFEKAGELRDREMDLKAQISALIDKNKEKSKAESEAGDAAGPIVTEADIQHIVSSWTGIPVEKVSTDESDRLLKMEETLHTRVIGQDEAVKAISRAIRRARVGLKNPNRPIASFIFSGPTGVGKSELAKSLATYYFGSEEAMIRLDMSEFMERHTVSKLIGSPPGYVGYTEGGQLTEAVRRRPYTVVLFDEIEKAHPDVFNMMLQILEDGRLTDSKGRTVDFKNTLLIMTSNVGSSVIEKGGRRIGFDLDFDEKDSSYNRIKSLVTEELKQYFRPEFLNRLSEMIVFRQLTKLEVKEIADIMLKEVFVRLKNKEIELQVTERFRDRVVDEGYNPSYGARPLRRAIMRLLEDSMAEKMLAGEIKEGDSVIVDVDSDGNVTVLNGTSGAPSDSAPEPILV", "bio_category": "solanum lycopersicum", "label": 0, "Cluster": 1180},
    {"UniProtID": "P17786", "sequence": "MGKEKIHISIVVIGHVDSGKSTTTGHLIYKLGGIDKRVIERFEKEAAEMNKRSFKYAWVLDKLKAERERGITIDIALWKFETTKYYCTVIDAPGHRDFIKNMITGTSQADCAVLIIDSTTGGFEAGISKDGQTREHALLAFTLGVKQMICCCNKMDATTPKYSKARYDEIVKEVSSYLKKVGYNPDKIPFVPISGFEGDNMIERSTNLDWYKGPTLLEALDQINEPKRPSDKPLRLPLQDVYKIGGIGTVPVGRVETGVIKPGMVVTFGPTGLTTEVKSVEMHHEALQEALPGDNVGFNVKNVAVKDLKRGYVASNSKDDPAKGAASFTAQVIIMNHPGQIGNGYAPVLDCHTSHIAVKFAEILTKIDRRSGKELEKEPKFLKNGDAGMVKMIPTKPMVVETFAEYPPLGRFAVRDMRQTVAVGVVKNVDKKDPTGAKVTKAAQKKGK", "bio_category": "solanum lycopersicum", "label": 0, "Cluster": 15113},
    {"UniProtID": "P48980", "sequence": "MENFPIINLEKLNGDERANTMEMIKDACENWGFFELVNHGIPHEVMDTVEKMTKGHYKKCMEQRFKELVASKGLEAVQAEVTDLDWESTFFLRHLPTSNISQVPDLDEEYREVMRDFAKRLEKLAEELLDLLCENLGLEKGYLKNAFYGSKGPNFGTKVSNYPPCPKPDLIKGLRAHTDAGGIILLFQDDKVSGLQLLKDEQWIDVPPMRHSIVVNLGDQLEVITNGKYKSVLHRVIAQTDGTRMSLASFYNPGSDAVIYPAKTLVEKEAEESTQVYPKFVFDDYMKLYAGLKFQAKEPRFEAMKAMESDPIASA", "bio_category": "solanum lycopersicum", "label": 0, "Cluster": 7134},
    # {"UniProtID": "P89677", "sequence": "MGFWMAMLLMLLLCLWVSCGIASVSYDHKAIIVNGQRKILISGSIHYPRSTPEMWPDLIQKAKEGGVDVIQTYVFWNGHEPEEGKYYFEERYDLVKFIKVVQEAGLYVHLRIGPYACAEWNFGGFPVWLKYVPGISFRTNNEPFKAAMQKFTTKIVDMMKAEKLYETQGGPIILSQIENEYGPMEWELGEPGKVYSEWAAKMAVDLGTGVPWIMCKQDDVPDPIINTCNGFYCDYFTPNKANKPKMWTEAWTAWFTEFGGPVPYRPAEDMAFAVARFIQTGGSFINYYMYHGGTNFGRTSGGPFIATSYDYDAPLDEFGSLRQPKWGHLKDLHRAIKLCEPALVSVDPTVTSLGNYQEARVFKSESGACAAFLANYNQHSFAKVAFGNMHYNLPPWSISILPDCKNTVYNTARVGAQSAQMKMTPVSRGFSWESFNEDAASHEDDTFTVVGLLEQINITRDVSDYLWYMTDIEIDPTEGFLNSGNWPWLTVFSAGHALHVFVNGQLAGTVYGSLENPKLTFSNGINLRAGVNKISLLSIAVGLPNVGPHFETWNAGVLGPVSLNGLNEGTRDLTWQKWFYKVGLKGEALSLHSLSGSPSVEWVEGSLVAQKQPLSWYKTTFNAPDGNEPLALDMNTMGKGQVWINGQSLGRHWPAYKSSGSCSVCNYTGWFDEKKCLTNCGEGSQRWYHVPRSWLYPTGNLLVVFEEWGGDPYGITLVKREIGSVCADIYEWQPQLLNWQRLVSGKFDRPLRPKAHLKCAPGQKISSIKFASFGTPEGVCGNFQQGSCHAPRSYDAFKKNCVGKESCSVQVTPENFGGDPCRNVLKKLSVEAICS", "bio_category":"solanum lycopersicum", "label":0, "Cluster":3284},
    {"UniProtID": "Q8LI30", "sequence": "MATLSLPLPHLTQAIPARARPRPRPLRGIPARLLSCRAAMAVAPDKEEAAAVALDKAVKVAVAAPDRAAVAAVGVGEELPEGYDQMMPAVEEARRRRAGVLLHPTSLRGPHGIGDLGDEAVAFLAWLRDAGCTLWQVLPLVPPGRKSGEDGSPYSGQDANCGNTLLISLEELVKDGLLMENELPDPLDMEYVEFDTVANLKEPLIAKAAERLLLSRGELRTQYDCFKKNPNISGWLEDAALFAAIDRSIDALSWYEWPEPLKNRHLRALEDIYQKQKDFIEIFMAQQFLFQRQWQRIRKYAKKLGISIMGDMPIYVGYHSADVWANRKSFLLDKNGFPTFVSGVPPDAFSETGQLWNSPLYDWKAMEAGGFEWWIKRINRALDLYDEFRIDHFRGLAGFWAVPSESKVALVGSWRAGPRNAFFDALFKAVGRINIIAEDLGVITEDVVDLRKSIEAPGMAVLQFAFGGGSDNPHLPHNHEFDQVVYTGTHDNDTVIGWWQTLPEEEKQTVFKYLPEANRTEISWALITAALSSVARTSMVTMQDILGLDSSARMNTPATQKGNWRWRMPSSVSFDSLSPEAAKLKELLGLYNRL", "bio_category": "oryza sativa", "label":0, "Cluster":3206}
    ]
# Convert to DataFrame
df = pd.DataFrame(data)
df.rename(columns={"UniProtID": "id"}, inplace=True)

print(df)


        id                                           sequence  \
0   A1KXI0  MKFLLVAALCALVAIGSCKPTREEIKTFEQFKKVFGKVYRNAEEEA...   
1   C0HL99  MSWQTYVDDHLMCEIEGNYLTSAAIIGQDGSIWAQSASFPQFKPEE...   
2   P02754  MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDIS...   
3   P09582  MWFLALCLAMSLGWTGAEPHFQPRIIGGRECLKNSQPWQVAVYHNG...   
4   P59262  MKFLVNVALVFMVVYISYIYAAPEPEPAPEPEAEADAEADPEAGIG...   
5   A5HII1  MGLPKSFVSMSLLFFSTLLILSLAFNAKNLTQRTNDEVKAMYESWL...   
6   P27740  MANKLFLVSATLALFFLLTNASVYRTGSEFDEHDATNPAGPFRIPK...   
7   P15494  MGVFNYETETTSVIPAARLFKAFILDGDNLFPKVAPQAISSVENIE...   
8   P0DO15  MMRARFPLLLLGLVFLASVSVSFGIAYWEKENPKHNKCLQSCNSER...   
9   P31541  MMARALVQSTNILPSVAGERAGQFNGSRKDQRTVRMLCNVKCCSSR...   
10  P17786  MGKEKIHISIVVIGHVDSGKSTTTGHLIYKLGGIDKRVIERFEKEA...   
11  P48980  MENFPIINLEKLNGDERANTMEMIKDACENWGFFELVNHGIPHEVM...   
12  Q8LI30  MATLSLPLPHLTQAIPARARPRPRPLRGIPARLLSCRAAMAVAPDK...   

            bio_category  label  Cluster  
0              oribatida      1     8462  
1  

You can observe that our dataset has "UniProtID", "sequence", "bio_category", "label" and "Cluster" columns

You should use the Propythia habilities to preprocess your sequences.

# 2. Graph building

In this phase, you are using Grapheins capacities to build this graphs. For that, you should install Graphein, and I recommend to use version 1.7.6

In [None]:
!pip install graphein==1.7.6
#!pip install graphein[extras]

In [5]:
from protein.structure import Graph, ProteinGraphDataset

Then, you start building the graphs for each UniProtID. The files are downloaded and saved. Create a directory called "pdb_files" to save these files

(TODO: mkdir pdb_files)

In [65]:
for uniprot_id in df["UniProtID"]:
    print(uniprot_id)
    graph = Graph(identifier = uniprot_id, id_type = "uniprot")
    print(graph)
    

A1KXI0


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\A1KXI0.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207F20CF0D0>
C0HL99


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\C0HL99.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207EF7B4090>
P02754


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P02754.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x0000020782C37890>
P09582


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P09582.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207EF7B4090>
P59262


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P59262.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x0000020782CF6290>
A5HII1


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\A5HII1.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207EF7B4090>
P27740


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P27740.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x0000020782CF6290>
P15494


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P15494.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207EF7B4090>
P0DO15


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P0DO15.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x0000020782CF6290>
P31541


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P31541.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207EF7B4090>
P17786


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P17786.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x000002078472A1D0>
P48980


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\P48980.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x00000207EF7B4090>
Q8LI30


AlphaFold file downloaded: C:\Users\Fofinha\Desktop\UNI\MESTRADO\propythia\pdb_files\Q8LI30.pdb


Output()

  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]
  .mean()[["x_coord", "y_coord", "z_coord"]]


<src.propythia.protein.structure.Graph object at 0x0000020784909F50>


In [2]:
dataset = ProteinGraphDataset(df)

NameError: name 'ProteinGraphDataset' is not defined

After 