# Setup

In [None]:
# installation of required libraries
!pip install biopandas

In [None]:
!rm -rf /content/*

In [None]:
from biopandas.pdb import PandasPdb
import pandas as pd

Note: Why is `biopandas` useful for protein analysis?

If you work with PDB files and want to analyze protein structures efficiently in Python, biopandas makes it easier by converting structural data into Pandas DataFrames.
Key Benefits for a Computational Protein Workshop

- **Easy Data Handling**: Instead of looping through PDB files manually, you can use biopandas to extract atomic coordinates, residue information, and chain details into a structured DataFrame.
- **Fast Filtering**: Need only backbone atoms, a specific chain, or only hydrophobic residues? Use simple pandas filtering instead of complex parsing.
- **Compatible with ML & Visualization**: Since biopandas works with DataFrames, it’s easier to integrate protein structural data into machine learning pipelines or use matplotlib for visualization.
- **No Complex Parsing**: Unlike `Bio.PDB` (which requires object-oriented access), biopandas lets you query PDB data like a spreadsheet, making it more beginner-friendly.


In [None]:
PDB_FILE_LOCATION = 'https://github.com/enveda/enzyme-ml/raw/refs/heads/main/workshop_data/cotb2_ml_data/cotb2_pp_mg.pdb'
!wget $PDB_FILE_LOCATION -O /content/cotb2_pp_mg.pdb

In [None]:
!ls /content

In [None]:
pdb_file = PandasPdb().read_pdb('/content/cotb2_pp_mg.pdb')

In [None]:
display(
    pdb_file.df["ATOM"].head(3),
    pdb_file.df["HETATM"].head(3)
)

## Extract the sequence from the protein file and write it into a fasta file