# 📊 Bioinformatics Data Visualization

In this notebook, we'll demonstrate how to visualize different types of bioinformatics data:
- Sequence-level properties like length and GC content
- Simulated gene expression with a volcano plot
- 3D molecular structures using `py3Dmol`

We'll use real bacterial data and a known protein structure from the Protein Data Bank (PDB).

## 1️⃣ Sequence Feature Visualization

### 📥 Download E. coli CDS Sequences
We’ll fetch coding sequences from NCBI for visualization.

In [None]:
import urllib.request
url = "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz"
urllib.request.urlretrieve(url, "ecoli_cds.fna.gz")
print("✅ Downloaded E. coli CDS FASTA")

### 📊 Calculate Sequence Lengths and GC Content
Let’s parse the FASTA file and compute basic stats.

In [None]:
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
import gzip

seq_lengths = []
gc_contents = []

with gzip.open("ecoli_cds.fna.gz", "rt") as handle:
    for record in SeqIO.parse(handle, "fasta"):
        seq_lengths.append(len(record.seq))
        gc_contents.append(gc_fraction(record.seq) * 100)

print(f"Parsed {len(seq_lengths)} sequences.")

### 📈 Plot: Length Distribution
Histogram of sequence lengths.

In [None]:
import matplotlib.pyplot as plt
plt.hist(seq_lengths, bins=50, color='lightblue')
plt.title("CDS Length Distribution")
plt.xlabel("Length (bp)")
plt.ylabel("Frequency")
plt.show()

### 🌐 Plot: GC Content vs Length
Explore how GC% varies with sequence length.

In [None]:
plt.scatter(seq_lengths, gc_contents, alpha=0.5)
plt.title("GC Content vs CDS Length")
plt.xlabel("Length (bp)")
plt.ylabel("GC Content (%)")
plt.show()

## 2️⃣ Gene Expression Volcano Plot (Simulated)

### 🧪 Simulate Expression Data for Volcano Plot
Create a mock expression dataset to visualize differential expression.

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px

np.random.seed(42)
df = pd.DataFrame({
    'log2FC': np.random.normal(0, 2, 500),
    'pval': np.random.uniform(0, 1, 500)
})
df['-log10(pval)'] = -np.log10(df['pval'])
df['Significant'] = (abs(df['log2FC']) > 1) & (df['pval'] < 0.05)

### 🌋 Plot: Simulated Volcano Plot
A scatter plot showing significance and fold change.

In [None]:
fig = px.scatter(df, x='log2FC', y='-log10(pval)', color='Significant', title="Simulated Volcano Plot")
fig.show()

## 3️⃣ Protein Structure: 3D Visualization with `py3Dmol`

### 📦 Download Hemoglobin Protein Structure (1A3N)
We’ll use this for 3D molecular visualization.

In [None]:
import urllib.request
url = "https://files.rcsb.org/download/1A3N.pdb"
urllib.request.urlretrieve(url, "1A3N.pdb")
print("✅ Downloaded 1A3N.pdb")

In [None]:
import warnings
from Bio import BiopythonWarning
from Bio.PDB import PDBParser

# Suppress Biopython warnings
warnings.simplefilter('ignore', BiopythonWarning)

# Load and parse the structure
parser = PDBParser()
structure = parser.get_structure("Hemoglobin", "1A3N.pdb")

# Print chain IDs
print("Chains in structure:")
for model in structure:
    for chain in model:
        print(" - Chain ID:", chain.id)

### 🧬 Interactive 3D Viewer Setup with py3Dmol
Let’s visualize the protein in 3D using a cartoon model.

In [None]:
import py3Dmol

view = py3Dmol.view(query='pdb:1A3N')
view.setStyle({'cartoon': {'color': 'spectrum'}})
view.zoomTo()
view.show()

### 🔬 Tip: Annotating Protein Regions (Optional)
To highlight active sites, ligands, or domains, you can add selections like:
```python
view.addStyle({'chain': 'A', 'resn': 'HEM'}, {'stick': {}})
```
You can also add labels and spheres to residues for educational demos or reports.

In [None]:
import py3Dmol

view = py3Dmol.view(query='pdb:1A3N')
view.addStyle({'chain': 'A', 'resn': 'HEM'}, {'stick': {}})
view.zoomTo()
view.show()

### 🧬 Interactive Protein Viewer with Chain and Style Selection
This viewer let you:
- Select a chain (A, B, C, or D)
- Choose a visual style (cartoon, stick, or surface)

In [None]:
import py3Dmol
import ipywidgets as widgets
from IPython.display import display

# Define available options
chains = ['A', 'B', 'C', 'D']
styles = ['cartoon', 'stick', 'surface']

# Create widgets
chain_selector = widgets.Dropdown(
    options=chains,
    value='A',
    description='Chain:',
    style={'description_width': 'initial'}
)

style_selector = widgets.Dropdown(
    options=styles,
    value='cartoon',
    description='Style:',
    style={'description_width': 'initial'}
)

# Function to update viewer
def update_viewer(chain_id, style):
    view = py3Dmol.view(query='pdb:1A3N')
    view.setStyle({'cartoon': {'color': 'lightgrey'}})
    view.addStyle({'chain': chain_id}, {style: {'color': 'red'}})
    view.zoomTo()
    view.show()

# Display widgets together
ui = widgets.HBox([chain_selector, style_selector])
out = widgets.interactive_output(update_viewer, {'chain_id': chain_selector, 'style': style_selector})

display(ui, out)
