# Basic tRNA Visualization with tRNAs in space

This notebook demonstrates how to load and visualize pre-computed tRNA global coordinates.

## Overview

The `trnas_in_space` tool converts tRNA Sprinzl coordinates to a standardized global coordinate system. This enables:
- Cross-isodecoder comparisons
- Aligned heatmaps
- Positional analysis across different tRNAs

Let's explore the E. coli K12 dataset as an example.

## Setup

First, import required libraries:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.dpi'] = 100

## Load Pre-computed Coordinates

Load the E. coli K12 global coordinates file:

In [None]:
# Load data
df = pd.read_csv('../outputs/ecoliK12_global_coords.tsv', sep='\t')

# Display basic information
print(f"Dataset: E. coli K12")
print(f"Total rows: {len(df):,}")
print(f"Unique tRNAs: {df['trna_id'].nunique()}")
print(f"Global coordinate range: 1 to {df['global_index'].max()}")
print(f"\nColumn names: {', '.join(df.columns)}")

## Inspect the Data

Let's look at the first few rows:

In [None]:
df.head(10)

## Summary Statistics

In [None]:
# tRNA length distribution
trna_lengths = df.groupby('trna_id')['seq_index'].max()
print(f"tRNA length statistics:")
print(trna_lengths.describe())

# Region distribution
print(f"\nRegion distribution:")
print(df['region'].value_counts())

## Visualization 1: tRNA Length Distribution

In [None]:
plt.figure(figsize=(10, 5))
plt.hist(trna_lengths, bins=20, edgecolor='black', alpha=0.7)
plt.xlabel('tRNA Length (nucleotides)')
plt.ylabel('Count')
plt.title('E. coli K12 tRNA Length Distribution')
plt.axvline(trna_lengths.median(), color='red', linestyle='--', 
            label=f'Median: {trna_lengths.median():.0f} nt')
plt.legend()
plt.tight_layout()
plt.show()

## Visualization 2: Aligned tRNA Heatmap

Create an alignment matrix using the global_index:

In [None]:
# Create pivot table for alignment
alignment = df.pivot_table(
    index='trna_id',
    columns='global_index',
    values='residue',
    aggfunc='first'
)

print(f"Alignment matrix shape: {alignment.shape}")
print(f"Coverage: {(alignment.notna().sum().sum() / alignment.size * 100):.1f}%")

In [None]:
# Encode nucleotides as numbers for heatmap
nucleotide_map = {'A': 1, 'C': 2, 'G': 3, 'T': 4, 'U': 4}
alignment_numeric = alignment.applymap(lambda x: nucleotide_map.get(x, 0))

# Plot heatmap (first 30 tRNAs for clarity)
plt.figure(figsize=(20, 8))
sns.heatmap(
    alignment_numeric.iloc[:30],
    cmap='viridis',
    cbar_kws={'label': 'Nucleotide'},
    yticklabels=True,
    xticklabels=False
)
plt.xlabel('Global Index')
plt.ylabel('tRNA ID')
plt.title('E. coli tRNA Alignment (First 30 tRNAs)')
plt.tight_layout()
plt.show()

## Visualization 3: Coverage by Position

Show how many tRNAs have each global position:

In [None]:
coverage = df.groupby('global_index')['trna_id'].nunique()

plt.figure(figsize=(15, 5))
plt.bar(coverage.index, coverage.values, width=1.0, alpha=0.7)
plt.xlabel('Global Index')
plt.ylabel('Number of tRNAs')
plt.title('Position Coverage Across All E. coli tRNAs')
plt.axhline(df['trna_id'].nunique(), color='red', linestyle='--', 
            label=f'Total tRNAs: {df["trna_id"].nunique()}')
plt.legend()
plt.tight_layout()
plt.show()

# Identify variable regions (low coverage)
low_coverage = coverage[coverage < df['trna_id'].nunique() * 0.5]
print(f"\nPositions with <50% coverage: {len(low_coverage)}")
print(f"These are mostly in the variable loop region")

## Visualization 4: Region Distribution

Show the distribution of structural regions:

In [None]:
# Count positions by region
region_counts = df.groupby('region')['seq_index'].count().sort_values(ascending=False)

plt.figure(figsize=(12, 6))
region_counts.plot(kind='barh', color='steelblue')
plt.xlabel('Number of Positions')
plt.ylabel('Structural Region')
plt.title('Distribution of Positions by Structural Region')
plt.tight_layout()
plt.show()

## Analysis Example: Anticodon Loop Conservation

Extract and analyze the anticodon loop (positions 32-38):

In [None]:
# Filter anticodon loop
anticodon_loop = df[df['region'] == 'anticodon-loop']

# Create sequence matrix
anticodon_seqs = anticodon_loop.pivot_table(
    index='trna_id',
    columns='sprinzl_index',
    values='residue',
    aggfunc='first'
)

print("Anticodon loop sequences (positions 32-38):")
print(anticodon_seqs.head(10))

# Extract anticodon (positions 34-36)
anticodons = df[df['sprinzl_index'].isin([34, 35, 36])]
anticodon_triplets = anticodons.groupby('trna_id')['residue'].apply(''.join)

print(f"\nAnticodon distribution:")
print(anticodon_triplets.value_counts().head(10))

## Analysis Example: Comparing Isodecoders

Compare tRNAs with the same anticodon:

In [None]:
# Focus on Ala-GGC isodecoders
ala_ggc = df[df['trna_id'].str.contains('Ala-GGC')]

if len(ala_ggc) > 0:
    # Create alignment for this isodecoder family
    ala_alignment = ala_ggc.pivot_table(
        index='trna_id',
        columns='global_index',
        values='residue',
        aggfunc='first'
    )
    
    # Encode and plot
    ala_numeric = ala_alignment.applymap(lambda x: nucleotide_map.get(x, 0))
    
    plt.figure(figsize=(20, 3))
    sns.heatmap(
        ala_numeric,
        cmap='Set3',
        cbar=False,
        yticklabels=True
    )
    plt.xlabel('Global Index')
    plt.ylabel('tRNA ID')
    plt.title('tRNA-Ala-GGC Isodecoders Alignment')
    plt.tight_layout()
    plt.show()
    
    print(f"Found {len(ala_alignment)} Ala-GGC isodecoders")
else:
    print("No Ala-GGC tRNAs found in this dataset")

## Export Filtered Data

Example: Export only anticodon loop positions:

In [None]:
# Export anticodon loop data
anticodon_loop.to_csv('ecoli_anticodon_loops.tsv', sep='\t', index=False)
print(f"Exported {len(anticodon_loop)} anticodon loop positions to ecoli_anticodon_loops.tsv")

## Summary

This notebook demonstrated:
1. Loading pre-computed tRNA global coordinates
2. Creating aligned heatmaps using `global_index`
3. Analyzing coverage and structural regions
4. Extracting specific positions (e.g., anticodon)
5. Comparing isodecoders

## Next Steps

- Try loading other organisms (yeast, human)
- Compare conservation across species
- Integrate with modification data
- Generate publication-quality figures

## References

- [tRNAs in space GitHub](https://github.com/lkwhite/tRNAs-in-space)
- [R2DT Documentation](https://docs.r2dt.bio/)
- [OUTPUT_FORMAT.md](../docs/OUTPUT_FORMAT.md) - Detailed column descriptions