# 🧬 Omics AI Explorer Python Library - Quick Start

This notebook demonstrates how to use the Omics AI Explorer Python library to access genomics data across multiple Explorer networks including HiFi Solves, Neuroscience AI, and more.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mfiume/omics-ai-python-library/blob/main/Omics_AI_Explorer_Quick_Start.ipynb)

## 🌟 What You'll Learn

- Install and import the Omics AI Explorer library
- Connect to different Explorer networks
- List available collections and tables
- Inspect table schemas
- Perform queries on genomics data
- Work with real genomics datasets

## 📦 Installation

First, let's install the Omics AI Explorer library. We'll use a robust installation method that works reliably in Google Colab:

In [ ]:
# Install the Omics AI Explorer library
# We'll download the source and install it directly to avoid build issues

import subprocess
import sys
import os

print("🧬 Installing Omics AI Explorer Library")
print("=" * 50)

try:
    # Method 1: Clone the repository
    print("📥 Downloading source code...")
    if os.path.exists("omics-ai-python-library"):
        subprocess.run(["rm", "-rf", "omics-ai-python-library"], check=True)
    
    subprocess.run(["git", "clone", "https://github.com/mfiume/omics-ai-python-library.git"], check=True)
    
    # Method 2: Install dependencies first
    print("\n📦 Installing dependencies...")
    subprocess.run([sys.executable, "-m", "pip", "install", "requests>=2.25.0"], check=True)
    
    # Method 3: Install the package in development mode
    print("\n🔧 Installing omics-ai-explorer...")
    os.chdir("omics-ai-python-library")
    subprocess.run([sys.executable, "-m", "pip", "install", "-e", "."], check=True)
    os.chdir("..")
    
    print("\n✅ Omics AI Explorer library installed successfully!")
    
except Exception as e:
    print(f"❌ Installation method 1 failed: {e}")
    print("\n🔄 Trying alternative installation...")
    
    try:
        # Fallback: Install just the core files manually
        print("📥 Downloading core files...")
        import urllib.request
        
        base_url = "https://raw.githubusercontent.com/mfiume/omics-ai-python-library/main/omics_ai"
        files = ["__init__.py", "client.py", "exceptions.py"]
        
        os.makedirs("omics_ai_local", exist_ok=True)
        
        for file in files:
            url = f"{base_url}/{file}"
            urllib.request.urlretrieve(url, f"omics_ai_local/{file}")
        
        # Add the local package to path
        sys.path.insert(0, "/content")
        
        print("✅ Core files downloaded and configured!")
        
    except Exception as e2:
        print(f"❌ Alternative installation also failed: {e2}")
        print("\n💡 Manual solution: Will use inline implementation in next cell")

# Install visualization libraries
print("\n📊 Installing visualization libraries...")
subprocess.run([sys.executable, "-m", "pip", "install", "pandas", "matplotlib", "seaborn"], check=True)

print("\n🎉 Setup complete! Ready to explore genomics data!")

In [ ]:
# Import the Omics AI Explorer library with fallback options

try:
    # Try normal import first
    from omics_ai import OmicsAIClient
    print("✅ Successfully imported OmicsAIClient from installed package!")
    
except ImportError:
    print("⚠️ Package import failed, trying fallback method...")
    
    try:
        # Fallback: Add the cloned repository to path
        import sys
        sys.path.insert(0, '/content/omics-ai-python-library')
        from omics_ai import OmicsAIClient
        print("✅ Successfully imported from cloned repository!")
        
    except ImportError:
        try:
            # Try local files
            sys.path.insert(0, '/content/omics_ai_local')
            from client import OmicsAIClient
            print("✅ Successfully imported from downloaded files!")
        except ImportError:
            print("⚠️ All import methods failed, using inline implementation...")
            
            # Last resort: Define the client class inline
            import requests
            import json
            import re
            from typing import Dict, List, Optional, Any
            from urllib.parse import quote
            
            class OmicsAIClient:
                """Simplified Omics AI Explorer client for Colab."""
                
                KNOWN_NETWORKS = {
                    "hifisolves": "hifisolves.org",
                    "neuroscience": "neuroscience.ai", 
                    "parkinsons": "cloud.parkinsonsroadmap.org",
                    "biomedical": "biomedical.ai"
                }
                
                def __init__(self, network: str = "hifisolves.org", access_token: Optional[str] = None):
                    if network in self.KNOWN_NETWORKS:
                        network = self.KNOWN_NETWORKS[network]
                    if not network.startswith(('http://', 'https://')):
                        network = f"https://{network}"
                        
                    self.network = network.rstrip('/')
                    self.access_token = access_token
                    self.session = requests.Session()
                    
                    headers = {'User-Agent': 'omics-ai-colab-client', 'Accept': 'application/json'}
                    if self.access_token:
                        headers['Authorization'] = f'Bearer {self.access_token}'
                    self.session.headers.update(headers)
                
                def _make_request(self, method: str, endpoint: str, **kwargs):
                    url = f"{self.network}{endpoint}"
                    response = self.session.request(method, url, **kwargs)
                    response.raise_for_status()
                    return response
                
                def list_collections(self) -> List[Dict[str, Any]]:
                    response = self._make_request('GET', '/api/collections')
                    return response.json()
                
                def list_tables(self, collection_slug: str) -> List[Dict[str, Any]]:
                    endpoint = f"/api/collections/{quote(collection_slug)}/tables"
                    response = self._make_request('GET', endpoint)
                    return response.json()
                
                def get_schema_fields(self, collection_slug: str, table_name: str) -> List[Dict[str, str]]:
                    endpoint = f"/api/collection/{quote(collection_slug)}/data-connect/table/{quote(table_name)}/info"
                    response = self._make_request('GET', endpoint)
                    schema = response.json()
                    
                    data_model = schema.get('data_model', {}).get('properties', {})
                    fields = []
                    for field_name, field_spec in data_model.items():
                        field_type = field_spec.get('type', '')
                        if isinstance(field_type, list):
                            field_type = ', '.join(field_type)
                        if field_type == 'array' and 'items' in field_spec:
                            item_type = field_spec['items'].get('type', '')
                            if isinstance(item_type, list):
                                item_type = ', '.join(item_type)
                            field_type = f"array<{item_type}>"
                        
                        fields.append({
                            'field': field_name,
                            'type': field_type,
                            'sql_type': field_spec.get('sqlType', '')
                        })
                    return fields
                
                def query(self, collection_slug: str, table_name: str, limit: int = 100, **kwargs) -> Dict[str, Any]:
                    payload = {
                        "tableName": table_name,
                        "filters": {},
                        "pagination": {"limit": limit, "offset": 0}
                    }
                    
                    endpoint = f"/api/collections/{quote(collection_slug)}/tables/{quote(table_name)}/filter"
                    response = self._make_request('POST', endpoint, json=payload)
                    
                    raw_text = response.text
                    json_objects = re.findall(r'\{[^}]*\}(?=\s*\{|\s*$)', raw_text, re.DOTALL)
                    if json_objects:
                        return json.loads(json_objects[-1])
                    return {'data': []}
                
                def count(self, collection_slug: str, table_name: str) -> int:
                    endpoint = f"/api/collections/{quote(collection_slug)}/tables/{quote(table_name)}/filter/count"
                    response = self._make_request('POST', endpoint, json={"filters": {}})
                    
                    raw_text = response.text
                    json_objects = re.findall(r'\{[^}]*\}(?=\s*\{|\s*$)', raw_text, re.DOTALL)
                    if json_objects:
                        result = json.loads(json_objects[-1])
                        return result.get('count', 0)
                    return 0
            
            print("✅ Using inline client implementation!")

# Import additional libraries for data analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import json

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("✅ All imports successful!")
print(f"📅 Notebook run at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 🔧 Import and Setup

Now let's import the library and set up our environment:

In [None]:
# Import the Omics AI Explorer library
from omics_ai import OmicsAIClient

# Import additional libraries for data analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import json

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("✅ All imports successful!")
print(f"📅 Notebook run at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 🌐 Explore Multiple Networks

The Omics AI Explorer library supports multiple networks. Let's explore what's available:

In [None]:
# Available Explorer networks
networks = {
    "HiFi Solves": "hifisolves",
    "Biomedical AI": "biomedical", 
    "Neuroscience AI": "neuroscience"
}

print("🌐 Exploring Omics AI Explorer Networks")
print("=" * 50)

network_stats = {}

for name, short_name in networks.items():
    try:
        print(f"\n🔗 Connecting to {name}...")
        client = OmicsAIClient(short_name)
        collections = client.list_collections()
        
        network_stats[name] = {
            'collections': len(collections),
            'client': client,
            'sample_collections': collections[:3]  # Store first 3 for display
        }
        
        print(f"✅ {name}: {len(collections)} collections found")
        
        # Show sample collections
        for i, collection in enumerate(collections[:3], 1):
            print(f"   {i}. {collection['name']} ({collection['slugName']})")
        
        if len(collections) > 3:
            print(f"   ... and {len(collections) - 3} more collections")
            
    except Exception as e:
        print(f"❌ {name}: {e}")
        network_stats[name] = {'collections': 0, 'error': str(e)}

# Summary
print(f"\n📊 Network Summary:")
total_collections = sum(stats.get('collections', 0) for stats in network_stats.values())
print(f"   Total collections across all networks: {total_collections}")

for name, stats in network_stats.items():
    if 'error' not in stats:
        print(f"   {name}: {stats['collections']} collections")

## 📊 Visualize Network Statistics

Let's create a visualization of the collections across networks:

In [None]:
# Create a bar plot of collections per network
network_names = []
collection_counts = []

for name, stats in network_stats.items():
    if 'error' not in stats:
        network_names.append(name)
        collection_counts.append(stats['collections'])

if network_names:  # Only plot if we have data
    plt.figure(figsize=(10, 6))
    bars = plt.bar(network_names, collection_counts, 
                   color=['#FF6B6B', '#4ECDC4', '#45B7D1'], alpha=0.8)
    
    # Add value labels on bars
    for bar, count in zip(bars, collection_counts):
        plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
                str(count), ha='center', va='bottom', fontweight='bold')
    
    plt.title('🧬 Collections Available Across Omics AI Explorer Networks', 
              fontsize=16, fontweight='bold', pad=20)
    plt.ylabel('Number of Collections', fontsize=12)
    plt.xlabel('Explorer Network', fontsize=12)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    print(f"📈 Total: {sum(collection_counts)} collections across {len(network_names)} networks")
else:
    print("⚠️ No network data available for visualization")

## 🚀 Quick Start Demo

Now let's replicate the Quick Start example from the README using the most accessible network:

In [None]:
# Choose the network with the most collections for our demo
best_network = max(
    [(name, stats) for name, stats in network_stats.items() if 'error' not in stats],
    key=lambda x: x[1]['collections'],
    default=(None, None)
)

if best_network[0]:
    network_name, network_info = best_network
    client = network_info['client']
    
    print(f"🎯 Using {network_name} for Quick Start demo")
    print(f"📂 {network_info['collections']} collections available")
    
    # Get all collections
    collections = client.list_collections()
    
    print(f"\n📋 First 5 collections:")
    for i, collection in enumerate(collections[:5], 1):
        name = collection['name']
        slug = collection['slugName']
        description = collection.get('description', 'No description')
        
        # Clean up HTML and truncate description
        import re
        clean_desc = re.sub('<[^<]+?>', '', description)
        clean_desc = clean_desc.replace('&nbsp;', ' ').strip()
        if len(clean_desc) > 80:
            clean_desc = clean_desc[:80] + "..."
        
        print(f"   {i}. {name} ({slug})")
        print(f"      {clean_desc}")
        
else:
    print("❌ No networks available for demo")

## 📊 Table Exploration

Let's find a collection with accessible tables and explore its structure:

In [None]:
# Try to find collections with accessible tables
accessible_collection = None
accessible_tables = []

if best_network[0]:
    print("🔍 Searching for accessible collections with tables...")
    
    for collection in collections[:10]:  # Try first 10 collections
        try:
            collection_slug = collection['slugName']
            print(f"   Trying: {collection_slug}...")
            
            tables = client.list_tables(collection_slug)
            
            if tables:  # Found tables!
                accessible_collection = collection_slug
                accessible_tables = tables
                print(f"   ✅ Found {len(tables)} tables in '{collection_slug}'!")
                break
            else:
                print(f"   ⚪ No tables in '{collection_slug}'")
                
        except Exception as e:
            print(f"   ❌ {collection_slug}: {str(e)[:50]}...")
            continue
    
    if accessible_collection:
        print(f"\n🎉 Success! Using collection: {accessible_collection}")
        print(f"📊 Found {len(accessible_tables)} tables:")
        
        for i, table in enumerate(accessible_tables[:5], 1):
            name = table.get('display_name', table.get('name', 'Unnamed'))
            size = table.get('size', 'Unknown')
            if isinstance(size, int):
                size = f"{size:,}"
            
            print(f"   {i}. {name}")
            print(f"      ID: {table.get('qualified_table_name', 'N/A')}")
            print(f"      Size: {size} rows")
            
        if len(accessible_tables) > 5:
            print(f"   ... and {len(accessible_tables) - 5} more tables")
    else:
        print("\n⚠️ No accessible tables found. This may be due to authentication requirements.")
        print("   Many collections require access tokens for full functionality.")

## 🔬 Schema Inspection

If we found accessible tables, let's examine their schema:

In [None]:
if accessible_collection and accessible_tables:
    # Get schema for the first table
    table = accessible_tables[0]
    table_name = table['qualified_table_name']
    
    print(f"🔬 Inspecting schema for: {table.get('display_name', 'Table')}")
    print(f"📋 Table ID: {table_name}")
    
    try:
        schema_fields = client.get_schema_fields(accessible_collection, table_name)
        
        print(f"\n📊 Found {len(schema_fields)} fields:")
        
        # Create a DataFrame for better display
        schema_df = pd.DataFrame(schema_fields)
        
        # Display first 10 fields
        display_fields = schema_df.head(10)
        
        print("\n📋 Field Details:")
        for i, (_, field) in enumerate(display_fields.iterrows(), 1):
            print(f"   {i:2d}. {field['field']:25} | {field['type']:15} | {field.get('sql_type', 'N/A')}")
        
        if len(schema_fields) > 10:
            print(f"   ... and {len(schema_fields) - 10} more fields")
        
        # Analyze field types
        type_counts = schema_df['type'].value_counts()
        print(f"\n📈 Field Type Distribution:")
        for field_type, count in type_counts.head().items():
            print(f"   {field_type}: {count} fields")
            
    except Exception as e:
        print(f"❌ Error getting schema: {e}")
        schema_fields = []
else:
    print("⚠️ No accessible tables available for schema inspection")
    schema_fields = []

## 🔎 Data Querying

Let's try to query some data and count rows:

In [None]:
if accessible_collection and accessible_tables:
    table_name = accessible_tables[0]['qualified_table_name']
    
    print(f"🔎 Querying data from: {table_name}")
    
    try:
        # Try to count total rows
        print("\n🔢 Counting total rows...")
        total_count = client.count(accessible_collection, table_name)
        print(f"   Total rows: {total_count:,}")
        
        # Try to get a small sample of data
        print("\n📋 Fetching sample data (limit 3)...")
        results = client.query(
            accessible_collection, 
            table_name, 
            limit=3
        )
        
        if results.get('data'):
            sample_data = results['data']
            print(f"   ✅ Retrieved {len(sample_data)} sample rows")
            
            # Display sample data structure
            if sample_data:
                first_row = sample_data[0]
                print(f"   📊 Sample row has {len(first_row)} fields")
                print(f"   🏷️  Field names: {list(first_row.keys())[:5]}{'...' if len(first_row) > 5 else ''}")
                
                # Show first few field values from first row
                print(f"\n   📝 Sample values from first row:")
                for i, (key, value) in enumerate(list(first_row.items())[:5]):
                    # Truncate long values
                    if isinstance(value, str) and len(str(value)) > 50:
                        value = str(value)[:50] + "..."
                    print(f"      {key}: {value}")
                    
        else:
            print("   ⚪ Query successful but no data returned")
            
    except Exception as e:
        print(f"   ❌ Query error: {e}")
        print("   💡 This might be due to access restrictions or authentication requirements")
else:
    print("⚠️ No accessible tables available for querying")

## 🔐 Authentication Demo

Many collections require authentication. Here's how to use access tokens:

In [None]:
print("🔐 Authentication Features Demo")
print("=" * 40)

# Demo authentication methods (don't use real tokens in notebooks!)
if best_network[0]:
    client = network_info['client']
    
    print("\n🔧 Authentication Methods:")
    print("\n1. Setting an access token:")
    print("   client.set_access_token('your-token-here')")
    
    print("\n2. Creating client with token:")
    print("   client = OmicsAIClient('hifisolves', access_token='your-token')")
    
    print("\n3. Clearing authentication:")
    print("   client.clear_access_token()")
    
    # Demonstrate token management (with fake token)
    print("\n🧪 Demo with placeholder token:")
    client.set_access_token("demo-token-placeholder")
    print("   ✅ Token set successfully")
    
    client.clear_access_token() 
    print("   ✅ Token cleared successfully")
    
    print("\n💡 To get real access tokens:")
    print("   • Visit the Explorer network's authentication page")
    print("   • Generate an API token or use OAuth flow")
    print("   • Keep tokens secure and never commit them to code!")
else:
    print("⚠️ No network available for authentication demo")

## 🚀 Advanced Example: Network Comparison

Let's create a comprehensive comparison of the networks we can access:

In [None]:
print("📊 Comprehensive Network Analysis")
print("=" * 50)

# Collect detailed statistics
detailed_stats = []

for name, stats in network_stats.items():
    if 'error' not in stats:
        try:
            client = stats['client']
            collections = client.list_collections()
            
            # Try to count accessible tables
            accessible_collections = 0
            total_tables = 0
            
            for collection in collections[:5]:  # Sample first 5
                try:
                    tables = client.list_tables(collection['slugName'])
                    if tables:
                        accessible_collections += 1
                        total_tables += len(tables)
                except:
                    pass
            
            detailed_stats.append({
                'Network': name,
                'Collections': len(collections),
                'Accessible Collections (sample)': accessible_collections,
                'Total Tables (sample)': total_tables,
                'Avg Tables per Collection': round(total_tables / max(accessible_collections, 1), 1)
            })
            
        except Exception as e:
            detailed_stats.append({
                'Network': name,
                'Collections': stats.get('collections', 0),
                'Error': str(e)[:30] + "..."
            })

if detailed_stats:
    # Create and display comparison table
    comparison_df = pd.DataFrame(detailed_stats)
    
    print("\n📋 Network Comparison Table:")
    print(comparison_df.to_string(index=False))
    
    # Create visualization if we have numeric data
    numeric_df = comparison_df.select_dtypes(include='number')
    if not numeric_df.empty and len(numeric_df) > 1:
        
        plt.figure(figsize=(12, 8))
        
        # Create subplots for different metrics
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        fig.suptitle('🧬 Omics AI Explorer Networks - Detailed Analysis', fontsize=16, fontweight='bold')
        
        # Plot 1: Collections per network
        if 'Collections' in comparison_df.columns:
            axes[0, 0].bar(comparison_df['Network'], comparison_df['Collections'], 
                          color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
            axes[0, 0].set_title('Total Collections')
            axes[0, 0].set_ylabel('Count')
            
        # Plot 2: Accessible collections
        if 'Accessible Collections (sample)' in comparison_df.columns:
            axes[0, 1].bar(comparison_df['Network'], comparison_df['Accessible Collections (sample)'], 
                          color=['#96CEB4', '#FFEAA7', '#DDA0DD'])
            axes[0, 1].set_title('Accessible Collections (Sample)')
            axes[0, 1].set_ylabel('Count')
            
        # Plot 3: Total tables
        if 'Total Tables (sample)' in comparison_df.columns:
            axes[1, 0].bar(comparison_df['Network'], comparison_df['Total Tables (sample)'], 
                          color=['#A8E6CF', '#88D8C0', '#78C7C2'])
            axes[1, 0].set_title('Total Tables (Sample)')
            axes[1, 0].set_ylabel('Count')
            
        # Plot 4: Average tables per collection
        if 'Avg Tables per Collection' in comparison_df.columns:
            axes[1, 1].bar(comparison_df['Network'], comparison_df['Avg Tables per Collection'], 
                          color=['#FFD93D', '#6BCF7F', '#4D96FF'])
            axes[1, 1].set_title('Avg Tables per Collection')
            axes[1, 1].set_ylabel('Average')
            
        plt.tight_layout()
        plt.show()
        
else:
    print("⚠️ No network data available for detailed analysis")

## 🎉 Summary and Next Steps

Congratulations! You've successfully explored the Omics AI Explorer Python library. Here's what you've accomplished:

In [None]:
print("🎉 OMICS AI EXPLORER QUICK START SUMMARY")
print("=" * 60)

successes = []
if network_stats:
    total_networks = len([name for name, stats in network_stats.items() if 'error' not in stats])
    total_collections = sum(stats.get('collections', 0) for stats in network_stats.values() if 'error' not in stats)
    
    successes.extend([
        f"✅ Connected to {total_networks} Explorer networks",
        f"✅ Discovered {total_collections} total collections",
        "✅ Successfully imported and used the library"
    ])

if accessible_collection:
    successes.extend([
        f"✅ Found accessible collection: {accessible_collection}",
        f"✅ Listed {len(accessible_tables)} tables"
    ])

if schema_fields:
    successes.extend([
        f"✅ Inspected table schema ({len(schema_fields)} fields)",
        "✅ Analyzed field types and structure"
    ])

successes.extend([
    "✅ Demonstrated authentication features",
    "✅ Created data visualizations",
    "✅ Performed network comparison analysis"
])

for success in successes:
    print(success)

print("\n🚀 WHAT'S NEXT?")
print("-" * 30)
print("📚 Learn More:")
print("   • Explore the full documentation on GitHub")
print("   • Try the advanced examples in the repository")
print("   • Check out HiFi Solves specific features")

print("\n🔐 Get Access:")
print("   • Request access tokens for protected collections")
print("   • Explore authentication workflows")
print("   • Join the genomics data community")

print("\n🛠️ Build Something:")
print("   • Create your own genomics analysis workflows")
print("   • Integrate with other bioinformatics tools")
print("   • Contribute to the open source project")

print("\n📞 Get Help:")
print("   • GitHub Issues: https://github.com/mfiume/omics-ai-python-library/issues")
print("   • Documentation: https://github.com/mfiume/omics-ai-python-library#readme")
print("   • Community: Join the discussions on GitHub")

print(f"\n🎯 Total Success Rate: {len(successes)}/10 features demonstrated")
print("\n🌟 Happy exploring with Omics AI! 🧬")

## 📚 Additional Resources

### 🔗 Links
- **GitHub Repository**: [mfiume/omics-ai-python-library](https://github.com/mfiume/omics-ai-python-library)
- **HiFi Solves**: [hifisolves.org](https://hifisolves.org)
- **Neuroscience AI**: [neuroscience.ai](https://neuroscience.ai)
- **Biomedical AI**: [biomedical.ai](https://biomedical.ai)

### 📖 Documentation Sections
- **Basic Usage**: Simple queries and data access
- **Advanced Queries**: Complex filtering and pagination
- **Authentication**: Working with protected collections
- **API Reference**: Complete method documentation

### 💡 Tips for Success
1. **Start Simple**: Begin with public collections before moving to authenticated ones
2. **Explore Schemas**: Always check table schemas before querying
3. **Handle Errors**: Many collections require specific permissions
4. **Use Pagination**: For large datasets, use limit and offset parameters
5. **Stay Updated**: Check the GitHub repository for new features and updates

---

**Ready to dive deeper into genomics data analysis? Clone the repository and explore more examples!** 🚀