# üöÄ Getting Started with Cloud MLOps using Vertex AI

Welcome to the Cloud MLOps project! This notebook will guide you through the complete setup process for developing an end-to-end machine learning pipeline using Google Cloud Platform and Vertex AI.

## üìã What You'll Accomplish

By the end of this notebook, you'll have:

- ‚úÖ A properly configured Google Cloud project with Vertex AI enabled
- ‚úÖ Authentication set up for all GCP services
- ‚úÖ A Cloud Storage bucket ready for ML artifacts
- ‚úÖ A sample dataset prepared and uploaded to GCS
- ‚úÖ All dependencies installed for MLOps development
- ‚úÖ Verified connectivity to all required services

## üéì Prerequisites

Before starting, make sure you have:

- Google Cloud account with billing enabled
- Applied for student credits (GCP Free Tier + GitHub Student Pack)
- Python 3.8+ installed
- Basic familiarity with Python and Jupyter notebooks

Let's get started! üéØ

## 1. Environment Setup and Prerequisites

Let's start by checking your Python environment and system requirements.

In [2]:
import sys
import platform
import subprocess
from pathlib import Path

def check_python_version():
    """Check if Python version meets requirements (3.8+)."""
    version = sys.version_info
    print(f"üêç Python Version: {version.major}.{version.minor}.{version.micro}")
    
    if version.major >= 3 and version.minor >= 8:
        print("‚úÖ Python version requirement met!")
        return True
    else:
        print("‚ùå Python 3.8+ is required!")
        return False

def check_system_info():
    """Display system information."""
    print(f"üíª Operating System: {platform.system()} {platform.release()}")
    print(f"üèóÔ∏è  Architecture: {platform.machine()}")
    
def check_virtual_environment():
    """Check if running in virtual environment."""
    if hasattr(sys, 'real_prefix') or (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix):
        print("‚úÖ Running in virtual environment")
        print(f"üìÅ Environment path: {sys.prefix}")
        return True
    else:
        print("‚ö†Ô∏è  Not running in virtual environment")
        print("üí° Consider creating one: python -m venv venv && source venv/bin/activate")
        return False

# Run checks
print("üîç Environment Check")
print("=" * 50)
check_python_version()
check_system_info()
check_virtual_environment()

# Check if we're in the correct project directory
current_dir = Path.cwd()
print(f"\nüìÇ Current Directory: {current_dir}")

# Look for key project files
expected_files = ["PLANNING.md", "TASKS.md", "requirements.txt"]
for file in expected_files:
    if (current_dir / file).exists() or (current_dir.parent / file).exists():
        print(f"‚úÖ Found {file}")
    else:
        print(f"‚ùå Missing {file}")

print("\n" + "=" * 50)

üîç Environment Check
üêç Python Version: 3.13.7
‚úÖ Python version requirement met!
üíª Operating System: Darwin 25.0.0
üèóÔ∏è  Architecture: arm64
‚úÖ Running in virtual environment
üìÅ Environment path: /Users/farishussain/MLOps/.venv

üìÇ Current Directory: /Users/farishussain/MLOps/notebooks
‚úÖ Found PLANNING.md
‚úÖ Found TASKS.md
‚úÖ Found requirements.txt



## 2. Google Cloud Project Configuration

Now let's set up your Google Cloud project variables. **Important:** You need to replace the placeholder values with your actual project details.

In [8]:
# üö® IMPORTANT: Replace these values with your actual project details!

# Your Google Cloud Project Configuration
PROJECT_ID = "mlops-295610"  # Your new MLOps project ID
PROJECT_NUMBER = "293997883832"       # Your project number
REGION = "us-central1"                # Default region for Vertex AI
ZONE = "us-central1-a"               # Default zone

# Storage Configuration  
BUCKET_NAME = f"{PROJECT_ID}-mlops-bucket"  # Must be globally unique
DATA_ROOT = "data"
MODELS_ROOT = "models"  
PIPELINES_ROOT = "pipelines"
OUTPUTS_ROOT = "outputs"

# Display configuration
print("üîß Project Configuration")
print("=" * 50)
print(f"üìù Project ID: {PROJECT_ID}")
print(f"üåç Region: {REGION}")
print(f"üìç Zone: {ZONE}")
print(f"ü™£ Bucket Name: {BUCKET_NAME}")
print("=" * 50)

# Validation
if PROJECT_ID == "your-mlops-project-id":
    print("‚ö†Ô∏è  WARNING: Please update PROJECT_ID with your actual project ID!")
    print("üí° Get your project ID from: https://console.cloud.google.com/")
else:
    print("‚úÖ Project configuration looks good!")
    print(f"üéØ Using account: {os.popen('gcloud auth list --filter=status:ACTIVE --format=\"value(account)\"').read().strip()}")

üîß Project Configuration
üìù Project ID: mlops-295610
üåç Region: us-central1
üìç Zone: us-central1-a
ü™£ Bucket Name: mlops-295610-mlops-bucket
‚úÖ Project configuration looks good!
üéØ Using account: farishussain049@gmail.com


## 3. Install Required Python Libraries

Let's install all the essential packages needed for our MLOps project. This may take a few minutes.

In [4]:
# Install core packages for MLOps with Vertex AI
import subprocess
import sys

def install_package(package):
    """Install a package using pip."""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", package])
        print(f"‚úÖ Successfully installed: {package}")
        return True
    except subprocess.CalledProcessError as e:
        print(f"‚ùå Failed to install: {package}")
        print(f"Error: {e}")
        return False

# Essential packages for MLOps
packages = [
    "google-cloud-aiplatform>=1.38.0",
    "google-cloud-storage>=2.10.0", 
    "kfp>=2.4.0",
    "pandas>=2.0.0",
    "numpy>=1.24.0",
    "scikit-learn>=1.3.0",
    "tensorflow>=2.13.0",
    "matplotlib>=3.7.0",
    "seaborn>=0.12.0",
    "tqdm>=4.65.0"
]

print("üì¶ Installing MLOps packages...")
print("=" * 50)

failed_packages = []
for package in packages:
    if not install_package(package):
        failed_packages.append(package)

print("\n" + "=" * 50)
if failed_packages:
    print(f"‚ùå Failed to install: {failed_packages}")
    print("üí° Try installing manually: pip install <package-name>")
else:
    print("üéâ All packages installed successfully!")
    
print("‚ú® Installation complete!")

üì¶ Installing MLOps packages...
‚úÖ Successfully installed: google-cloud-aiplatform>=1.38.0
‚úÖ Successfully installed: google-cloud-storage>=2.10.0
‚úÖ Successfully installed: kfp>=2.4.0
‚úÖ Successfully installed: pandas>=2.0.0
‚úÖ Successfully installed: numpy>=1.24.0
‚úÖ Successfully installed: scikit-learn>=1.3.0
‚úÖ Successfully installed: tensorflow>=2.13.0
‚úÖ Successfully installed: matplotlib>=3.7.0
‚úÖ Successfully installed: seaborn>=0.12.0
‚úÖ Successfully installed: tqdm>=4.65.0

üéâ All packages installed successfully!
‚ú® Installation complete!


## 4. Configure GCP Authentication

Now let's set up authentication with Google Cloud Platform. This is essential for accessing Vertex AI and other GCP services.

In [9]:
import subprocess
from google.cloud import aiplatform
from google.oauth2 import service_account
import os

def check_gcloud_auth():
    """Check if gcloud is authenticated."""
    try:
        result = subprocess.run(['gcloud', 'auth', 'list', '--format=value(account)'], 
                              capture_output=True, text=True, check=True)
        accounts = result.stdout.strip().split('\n')
        active_accounts = [acc for acc in accounts if acc.strip()]
        
        if active_accounts:
            print("‚úÖ Google Cloud authentication detected!")
            for account in active_accounts:
                print(f"   üìß Account: {account}")
            return True
        else:
            print("‚ùå No authenticated Google Cloud accounts found")
            return False
    except (subprocess.CalledProcessError, FileNotFoundError):
        print("‚ùå Google Cloud CLI not found or authentication failed")
        return False

def setup_vertex_ai():
    """Initialize Vertex AI with project configuration."""
    try:
        # Initialize Vertex AI
        aiplatform.init(
            project=PROJECT_ID,
            location=REGION,
            staging_bucket=f"gs://{BUCKET_NAME}"
        )
        print("‚úÖ Vertex AI initialized successfully!")
        print(f"   üìù Project: {PROJECT_ID}")
        print(f"   üåç Location: {REGION}")
        return True
    except Exception as e:
        print(f"‚ùå Failed to initialize Vertex AI: {e}")
        return False

print("üîê Authentication Setup")
print("=" * 50)

# Check authentication
auth_ok = check_gcloud_auth()

if not auth_ok:
    print("\nüí° To authenticate, run these commands in your terminal:")
    print("   1. gcloud auth login")
    print("   2. gcloud config set project", PROJECT_ID)
    print("   3. gcloud auth application-default login")
    print("\n‚ö†Ô∏è  Please authenticate and then re-run this cell.")
else:
    print("\nüöÄ Setting up Vertex AI...")
    vertex_ok = setup_vertex_ai()
    
    if vertex_ok:
        print("\nüéâ Authentication and Vertex AI setup complete!")
    else:
        print("\n‚ö†Ô∏è  Vertex AI setup failed. Check your project configuration.")

print("=" * 50)

üîê Authentication Setup
‚úÖ Google Cloud authentication detected!
   üìß Account: faris.hussain@enmacc.com
   üìß Account: farishussain049@gmail.com

üöÄ Setting up Vertex AI...
‚úÖ Vertex AI initialized successfully!
   üìù Project: mlops-295610
   üåç Location: us-central1

üéâ Authentication and Vertex AI setup complete!


## 5. Enable Required APIs

Let's enable the Google Cloud APIs that our MLOps pipeline will need. This step requires proper authentication.

In [10]:
def enable_api(api_name):
    """Enable a Google Cloud API."""
    try:
        cmd = ['gcloud', 'services', 'enable', api_name, '--project', PROJECT_ID]
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        print(f"‚úÖ Enabled: {api_name}")
        return True
    except subprocess.CalledProcessError as e:
        print(f"‚ùå Failed to enable {api_name}: {e.stderr}")
        return False
    except FileNotFoundError:
        print("‚ùå gcloud CLI not found. Please install it first.")
        return False

# Required APIs for MLOps pipeline
required_apis = [
    "aiplatform.googleapis.com",      # Vertex AI
    "storage.googleapis.com",         # Cloud Storage
    "cloudbuild.googleapis.com",      # Cloud Build
    "containerregistry.googleapis.com", # Container Registry
    "compute.googleapis.com",         # Compute Engine (for training)
    "iam.googleapis.com",            # Identity and Access Management
    "logging.googleapis.com",        # Cloud Logging
    "monitoring.googleapis.com"      # Cloud Monitoring
]

print("üîß Enabling Required APIs")
print("=" * 50)
print("‚è≥ This may take a few minutes...")

failed_apis = []
for api in required_apis:
    print(f"üîÑ Enabling {api}...")
    if not enable_api(api):
        failed_apis.append(api)

print("\n" + "=" * 50)
if failed_apis:
    print(f"‚ùå Failed to enable: {failed_apis}")
    print("üí° Try enabling manually in the GCP Console")
    print("   https://console.cloud.google.com/apis/library")
else:
    print("üéâ All required APIs enabled successfully!")
    
print("\n‚úÖ API setup complete!")

üîß Enabling Required APIs
‚è≥ This may take a few minutes...
üîÑ Enabling aiplatform.googleapis.com...
‚úÖ Enabled: aiplatform.googleapis.com
üîÑ Enabling storage.googleapis.com...
‚úÖ Enabled: storage.googleapis.com
üîÑ Enabling cloudbuild.googleapis.com...
‚ùå Failed to enable cloudbuild.googleapis.com: ERROR: (gcloud.services.enable) FAILED_PRECONDITION: Billing account for project '293997883832' is not found. Billing must be enabled for activation of service(s) 'cloudbuild.googleapis.com,artifactregistry.googleapis.com,containerregistry.googleapis.com' to proceed.
Help Token: AXcLsyA7D_fKlN4Ts-k0_I-AoTJ8rCWmcw5B4wEEyNiNJCrqBSdvnue1FPt1WBuiPqgWDg3lN8lAItU_R9yl-evB56YlYsQjUVubq3Um_dgC-T6k
- '@type': type.googleapis.com/google.rpc.PreconditionFailure
  violations:
  - subject: ?error_code=390001&project=293997883832&services=cloudbuild.googleapis.com&services=artifactregistry.googleapis.com&services=containerregistry.googleapis.com
    type: googleapis.com/billing-enabled
- '@typ

## 6. Create Cloud Storage Bucket

Now let's create a Cloud Storage bucket to store our ML artifacts, data, and pipeline outputs.

In [12]:
from google.cloud import storage
import io

def create_bucket_if_not_exists(bucket_name, location="US"):
    """Create a Cloud Storage bucket if it doesn't exist."""
    try:
        client = storage.Client(project=PROJECT_ID)
        
        # Check if bucket already exists
        try:
            bucket = client.get_bucket(bucket_name)
            print(f"‚úÖ Bucket '{bucket_name}' already exists")
            return bucket
        except:
            pass
        
        # Create new bucket
        bucket = client.create_bucket(bucket_name, location=location)
        print(f"‚úÖ Created bucket '{bucket_name}' in {location}")
        return bucket
        
    except Exception as e:
        print(f"‚ùå Failed to create bucket '{bucket_name}': {e}")
        return None

def create_folder_structure(bucket, folders):
    """Create folder structure in the bucket."""
    try:
        for folder in folders:
            # Create a placeholder file to establish the folder
            blob_name = f"{folder}/.gitkeep"
            blob = bucket.blob(blob_name)
            blob.upload_from_string("")
            print(f"üìÅ Created folder: {folder}/")
        return True
    except Exception as e:
        print(f"‚ùå Failed to create folder structure: {e}")
        return False

print("ü™£ Cloud Storage Setup")
print("=" * 50)
print(f"üìù Bucket name: {BUCKET_NAME}")

# Create the bucket
bucket = create_bucket_if_not_exists(BUCKET_NAME, location=REGION.upper())

if bucket:
    # Create folder structure
    folders = [DATA_ROOT, MODELS_ROOT, PIPELINES_ROOT, OUTPUTS_ROOT]
    print(f"\nüìÅ Creating folder structure...")
    
    if create_folder_structure(bucket, folders):
        print("‚úÖ Folder structure created successfully!")
        
        # List the created structure
        print(f"\nüìÇ Bucket structure:")
        for folder in folders:
            print(f"   üìÅ gs://{BUCKET_NAME}/{folder}/")
    else:
        print("‚ùå Failed to create folder structure")
        
    # Set the staging bucket for Vertex AI
    print(f"\nüéØ Vertex AI staging bucket: gs://{BUCKET_NAME}")
    
else:
    print("‚ùå Could not create or access bucket")
    print("üí° Make sure:")
    print("   1. Your project ID is correct")
    print("   2. You have Storage Admin permissions")
    print("   3. The bucket name is globally unique")

print("=" * 50)

ü™£ Cloud Storage Setup
üìù Bucket name: mlops-295610-mlops-bucket
‚úÖ Created bucket 'mlops-295610-mlops-bucket' in US-CENTRAL1

üìÅ Creating folder structure...
üìÅ Created folder: data/
üìÅ Created folder: models/
üìÅ Created folder: pipelines/
üìÅ Created folder: outputs/
‚úÖ Folder structure created successfully!

üìÇ Bucket structure:
   üìÅ gs://mlops-295610-mlops-bucket/data/
   üìÅ gs://mlops-295610-mlops-bucket/models/
   üìÅ gs://mlops-295610-mlops-bucket/pipelines/
   üìÅ gs://mlops-295610-mlops-bucket/outputs/

üéØ Vertex AI staging bucket: gs://mlops-295610-mlops-bucket


## 7. Test GCP Connectivity

Let's verify that all our GCP services are properly configured and accessible.

In [13]:
def test_vertex_ai_connection():
    """Test Vertex AI connectivity."""
    try:
        # List available models (this tests API access)
        models = aiplatform.Model.list(
            filter=f'display_name="*"',
            order_by="create_time desc"
        )
        print("‚úÖ Vertex AI connection successful")
        print(f"   üìä Found {len(models)} existing models")
        return True
    except Exception as e:
        print(f"‚ùå Vertex AI connection failed: {e}")
        return False

def test_storage_connection():
    """Test Cloud Storage connectivity."""
    try:
        client = storage.Client(project=PROJECT_ID)
        bucket = client.get_bucket(BUCKET_NAME)
        
        # Test write access
        test_blob = bucket.blob("test/connectivity_test.txt")
        test_blob.upload_from_string("Connection test successful!")
        
        # Test read access
        content = test_blob.download_as_text()
        
        # Clean up test file
        test_blob.delete()
        
        print("‚úÖ Cloud Storage connection successful")
        print("   üìÅ Read/write access confirmed")
        return True
    except Exception as e:
        print(f"‚ùå Cloud Storage connection failed: {e}")
        return False

def test_project_permissions():
    """Test basic project permissions."""
    try:
        # Test if we can list services
        result = subprocess.run(['gcloud', 'services', 'list', '--project', PROJECT_ID], 
                              capture_output=True, text=True, check=True)
        enabled_services = len(result.stdout.strip().split('\n')) - 1  # Subtract header
        print("‚úÖ Project permissions verified")
        print(f"   üîß {enabled_services} services enabled")
        return True
    except Exception as e:
        print(f"‚ùå Project permission test failed: {e}")
        return False

print("üîç Connectivity Tests")
print("=" * 50)

tests = [
    ("Project Permissions", test_project_permissions),
    ("Cloud Storage", test_storage_connection), 
    ("Vertex AI", test_vertex_ai_connection)
]

results = {}
for test_name, test_func in tests:
    print(f"\nüß™ Testing {test_name}...")
    results[test_name] = test_func()

print("\n" + "=" * 50)
print("üìã Test Summary:")
for test_name, passed in results.items():
    status = "‚úÖ PASS" if passed else "‚ùå FAIL"
    print(f"   {test_name}: {status}")

all_passed = all(results.values())
if all_passed:
    print("\nüéâ All connectivity tests passed! Your environment is ready!")
else:
    print("\n‚ö†Ô∏è  Some tests failed. Please check your configuration.")

print("=" * 50)

üîç Connectivity Tests

üß™ Testing Project Permissions...
‚úÖ Project permissions verified
   üîß 25 services enabled

üß™ Testing Cloud Storage...
‚úÖ Cloud Storage connection successful
   üìÅ Read/write access confirmed

üß™ Testing Vertex AI...
‚úÖ Vertex AI connection successful
   üìä Found 0 existing models

üìã Test Summary:
   Project Permissions: ‚úÖ PASS
   Cloud Storage: ‚úÖ PASS
   Vertex AI: ‚úÖ PASS

üéâ All connectivity tests passed! Your environment is ready!


## 8. Download and Prepare Sample Dataset

Let's download a small sample dataset for our MLOps pipeline demonstration. We'll use a subset of CIFAR-10 to keep it lightweight and cost-effective.

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from pathlib import Path

def download_cifar10_subset(num_samples_per_class=100):
    """Download and prepare a small subset of CIFAR-10 dataset."""
    print("üì• Downloading CIFAR-10 dataset...")
    
    # Load CIFAR-10 dataset
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
    
    # Class names
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
                   'dog', 'frog', 'horse', 'ship', 'truck']
    
    print(f"‚úÖ Downloaded CIFAR-10 dataset")
    print(f"   üìä Original training samples: {len(x_train)}")
    print(f"   üìä Original test samples: {len(x_test)}")
    
    # Create subset for demonstration
    print(f"\nüîÑ Creating subset ({num_samples_per_class} samples per class)...")
    
    subset_indices = []
    for class_id in range(10):
        class_indices = np.where(y_train == class_id)[0]
        selected = np.random.choice(class_indices, num_samples_per_class, replace=False)
        subset_indices.extend(selected)
    
    # Create subset
    subset_indices = np.array(subset_indices)
    x_subset = x_train[subset_indices]
    y_subset = y_train[subset_indices]
    
    print(f"‚úÖ Created subset with {len(x_subset)} total samples")
    
    return x_subset, y_subset, class_names

def explore_dataset(x_data, y_data, class_names):
    """Basic dataset exploration."""
    print("üìä Dataset Exploration")
    print("=" * 30)
    print(f"üìà Shape: {x_data.shape}")
    print(f"üéØ Labels shape: {y_data.shape}")
    print(f"üìè Image dimensions: {x_data.shape[1:]}")
    print(f"üî¢ Data type: {x_data.dtype}")
    print(f"üìä Value range: {x_data.min()} - {x_data.max()}")
    
    # Class distribution
    unique, counts = np.unique(y_data, return_counts=True)
    print(f"\nüè∑Ô∏è  Class Distribution:")
    for class_id, count in zip(unique, counts):
        print(f"   {class_names[class_id]}: {count} samples")
    
    return True

def visualize_samples(x_data, y_data, class_names, num_samples=8):
    """Visualize some sample images."""
    plt.figure(figsize=(12, 6))
    
    for i in range(num_samples):
        plt.subplot(2, 4, i + 1)
        plt.imshow(x_data[i])
        plt.title(f"Class: {class_names[y_data[i][0]]}")
        plt.axis('off')
    
    plt.suptitle("Sample Images from Dataset", fontsize=16)
    plt.tight_layout()
    plt.show()
    
    return True

# Create local data directory
local_data_dir = Path("../data")
local_data_dir.mkdir(exist_ok=True)

print("? Setting up sample dataset for MLOps learning...")
print("=" * 60)

# Download and prepare dataset
x_data, y_data, class_names = download_cifar10_subset(num_samples_per_class=50)

# Explore the dataset
explore_dataset(x_data, y_data, class_names)

# Visualize some samples
print(f"\n? Visualizing sample images...")
visualize_samples(x_data, y_data, class_names)

# Save dataset locally
print(f"\nüíæ Saving dataset locally...")
local_file_path = local_data_dir / "cifar10_subset.npz"
np.savez_compressed(
    local_file_path,
    x_data=x_data,
    y_data=y_data,
    class_names=class_names
)
print(f"‚úÖ Dataset saved to: {local_file_path}")

# Upload to Google Cloud Storage
print(f"\n‚òÅÔ∏è  Uploading dataset to GCS...")
try:
    from google.cloud import storage
    
    client = storage.Client(project=PROJECT_ID)
    bucket = client.get_bucket(BUCKET_NAME)
    
    # Upload the dataset
    gcs_path = f"{DATA_ROOT}/cifar10_subset.npz"
    blob = bucket.blob(gcs_path)
    blob.upload_from_filename(local_file_path)
    
    print(f"‚úÖ Dataset uploaded to: gs://{BUCKET_NAME}/{gcs_path}")
    print(f"üìÅ File size: {blob.size / (1024*1024):.2f} MB")
    
except Exception as e:
    print(f"‚ùå Failed to upload dataset: {e}")

print("=" * 60)
print("üéâ Sample dataset setup complete!")

## 9. Upload Data to Cloud Storage

Now let's save our prepared dataset and upload it to Google Cloud Storage for use in our MLOps pipeline.

In [None]:
import pickle
from tqdm import tqdm

def save_dataset_locally(x_train, y_train, x_test, y_test, class_names, save_dir):
    """Save dataset to local files."""
    save_path = Path(save_dir)
    save_path.mkdir(exist_ok=True)
    
    # Save training data
    np.save(save_path / "x_train.npy", x_train)
    np.save(save_path / "y_train.npy", y_train)
    
    # Save test data  
    np.save(save_path / "x_test.npy", x_test)
    np.save(save_path / "y_test.npy", y_test)
    
    # Save class names
    with open(save_path / "class_names.pkl", "wb") as f:
        pickle.dump(class_names, f)
    
    # Create metadata file
    metadata = {\n        \"dataset_name\": \"cifar10_subset\",\n        \"num_classes\": len(class_names),\n        \"train_samples\": len(x_train),\n        \"test_samples\": len(x_test),\n        \"image_shape\": x_train.shape[1:],\n        \"class_names\": class_names\n    }\n    \n    with open(save_path / \"metadata.pkl\", \"wb\") as f:\n        pickle.dump(metadata, f)\n    \n    print(f\"‚úÖ Dataset saved to {save_path}\")\n    return save_path\n\ndef upload_to_gcs(local_path, bucket_name, gcs_prefix):\n    \"\"\"Upload files to Google Cloud Storage.\"\"\"\n    try:\n        client = storage.Client(project=PROJECT_ID)\n        bucket = client.get_bucket(bucket_name)\n        \n        local_files = list(Path(local_path).glob(\"*\"))\n        print(f\"üì§ Uploading {len(local_files)} files to gs://{bucket_name}/{gcs_prefix}/\")\n        \n        uploaded_files = []\n        for file_path in tqdm(local_files, desc=\"Uploading\"):\n            if file_path.is_file():\n                blob_name = f\"{gcs_prefix}/{file_path.name}\"\n                blob = bucket.blob(blob_name)\n                blob.upload_from_filename(str(file_path))\n                uploaded_files.append(f\"gs://{bucket_name}/{blob_name}\")\n        \n        print(f\"‚úÖ Successfully uploaded {len(uploaded_files)} files\")\n        return uploaded_files\n        \n    except Exception as e:\n        print(f\"‚ùå Upload failed: {e}\")\n        return []\n\ndef verify_gcs_upload(bucket_name, gcs_prefix):\n    \"\"\"Verify files were uploaded correctly to GCS.\"\"\"\n    try:\n        client = storage.Client(project=PROJECT_ID)\n        bucket = client.get_bucket(bucket_name)\n        \n        blobs = list(bucket.list_blobs(prefix=gcs_prefix))\n        \n        print(f\"\\nüìÅ Files in gs://{bucket_name}/{gcs_prefix}/:\")\n        total_size = 0\n        for blob in blobs:\n            size_mb = blob.size / (1024 * 1024)\n            total_size += size_mb\n            print(f\"   üìÑ {blob.name} ({size_mb:.2f} MB)\")\n        \n        print(f\"\\nüìä Total size: {total_size:.2f} MB\")\n        print(f\"üìà Total files: {len(blobs)}\")\n        \n        return len(blobs) > 0\n        \n    except Exception as e:\n        print(f\"‚ùå Verification failed: {e}\")\n        return False\n\nprint(\"üíæ Data Upload to Cloud Storage\")\nprint(\"=\" * 50)\n\n# Save dataset locally first\nlocal_save_path = save_dataset_locally(\n    x_train_subset, y_train_subset, \n    x_test_subset, y_test_subset, \n    class_names, \n    local_data_dir / \"cifar10_subset\"\n)\n\n# Upload to GCS\ngcs_data_prefix = f\"{DATA_ROOT}/cifar10_subset\"\nuploaded_files = upload_to_gcs(local_save_path, BUCKET_NAME, gcs_data_prefix)\n\nif uploaded_files:\n    # Verify upload\n    print(\"\\nüîç Verifying upload...\")\n    verification_ok = verify_gcs_upload(BUCKET_NAME, gcs_data_prefix)\n    \n    if verification_ok:\n        print(\"\\nüéâ Data successfully uploaded to Cloud Storage!\")\n        print(f\"üìç Data location: gs://{BUCKET_NAME}/{gcs_data_prefix}/\")\n    else:\n        print(\"\\n‚ö†Ô∏è  Upload verification failed\")\nelse:\n    print(\"\\n‚ùå Upload failed\")\n\nprint(\"=\" * 50)

## 10. Verify Development Environment

Let's run a comprehensive verification to ensure everything is set up correctly for MLOps development.

In [None]:
import sys
import subprocess
import pickle
from pathlib import Path

def comprehensive_verification():
    """Run comprehensive verification of the MLOps environment setup."""
    
    verification_results = {}
    
    print("üîç Comprehensive Environment Verification")
    print("=" * 60)
    
    # 1. Python Environment
    print("\n1Ô∏è‚É£ Python Environment")
    try:
        version = sys.version_info
        if version.major >= 3 and version.minor >= 8:
            print("   ‚úÖ Python version: OK")
            verification_results['python'] = True
        else:
            print("   ‚ùå Python version: FAIL")
            verification_results['python'] = False
    except:
        verification_results['python'] = False
    
    # 2. Required Libraries
    print("\n2Ô∏è‚É£ Required Libraries")
    required_libs = ['google.cloud.aiplatform', 'google.cloud.storage', 
                    'tensorflow', 'sklearn', 'pandas', 'numpy']
    lib_status = []
    
    for lib in required_libs:
        try:
            __import__(lib)
            print(f"   ‚úÖ {lib}: OK")
            lib_status.append(True)
        except ImportError:
            print(f"   ‚ùå {lib}: MISSING")
            lib_status.append(False)
    
    verification_results['libraries'] = all(lib_status)
    
    # 3. GCP Authentication
    print("\n3Ô∏è‚É£ GCP Authentication")
    try:
        result = subprocess.run(['gcloud', 'auth', 'list', '--format=value(account)'], 
                              capture_output=True, text=True, check=True)
        if result.stdout.strip():
            print("   ‚úÖ GCP Authentication: OK")
            verification_results['auth'] = True
        else:
            print("   ‚ùå GCP Authentication: FAIL")
            verification_results['auth'] = False
    except:
        print("   ‚ùå GCP Authentication: FAIL")
        verification_results['auth'] = False
    
    # 4. Project Configuration
    print("\n4Ô∏è‚É£ Project Configuration")
    if PROJECT_ID != "your-mlops-project-id":
        print(f"   ‚úÖ Project ID configured: {PROJECT_ID}")
        verification_results['project'] = True
    else:
        print("   ‚ùå Project ID: NOT CONFIGURED")
        verification_results['project'] = False
    
    # 5. Vertex AI Access
    print("\n5Ô∏è‚É£ Vertex AI Access")
    try:
        models = aiplatform.Model.list()
        print("   ‚úÖ Vertex AI: OK")
        verification_results['vertex_ai'] = True
    except Exception as e:
        print(f"   ‚ùå Vertex AI: FAIL - {str(e)[:50]}...")
        verification_results['vertex_ai'] = False
    
    # 6. Cloud Storage Access
    print("\n6Ô∏è‚É£ Cloud Storage Access")
    try:
        from google.cloud import storage
        client = storage.Client(project=PROJECT_ID)
        bucket = client.get_bucket(BUCKET_NAME)
        print(f"   ‚úÖ Storage bucket: {BUCKET_NAME}")
        verification_results['storage'] = True
    except Exception as e:
        print(f"   ‚ùå Storage: FAIL - {str(e)[:50]}...")
        verification_results['storage'] = False
    
    # 7. Dataset Availability
    print("\n7Ô∏è‚É£ Dataset Availability")
    try:
        blobs = list(bucket.list_blobs(prefix=f"{DATA_ROOT}/cifar10_subset"))
        if len(blobs) > 0:
            print(f"   ‚úÖ Dataset uploaded: {len(blobs)} files")
            verification_results['dataset'] = True
        else:
            print("   ‚ùå Dataset: NOT FOUND")
            verification_results['dataset'] = False
    except:
        print("   ‚ùå Dataset: CANNOT CHECK")
        verification_results['dataset'] = False
    
    # Summary
    print("\n" + "=" * 60)
    print("üìã VERIFICATION SUMMARY")
    print("=" * 60)
    
    passed = sum(verification_results.values())
    total = len(verification_results)
    
    for check, status in verification_results.items():
        status_icon = "‚úÖ" if status else "‚ùå"
        print(f"   {status_icon} {check.replace('_', ' ').title()}")
    
    print(f"\nüéØ Overall Status: {passed}/{total} checks passed")
    
    if passed == total:
        print("\nüéâ CONGRATULATIONS! Your MLOps environment is fully configured!")
        print("\nüöÄ You're ready to start building your MLOps pipeline!")
        print("\nüìã Next steps:")
        print("   1. Check off completed tasks in TASKS.md")
        print("   2. Move to Phase 2: Data Pipeline Implementation") 
        print("   3. Start with notebook 02_data_pipeline.ipynb")
    else:
        print("\n‚ö†Ô∏è  Some components need attention before proceeding.")
        print("\nüîß Please fix the failed checks above.")
    
    print("=" * 60)
    
    return verification_results

# Run comprehensive verification
results = comprehensive_verification()

# Save configuration for future notebooks
config_summary = {
    'project_id': PROJECT_ID,
    'region': REGION,
    'bucket_name': BUCKET_NAME,
    'verification_results': results,
    'setup_complete': all(results.values())
}

# Save to local config file for other notebooks to use
config_path = Path('../configs/setup_config.pkl')
config_path.parent.mkdir(exist_ok=True)

with open(config_path, 'wb') as f:
    pickle.dump(config_summary, f)

print(f"\nüíæ Configuration saved to: {config_path}")

SyntaxError: unexpected character after line continuation character (2753975430.py, line 2)