# Getting Started with PyGeomodeling

This tutorial introduces the basics of PyGeomodeling for reservoir modeling and geostatistics.

## What You'll Learn

1. Loading GRDECL files
2. Exploring reservoir properties
3. Preparing features for modeling
4. Training a simple Gaussian Process model
5. Making predictions and visualizing results

In [None]:
# Install if needed
# !pip install pygeomodeling

import numpy as np
import matplotlib.pyplot as plt
from pygeomodeling import GRDECLParser, UnifiedSPE9Toolkit, SPE9Plotter

print("✓ Imports successful!")

## 1. Loading Data

Let's start by loading a small sample GRDECL file.

In [None]:
# Load sample data
parser = GRDECLParser('../../data/sample_small.grdecl')
data = parser.load_data()

print(f"Grid dimensions: {data['dimensions']}")
print(f"Available properties: {list(data['properties'].keys())}")

## 2. Exploring Properties

Let's examine the permeability distribution.

In [None]:
# Get permeability data
permx = data['properties']['PERMX']
poro = data['properties']['PORO']

print(f"PERMX shape: {permx.shape}")
print(f"PERMX range: [{permx.min():.2f}, {permx.max():.2f}] mD")
print(f"PERMX mean: {permx.mean():.2f} mD")
print(f"\nPORO range: [{poro.min():.3f}, {poro.max():.3f}]")
print(f"PORO mean: {poro.mean():.3f}")

In [None]:
# Visualize permeability slices
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, ax in enumerate(axes):
    slice_data = permx[:, :, i]
    im = ax.imshow(slice_data.T, cmap='viridis', origin='lower')
    ax.set_title(f'Layer {i+1} (z={i})')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    plt.colorbar(im, ax=ax, label='Permeability (mD)')

plt.tight_layout()
plt.show()

## 3. Prepare Features

Now let's prepare the data for machine learning.

In [None]:
# Create toolkit
toolkit = UnifiedSPE9Toolkit()
toolkit.load_spe9_data(data)

# Prepare features
X, y = toolkit.prepare_features(add_geological_features=False)

print(f"Feature matrix shape: {X.shape}")
print(f"Target vector shape: {y.shape}")
print(f"\nFeatures: {toolkit.feature_names}")

In [None]:
# Create train/test split
X_train, X_test, y_train, y_test = toolkit.create_train_test_split(
    test_size=0.2, 
    random_state=42
)

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")

## 4. Train a Model

Let's train a Gaussian Process Regressor.

In [None]:
# Create and train model
model = toolkit.create_sklearn_model('gpr', kernel_type='rbf')
toolkit.train_sklearn_model(model, 'gpr_rbf')

print("✓ Model trained successfully!")

## 5. Evaluate and Visualize

Let's evaluate the model performance.

In [None]:
# Evaluate model
results = toolkit.evaluate_model('gpr_rbf', X_test, y_test)

print("Model Performance:")
print(f"  R² Score: {results['r2']:.4f}")
print(f"  MSE: {results['mse']:.4f}")
print(f"  MAE: {results['mae']:.4f}")

In [None]:
# Make predictions
predictions = toolkit.models['gpr_rbf'].predict(X_test)

# Plot predictions vs actual
plt.figure(figsize=(8, 6))
plt.scatter(y_test, predictions, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Permeability (mD)')
plt.ylabel('Predicted Permeability (mD)')
plt.title(f'Predictions vs Actual (R² = {results["r2"]:.4f})')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Summary

In this tutorial, you learned:

✓ How to load GRDECL files  
✓ How to explore reservoir properties  
✓ How to prepare features for modeling  
✓ How to train a Gaussian Process model  
✓ How to evaluate and visualize results  

## Next Steps

- Try `02_advanced_modeling.ipynb` for more sophisticated models
- Explore `03_spatial_cross_validation.ipynb` for proper validation
- Check `04_parallel_processing.ipynb` for performance optimization