# MKYZ Quick Start Guide

Get up and running with MKYZ in 5 minutes.

This notebook covers the essential workflow of MKYZ: loading data, training models (manually and automatically), evaluating results, and exporting reports.

## 1. Import MKYZ

In [None]:
import mkyz

## 2. Load Your Data

MKYZ provides two ways to load and prepare your data for machine learning.

### Option A: Using `prepare_data` (Original API)
Automatically handles loading, splitting, scaling, and encoding in one go.

In [None]:
# Automatically handles everything
data = mkyz.prepare_data(
    'data/titanic.csv',
    target_column='Survived',
    test_size=0.2,
    random_state=42
)

# Returns a tuple: X_train, X_test, y_train, y_test, df, target, num_cols, cat_cols
X_train, X_test, y_train, y_test, df, target, num_cols, cat_cols = data

print(f"X_train shape: {X_train.shape}")

### Option B: Using `load_data` (New API)
More flexible loading and validation.

In [None]:
# More flexible loading
df = mkyz.load_data('data/titanic.csv')  # Also supports .xlsx, .json, .parquet

# Validate the dataset
validation = mkyz.validate_dataset(df, target_column='Survived')
print(validation)

## 3. Train a Model

### Single Model Training
Train a specific model using the `data` tuple.

In [None]:
# Train a Random Forest classifier
model = mkyz.train(
    data,
    task='classification',
    model='rf',
    n_estimators=100
)

### AutoML - Find the Best Model
Automatically train all supported models and find the best one.

In [None]:
# Automatically train all models and find the best one
best_model = mkyz.auto_train(
    data,
    task='classification',
    n_threads=4,               # Parallel training
    optimize_models=True,      # Hyperparameter tuning
    optimization_method='grid_search'  # or 'bayesian'
)

## 4. Make Predictions

In [None]:
predictions = mkyz.predict(data, model, task='classification')

## 5. Evaluate Performance

### Quick Evaluation

In [None]:
scores = mkyz.evaluate(data, predictions, task='classification')
print(scores)

### Detailed Metrics

In [None]:
from mkyz import classification_metrics

metrics = classification_metrics(y_test, predictions)
for metric, value in metrics.items():
    print(f"{metric}: {value:.4f}")

### Cross-Validation

In [None]:
from mkyz import cross_validate, CVStrategy

results = cross_validate(
    model, X_train, y_train,
    cv=CVStrategy.STRATIFIED,
    n_splits=5
)

print(f"Mean accuracy: {results['mean_test_score']:.4f}")
print(f"Std: {results['std_test_score']:.4f}")

## 6. Save Your Model

In [None]:
# Save with metadata
import os
if not os.path.exists('models'):
    os.makedirs('models')

mkyz.save_model(
    model,
    'models/my_model',
    format='joblib',
    metadata={'accuracy': 0.95, 'version': '1.0'}
)

## 7. Load and Use Later

In [None]:
# Load the model
loaded_model, metadata = mkyz.load_model(
    'models/my_model.joblib',
    return_metadata=True
)

print(f"Model metadata: {metadata}")

# Make predictions with loaded model (using sklearn API directly)
new_predictions = loaded_model.predict(X_test)

## 8. Generate a Report

In [None]:
from mkyz import ModelReport

# Create comprehensive report
report = ModelReport(
    model=model,
    X_test=X_test,
    y_test=y_test,
    task='classification',
    model_name='Random Forest Classifier'
)

# Generate report
report.generate()

# Print summary
print(report.summary())

# Export to HTML
if not os.path.exists('reports'):
    os.makedirs('reports')
report.export_html('reports/model_report.html')

## 9. Visualize Results

In [None]:
# Various visualizations
mkyz.visualize(data, plot_type='histogram')
mkyz.visualize(data, plot_type='correlation')
mkyz.visualize(data, plot_type='boxplot')

---

## Complete Example

Here is a complete end-to-end example using the Titanic dataset.

In [None]:
import mkyz

# 1. Prepare data
data = mkyz.prepare_data('data/titanic.csv', target_column='Survived')
X_train, X_test, y_train, y_test, df, target, num_cols, cat_cols = data

# 2. Auto-train and find best model
best_model = mkyz.auto_train(data, task='classification')

# 3. Evaluate with cross-validation
results = mkyz.cross_validate(best_model, X_train, y_train, cv='stratified')
print(f"CV Accuracy: {results['mean_test_score']:.4f}")

# 4. Generate report
report = mkyz.ModelReport(best_model, X_test, y_test, task='classification')
report.generate()
report.export_html('titanic_report.html')

# 5. Save model
mkyz.save_model(best_model, 'titanic_model')

print("Done! Check titanic_report.html for results.")