# py-iku: Basic Usage Guide

This notebook demonstrates the core functionality of py-iku - converting Python data processing code to Dataiku DSS flows.

## What is py-iku?

py-iku is a Python library that:
- Converts pandas, NumPy, and scikit-learn code to Dataiku DSS recipes
- Generates visual flow diagrams
- Exports directly to Dataiku DSS projects

## Installation

```bash
pip install py2dataiku
```

In [None]:
# Import the library
from py2dataiku import convert, Py2Dataiku

## 1. Simple Conversion

Let's start with a simple pandas pipeline that reads data, cleans it, and saves the result.

In [None]:
# Define some pandas code
pandas_code = '''
import pandas as pd

# Load raw data
df = pd.read_csv('customers.csv')

# Clean the data
df['name'] = df['name'].str.strip().str.title()
df['email'] = df['email'].str.lower()
df = df.dropna(subset=['customer_id'])

# Save cleaned data
df.to_csv('customers_cleaned.csv', index=False)
'''

# Convert to Dataiku flow
flow = convert(pandas_code)

print(f"Flow: {flow.name}")
print(f"Datasets: {len(flow.datasets)}")
print(f"Recipes: {len(flow.recipes)}")

In [None]:
# Get a text summary of the flow
print(flow.get_summary())

## 2. Visualizing Flows

py-iku supports multiple visualization formats.

In [None]:
# ASCII visualization (great for terminals)
print(flow.visualize(format='ascii'))

In [None]:
# Mermaid diagram (for documentation)
print(flow.visualize(format='mermaid'))

In [None]:
# HTML visualization (interactive)
from IPython.display import HTML
HTML(flow.visualize(format='html'))

## 3. Complex Pipeline Example

Let's convert a more complex pipeline with joins, grouping, and filtering.

In [None]:
complex_code = '''
import pandas as pd

# Load multiple data sources
orders = pd.read_csv('orders.csv')
customers = pd.read_csv('customers.csv')
products = pd.read_csv('products.csv')

# Join orders with customers
orders_enriched = pd.merge(orders, customers, on='customer_id', how='left')

# Join with products
orders_full = pd.merge(orders_enriched, products, on='product_id', how='left')

# Filter to high-value orders
high_value = orders_full[orders_full['amount'] >= 100]

# Aggregate by customer
customer_summary = high_value.groupby('customer_id').agg({
    'amount': ['sum', 'mean', 'count'],
    'order_date': 'max'
}).reset_index()

# Save results
customer_summary.to_csv('customer_summary.csv', index=False)
'''

flow = convert(complex_code)
print(flow.get_summary())

In [None]:
# View the flow diagram
print(flow.visualize(format='ascii'))

## 4. Examining Recipes

Let's look at the individual recipes generated.

In [None]:
# List all recipes
for recipe in flow.recipes:
    print(f"\nRecipe: {recipe.name}")
    print(f"  Type: {recipe.recipe_type.value}")
    print(f"  Inputs: {[d.name for d in recipe.inputs]}")
    print(f"  Outputs: {[d.name for d in recipe.outputs]}")

## 5. Export Formats

Export the flow in different formats.

In [None]:
# Export as JSON
import json
print(json.dumps(flow.to_dict(), indent=2)[:1000] + '...')

In [None]:
# Export as YAML
print(flow.to_yaml()[:1000] + '...')

## 6. Using the Py2Dataiku Class

For more control, use the `Py2Dataiku` class directly.

In [None]:
# Create converter instance
converter = Py2Dataiku()

# Convert code
flow = converter.convert(pandas_code)

# Validate the flow
is_valid, errors = flow.validate()
print(f"Flow valid: {is_valid}")
if errors:
    print(f"Errors: {errors}")

## Next Steps

- See `02_numpy_operations.ipynb` for NumPy support
- See `03_sklearn_pipelines.ipynb` for scikit-learn support
- See `04_visualizations.ipynb` for visualization options
- See `05_advanced_features.ipynb` for plugins and DSS export