## PART 1: The Problem - Hardcoded Nightmare

In [1]:
# my_training_script.py - How most people start

def train_model():
    """Train a machine learning model"""
    
    # Configuration
    learning_rate = 0.01
    epochs = 100
    batch_size = 32
    model_type = "RandomForest"
    data_path = "data/train.csv"
    output_path = "models/model_v1.pkl"
    
    print(f"Training {model_type}...")
    print(f"  Learning rate: {learning_rate}")
    print(f"  Epochs: {epochs}")
    print(f"  Batch size: {batch_size}")
    print(f"  Data: {data_path}")
    print(f"  Output: {output_path}")
    
    # Simulate training
    import time
    time.sleep(2)
    
    print(f"‚úÖ Model saved to {output_path}")
    return f"{model_type} trained successfully"

# Run it
result = train_model()
print(result)

Training RandomForest...
  Learning rate: 0.01
  Epochs: 100
  Batch size: 32
  Data: data/train.csv
  Output: models/model_v1.pkl
‚úÖ Model saved to models/model_v1.pkl
RandomForest trained successfully


**This works fine. But can we run this with learning_rate=0.001 and epochs=200? NO. For this, we need to update the code to make these changes. And we need to try with 10 different learning rates, we need to update our source code 10 times.**

1. ‚ùå **Version control nightmare** - Git shows changes every time you edit config
2. ‚ùå **Can't reproduce** - What were the exact settings for experiment_v3?
3. ‚ùå **File explosion** - 50 experiments = 50 files?
4. ‚ùå **Team chaos** - Everyone edits the same file differently
5. ‚ùå **No automation** - Can't run 100 experiments overnight
6. ‚ùå **Production deployment** - How do staging and prod use different configs?"


```python
# Your laptop
data_path = "data/train.csv"  # Works!

# Your teammate's laptop
data_path = "C:/Users/John/data/train.csv"  # Different path!

# Production server
data_path = "/mnt/s3-bucket/production/train.csv"  # Completely different!

# Testing environment
data_path = "/tmp/test_data.csv"  # Another path!
```

Every environment needs different values. Are we going to maintain 4 copies of the same script? And what happens when we have 20 scripts? That's 80 files!

This is why professional software engineers use **Command-Line Interfaces (CLI)**.

## PART 2: The Concept - What is a CLI?


```bash
# You've seen these before:
git commit -m "Added new feature"
python script.py
pip install pandas
docker run -p 8080:80 nginx
ls -la
cd /home/user
```

What do these commands have in common?

They all follow the same pattern:
```
command [options] [arguments]
```

For example:
```bash
git commit -m "Added new feature"
‚îÇ   ‚îÇ      ‚îÇ  ‚îî‚îÄ Argument (the message)
‚îÇ   ‚îÇ      ‚îî‚îÄ Option flag (-m means 'message')
‚îÇ   ‚îî‚îÄ Subcommand (commit)
‚îî‚îÄ Main command (git)
```

This is a **Command-Line Interface** - a way to control programs through text commands with parameters.

### Python Scripts Can Be CLIs Too

Our Python scripts can work the same way. Instead of editing code, you pass parameters...


```bash
# Instead of editing the file:
# learning_rate = 0.01  # Edit this line

# You run:
python train.py --learning-rate 0.01

# Different experiment? Just change the command:
python train.py --learning-rate 0.001
python train.py --learning-rate 0.05 --epochs 200
python train.py --model-type LogisticRegression --batch-size 64
```

The Difference: 
- ‚úÖ No file editing
- ‚úÖ Clear what's being changed
- ‚úÖ Easy to reproduce (copy the command)
- ‚úÖ Can be scripted (loop through 100 settings)
- ‚úÖ Different configs for different environments

## PART 3: `sys.argv`

### The Simplest CLI

Let's build this incrementally. Python has a built-in way to access command-line arguments...

**CREATE A NEW FILE - save as `simple_cli.py`:**

```python
import sys

print("Script name:", sys.argv[0])
print("All arguments:", sys.argv)
print("Number of arguments:", len(sys.argv))

if len(sys.argv) > 1:
    print("First argument:", sys.argv[1])
if len(sys.argv) > 2:
    print("Second argument:", sys.argv[2])
```

**RUN IN TERMINAL:**

```bash
python simple_cli.py
# Output:
# Script name: simple_cli.py
# All arguments: ['simple_cli.py']
# Number of arguments: 1

python simple_cli.py hello world
# Output:
# Script name: simple_cli.py
# All arguments: ['simple_cli.py', 'hello', 'world']
# Number of arguments: 3
# First argument: hello
# Second argument: world
```

`sys.argv` is a list of strings:
- `sys.argv[0]` - Always the script name
- `sys.argv[1]` - First argument you pass
- `sys.argv[2]` - Second argument you pass
- And so on...

### Building a Training Script with sys.argv

Let's use this to make our training script configurable...

**CREATE `train_simple.py`:**

```python
import sys

# Get arguments
learning_rate = float(sys.argv[1])  # First argument
epochs = int(sys.argv[2])           # Second argument

print(f"Training with:")
print(f"  Learning rate: {learning_rate}")
print(f"  Epochs: {epochs}")

# Simulate training
import time
time.sleep(1)
print("‚úÖ Training complete!")
```

**RUN IT:**

```bash
python train_simple.py 0.01 100
# Training with:
#   Learning rate: 0.01
#   Epochs: 100
# ‚úÖ Training complete!

python train_simple.py 0.001 200
# Training with:
#   Learning rate: 0.001
#   Epochs: 200
# ‚úÖ Training complete!
```

**SAY:**  
"Great! Now we can change settings without editing files!"



### Problem with this approach


```bash
# What if we forget an argument?
python train_simple.py 0.01
# IndexError: list index out of range

# What if we pass them in wrong order?
python train_simple.py 100 0.01
# Runs but wrong! epochs=100 is treated as learning_rate

# What if we pass wrong type?
python train_simple.py hello world
# ValueError: could not convert string to float: 'hello'

# How do we know what arguments it needs?
python train_simple.py --help
# Still just crashes!
```

This approach is:
- ‚ùå **Fragile** - Crashes on any mistake
- ‚ùå **Confusing** - What's the order? What's required?
- ‚ùå **No help** - No way to see usage
- ‚ùå **No defaults** - Must always provide every parameter
- ‚ùå **Hard to extend** - Adding parameters breaks existing usage

This is where `argparse` saves the day!


## PART 4: `argparse` - The Professional Way

### The Basic Pattern

Python's `argparse` module solves all these problems. Let's rebuild our script properly..."

```python
import argparse

# Step 1: Create a parser
parser = argparse.ArgumentParser(description='Train a machine learning model')

# Step 2: Add arguments
parser.add_argument('--learning-rate', type=float, required=True)
parser.add_argument('--epochs', type=int, required=True)

# Step 3: Parse the arguments
args = parser.parse_args()

# Step 4: Use them
print(f"Training with:")
print(f"  Learning rate: {args.learning_rate}")
print(f"  Epochs: {args.epochs}")
```

**SAVE AS `train_argparse_v1.py` AND RUN:**

```bash
python train_argparse_v1.py --learning-rate 0.01 --epochs 100
# Training with:
#   Learning rate: 0.01
#   Epochs: 100
```


Four simple steps:
1. **Create parser** - Set up the CLI framework
2. **Add arguments** - Define what parameters you accept
3. **Parse** - Let argparse read sys.argv and validate
4. **Use** - Access values through the args object

Notice: We use `--learning-rate` (with dashes) but access it as `args.learning_rate` (with underscores). Argparse converts automatically!

### Free Features



```bash
# 1. Automatic help message!
python train_argparse_v1.py --help

# Output:
# usage: train_argparse_v1.py [-h] --learning-rate LEARNING_RATE --epochs EPOCHS
#
# Train a machine learning model
#
# optional arguments:
#   -h, --help            show this help message and exit
#   --learning-rate LEARNING_RATE
#   --epochs EPOCHS
```

**Free documentation! Argparse generates this from our code!"**

```bash
# 2. Type validation!
python train_argparse_v1.py --learning-rate hello --epochs 100
# error: argument --learning-rate: invalid float value: 'hello'
```

**Automatic type checking! You specified `type=float`, argparse enforces it!**


```bash
# 3. Required argument checking!
python train_argparse_v1.py --learning-rate 0.01
# error: the following arguments are required: --epochs
```

**Validates required arguments automatically!**


```bash
# 4. Unknown argument detection!
python train_argparse_v1.py --learning-rate 0.01 --epochs 100 --typo 123
# error: unrecognized arguments: --typo
```

**Catches typos and mistakes!**


### Let's make it production-grade with defaults, help messages, and optional parameters...


**CREATE `train_argparse_v2.py`:**

```python
import argparse

parser = argparse.ArgumentParser(
    description='Train a machine learning model',
    formatter_class=argparse.RawDescriptionHelpFormatter,
    epilog='''
Examples:
  python train.py --learning-rate 0.01 --epochs 100
  python train.py --lr 0.001 --epochs 200 --model rf
  python train.py --help
    '''
)

# Required arguments
parser.add_argument(
    '--learning-rate', '--lr',  # Two names for same argument!
    type=float,
    required=True,
    help='Learning rate for optimization (e.g., 0.01)'
)

parser.add_argument(
    '--epochs', '-e',  # Short version with single dash
    type=int,
    required=True,
    help='Number of training epochs (e.g., 100)'
)

# Optional arguments with defaults
parser.add_argument(
    '--batch-size', '-b',
    type=int,
    default=32,  # Default value!
    help='Batch size for training (default: 32)'
)

parser.add_argument(
    '--model-type',
    type=str,
    choices=['RandomForest', 'LogisticRegression', 'NeuralNetwork'],  # Only these allowed!
    default='RandomForest',
    help='Type of model to train (default: RandomForest)'
)

parser.add_argument(
    '--output',
    type=str,
    default='models/model.pkl',
    help='Path to save the trained model (default: models/model.pkl)'
)

# Boolean flag (on/off)
parser.add_argument(
    '--verbose', '-v',
    action='store_true',  # Flag: present = True, absent = False
    help='Enable verbose output'
)

# Parse arguments
args = parser.parse_args()

# Use them
print(f"Training configuration:")
print(f"  Learning rate: {args.learning_rate}")
print(f"  Epochs: {args.epochs}")
print(f"  Batch size: {args.batch_size}")
print(f"  Model type: {args.model_type}")
print(f"  Output path: {args.output}")
print(f"  Verbose: {args.verbose}")

if args.verbose:
    print("\nüîç Verbose mode enabled - showing detailed logs...")

print("\nüöÄ Starting training...")
import time
time.sleep(1)
print(f"‚úÖ Model saved to {args.output}")
```


**TEST IT:**

```bash
# Minimal required arguments
python train_argparse_v2.py --lr 0.01 --epochs 100
# Uses defaults for batch-size, model-type, output

# Full control
python train_argparse_v2.py --lr 0.001 -e 200 -b 64 --model-type NeuralNetwork --output models/nn_v1.pkl --verbose

# See the beautiful help
python train_argparse_v2.py --help

# Try invalid model type
python train_argparse_v2.py --lr 0.01 --epochs 100 --model-type InvalidModel
# error: argument --model-type: invalid choice: 'InvalidModel' 
#        (choose from 'RandomForest', 'LogisticRegression', 'NeuralNetwork')
```


**1. Multiple Names:**
```python
'--learning-rate', '--lr'  # Long and short versions
```
Users can use either `--learning-rate` or `--lr` - both work!

**2. Default Values:**
```python
default=32
```
If not specified, uses 32. Optional arguments should have defaults!

**3. Choices:**
```python
choices=['RandomForest', 'LogisticRegression', 'NeuralNetwork']
```
Only these values are allowed - automatic validation!

**4. Boolean Flags:**
```python
action='store_true'
```
Present = True, absent = False. No value needed: `--verbose` not `--verbose True`

**5. Help Messages:**
```python
help='Learning rate for optimization (e.g., 0.01)'
```
Shows in --help. Be descriptive and include examples!


## PART 5: Real MLOps Patterns

### Pattern 1: Subcommands (Like Git)

Professional tools use subcommands. Git has `git commit`, `git push`, `git pull`. Let's build that...

```python
import argparse

# Main parser
parser = argparse.ArgumentParser(description='MLOps CLI Tool')
subparsers = parser.add_subparsers(dest='command', help='Available commands')

# Subcommand: train
train_parser = subparsers.add_parser('train', help='Train a model')
train_parser.add_argument('--lr', type=float, required=True)
train_parser.add_argument('--epochs', type=int, required=True)
train_parser.add_argument('--model', choices=['rf', 'lr', 'nn'], default='rf')

# Subcommand: predict
predict_parser = subparsers.add_parser('predict', help='Make predictions')
predict_parser.add_argument('--model-path', required=True)
predict_parser.add_argument('--input', required=True)
predict_parser.add_argument('--output', default='predictions.csv')

# Subcommand: evaluate
eval_parser = subparsers.add_parser('evaluate', help='Evaluate model')
eval_parser.add_argument('--model-path', required=True)
eval_parser.add_argument('--test-data', required=True)
eval_parser.add_argument('--metrics', nargs='+', default=['accuracy'])

# Parse
args = parser.parse_args()

# Route to appropriate handler
if args.command == 'train':
    print(f"üéØ Training {args.model} model with lr={args.lr}, epochs={args.epochs}")
    
elif args.command == 'predict':
    print(f"üîÆ Making predictions: {args.model_path} -> {args.output}")
    
elif args.command == 'evaluate':
    print(f"üìä Evaluating with metrics: {args.metrics}")
    
else:
    parser.print_help()
```

**SAVE AS `mlops_cli.py` AND TEST:**

```bash
# See available commands
python mlops_cli.py --help

# Train command
python mlops_cli.py train --lr 0.01 --epochs 100 --model rf

# Predict command
python mlops_cli.py predict --model-path models/model.pkl --input data.csv

# Evaluate command
python mlops_cli.py evaluate --model-path models/model.pkl --test-data test.csv --metrics accuracy f1 precision

# Each subcommand has its own help
python mlops_cli.py train --help
python mlops_cli.py predict --help
```

This is how professional tools are structured! Same script, multiple commands, each with its own options.


### Pattern 2: Configuration Files + CLI Override

In production, we often want default configs but ability to override. Here's the pattern...

```python
import argparse
import json
from pathlib import Path

# Load default config from file
def load_config(config_path):
    if Path(config_path).exists():
        with open(config_path) as f:
            return json.load(f)
    return {}

parser = argparse.ArgumentParser()

# Config file argument
parser.add_argument(
    '--config',
    default='config.json',
    help='Path to configuration file'
)

# Individual overrides
parser.add_argument('--lr', type=float, help='Override learning rate')
parser.add_argument('--epochs', type=int, help='Override epochs')
parser.add_argument('--model', type=str, help='Override model type')

args = parser.parse_args()

# Load defaults from config file
config = load_config(args.config)

# Override with CLI arguments (if provided)
if args.lr is not None:
    config['learning_rate'] = args.lr
if args.epochs is not None:
    config['epochs'] = args.epochs
if args.model is not None:
    config['model_type'] = args.model

print("Final configuration:")
print(json.dumps(config, indent=2))
```

**CREATE `config.json`:**
```json
{
  "learning_rate": 0.01,
  "epochs": 100,
  "model_type": "RandomForest",
  "batch_size": 32
}
```

**TEST:**
```bash
# Use config file defaults
python train_config.py

# Override specific values
python train_config.py --lr 0.001 --epochs 200

# Different config file
python train_config.py --config prod_config.json --lr 0.005
```

This is the production pattern:
1. **Config file** - Default settings for each environment
2. **CLI overrides** - Quick changes without editing files
3. **Priority**: CLI > Config File > Hardcoded Defaults

### Pattern 3: Environment-Specific Defaults

Different environments need different defaults. Here's how professionals handle it...


```python
import argparse
import os

# Detect environment
ENV = os.getenv('ENVIRONMENT', 'development')

# Environment-specific defaults
DEFAULTS = {
    'development': {
        'data_path': 'data/dev/train.csv',
        'output_path': 'models/dev/',
        'batch_size': 16,
        'epochs': 10,
        'log_level': 'DEBUG'
    },
    'staging': {
        'data_path': 's3://staging-bucket/train.csv',
        'output_path': 's3://staging-bucket/models/',
        'batch_size': 32,
        'epochs': 50,
        'log_level': 'INFO'
    },
    'production': {
        'data_path': 's3://prod-bucket/train.csv',
        'output_path': 's3://prod-bucket/models/',
        'batch_size': 64,
        'epochs': 100,
        'log_level': 'WARNING'
    }
}

# Get defaults for current environment
defaults = DEFAULTS.get(ENV, DEFAULTS['development'])

parser = argparse.ArgumentParser(
    description=f'Training script (Environment: {ENV})'
)

parser.add_argument(
    '--data-path',
    default=defaults['data_path'],
    help=f"Data path (default for {ENV}: {defaults['data_path']})"
)

parser.add_argument(
    '--output',
    default=defaults['output_path'],
    help=f"Output path (default for {ENV}: {defaults['output_path']})"
)

parser.add_argument(
    '--batch-size',
    type=int,
    default=defaults['batch_size'],
    help=f"Batch size (default for {ENV}: {defaults['batch_size']})"
)

parser.add_argument(
    '--epochs',
    type=int,
    default=defaults['epochs'],
    help=f"Epochs (default for {ENV}: {defaults['epochs']})"
)

args = parser.parse_args()

print(f"üåç Environment: {ENV}")
print(f"üìä Configuration:")
print(f"  Data: {args.data_path}")
print(f"  Output: {args.output}")
print(f"  Batch size: {args.batch_size}")
print(f"  Epochs: {args.epochs}")
```

**TEST:**
```bash
# Development (default)
python train_env.py

# Staging
ENVIRONMENT=staging python train_env.py

# Production
ENVIRONMENT=production python train_env.py

# Override production defaults
ENVIRONMENT=production python train_env.py --epochs 150 --batch-size 128
```

**SAY:**  
"This is how Netflix, Uber, and Airbnb do it:
- ‚úÖ Same code, different environments
- ‚úÖ Smart defaults per environment
- ‚úÖ Can still override anything
- ‚úÖ Environment variable controls behavior"


## PART 6: Best Practices & Common Patterns

### Essential Tips

"Let me share some best practices learned the hard way..."

**1. Always Provide Help Messages**
```python
# ‚ùå Bad
parser.add_argument('--lr', type=float)

# ‚úÖ Good
parser.add_argument(
    '--learning-rate', '--lr',
    type=float,
    required=True,
    help='Learning rate (e.g., 0.01, 0.001). Controls optimization step size.'
)
```

**2. Use Descriptive Long Names**
```python
# ‚ùå Bad
parser.add_argument('--n', type=int)  # What's n?

# ‚úÖ Good
parser.add_argument(
    '--num-epochs', '--epochs', '-e',  # Long, medium, short
    type=int,
    help='Number of training epochs'
)
```

**3. Set Sensible Defaults**
```python
# ‚úÖ Good
parser.add_argument('--batch-size', type=int, default=32)
parser.add_argument('--log-level', default='INFO', choices=['DEBUG', 'INFO', 'WARNING', 'ERROR'])
```

**4. Use Choices for Enums**
```python
# ‚úÖ Good
parser.add_argument(
    '--model',
    choices=['rf', 'lr', 'svm', 'nn'],
    default='rf',
    help='Model type'
)
```

**5. Validate Argument Relationships**
```python
args = parser.parse_args()

# Validate relationships
if args.model == 'nn' and args.epochs < 10:
    parser.error("Neural networks need at least 10 epochs")

if args.batch_size > args.dataset_size:
    parser.error(f"Batch size ({args.batch_size}) can't exceed dataset size ({args.dataset_size})")
```


## PART 7: Bridge to Advanced Content

OPEN THE NOTEBOOK ARGPARSE SECTION : 

**Topic 7: Professional CLI Interfaces for MLOps**

### Understanding the Production Code

Now let's look at the CLI interface in our notebook `Session-1_Python_Refresher_for_MLOps` and understand every piece...

```python
class MLOpsCLI:
    def __init__(self):
        self.logger = get_mlops_logger("cli")
        self.file_handler = MLOpsFileHandler()
    
    def create_parser(self):
        parser = argparse.ArgumentParser(
            description="MLOps Starter Kit - Professional ML Operations CLI",
            formatter_class=argparse.RawDescriptionHelpFormatter,
            epilog="""
Examples:
  python mlops_cli.py train --lr 0.01 --epochs 50
  python mlops_cli.py predict --model models/best.pkl --input data.csv
            """
        )
        
        # Global arguments
        parser.add_argument('--verbose', '-v', action='store_true')
        parser.add_argument('--quiet', '-q', action='store_true')
        
        # Subcommands
        subparsers = parser.add_subparsers(dest='command')
        
        # Train command
        train_parser = subparsers.add_parser('train', help='Train ML models')
        train_parser.add_argument('--model-type', choices=['rf', 'lr', 'both'])
        train_parser.add_argument('--lr', type=float, default=0.01)
        # ... more arguments
        
        return parser
```

**ArgumentParser:**
"The foundation - sets up the CLI with description and help formatting"

**formatter_class:**
"Controls how help is displayed - `RawDescriptionHelpFormatter` preserves your formatting in epilog"

**epilog:**
"Shows examples at the bottom of --help. Always include usage examples!"

**subparsers:**
"Enables git-style subcommands. Each subcommand can have its own arguments"

**action='store_true':**
"Boolean flags. Present = True, absent = False"

**choices:**
"Restricts values. Automatic validation and shows options in help"
