
# 📊 Simulation Input Preparation Summary (JaamSim)

## ✅ Objective
To analyse real-world process duration data from a manufacturing system and prepare **statistically valid, distribution-based parameters** for a **Discrete Event Simulation** using **JaamSim**.

---

## 📁 Initial Data Preparation

### 1. Filtered Relevant Columns
Selected relevant columns for analysis:
```python
analysis_data = raw_data[
    ['Key', 'Batch No.', 'Product Code', 'Description', 'Process/Step Name', 
     'Machine', 'Start Datetime', 'End Datetime', 'Process Duration (min)']
]
```

### 2. Renamed Columns for Simplicity
```python
filtered_data = analysis_data.rename(columns={
    'Process/Step Name': 'step_name',
    'Process Duration (min)': 'duration_min',
    'Product Code': 'product_code',
    'Machine': 'machine',
    'Batch No.': 'batch_no'
})
```

### 3. Derived Machine Group Prefixes
```python
filtered_data['machine_group'] = filtered_data['machine'].str.split('-').str[0]
```

---

## 📊 Exploratory Data Analysis

- Frequency counts via `value_counts()` to understand machine usage
- Histograms & KDE plots by machine group
- Applied **log-transforms** for skew reduction
- Normality tests (e.g., `normaltest`) on log-duration → most were right-skewed

---

## 🔧 Simulation-Oriented Data Aggregation

### 4. Batch-Machine Aggregation
```python
summary_by_machine = (
    filtered_data
    .groupby(['batch_no', 'machine'])['duration_min']
    .sum()
    .reset_index()
)
```

---

## 📐 Distribution Fitting Per Machine

### 5. Fit Candidate Distributions
Tried: `norm`, `lognorm`, `gamma`, `triang`, `expon`
Used **KS test** for best-fit selection per machine based on p-value.

```python
params = dist.fit(data)
stat, p = stats.kstest(data, dist_name, args=params)
```

---

## 📄 JaamSim-Ready Parameter Extraction

### 6. Mapping Fitted Results to JaamSim Format

| Distribution | JaamSim Fields |
|--------------|----------------|
| **lognorm**  | Location, Scale, NormalMean = log(scale), NormalSD = shape |
| **norm**     | Mean, StdDev   |
| **expon**    | Location, Scale |
| **gamma**    | Shape, Location, Scale |

✅ Final table included: machine name, distribution type, and JaamSim-compatible parameters.

---

## ⚠️ Validation

- Flagged issues:
  - Negative `loc`
  - `scale = 0` (→ log undefined)
  - Insufficient data
- Re-fitting or fallbacks suggested

---

## 🧾 Final Output
A table ready for JaamSim input:
- Realistic, data-driven variation in process times
- Supports LogNormal, Exponential, Gamma, Normal
- Mapped directly to simulation blocks

---
