In [None]:
!pip install numpy pandas

# Synthetic Dataset for truth table XOR



In [None]:
import numpy as np
import pandas as pd

# Set seed for reproducibility
np.random.seed(42)

# Generate 100 samples per input combination
data = []
for _ in range(100):
    # (0, 0) -> 0.1
    x1_00 = np.clip(np.random.normal(0, 0.1), -0.3, 0.3)
    x2_00 = np.clip(np.random.normal(0, 0.1), -0.3, 0.3)
    data.append([x1_00, x2_00, 0])

    # (0, 1) -> 0.9
    x1_01 = np.clip(np.random.normal(0, 0.1), -0.3, 0.3)
    x2_01 = np.clip(np.random.normal(1, 0.1), 0.7, 1.3)
    data.append([x1_01, x2_01, 1])

    # (1, 0) -> 0.9
    x1_10 = np.clip(np.random.normal(1, 0.1), 0.7, 1.3)
    x2_10 = np.clip(np.random.normal(0, 0.1), -0.3, 0.3)
    data.append([x1_10, x2_10, 1])

    # (1, 1) -> 0.1
    x1_11 = np.clip(np.random.normal(1, 0.1), 0.7, 1.3)
    x2_11 = np.clip(np.random.normal(1, 0.1), 0.7, 1.3)
    data.append([x1_11, x2_11, 0])

# Create DataFrame
df = pd.DataFrame(data, columns=['x1', 'x2', 'y'])

# Save to CSV
df.to_csv('xor_synthetic_dataset.csv', index=False)

### Code snippet explaination

```python
x1_00 = np.clip(np.random.normal(0, 0.1), -0.3, 0.3)
x2_00 = np.clip(np.random.normal(0, 0.1), -0.3, 0.3)
data.append([x1_00, x2_00, 0.1])
```


#### 1. `np.random.normal(0, 0.1)`
- **Purpose**: Generates random values from a normal (Gaussian) distribution
- **Parameters**:
  - `0`: Mean (center) of the distribution
  - `0.1`: Standard deviation (spread) of the distribution
- **Why?**
  - We're simulating the (0,0) input case for XOR
  - The mean of 0 centers values around the ideal input (0)
  - Standard deviation of 0.1 adds small, realistic noise to prevent overfitting to exact values
  - This mimics real-world data where inputs are rarely perfect 0s or 1s

#### 2. `np.clip(..., -0.3, 0.3)`
- **Purpose**: Constrains values within a specified range
- **Parameters**:
  - Lower bound: `-0.3`
  - Upper bound: `0.3`
- **Why these bounds?**
  - Prevents extreme values that could saturate activation functions
  - For sigmoid: Values beyond ±3 cause saturation (output near 0 or 1)
  - For ReLU: Negative values would output 0, losing information
  - Keeping values in [-0.3, 0.3] ensures:
    - Sigmoid operates in its high-gradient region (better learning)
    - ReLU won't kill negative inputs
    - Values are close to 0 but with meaningful variation

#### 3. `x1_00` and `x2_00`
- **Purpose**: Represents the two input features for one sample
- **Naming convention**:
  - `x1_00`: First input feature for (0,0) case
  - `x2_00`: Second input feature for (0,0) case
- **Why two separate variables?**
  - XOR requires two binary inputs
  - Each input gets its own noise injection
  - Creates diversity while maintaining the (0,0) relationship

### 4. `data.append([x1_00, x2_00, 0])`
- **Purpose**: Adds a complete training sample to our dataset
- **Structure**: `[input1, input2, target_output]`
- **Target value `0`**:
  - Represents the XOR output for (0,0) inputs
