# SENSOR SELECTION CASE STUDY

### IoT-Based Predictive Maintenance for Manufacturing Equipment

**BUSINESS SCENARIO**

**Company:** TechManufacture Inc.
**Problem:** Predicting equipment failures in a production line

**CURRENT SITUATION:**
- 12 IoT sensors installed on critical manufacturing equipment
- Sensors monitor temperature, vibration, pressure, and operational metrics
- Each sensor costs $500/month for maintenance and data transmission
- Total cost: $6,000/month or $72,000/year

**CHALLENGE:**
- Management wants to reduce operational costs
- Need to identify which sensors are truly necessary
- Cannot compromise on prediction accuracy
- This is a PRE-MODELING task - we need to decide BEFORE building ML models

**OBJECTIVE:**
Use EDA techniques to systematically identify which sensors can be removed while maintaining the ability to predict equipment failures.

**DATASET DETAILS:**
- 2,000 observations collected over 6 months
- 12 sensor measurements per observation
- Binary target: equipment_failure (0 = normal, 1 = failure)
- Sensor types: temperature, vibration, pressure, speed, current, voltage


IMPORTS AND CONFIG

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold, mutual_info_classif
from sklearn.ensemble import RandomForestClassifier
from statsmodels.stats.outliers_influence import variance_inflation_factor
import warnings
warnings.filterwarnings('ignore')

In [10]:
np.random.seed(42)
n_samples = 2000

### STEP 1: DATA GENERATION - Creating Realistic Sensor Data

In [11]:
# Generating sensor data with specific patterns...

# TEMPERATURE SENSORS (3 sensors)
# temp_core: Core temperature - highly predictive of failures
temp_core = np.random.normal(75, 8, n_samples)  # Normal operation around 75°C
temp_core[1800:] += np.random.uniform(15, 25, 200)  # Failures show high temp

# temp_ambient: Ambient temperature - less variable, less predictive
temp_ambient = np.random.normal(22, 2, n_samples)  # Room temperature

# temp_exhaust: Exhaust temperature - HIGHLY CORRELATED with core temp
temp_exhaust = temp_core * 0.85 + np.random.normal(5, 2, n_samples)


In [12]:
# VIBRATION SENSORS (3 sensors)
# vibration_x: X-axis vibration - predictive of mechanical failures
vibration_x = np.random.gamma(2, 2, n_samples)
vibration_x[1800:] += np.random.uniform(8, 15, 200)  # High vibration before failure

# vibration_y: Y-axis vibration - somewhat correlated with X
vibration_y = vibration_x * 0.6 + np.random.gamma(1.5, 1.5, n_samples)

# vibration_z: Z-axis - NEAR CONSTANT (sensor malfunction/not useful)
vibration_z = np.random.normal(0.5, 0.001, n_samples)  # Almost no variance


In [13]:
# PRESSURE SENSORS (2 sensors)
# pressure_inlet: Inlet pressure - normal operation
pressure_inlet = np.random.normal(100, 5, n_samples)

# pressure_outlet: Outlet pressure - HIGHLY CORRELATED with inlet
pressure_outlet = pressure_inlet * 0.95 + np.random.normal(2, 1, n_samples)


In [14]:
# OPERATIONAL SENSORS (4 sensors)
# motor_speed: Motor RPM - predictive of failures
motor_speed = np.random.normal(1800, 50, n_samples)
motor_speed[1800:] += np.random.uniform(-200, -100, 200)  # Speed drops before failure

# motor_current: Electrical current - predictive
motor_current = np.random.normal(15, 2, n_samples)
motor_current[1800:] += np.random.uniform(5, 10, 200)  # Current spikes before failure

# voltage_supply: Supply voltage - VERY STABLE (grid supply)
voltage_supply = np.random.normal(220, 0.5, n_samples)  # Almost constant

# power_factor: Power efficiency - less predictive
power_factor = np.random.uniform(0.85, 0.95, n_samples)

In [15]:
# Create DataFrame
sensor_data = pd.DataFrame({
    'temp_core': temp_core,
    'temp_ambient': temp_ambient,
    'temp_exhaust': temp_exhaust,
    'vibration_x': vibration_x,
    'vibration_y': vibration_y,
    'vibration_z': vibration_z,
    'pressure_inlet': pressure_inlet,
    'pressure_outlet': pressure_outlet,
    'motor_speed': motor_speed,
    'motor_current': motor_current,
    'voltage_supply': voltage_supply,
    'power_factor': power_factor
})

In [16]:
sensor_data.head()

Unnamed: 0,temp_core,temp_ambient,temp_exhaust,vibration_x,vibration_y,vibration_z,pressure_inlet,pressure_outlet,motor_speed,motor_current,voltage_supply,power_factor
0,78.973713,22.996443,71.590365,1.034029,0.773871,0.499038,105.869505,101.911063,1783.1925,14.832178,219.832933,0.9105
1,73.893886,24.280298,64.137083,0.735611,1.636718,0.500435,94.909812,90.621499,1794.028842,14.939575,220.076285,0.910947
2,80.181508,25.161081,72.288734,7.210694,6.105532,0.499314,101.590797,98.328942,1801.761265,14.932829,220.103354,0.928143
3,87.184239,19.969812,80.69424,2.951314,3.182083,0.498511,102.068239,99.071263,1845.818941,17.595548,219.857668,0.932297
4,73.126773,20.378285,66.563005,1.532191,6.906448,0.499071,95.25689,93.762405,1775.076796,12.783893,220.338451,0.852723


In [17]:
# Create target variable (equipment failure)
# Failures are influenced by specific sensors
failure_score = (
    (temp_core - 75) * 0.4 +           # High temperature
    vibration_x * 2 +                   # High vibration
    (1800 - motor_speed) * 0.1 +       # Low speed
    (motor_current - 15) * 3            # High current
)


In [18]:
failure_score

array([ 4.83482776,  1.44461613, 16.11635197, ..., 88.67513998,
       55.23883322, 64.11161593], shape=(2000,))

In [19]:
# Add noise and create binary outcome
failure_score += np.random.normal(0, 10, n_samples)
equipment_failure = (failure_score > np.percentile(failure_score, 85)).astype(int)

sensor_data['equipment_failure'] = equipment_failure

In [23]:
# Dataset Preview
sensor_data.head()

Unnamed: 0,temp_core,temp_ambient,temp_exhaust,vibration_x,vibration_y,vibration_z,pressure_inlet,pressure_outlet,motor_speed,motor_current,voltage_supply,power_factor,equipment_failure
0,78.973713,22.996443,71.590365,1.034029,0.773871,0.499038,105.869505,101.911063,1783.1925,14.832178,219.832933,0.9105,0
1,73.893886,24.280298,64.137083,0.735611,1.636718,0.500435,94.909812,90.621499,1794.028842,14.939575,220.076285,0.910947,0
2,80.181508,25.161081,72.288734,7.210694,6.105532,0.499314,101.590797,98.328942,1801.761265,14.932829,220.103354,0.928143,0
3,87.184239,19.969812,80.69424,2.951314,3.182083,0.498511,102.068239,99.071263,1845.818941,17.595548,219.857668,0.932297,0
4,73.126773,20.378285,66.563005,1.532191,6.906448,0.499071,95.25689,93.762405,1775.076796,12.783893,220.338451,0.852723,0


In [21]:
# Total Samples
n_samples

2000

In [22]:
# Failure rate %
equipment_failure.mean() * 100

np.float64(15.0)

In [24]:
# Statistical Summary
sensor_data.describe().round(2)

Unnamed: 0,temp_core,temp_ambient,temp_exhaust,vibration_x,vibration_y,vibration_z,pressure_inlet,pressure_outlet,motor_speed,motor_current,voltage_supply,power_factor,equipment_failure
count,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0
mean,77.34,22.0,70.65,5.19,5.37,0.5,100.21,97.21,1784.34,15.71,220.0,0.9,0.15
std,9.86,2.01,8.61,4.5,3.28,0.0,5.01,4.87,68.86,3.02,0.48,0.03,0.36
min,49.07,16.02,44.91,0.04,0.16,0.5,85.33,80.54,1510.37,7.69,218.25,0.85,0.0
25%,70.58,20.6,64.99,2.15,2.97,0.5,96.76,93.85,1751.1,13.77,219.68,0.88,0.0
50%,76.47,22.0,69.89,3.71,4.55,0.5,100.2,97.21,1792.08,15.22,220.0,0.9,0.0
75%,82.77,23.33,75.45,6.63,6.84,0.5,103.65,100.44,1829.9,16.93,220.31,0.92,0.0
max,123.17,29.85,109.01,25.26,25.22,0.5,122.4,119.61,1980.14,29.38,221.61,0.95,1.0


In [25]:
# Shuffle the rows
sensor_data = sensor_data.sample(frac=1, random_state=42).reset_index(drop=True)
# Save to CSV for reference
sensor_data.to_csv('sensor_data.csv', index=False)
print("\n✓ Data saved to 'sensor_data.csv'")


✓ Data saved to 'sensor_data.csv'


### STEP 2: EXPLORATORY DATA ANALYSIS - Understanding the Data