The dataset consists of time-series sensor data collected over time, where each row represents a sensor reading at a given timestamp. The goal is to classify whether the signal "Passes" or "Fails" based on sensor values.
- Time: Timestamp of the recorded signal.
- Feature Columns (0-589): Sensor readings capturing various measurements.
- Pass/Fail: Target variable indicating whether the signal was classified as a pass (
1) or fail (-1).
To prepare the dataset for analysis and model training, the following steps are taken:
- Check for missing or
NaNvalues in sensor readings. - Strategy: Replace missing values with mean/median or remove rows if they are too sparse.
- Sensor readings have different ranges, so Standardization (
StandardScaler) or MinMax Scaling is applied.
- The dataset is split into training (80%) and testing (20%) subsets using
train_test_split().
- Histograms: Show how individual sensor readings are distributed.
- Boxplots: Identify outliers in the sensor data.
- Heatmap (Seaborn): Identify strongly correlated sensor features.
- Feature Reduction: Remove redundant sensors that provide duplicate information.
- Select the most important sensor readings that impact the Pass/Fail classification.
- Feature engineering techniques like Principal Component Analysis (PCA) or Feature Importance (Random Forest/Gradient Boosting) may be used.
Multiple classification models are trained and evaluated:
- Logistic Regression (Baseline)
- Decision Tree Classifier
- Random Forest Classifier
- Gradient Boosting Classifier
- Naïve Bayes Classifier
- Support Vector Machine (SVM)
- Stacking Classifier (Combining multiple models)
- Accuracy Score
- Precision, Recall, F1-score
- Confusion Matrix for classification performance.
- What sensor readings indicate a failure?
- Certain sensor values may be critical in identifying defective signals.
- How accurate is the failure prediction?
- If models achieve high accuracy (>85%), they can be deployed in real-world signal monitoring.
- How can this help businesses?
- Automated detection of failing signals reduces manual inspections and improves production efficiency.