Skip to content

ghost2k0407/Sensor-Data-Classification

Repository files navigation

Sensor Data Classification

1. Understanding the Dataset

The dataset consists of time-series sensor data collected over time, where each row represents a sensor reading at a given timestamp. The goal is to classify whether the signal "Passes" or "Fails" based on sensor values.

Dataset Structure

  • Time: Timestamp of the recorded signal.
  • Feature Columns (0-589): Sensor readings capturing various measurements.
  • Pass/Fail: Target variable indicating whether the signal was classified as a pass (1) or fail (-1).

2. Data Preprocessing

To prepare the dataset for analysis and model training, the following steps are taken:

A. Handling Missing Values

  • Check for missing or NaN values in sensor readings.
  • Strategy: Replace missing values with mean/median or remove rows if they are too sparse.

B. Feature Scaling

  • Sensor readings have different ranges, so Standardization (StandardScaler) or MinMax Scaling is applied.

C. Train-Test Split

  • The dataset is split into training (80%) and testing (20%) subsets using train_test_split().

3. Exploratory Data Analysis (EDA)

A. Visualizing Sensor Distributions

  • Histograms: Show how individual sensor readings are distributed.
  • Boxplots: Identify outliers in the sensor data.

B. Correlation Analysis

  • Heatmap (Seaborn): Identify strongly correlated sensor features.
  • Feature Reduction: Remove redundant sensors that provide duplicate information.

4. Feature Selection & Engineering

  • Select the most important sensor readings that impact the Pass/Fail classification.
  • Feature engineering techniques like Principal Component Analysis (PCA) or Feature Importance (Random Forest/Gradient Boosting) may be used.

5. Model Training & Evaluation

Multiple classification models are trained and evaluated:

A. Models Used

  • Logistic Regression (Baseline)
  • Decision Tree Classifier
  • Random Forest Classifier
  • Gradient Boosting Classifier
  • Naïve Bayes Classifier
  • Support Vector Machine (SVM)
  • Stacking Classifier (Combining multiple models)

B. Performance Metrics

  • Accuracy Score
  • Precision, Recall, F1-score
  • Confusion Matrix for classification performance.

6. Insights & Business Impact

  • What sensor readings indicate a failure?
    • Certain sensor values may be critical in identifying defective signals.
  • How accurate is the failure prediction?
    • If models achieve high accuracy (>85%), they can be deployed in real-world signal monitoring.
  • How can this help businesses?
    • Automated detection of failing signals reduces manual inspections and improves production efficiency.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published