# Data

You are going to work with the following wafer manufacturing dataset:

**Overview**

Detecting Anomalies can be a difficult task and especially in the case of labeled datasets due to some level of human bias introduced while labeling the final product as anomalous or good. These giant manufacturing systems need to be monitored every 10 milliseconds to capture their behavior which brings in lots of information and what we call the Industrial IoT (IIOT). Also, hardly a manufacturer wants to create an anomalous product. Hence, the anomalies are like a needle in a haystack which renders the dataset that is significantly Imbalanced.

Capturing such a dataset using a machine learning model and making the model generalize can be fun. In this competition, we bring such a use-case from one of India's leading manufacturers of wafers(semiconductors). The dataset collected was anonymized to hide the feature names, also there are 1558 features that would require some serious domain knowledge to understand them.


**Attribute Description:**

* Feature_1 - Feature_1558 - Represents the various attributes that were collected from the manufacturing machine
* Class - (0 or 1) - Represents Good/Anomalous class labels for the products

In [1]:
data_path = "https://storage.googleapis.com/edulabs-public-datasets/wafer.zip"

# Instructions

Your task is to develop model for anomaly detection.

* Note that there are a 1558 features - you might want to reduce feature amount either by feature seleciton or extraction

* The dataset is unbalanced (there are much more normal samples rather than anomalies)

* Try unsupervised anomaly detection techniques

* Try supervised learning techniques - you will probably need to use various strategies for unbalanced datasets like undersampling / oversampling, etc

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv(data_path)

In [4]:
df.value_counts(df['Class'])

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,1620
1,143


In [5]:
df

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,...,feature_1550,feature_1551,feature_1552,feature_1553,feature_1554,feature_1555,feature_1556,feature_1557,feature_1558,Class
0,100,160,1.6000,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,20,83,4.1500,1,0,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0
2,99,150,1.5151,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,40,40,1.0000,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,12,234,19.5000,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1758,1,1,2.0000,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1759,40,200,5.0000,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1760,96,218,2.2708,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1761,16,81,5.0625,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
