# Silicon Defect Detection using DBSCAN and Gradio

---
<p align="justify">This analysis uses machine learning to detect anomalies in silicon wafers. It employs the DBSCAN algorithm to cluster wafers based on their thickness, resistivity, and impurities. Anomalies are identified as data points that do not belong to any cluster. A user-friendly interface using Gradio allows users to input wafer properties and receive an anomaly prediction. This approach aims to automate defect detection and enhance quality control in semiconductor manufacturing.

---


## Import Libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

## Load Dataset

In [2]:
df = pd.read_csv('/content/drive/MyDrive/datasets/silicon_defect_data.csv')
df.head()

Unnamed: 0,thickness,resistivity,impurities
0,527.727057,13.013539,4.58761
1,511.526015,10.736529,0.758916
2,504.129974,13.926211,0.161577
3,497.653702,10.110784,3.5436
4,506.137004,14.080651,1.197867


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   thickness    1000 non-null   float64
 1   resistivity  1000 non-null   float64
 2   impurities   1000 non-null   float64
dtypes: float64(3)
memory usage: 23.6 KB


## Select Features for Clustering

In [4]:
X = df[['thickness','resistivity','impurities']]
X

Unnamed: 0,thickness,resistivity,impurities
0,527.727057,13.013539,4.587610
1,511.526015,10.736529,0.758916
2,504.129974,13.926211,0.161577
3,497.653702,10.110784,3.543600
4,506.137004,14.080651,1.197867
...,...,...,...
995,482.982818,11.558763,0.131284
996,509.353438,17.002765,0.126738
997,500.007322,11.110607,1.651354
998,502.930482,18.171066,7.396351


## Standardise Features

In [5]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## Apply DBSCAN

In [6]:
dbscan = DBSCAN(eps=2, min_samples=5)
clusters = dbscan.fit_predict(X_scaled)

## Silhouette Score

In [7]:
silhouette = silhouette_score(X_scaled,clusters)
print(f"Silhouette Score : {silhouette:.2f}")

Silhouette Score : 0.61


In [9]:
df['Cluster'] = clusters
df.head()

Unnamed: 0,thickness,resistivity,impurities,Cluster
0,527.727057,13.013539,4.58761,0
1,511.526015,10.736529,0.758916,0
2,504.129974,13.926211,0.161577,0
3,497.653702,10.110784,3.5436,0
4,506.137004,14.080651,1.197867,0


In [10]:
clusters

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0

## Implementing UI with Gradio

In [12]:
!pip install gradio

Collecting gradio
  Downloading gradio-5.27.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.9.0 (from gradio)
  Downloading gradio_client-1.9.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (

In [11]:
#Create a function to be used for Gradio

def predict_anomaly(thickness,resistivity,impurities):
  new_data_point = np.array([thickness,resistivity,impurities])
  new_data_point_scaled = scaler.transform(new_data_point.reshape(1,-1))
  cluster = dbscan.fit_predict(new_data_point_scaled)[0]
  is_anomaly  = cluster == -1
  if is_anomaly:
    return "Anomaly Detected!."
  else:
    return "No Anomaly Detected"

In [13]:
import gradio as gr

iface = gr.Interface(
    fn=predict_anomaly,
    inputs = [
        gr.Number(label = "Thickness"),
        gr.Number(label = "Resistivity"),
        gr.Number(label = "Impurities")
    ],
    outputs = "text",
    title = "Silicon Defect Anomaly Detection",
    description="Enter the silicon properties to check for anomalies."
    )

iface.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://b5d1279369b824662f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Conclusion

<p align="justify">This analysis demonstrates the effective application of DBSCAN clustering for identifying anomalies in silicon wafers based on their physical properties. By clustering wafers with similar characteristics, the model successfully isolates outliers, potentially representing defective wafers.

<p align="justify">The integration of a user-friendly interface using Gradio further enhances the practicality of this approach, enabling users to quickly assess the anomaly status of individual wafers. This automated anomaly detection system offers a valuable tool for quality control in semiconductor manufacturing, potentially reducing manual inspection efforts and improving overall production efficiency.

<p align="justify">While this study focused on specific features and a particular dataset, the methodology can be adapted to incorporate additional wafer properties and applied to diverse datasets in the semiconductor industry. Future research could explore refining the clustering parameters, evaluating the system's performance on larger datasets, and integrating it with real-time monitoring systems for proactive defect detection.