# Assignment no 84 Anomaly Detection II (3.5.23)

### Q1. What is the role of feature selection in anomaly detection?

Ans - 

**Feature selection** plays a crucial role in anomaly detection by:
- **Improving Model Accuracy**: By selecting the most relevant features, the model can more accurately distinguish between normal and anomalous data points.
- **Reducing Dimensionality**: Helps in minimizing the computational cost and complexity, especially important for high-dimensional datasets.
- **Enhancing Interpretability**: Simplifies the model, making it easier to interpret the results and understand the factors contributing to anomalies.
- **Preventing Overfitting**: Reduces the risk of overfitting by eliminating irrelevant or redundant features.

### Q2. What are some common evaluation metrics for anomaly detection algorithms and how are they computed?

Ans - 

**Common Evaluation Metrics**:
- **Accuracy**: Measures the proportion of correctly identified anomalies and normal instances.
  \[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]
- **Precision**: Measures the proportion of true anomalies among the identified anomalies.
  \[ \text{Precision} = \frac{TP}{TP + FP} \]
- **Recall**: Measures the proportion of actual anomalies that were correctly identified.
  \[ \text{Recall} = \frac{TP}{TP + FN} \]
- **F1 Score**: The harmonic mean of precision and recall, providing a balance between the two.
  \[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]
- **ROC-AUC**: The area under the receiver operating characteristic curve, measuring the trade-off between true positive rate and false positive rate.


### Q3. What is DBSCAN and how does it work for clustering?

Ans - 

**DBSCAN (Density-Based Spatial Clustering of Applications with Noise)** is a clustering algorithm that identifies clusters based on the density of data points. It works as follows:
- **Core Points**: Points with at least `min_samples` neighbors within `epsilon` distance.
- **Border Points**: Points that are within `epsilon` distance of a core point but do not have enough neighbors to be core points themselves.
- **Noise Points**: Points that are not core points or border points.

**Steps**:
1. Select a random point.
2. Retrieve all points density-reachable from the selected point.
3. If the selected point is a core point, a cluster is formed.
4. Repeat the process until all points are processed.

### Q4. How does the epsilon parameter affect the performance of DBSCAN in detecting anomalies?

Ans - 

The **epsilon (ε)** parameter defines the radius of the neighborhood around a data point. Its effect on performance includes:
- **Smaller ε**: Leads to many small clusters and potentially many noise points, capturing more local patterns but possibly missing global structures.
- **Larger ε**: Leads to fewer, larger clusters and fewer noise points, capturing more global patterns but possibly missing local anomalies.

### Q5. What are the differences between the core, border, and noise points in DBSCAN, and how do they relate to anomaly detection?

Ans - 

- **Core Points**: Points with at least `min_samples` neighbors within `ε`. They form the internal part of a cluster and are not considered anomalies.
- **Border Points**: Points within `ε` distance of a core point but with fewer than `min_samples` neighbors. They are on the edge of clusters and may or may not be anomalies.
- **Noise Points**: Points that are neither core points nor border points. These are considered anomalies.

### Q6. How does DBSCAN detect anomalies and what are the key parameters involved in the process?

Ans - 

**DBSCAN** detects anomalies by identifying noise points:
- **Key Parameters**:
  - **epsilon (ε)**: The maximum radius of the neighborhood.
  - **min_samples**: The minimum number of points required to form a dense region (core point).
  
**Detection Process**:
- Points that do not belong to any cluster (noise points) are detected as anomalies.

### Q7. What is the make_circles package in scikit-learn used for?

Ans - 

The `make_circles` function in scikit-learn is used for generating a large circle containing a smaller circle in 2-dimensional space. It is often used for testing and visualizing clustering algorithms.


### Q8. What are local outliers and global outliers, and how do they differ from each other?

Ans-

- **Local Outliers**: Data points that deviate significantly from their local neighborhood.
- **Global Outliers**: Data points that deviate significantly from the entire dataset.

**Difference**:
- Local outliers are detected by comparing a point to its nearest neighbors, while global outliers are detected by comparing a point to the overall data distribution.

### Q9. How can local outliers be detected using the Local Outlier Factor (LOF) algorithm?

Ans - 

The **Local Outlier Factor (LOF)** algorithm detects local outliers by:
1. Calculating the local density deviation of a data point with respect to its neighbors.
2. Points with a higher LOF score are considered local outliers.

### Q10. How can global outliers be detected using the Isolation Forest algorithm?

Ans - 

The **Isolation Forest** algorithm detects global outliers by:
1. Constructing random binary trees.
2. Isolating points based on the number of splits required to isolate them.
3. Points that are isolated with fewer splits (shorter path lengths) are considered global outliers.


### Q11. What are some real-world applications where local outlier detection is more appropriate than global outlier detection, and vice versa?

Ans - 

**Local Outlier Detection**:
- **Network Security**: Detecting unusual activities in specific network segments.
- **Sensor Networks**: Identifying faulty sensors in localized regions.
- **Healthcare**: Detecting abnormal patient readings in specific departments.

**Global Outlier Detection**:
- **Fraud Detection**: Identifying fraudulent transactions across an entire dataset.
- **Manufacturing**: Detecting defective products in a production line.
- **Finance**: Identifying outlier financial transactions or market anomalies.