

**Q1. What is anomaly detection and what is its purpose?**

### Answer 1

Anomaly detection is the identification of data points that are significantly different from the majority of the data. The purpose of anomaly detection is to identify outliers, errors, or unusual events in data. Anomaly detection can be used for a variety of purposes, such as fraud detection, intrusion detection, and quality control.

----------

**Q2. What are the key challenges in anomaly detection?**

### Answer 2
The key challenges in anomaly detection include:

* **Data imbalance:** Anomaly data points are often rare, which can make it difficult to train an accurate anomaly detection model.
* **Outlier definition:** What constitutes an anomaly can vary depending on the application. It is important to define what constitutes an anomaly before building an anomaly detection model.
* **Model selection:** There are a variety of anomaly detection algorithms available. It is important to select the right algorithm for the specific application.


----------------------------------------------------------------

**Q3. How does unsupervised anomaly detection differ from supervised anomaly detection?**

### Answer 3 

Unsupervised anomaly detection does not use any labeled data. It identifies outliers by looking for data points that are significantly different from the majority of the data. Supervised anomaly detection uses labeled data to train a model that can identify outliers.

--------

**Q4. What are the main categories of anomaly detection algorithms?**

### Answer 4


The main categories of anomaly detection algorithms include:

* **Distance-based anomaly detection:** These algorithms identify outliers by looking at the distance between a data point and its neighbors.
* **Density-based anomaly detection:** These algorithms identify outliers by looking at the density of data points in a region.
* **Statistical anomaly detection:** These algorithms identify outliers by looking at statistical properties of the data, such as the mean, standard deviation, and kurtosis.
* **Machine learning anomaly detection:** These algorithms use machine learning techniques to identify outliers.

--------

**Q5. What are the main assumptions made by distance-based anomaly detection methods?**

### Answer 5


The main assumptions made by distance-based anomaly detection methods include:

* The data is normally distributed.
* Outliers are rare.
* The majority of data points are similar to each other.

----------------------------------------------------------------

**Q6. How does the LOF algorithm compute anomaly scores?**

### Answer 6

The LOF algorithm computes anomaly scores by looking at the local density of a data point. A data point with a low local density is more likely to be an outlier.

----------

**Q7. What are the key parameters of the Isolation Forest algorithm?**

## Answer 7

The key parameters of the Isolation Forest algorithm include:

* The number of trees.
* The depth of each tree.
* The number of features used to split each node.

--------

**Q8. If a data point has only 2 neighbours of the same class within a radius of 0.5, what is its anomaly score using KNN with K=10?**
## Answer 8 

The anomaly score for a data point using KNN with K=10 is calculated as follows:

```
anomaly_score = (k - number_of_neighbors) / k
```

In this case, the anomaly score is (10 - 2) / 10 = 0.8.

---------

**Q9. Using the Isolation Forest algorithm with 100 trees and a dataset of 3000 data points, what is the anomaly score for a data point that has an average path length of 5.0 compared to the average path length of the trees?**


### Answer 9

In [1]:
import numpy as np
from sklearn.ensemble import IsolationForest

# Create a random dataset with 3000 data points
np.random.seed(0)
X = np.random.randn(3000, 2)

# Create an Isolation Forest model with 100 trees
isolation_forest = IsolationForest(n_estimators=100)

# Fit the model to the dataset
isolation_forest.fit(X)

# Calculate the average path length of the trees
n_samples = X.shape[0]
average_path_length = np.zeros(n_samples)
for i in range(n_samples):
    average_path_length[i] = isolation_forest.decision_function([X[i]]).mean()

# Calculate the anomaly score for a data point with average path length 5.0
data_point_average_path_length = 5.0
anomaly_score = isolation_forest.decision_function(X).mean()

# Calculate the anomaly score difference
score_difference = anomaly_score - data_point_average_path_length

# Print the anomaly score difference for the data point
print("Anomaly Score Difference:", score_difference)


Anomaly Score Difference: -4.965486990301995
