# Task 4: Network Anomaly Detection using a Deep Autoencoder

## Project Overview

**Objective:**
The primary goal of this project is to develop and evaluate a deep autoencoder model for detecting anomalies in network traffic. The model will be trained to distinguish between normal network connections and various types of malicious attacks, such as Denial-of-Service (DoS), port scanning, and unauthorized access attempts.

**Dataset:**
The project utilizes the **KDD Cup 1999 dataset**, which was created by MIT Lincoln Labs for intrusion detection system evaluations. We will be working with the `kddcup.data_10_percent.gz` subset, which contains a large number of network connection records. Each record is described by 41 features and is labeled as either `normal.` or a specific type of attack.

**Methodology:**
The core approach is to build an autoencoder, a type of neural network trained to reconstruct its input data. The key steps of the methodology are:
1.  **Data Loading and Preprocessing:** Load the dataset, assign correct column names, and perform necessary preprocessing, including scaling numerical features and encoding categorical ones.
2.  **Model Architecture:** Design a deep autoencoder with multiple dense layers for both the encoder and the decoder.
3.  **Training Strategy:** Crucially, the autoencoder will be trained **exclusively on data corresponding to 'normal' network traffic**. The underlying hypothesis is that the model will learn to reconstruct normal data with a low error, but will struggle to reconstruct anomalous data (attacks), resulting in a high reconstruction error.
4.  **Evaluation:** The reconstruction error will serve as an anomaly score. By setting an appropriate threshold on this error, we can classify connections as either normal or anomalous. The model's performance will be evaluated on a test set containing both normal and anomalous data using metrics such as the confusion matrix, Receiver Operating Characteristic (ROC) curve, and the Area Under the Curve (AUC).

**Tools and Libraries:**
*   **Python 3.x**
*   **Pandas & NumPy** for data manipulation.
*   **Scikit-learn** for data preprocessing (scaling, splitting).
*   **TensorFlow/Keras** for building and training the deep autoencoder model.
*   **Matplotlib & Seaborn** for data visualization.

### Import required libraries

In [None]:
import pandas as pd
import numpy as np