# <div align="center">Auto Encoder</div>

- Anomaly detection is a machine learning method used to find data patterns that deviate from normal behavior. These anomalies, or outliers, may signal errors, fraud, or unusual events requiring further analysis. It is widely applied in fields like finance, cybersecurity, healthcare, and maintenance, using techniques such as statistical models, clustering, and deep learning.
- Popular anomaly detection methods include PCA, K-Nearest Neighbors, Isolation Forest, and ensemble techniques. In deep learning, Autoencoders are widely used by learning normal data patterns and detecting deviations as anomalies. Evaluating such models is difficult due to the rarity of anomalies, so metrics like precision, recall, and F1-score, along with cross-validation, are used to assess performance.
- `Why not PCA`: PCA is a dimensionality reduction technique that identifies directions of maximum variance and can aid in anomaly detection by simplifying high-dimensional data. However, it assumes linear relationships and may fail on nonlinear or complex datasets. It is also less effective at detecting rare anomalies when normal patterns dominate the data.

- `Autoencoders` are a type of neural network architecture that can be used for unsupervised learning, dimensionality reduction, and data compression. The goal of an autoencoder is to learn a compressed representation of the input data by encoding the input into a lower-dimensional representation, and then decoding the representation back into the original input.
- Autoencoders have three parts: an encoder that compresses input into a lower-dimensional representation, a hidden layer (Code) that stores this representation, and a decoder that reconstructs the original data from it.
- Autoencoders learn to reduce the difference between input and reconstructed output, creating a compressed data representation. They capture complex, non-linear patterns and are useful for tasks like image compression, anomaly detection, and data generation.
- In anomaly detection, autoencoders are trained on normal data to learn its typical patterns. New data is encoded and reconstructed, and if the reconstruction error exceeds a set threshold, it is flagged as an anomaly. This approach works without needing labeled anomalies in the training set.

#### Step 1: Load The data
#### Step 2: Data Analysis and Cleaning
- df.info() # Dataframe info
- df.shape # Count rows and columns
- df.isna().sum() # Null check
- df.describe() # Dataframe statistics
#### Step 3: Exploratory Data Analysis (EDA)
- df['fraud'].value_counts() plot the details and infer the details
  - Inferance: The dataset is heavily imbalanced. The autoencoders only require instances of genuine samples for training. The fraud samples will be used for testing.
- sns.heatmap(df.corr(), annot=True, fmt='.4f', cmap='Blues') \n plt.title('Correlation Matrix')
  - Inferences: Most correlation values are very close to 0, which indicates that our features are weakly correlated. PCA assumes linearity in the data, and it does not work well with Non-linear features. This is where Autoencoders prove to be useful, since they can capture the complex relationships in the data.
#### Step 4: Feature Engineering
- Purpose (make_column_transformer): It ensures only selected columns (typically those containing numerical features) are scaled, leaving other columns untouched or subject to additional transformations if specified elsewhere in the transformer.
- Functionality: StandardScaler() standardizes the features by removing the mean and scaling to unit variance. num_feats should be a list or array of column names or indices containing numerical features.
- Result: When the transformerâ€™s .fit_transform() method is called on a DataFrame, the numerical columns in num_feats will be normalized, and other columns (if not otherwise stated using remainder or other tuples) will be dropped.