----
### <b>IBM - What is an Autoencoder?</b>

https://www.ibm.com/topics/autoencoder

##### <u>Overview</u>

- within the category of encoder/decoder architecture
    - variational autoencoders (VAEs)
    - adversarial autoencoders (AAEs)
- useful for 
    - future data extraction (facial recognition, data compression, etc)
    - generative tasks (image generation or generating time series data)
- <mark>designed to identify <i><b>latent variables</b></i>, or inferred pieces of information based on common factors between data</mark>
    - not directly observable
    - impact the interpretations of how data is fundamentally distributed
    - exists in a <i>latent space</i> (i.e. the collection of these variables)
    - represents the most important pieces of information from the data
- <mark>uses <b>unsupervised</b> machine learning</mark>
    - the model learns which latent variables are the most important
    - then uses that information to try and <b>reconstruct original data</b>
    - don't rely on labeled training data or comparing with a ground truth (supervised)
    - the model compares its outputs against the original input data


##### <u>How it Works</u>

Encoder $\rarr$ Bottleneck $\rarr$ Decoder

**Autoencoder Structure**
- Encoder compresses input data through dimensionality reduction.
- Bottleneck contains the most compressed representation of input.
- Decoder reconstructs data back to its original form.

**Purpose of Autoencoders**
- Discovering minimum important features for effective reconstruction.
- Reconstruction error measures the efficacy of the autoencoder.

**Role of Decoder**
- In some cases, discarded after training to train the encoder.
- In VAEs, decoder outputs new data samples post-training.

**Advantages of Autoencoders**
- Capture complex non-linear correlations.
- Use non-linear activation functions like sigmoid.

**Adaptations and Hyperparameters**
- Code size determines data compression.
- Number of layers affects complexity vs. processing speed.
- Number of nodes per layer varies based on data nature.
- Loss function optimizes model weights during training.


##### <u>Use Cases</u>

**Data Compression:** Autoencoders naturally learn compressed representation of input data.

**Dimensionality Reduction:**
  - Encodings learned by autoencoders can be used in larger neural networks.
  - Reducing complexity can extract relevant features for other tasks and increase efficiency.

**Anomaly Detection and Facial Recognition:** Autoencoders detect anomalies by comparing reconstruction loss to a normal example.

**Image and Audio Denoising:** Denoising autoencoders remove extraneous artifacts or corruption in data.

**Image Reconstruction:** Autoencoders can fill in missing elements or colorize images.

**Generative Tasks:** VAEs and AAEs have been successful in generative tasks, including image and molecular structure generation.


----
### <b>Datacamp: Introduction to Autoencoders</b>

https://www.datacamp.com/tutorial/introduction-to-autoencoders

![Autoencoder Architecture](../images/autoencoder_architecture.png)

----
### <b>Nima - Machines Learn to Infer Stellar Parameters</b>

https://arxiv.org/abs/2009.12872

- goal was to put data to low dimension and try to reconstruct, turns out it learned things from raw numbers
- note that spectra (signals) is 1-dimensional, images is 2

##### <u>Deterministic Convolutional Autoencoder</u>

**Architecture**
  - Combination of convolutional, up-convolutional, and fully connected layers.
  - 15 convolutional layers in the encoder part.
  - Bottleneck transforms vectors down to 512 vectors of length 20.
  - Code size chosen based on desired compression rate.

**Reconstruction Loss**
  - Minimized per-pixel L1 loss function for pixel-level accuracy in reconstructed spectrum.
  - Empirically computed $L_{AE}$ for reconstruction error.

**Median Normalization**
  - Normalization of input spectra for stability of training process.
  - Original input spectra normalized to ensure consistent value ranges.

**Learning Disentangled Representations**
  - Key to interpretability.
  - Traditional methods of enforcing disentanglement built on VAE-based methods. (variational autoencoder)

##### <u>Dataset</u>

**HARPS Instrument**
  - Dataset built from observations using HARPS instrument.
  - Resolving power of 115,000, covering spectral range 378–691nm.
  - ∼270,000 HARPS fully reduced spectra used in investigations.

**Imbalanced Observations**
  - Visibility balancing technique incorporated during training to handle dataset imbalances.
  - Parallel experiments conducted to compare results with and without considering dataset imbalances.
  - Unique list of objects extracted to ensure each object is looked at only once.

##### <u>Reconstruction Results</u>

**Deterministic AutoEncoder**
  - Quality of reconstructed spectra dependent on bottleneck size.
  - Reconstructions displayed with various bottleneck sizes.
  - Higher bottleneck dimensions result in more accurate reconstruction of fine features.

**With Disentangled Features**
  - Disentanglement affects reconstruction quality
  - requires more bottleneck dimensions for high-quality reconstruction and disentanglement simultaneously

**Training Set vs. Validation Set**
  - HARPS dataset split into training and validation subsets for monitoring training process and avoiding overfitting.
  - No meaningful difference in reconstruction quality observed across subsets.