# LR Techniques:

## 1. Predetermined Piecewise Scheduling

#### In predetermined piecewise scheduling, the learning rate is manually defined for specific epochs. It remains constant for a defined number of steps/epochs and then reduces.

##### Advantages:
###### - Simple and easy to implement.
###### - Useful when you have domain knowledge or experience with similar tasks.

##### Disadvantages:
###### - Requires manual tuning and experimentation.
###### - Not adaptive to the model's performance.

In [None]:
import tensorflow as tf

initial_learning_rate = 0.1
boundaries = [10, 20, 30]  # At these epochs, learning rate will change
values = [0.1, 0.01, 0.001, 0.0001]  # Learning rate at each stage

learning_rate_fn = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate_fn),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])


## 2. Performance-Based Scheduling

#### In performance-based scheduling (also called ReduceLROnPlateau), the learning rate is reduced when the model's performance (e.g., validation loss) stops improving.

###### Advantages:
###### - Adaptive and reduces the learning rate when progress slows.
###### - Helps in fine-tuning the model in later stages of training.

##### Disadvantages:
###### - May result in overfitting if the learning rate is reduced too quickly.

In [None]:
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1, patience=5, min_lr=0.00001)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_data, train_labels, validation_data=(val_data, val_labels),
          epochs=50, callbacks=[reduce_lr])


## 3. Exponential Scheduling

#### Exponential decay reduces the learning rate by multiplying it by a factor (usually less than 1) at each step or epoch.

##### Advantages:
###### - Smooth decay in learning rate helps fine-tuning.
###### - Can accelerate convergence initially.

##### Disadvantages:
###### - May reduce learning rate too quickly if not configured properly.

In [None]:
initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate, decay_steps=10000, decay_rate=0.96, staircase=True)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])


## 4. Power Scheduling

#### Power scheduling reduces the learning rate at a polynomial rate with respect to time. The learning rate is typically reduced at a rate proportional to 𝑡 ** −𝑝𝑜𝑤𝑒𝑟 , where t is the epoch number.

##### Advantage:
###### - Provides a more controlled decay compared to exponential scheduling.
###### - Useful in large-scale problems where fine-tuning is crucial.

##### Disadvantages:
###### - Needs careful tuning of the power parameter.
##### - Reduces the learning rate slowly.

In [None]:
initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
    initial_learning_rate, decay_steps=10000, end_learning_rate=0.0001, power=0.5)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])
