# Deep Learning Overview

## Neural Network Components

- Input layer: takes raw features (e.g., image pixels as 2D/3D arrays).
- Hidden layers: weighted sums + biases pass through activations to learn representations.
- Activations: add nonlinearity (ReLU, sigmoid, tanh) so the model can fit complex patterns.
- Output layer: produces task-specific predictions (e.g., class scores for cat/dog/bird).
- Training: adjust weights/biases via backpropagation and gradient descent to reduce loss.

![Alt text](images/nn.png)

## Disadvantages of Feedforward Neural Networks

- **No memory** – cannot remember past inputs
- **No temporal dependency modeling** – treats each input independently
- **Fixed-size input only** – all features must be given at once
- **No sense of order or sequence**
- **Poor performance on sequential data** (text, speech, time series)

## Recurrent Neural Networks (RNNs)


## Why Do We Need Recurrent Neural Networks (RNNs)?

- **Have memory** – store information from previous time steps
- **Model temporal dependencies** in sequential data
- **Process variable-length sequences**
- **Order-aware** – understands sequence and context
- **Better suited for time-based data**

Recurrent Neural Networks (RNNs)

- Process sequences step by step, carrying a hidden state that summarizes prior tokens.
- Parameter sharing across time makes them data-efficient for sequential patterns.
- Limitations for text generation:
  - Vanishing/exploding gradients hinder learning long-range dependencies.
  - Hidden state is a bottleneck; context can fade over long spans.
  - Strictly sequential computation limits parallelism, slowing training/inference.

![Alt text](images/rnn.png)

## RNN - Text Generation

- Processes input **sequentially, one word at a time**
- Each word is converted into a **word embedding** \(c_t\)
- **Hidden state** \(h^{(t)}\) stores information from previous words
- Hidden state is updated using current input and past memory
- **Initial hidden state** \(h^{(0)}\) starts the sequence
- **Softmax output layer** produces probability distribution over vocabulary
- Predicts the **next word** based on previous context
- Captures **temporal dependencies** in language


<p align="center">
  <img src="images/rnntextgeneration.png" alt="Alt text">
</p>

## RNN Disadvantages

- **Recurrent computation is slow** due to sequential processing
- **Difficult to learn long-term dependencies**
- **Vanishing and exploding gradient problems**
- **Hard to access information from many time steps back**
- **Training is computationally expensive**

## Need for LSTM (Long Short-Term Memory)

- Overcomes **vanishing gradient problem** in RNNs
- Effectively learns **long-term dependencies**
- Retains important information and **forgets irrelevant data**
- Performs better on **long sequences**
- More stable and reliable training than standard RNNs
- Widely used in **language modeling, speech recognition, and time-series tasks**

## LSTM Architecture – Brief Overview

- LSTM is a special type of RNN designed to model **long-term dependencies**
- Consists of a **cell state (Cₜ)** that acts as long-term memory
- Uses **three gates** to control information flow:
  - **Forget Gate** – decides what information to remove from the cell state
  - **Input Gate** – decides what new information to store
  - **Output Gate** – decides what information to output
- Gates use **sigmoid activation** to control data flow
- **Tanh activation** is used to scale candidate values
- Cell state flows through the network with minimal modification, reducing information loss
- Produces a **hidden state (hₜ)** at each time step for output or next step processing

<p align="center">
  <img src="images/lstm.png" alt="Alt text">
</p>

## LSTM - Text Generation

<p align="center">
  <img src="images/lstmtextgen.png" alt="Alt text">
</p>

from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.callbacks import EarlyStopping
from keras.models import Sequential
import keras 
from keras.utils import np_utils
import numpy as np 

In [None]:
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.callbacks import EarlyStopping
from keras.models import Sequential
import keras 
from keras.utils import np_utils
import numpy as np 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
tokenizer = Tokenizer()

In [None]:
def dataset_preparation(data):

    # basic cleanup
    corpus = data.lower().split("\n")

    # tokenization
    tokenizer.fit_on_texts(corpus)
    total_words = len(tokenizer.word_index) + 1

    # create input sequences using list of tokens
    input_sequences = []
    for line in corpus:
        token_list = tokenizer.texts_to_sequences([line])[0]
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)

    # pad sequences 
    max_sequence_len = max([len(x) for x in input_sequences])
    input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

    # create predictors and label
    predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
    label = keras.utils.np_utils.to_categorical(label, num_classes=total_words)

    return predictors, label, max_sequence_len, total_words

def create_model(predictors, label, max_sequence_len, total_words):
    model = Sequential()
    model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
    model.add(LSTM(150, return_sequences = True))
    # model.add(Dropout(0.2))
    model.add(LSTM(100,return_sequences = True))
    model.add(LSTM(100))
    model.add(Dense(total_words, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
    model.fit(predictors, label, epochs=100, verbose=1, callbacks=[earlystop])
    print (model.summary())
    return model 

def generate_text(seed_text, next_words, max_sequence_len,model):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = np.argmax(model.predict(token_list, verbose=0))
    
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text



#data = open('data.txt').read()
data = open('/content/drive/MyDrive/sample1.txt').read()
predictors, label, max_sequence_len, total_words = dataset_preparation(data)
model = create_model(predictors, label, max_sequence_len, total_words)

print( generate_text("It can process ", 5, max_sequence_len,model))

print("--Over--")