<a href="https://colab.research.google.com/github/rohitblpprajapat/100-days-of-code/blob/master/pytorch_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch


In [None]:
help(torch)

  get_obj = getattr(cls, name)


Help on package torch:

NAME
    torch

DESCRIPTION
    The torch package contains data structures for multi-dimensional
    tensors and defines mathematical operations over these tensors.
    Additionally, it provides many utilities for efficient serialization of
    Tensors and arbitrary types, and other useful utilities.

    It has a CUDA counterpart, that enables you to run your tensor computations
    on an NVIDIA GPU with compute capability >= 3.0.

PACKAGE CONTENTS
    _C
    _VF
    __config__
    __future__
    _appdirs
    _awaits (package)
    _classes
    _compile
    _custom_op (package)
    _custom_ops
    _decomp (package)
    _dispatch (package)
    _dynamo (package)
    _environment
    _export (package)
    _functorch (package)
    _guards
    _higher_order_ops (package)
    _inductor (package)
    _jit_internal
    _lazy (package)
    _library (package)
    _linalg_utils
    _lobpcg
    _logging (package)
    _lowrank
    _meta_registrations
    _namedtensor_interna

In [None]:
help(torch.tensor)

Help on built-in function tensor in module torch:

tensor(...)
    tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False) -> Tensor

    Constructs a tensor with no autograd history (also known as a "leaf tensor", see :doc:`/notes/autograd`) by copying :attr:`data`.


        When working with tensors prefer using :func:`torch.Tensor.clone`,
        :func:`torch.Tensor.detach`, and :func:`torch.Tensor.requires_grad_` for
        readability. Letting `t` be a tensor, ``torch.tensor(t)`` is equivalent to
        ``t.detach().clone()``, and ``torch.tensor(t, requires_grad=True)``
        is equivalent to ``t.detach().clone().requires_grad_(True)``.

    .. seealso::

        :func:`torch.as_tensor` preserves autograd history and avoids copies where possible.
        :func:`torch.from_numpy` creates a tensor that shares storage with a NumPy array.

    Args:
        data (array_like): Initial data for the tensor. Can be a list, tuple,
            NumPy ``ndarray``,

In [None]:
import numpy as np


In [None]:
params = np.load('/content/drive/MyDrive/Copy of parameters_w11.npz')

In [None]:
import numpy as np

# 1. Load the Parameters
# Ensure 'parameters_w11.npz' is in your current directory

U_e = params['U_e']
W_e = params['W_e']
W_d = params['W_d']
U_d = params['U_d']
V_d = params['V_d']  # The diagram labels this 'V', usually the output matrix

# 2. Define Helper Functions
def tanh(x):
    return np.tanh(x)

def softmax(x):
    e_x = np.exp(x - np.max(x)) # Subtract max for numerical stability
    return e_x / np.sum(e_x, axis=0)

def cross_entropy_loss(y_true, y_pred):
    # y_true is 5x1 one-hot, y_pred is 5x1 probability distribution
    # We add a small epsilon to avoid log(0)
    epsilon = 1e-15
    # Extract the probability corresponding to the true class (where y_true is 1)
    true_class_prob = np.sum(y_true * y_pred)
    loss = -np.log(true_class_prob + epsilon)
    return loss

# 3. Define the Data
# Based on the diagram logic:
# Vocabulary size is 5.
# Let's map the vectors shown in the diagram to variables.

# Encoder Inputs (x_source = "ariya")
# Looking at the diagram's one-hot vectors:
a = np.array([[1], [0], [0], [0], [0]])
r = np.array([[0], [1], [0], [0], [0]])
i = np.array([[0], [0], [1], [0], [0]])
y_char = np.array([[0], [0], [0], [1], [0]]) # 'y' variable name is taken, using y_char

encoder_inputs = [a, r, i, y_char, a]

# Decoder Inputs (for "learn")
# The diagram shows the decoder inputs start with <go>
go = np.array([[0], [0], [0], [0], [1]])
l  = np.array([[1], [0], [0], [0], [0]])
e  = np.array([[0], [1], [0], [0], [0]])
a_dec = np.array([[0], [0], [1], [0], [0]]) # Note: 'a' in decoder might map differently, strictly following diagram vectors
r_dec = np.array([[0], [0], [0], [1], [0]])
n  = np.array([[0], [0], [0], [0], [1]])

# The input sequence to the decoder (s1 to s5 inputs)
decoder_inputs = [go, l, e, a_dec, r_dec]

# The TARGET sequence (what we want to predict at y1 to y5)
# If input is <go>, target is 'l'. If input is 'l', target is 'e', etc.
targets = [l, e, a_dec, r_dec, n]

# 4. Forward Propagation

# --- Encoder ---
h_t = np.zeros((5, 1)) # Initial hidden state h0

for x_t in encoder_inputs:
    # h_t = tanh(W_e * h_{t-1} + U_e * x_t)
    # Note: Use np.dot or @ for matrix multiplication
    h_t = tanh(np.dot(W_e, h_t) + np.dot(U_e, x_t))

# The final encoder state becomes the initial decoder state
s_t = h_t

# --- Decoder ---
total_loss = 0

print("Starting Decoder...")

for t in range(len(decoder_inputs)):
    dec_input = decoder_inputs[t]
    target = targets[t]

    # 1. Update State: s_t = tanh(W_d * s_{t-1} + U_d * dec_input)
    s_t = tanh(np.dot(W_d, s_t) + np.dot(U_d, dec_input))

    # 2. Calculate Logits: z = V * s_t
    z = np.dot(V_d, s_t)

    # 3. Calculate Probabilities: y_hat = softmax(z)
    y_hat = softmax(z)

    # 4. Calculate Loss for this step
    loss_t = cross_entropy_loss(target, y_hat)
    total_loss += loss_t

    print(f"Time step {t+1}: Loss = {loss_t:.4f}")

print("-" * 20)
print(f"Total Loss L(theta): {total_loss:.4f}")

Starting Decoder...
Time step 1: Loss = 2.8587
Time step 2: Loss = 1.9138
Time step 3: Loss = 4.9559
Time step 4: Loss = 4.8696
Time step 5: Loss = 3.1378
--------------------
Total Loss L(theta): 17.7357
