# init

> The `init` module in the `minima` (mi) library provides a suite of tensor initialization functions to create and initialize tensors in various ways. Each function in this module represents a different strategy for initializing the values of a tensor, such as uniform or normal random values, constant values, or specialized initializations like Xavier or Kaiming methods.

> These initialization methods serve as the starting point for the optimization process in neural networks, setting the stage for gradient descent and other optimization methods to fine-tune the model's parameters during training. Carefully chosen initial values can significantly influence the training dynamics and the final performance of a model.
The `init` module is a critical part of the deep learning pipeline, providing the essential first step in the process of training a neural network. It ensures a smooth and effective transition from model definition to the iterative process of learning from data.

In [None]:
#| default_exp init

In [None]:
#| export
import math
import minima as mi

1. **`rand`**: This function generates a tensor filled with random numbers drawn from a uniform distribution between `low` and `high` (defaulting to 0 and 1). It does this by creating an array of random values on the specified device (defaulting to CPU), then scales and shifts these values to the correct range. The result is wrapped in a `mi.Tensor` object, which supports automatic differentiation if `requires_grad` is True.

In [None]:
#| export
def rand(
    *shape, # The shape of the output tensor. Variable length argument list. 
    low=0.0, # Lower bound of the uniform distribution. Default is 0.0.
    high=1.0, # Upper bound of the uniform distribution. Default is 1.0.
    device=None, # The device where the tensor will be allocated. Default is CPU.
    dtype='float32', # The data type of the tensor. Default is 'float32'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a tensor with random numbers uniformly distributed between `low` and `high`.

    Parameters
    ----------
    *shape : int
    low : float, optional
    high : float, optional
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A tensor of shape `shape`, filled with random numbers from the uniform distribution between `low` and `high`.

    """
    device = mi.cpu() if device is None else device
    array = device.rand(*shape) * (high - low) + low
    return mi.Tensor(array, device=device, dtype=dtype, requires_grad=requires_grad)


In [None]:
rand(10,5)

In [None]:
t = rand(10,5)

In [None]:
t.dtype, t.device, t.requires_grad

2. **`randn`**: Similar to `rand`, but generates numbers from a normal distribution with the specified mean and standard deviation (defaulting to 0 and 1). This is done by creating an array of normally-distributed random values, then scaling and shifting them to match the requested parameters.

In [None]:
#| export
def randn(
    *shape, # The shape of the output tensor. Variable length argument list.
    mean=0.0,# Mean of the normal distribution. Default is 0.0.
    std=1.0, # Standard deviation of the normal distribution. Default is 1.0.
    device=None,# The device where the tensor will be allocated. Default is CPU.
    dtype="float32",# The data type of the tensor. Default is 'float32'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a tensor with random numbers normally distributed with specified mean and standard deviation.

    Parameters
    ----------
    *shape : int
    mean : float, optional
    std : float, optional
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A tensor of shape `shape`, filled with random numbers from the normal distribution with the specified mean and standard deviation.
    """
    device = mi.cpu() if device is None else device
    array = device.randn(*shape) * std + mean
    return mi.Tensor(array, device=device, dtype=dtype, requires_grad=requires_grad)

In [None]:
t = randn(5,5, requires_grad=True)

In [None]:
t

In [None]:
t.shape, t.dtype, t.device, t.requires_grad

3. **`constant`**: This function creates a tensor filled with a constant value `c` (defaulting to 1). It does this by creating an array of ones on the specified device and then scaling these ones by the constant value.

In [None]:
#| export
def constant(
    *shape, # The shape of the output tensor. Variable length argument list.
    c=1.0, # The constant value to fill the tensor with. Default is 1.0.
    device=None, # The device where the tensor will be allocated. Default is CPU.
    dtype="float32", # The data type of the tensor. Default is 'float32'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a tensor filled with a constant value.

    Parameters
    ----------
    *shape : int
    c : float, optional
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A tensor of shape `shape`, filled with the constant value `c`.
    """
    device = mi.cpu() if device is None else device
    array = device.ones(*shape, dtype=dtype) * c # note: can change dtype
    return mi.Tensor(array, device=device, dtype=dtype, requires_grad=requires_grad)

4. **`ones` and `zeros`**: These functions are simply shortcuts for creating tensors filled with ones or zeros, respectively. They're implemented by calling the `constant` function with `c` set to 1 or 0.

In [None]:
#| export
def ones(
    *shape, # The shape of the output tensor. Variable length argument list.
    device=None, # The device where the tensor will be allocated. Default is CPU.
    dtype="float32", # The data type of the tensor. Default is 'float32'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a tensor filled with ones.

    Parameters
    ----------
    *shape : int
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A tensor of shape `shape`, filled with ones.
    """
    return constant(*shape, c=1.0, device=device, dtype=dtype, requires_grad=requires_grad)

In [None]:
#| export
def zeros(
    *shape, # The shape of the output tensor. Variable length argument list.
    device=None, # The device where the tensor will be allocated. Default is CPU.
    dtype="float32", # The data type of the tensor. Default is 'float32'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a tensor filled with zeros.

    Parameters
    ----------
    *shape : int
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A tensor of shape `shape`, filled with zeros.
    """
    return constant(*shape, c=0.0, device=device, dtype=dtype, requires_grad=requires_grad)

5. **`randb`**: This function creates a binary tensor, with each element independently being True with probability `p` (defaulting to 0.5). This is done by generating uniformly-distributed random numbers and checking whether they're less than or equal to `p`.

In [None]:
#| export
def randb(
    *shape, # The shape of the output tensor. Variable length argument list.
    p=0.5, # The probability of generating a `True` (1) in the binary tensor. Default is 0.5.
    device=None, # The device where the tensor will be allocated. Default is CPU.
    dtype="bool", # The data type of the tensor. Default is 'bool'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a binary tensor with random values of `True` or `False`.

    Parameters
    ----------
    *shape : int
    p : float, optional
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A binary tensor of shape `shape`, filled with random boolean values, where the probability of `True` is `p`.
    """
    device = mi.cpu() if device is None else device
    array = device.rand(*shape) <= p
    return mi.Tensor(array, device=device, dtype=dtype, requires_grad=requires_grad)

6. **`one_hot`**: This function creates a one-hot encoding tensor. Given a size `n` and an index `i`, it creates a tensor of size `n` with a 1 at the `i`-th position and 0s elsewhere.

In [None]:
#| export
def one_hot(
    n, # The size of the one-hot vector.
    i, # The index to be set to `1` in the one-hot vector.
    device=None, # The device where the tensor will be allocated. Default is CPU.
    dtype="float32", # The data type of the tensor. Default is 'float32'.
    requires_grad=False # If True, the tensor is created with gradient tracking. Default is False.
):
    """
    Generates a one-hot encoding tensor.

    Parameters
    ----------
    n : int
    i : int
    device : Device, optional
    dtype : str, optional
    requires_grad : bool, optional
    
    Returns
    -------
    mi.Tensor
        A one-hot tensor of size `n`, with the `i`th element set to `1` and all others set to `0`.
    """
    device = mi.cpu() if device is None else device
    return mi.Tensor(device.one_hot(n,i.numpy(), dtype=dtype), device=device, requires_grad=requires_grad)

## Export

In [None]:
import nbdev; nbdev.nbdev_export()