Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shape mismatch error in loss calculation #51

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dewijones92
Copy link

The loss calculation in the code was causing a shape mismatch error due to

inconsistent tensor shapes. The error occurred because the entire Y tensor

was being used to index the prob tensor, which had a different shape.

The original line of code:

loss = -prob[torch.arange(32), Y].log().mean()

was causing the issue because:

  1. torch.arange(32) creates a tensor of indices from 0 to 31, assuming a fixed

    batch size of 32. However, the actual batch size might differ.

  2. Y refers to the entire label tensor, which has a shape of (num_samples,),

    where num_samples is the total number of samples in the dataset.

Using the entire Y tensor to index prob resulted in a shape mismatch because

prob has a shape of (batch_size, num_classes), where batch_size is the number

of samples in the current minibatch and num_classes is the number of possible

output classes.

To fix this issue, the line was modified to:

loss = -prob[torch.arange(prob.shape[0]), Y[ix]].log().mean()

The changes made:

  1. torch.arange(prob.shape[0]) creates a tensor of indices from 0 to batch_size-1,

    dynamically adapting to the actual batch size of prob.

  2. Y[ix] retrieves the labels corresponding to the current minibatch indices ix,

    ensuring that the labels align correctly with the predicted probabilities in prob.

By using Y[ix] instead of Y, the shapes of the indexing tensors match during the

loss calculation, resolving the shape mismatch error. The model can now be trained

and evaluated correctly on the given dataset.

These changes were necessary to ensure the correct calculation of the loss for each

minibatch, enabling the model to learn from the appropriate labels and improve its

performance.

Fixes #50

The loss calculation in the code was causing a shape mismatch error due to

inconsistent tensor shapes. The error occurred because the entire `Y` tensor

was being used to index the `prob` tensor, which had a different shape.

The original line of code:

`loss = -prob[torch.arange(32), Y].log().mean()`

was causing the issue because:

1. `torch.arange(32)` creates a tensor of indices from 0 to 31, assuming a fixed

   batch size of 32. However, the actual batch size might differ.

2. `Y` refers to the entire label tensor, which has a shape of (num_samples,),

   where num_samples is the total number of samples in the dataset.

Using the entire `Y` tensor to index `prob` resulted in a shape mismatch because

`prob` has a shape of (batch_size, num_classes), where batch_size is the number

of samples in the current minibatch and num_classes is the number of possible

output classes.

To fix this issue, the line was modified to:

`loss = -prob[torch.arange(prob.shape[0]), Y[ix]].log().mean()`

The changes made:

1. `torch.arange(prob.shape[0])` creates a tensor of indices from 0 to batch_size-1,

   dynamically adapting to the actual batch size of `prob`.

2. `Y[ix]` retrieves the labels corresponding to the current minibatch indices `ix`,

   ensuring that the labels align correctly with the predicted probabilities in `prob`.

By using `Y[ix]` instead of `Y`, the shapes of the indexing tensors match during the

loss calculation, resolving the shape mismatch error. The model can now be trained

and evaluated correctly on the given dataset.

These changes were necessary to ensure the correct calculation of the loss for each

minibatch, enabling the model to learn from the appropriate labels and improve its

performance.

Fixes karpathy#50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with makemore_part2_mlp.ipynb
1 participant