Skip to content

Conversation

@ssam18
Copy link
Contributor

@ssam18 ssam18 commented Nov 17, 2025

I was working with the CIFAR-10 dataset and noticed that the documentation doesn't mention the known issue of label noise. I kept running into images that seemed mislabeled - airplanes labeled as frogs, horses as cats, etc. At first I thought something was wrong with my code, but after researching, I learned this is a well-known characteristic of the original CIFAR-10 dataset.

I've added a brief note to the load_data() docstring to inform users about this label noise issue. This should help new users understand that some label inconsistencies are expected and not due to bugs in their code or the Keras implementation.

The note is concise and placed right after the dataset description, so users will see it before they start working with the data. It mentions that:

  • The dataset has a small percentage of mislabeled samples
  • This is inherent to the original dataset
  • It may impact training and evaluation

Fixes #21631

@ssam18
Copy link
Contributor Author

ssam18 commented Nov 17, 2025

This is a simple documentation improvement that should help avoid confusion for users who encounter mislabeled images in CIFAR-10. The note is factual, concise, and doesn't require any code changes - just making the existing dataset behavior more transparent in the docs.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ssam18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the CIFAR-10 dataset documentation by incorporating a crucial note regarding label noise. This addition aims to proactively inform users about the inherent presence of mislabeled samples within the dataset, thereby setting appropriate expectations and preventing potential confusion or misdiagnosis of issues during model development and evaluation.

Highlights

  • Documentation Update: Added a note to the load_data() docstring in keras/src/datasets/cifar10.py to inform users about the known issue of label noise in the CIFAR-10 dataset.
  • User Awareness: The new note clarifies that a small percentage of mislabeled samples is inherent to the original CIFAR-10 dataset and may affect model training and evaluation, preventing user confusion.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a helpful note to the cifar10.load_data() docstring, warning users about the known label noise in the CIFAR-10 dataset. This is a valuable addition that can save users from confusion and debugging time. My review includes one suggestion to adjust the placement of this note within the docstring to align with the Keras API style guide, which will improve documentation consistency.

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.08%. Comparing base (edbf8f5) to head (461a33b).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21855      +/-   ##
==========================================
- Coverage   82.47%   78.08%   -4.40%     
==========================================
  Files         577      577              
  Lines       59508    59508              
  Branches     9332     9332              
==========================================
- Hits        49080    46464    -2616     
- Misses       8015    10747    +2732     
+ Partials     2413     2297     -116     
Flag Coverage Δ
keras 77.93% <ø> (-4.37%) ⬇️
keras-jax ?
keras-numpy 57.55% <ø> (ø)
keras-openvino 34.34% <ø> (ø)
keras-tensorflow 64.12% <ø> (+<0.01%) ⬆️
keras-torch 63.60% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The CIFAR-10 dataset is known to contain a small percentage of
mislabeled samples, which can affect model training and evaluation.
This note helps users understand that some label inconsistencies
are expected and inherent to the original dataset.

Fixes keras-team#21631

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
@ssam18 ssam18 force-pushed the add-cifar10-label-noise-note branch from b17cabc to 461a33b Compare November 17, 2025 23:51
Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sure seems worth including, thanks for the PR

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Nov 24, 2025
@fchollet fchollet merged commit 9bcdbc7 into keras-team:master Nov 24, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kokoro:force-run ready to pull Ready to be merged into the codebase size:XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible label inconsistencies in CIFAR-10 dataset

4 participants