Front-end statistical data quantity deviation

### Describe the bug

While browsing the dataset at https://huggingface.co/datasets/NeuML/wikipedia-20250123, I noticed that a dataset with nearly 7M entries was estimated to be only 4M in size—almost half the actual amount. According to the post-download loading and the dataset_info (https://huggingface.co/datasets/NeuML/wikipedia-20250123/blob/main/train/dataset_info.json), the true data volume is indeed close to 7M. This significant discrepancy could mislead users when sorting datasets by row count. Why not directly retrieve this information from dataset_info?

Not sure if this is the right place to report this bug, but leaving it here for the team's awareness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Front-end statistical data quantity deviation #7507

Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Front-end statistical data quantity deviation #7507

Description

Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions