Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync resource.stats with the standards #868

Closed
krassowski opened this issue Jun 17, 2021 · 2 comments
Closed

Sync resource.stats with the standards #868

krassowski opened this issue Jun 17, 2021 · 2 comments
Assignees
Labels
general General improvements
Milestone

Comments

@krassowski
Copy link

krassowski commented Jun 17, 2021

Overview

Consider introducing the Stats class (resource.stats)


from frictionless import Resource

Resource(stats={'hash': '2a53375ff139d9837e93a38a279d63e5', 'bytes': 12345}).to_json('test.json')
$ cat test.json 
{
  "stats": {
    "hash": "2a53375ff139d9837e93a38a279d63e5",
    "bytes": 12345
  }
}

Yet the Data Resource JSON schema and the Tabular Data Resource JSON schema (and specifications) recommend the hash and bytes information to be at the root level.

Question: is this apparent discrepancy intended, or is it something that should be improved?


Please preserve this line to notify @roll (lead of this repository)

@roll roll added the general General improvements label Jun 19, 2021
@roll
Copy link
Member

roll commented Jun 19, 2021

@krassowski
It's a good question.

It's true that in the Framework we put all the stats in a nested object. It's because of a few reasons including internal performance. The same approach was used in Data Package Pipelines.

Actually, I think that having stats as an object is better and more scalable (extensions can add more stats without messing with the root properties) and potentially might be adopted by the specs themselves.

But currently, it contradicts specs so what do you think might be a solution - using a flag on the export?

@roll roll added this to the v5 milestone Apr 19, 2022
@roll roll self-assigned this Apr 19, 2022
@roll roll changed the title Exporting Resource keeps hash & bytes stats nested (not at the root level) Sync resource.stats with the standards May 8, 2022
@roll roll modified the milestones: v5, v6 Jun 11, 2022
@roll roll modified the milestones: v6, v5 Jul 6, 2022
@roll
Copy link
Member

roll commented Jul 9, 2022

FIXED in v5 (#1119) (will be released this month)

frictionless describe data/table.csv --stats --standards v1

or

from frictionless import Resource, system

system.standards_version = 'v1' # or with system.use_standards_version('v1')
resource = Resource('table.csv')
resource.infer(stats=True)
resource.to_descriptor()

@roll roll closed this as completed Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general General improvements
Projects
Archived in project
Development

No branches or pull requests

2 participants