-
Notifications
You must be signed in to change notification settings - Fork 10
Closed
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested
Description
When a Dataset contains pointer columns, we compute the checksum based on the file path instead of the file content (see here). Since we currently use the checksum to determine equality between Dataset objects (see here), this can lead to a weird situation like the following:
src_dataset.to_json("/home/cas/Desktop")
dst_dataset = Dataset.from_json("/home/cas/Desktop/dataset.yaml")
src_dataset == dst_dataset # This is FalseHow could we improve this system? @hadim mentioned using the number of bytes in a file as a proxy. That would work a lot better, but I don't think that will be performant enough as datasets get large and are stored remote.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested