Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design dataset hashing #1

Closed
sergiimk opened this issue Aug 3, 2020 · 0 comments · Fixed by #14 or #16
Closed

Design dataset hashing #1

sergiimk opened this issue Aug 3, 2020 · 0 comments · Fixed by #14 or #16

Comments

@sergiimk
Copy link
Contributor

sergiimk commented Aug 3, 2020

Dataset hashing method is currently a stop-gap implementation.

This goal is to design a hashing method that:

  • has a streaming nature
  • tolerant to row reordering (as many processing engine are concurrent and don't enforce ordering between outputs of independent calculations)
  • fast
  • collision-resistant

Ideas:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant