New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an 'id' field per card #63

Closed
Sembiance opened this Issue Aug 3, 2015 · 6 comments

Comments

Projects
None yet
4 participants
@Sembiance
Collaborator

Sembiance commented Aug 3, 2015

A unique 'id' field per card should be added.

This has been requested a LOT by many people, as it would make importing into a database a lot easier. Also, this id field would be needed in order to implement #55 and #47

There are a number of different ways the id could be created. Some additional discussion and thought will be needed to pick the best way.

My initial hunch is a hash of (setCode + cardName + imageName) would work. Depending on the type of hash used though the id field might be kinda long, which I may want to avoid. This sort of 'predictable hash' would be needed so if anyone were to generate all the files from scratch the id's would end up being the same.

@Sembiance Sembiance referenced this issue Aug 3, 2015

Closed

Add tokens #47

@ZeldaZach

This comment has been minimized.

Show comment
Hide comment
@ZeldaZach

ZeldaZach Aug 3, 2015

Member

You could potentially md5 the desired combination (setCode + cardName + imageName) as that's always a predictable length, 32.

You could then truncate a portion of the result (maybe down to only 15 characters), albeit it does increase the risk for collisions. I've yet seen a cockatrice hash collide, so this doesn't seem like a terrible way to go.

Member

ZeldaZach commented Aug 3, 2015

You could potentially md5 the desired combination (setCode + cardName + imageName) as that's always a predictable length, 32.

You could then truncate a portion of the result (maybe down to only 15 characters), albeit it does increase the risk for collisions. I've yet seen a cockatrice hash collide, so this doesn't seem like a terrible way to go.

@RavenousWolf

This comment has been minimized.

Show comment
Hide comment
@RavenousWolf

RavenousWolf Aug 6, 2015

Hey, I'm a dev for www.ozguild.co and we have been using your database to create unique hashes already. I just sent you an email about some changes to Collectors edition basic lands which messed with our system (multiple cards have the exact same details). We would be happy to let you know how we have been doing it and working with you to get this implemented as it would be very useful to us

RavenousWolf commented Aug 6, 2015

Hey, I'm a dev for www.ozguild.co and we have been using your database to create unique hashes already. I just sent you an email about some changes to Collectors edition basic lands which messed with our system (multiple cards have the exact same details). We would be happy to let you know how we have been doing it and working with you to get this implemented as it would be very useful to us

@ZeldaZach

This comment has been minimized.

Show comment
Hide comment
@ZeldaZach

ZeldaZach Aug 6, 2015

Member

Sounds awesome @RavenousWolf :)

Member

ZeldaZach commented Aug 6, 2015

Sounds awesome @RavenousWolf :)

@Sembiance

This comment has been minimized.

Show comment
Hide comment
@Sembiance

Sembiance Aug 18, 2015

Collaborator

Ok, an 'id' field will be coming soon to mtgjson.

It is composed of sha1(setCode + cardName + imageName)

Collaborator

Sembiance commented Aug 18, 2015

Ok, an 'id' field will be coming soon to mtgjson.

It is composed of sha1(setCode + cardName + imageName)

@Sembiance Sembiance closed this in 560f550 Aug 18, 2015

@phrozen

This comment has been minimized.

Show comment
Hide comment
@phrozen

phrozen Sep 14, 2015

I'm doing a bit of work with the foreign names and I just found that this "pronto" solution to de UUID of any card might not be the best. First, it uses an arbitrary string field ('imageName') which is already deprecated (even if its still supported). Second it uses a 160bit cryptographic hash for something as a uuid in a set with (at most) 200k records (I'm exagerating and even counting like there is every translation for every print), this is not only not efficient, it is exagerated in most aspects and can't be reproduced without knowledge of this specific environment.

IMHO an ID should represent a card by itself in its printing form, though it must include the fields that better represent that card (physical or virtual), in the most straightforward and simple way, those fields are:

  • SET
  • NAME
  • LANGUAGE
  • NUMBER (this to avoid issue #70 only with basic lands) or a variation art identifier in case of Alternate Art cards

This 4 fields can represent ANY card (even if number is missing it should not collide in it's set ie. Promos) ever printed, and this 4 fields can be obtained without any external reference (like this project and the imageName field) so they are a more universal way of describing that specific card.

On the other point, even if SHA1 is a very well known Hashing function, this is definitely not it's use, first of all is a correct cryptographic hashing algorithm, which makes it pretty secure, but also slow, and it produces a 160bits Sum which is actually HUGE for a set of records less than 0.0001% the size. I'm already doing collision tests using 32 and 64bit NON-Cryptographic functions like Fast-Hash, SipHash, Murmur3 (I recommend this one) using the fields provided above, NAME+SET+LANG+NUMBER? and it's working perfectly with 32bit Sums (a single integer) although I have found a lot of collisions on alternate art cards. So I'm currently trying to identify those.

Thoughts?

phrozen commented Sep 14, 2015

I'm doing a bit of work with the foreign names and I just found that this "pronto" solution to de UUID of any card might not be the best. First, it uses an arbitrary string field ('imageName') which is already deprecated (even if its still supported). Second it uses a 160bit cryptographic hash for something as a uuid in a set with (at most) 200k records (I'm exagerating and even counting like there is every translation for every print), this is not only not efficient, it is exagerated in most aspects and can't be reproduced without knowledge of this specific environment.

IMHO an ID should represent a card by itself in its printing form, though it must include the fields that better represent that card (physical or virtual), in the most straightforward and simple way, those fields are:

  • SET
  • NAME
  • LANGUAGE
  • NUMBER (this to avoid issue #70 only with basic lands) or a variation art identifier in case of Alternate Art cards

This 4 fields can represent ANY card (even if number is missing it should not collide in it's set ie. Promos) ever printed, and this 4 fields can be obtained without any external reference (like this project and the imageName field) so they are a more universal way of describing that specific card.

On the other point, even if SHA1 is a very well known Hashing function, this is definitely not it's use, first of all is a correct cryptographic hashing algorithm, which makes it pretty secure, but also slow, and it produces a 160bits Sum which is actually HUGE for a set of records less than 0.0001% the size. I'm already doing collision tests using 32 and 64bit NON-Cryptographic functions like Fast-Hash, SipHash, Murmur3 (I recommend this one) using the fields provided above, NAME+SET+LANG+NUMBER? and it's working perfectly with 32bit Sums (a single integer) although I have found a lot of collisions on alternate art cards. So I'm currently trying to identify those.

Thoughts?

@Sembiance

This comment has been minimized.

Show comment
Hide comment
@Sembiance

Sembiance Sep 14, 2015

Collaborator

@phrozen

The four fields you mention are not unique enough to represent a card.

Early sets do not have card numbers, so LEA for example has two swamps that have identical data, even the same artist, but they have different art so they are different cards.

This is why the imageName field exists. Yes, I agree it is a mtgjson specific field, but it's used in the id, and other apps/websites use it too for uniqueness purposes, so therefore I'll never be dropping support for it.

As for language, gatherer lacks foreign language info on many sets and mtgjson does not yet support multiple languages anyways (Issue #40) Still, when the day comes to add full foreign language support, I've detailed how I'll handle it in Issue #40 in regards to the id field so existing id's will not change.

As far as the choice of hash function and the size of the id field, I don't necessarily disagree with you on that.
However, the 'id' field has already gone live, and tons of websites and apps utilize mtgjson. Undoubtedly several have already started relying on the 'id' field and changing it at this stage would make life very painful for those folks. So I won't be changing the id field at this point.

Collaborator

Sembiance commented Sep 14, 2015

@phrozen

The four fields you mention are not unique enough to represent a card.

Early sets do not have card numbers, so LEA for example has two swamps that have identical data, even the same artist, but they have different art so they are different cards.

This is why the imageName field exists. Yes, I agree it is a mtgjson specific field, but it's used in the id, and other apps/websites use it too for uniqueness purposes, so therefore I'll never be dropping support for it.

As for language, gatherer lacks foreign language info on many sets and mtgjson does not yet support multiple languages anyways (Issue #40) Still, when the day comes to add full foreign language support, I've detailed how I'll handle it in Issue #40 in regards to the id field so existing id's will not change.

As far as the choice of hash function and the size of the id field, I don't necessarily disagree with you on that.
However, the 'id' field has already gone live, and tons of websites and apps utilize mtgjson. Undoubtedly several have already started relying on the 'id' field and changing it at this stage would make life very painful for those folks. So I won't be changing the id field at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment