Skip to content

Conversation

okhwaja
Copy link
Contributor

@okhwaja okhwaja commented Jun 15, 2020

Hi, first time contributing so please bear with me if I missed something.

When testing out some code based on the contents of the internals section, I believe I ran into a small mistake. The header should contain the size in bytes of the content, but content.length in the example returns the number of characters. Using bytesize returns the right value: https://ruby-doc.org/core-2.4.0/String.html#method-i-bytesize

The example works with its content (what is up, doc?) but the issue manifests with other characters

irb(main):002:0> content = "i have €5 in my pocket"
=> "i have €5 in my pocket"
irb(main):003:0> content.length
=> 22
irb(main):004:0> content.bytesize
=> 24
irb(main):05:0> sha_w_length = Digest::SHA1.hexdigest("blob #{content.length}\0" + content)
=> "9859c2c849cc5591aa2223a6fb697aeaa9a5f7fe"
irb(main):06:0> sha_w_bytesize = Digest::SHA1.hexdigest("blob #{content.bytesize}\0" + content)
=> "3d446f4f877e1bea82e603328a845ad7b036338e"

As expected they result in different sha's. The correct one is the one using bytesize

$ echo -n 'i have €5 in my pocket' | git hash-object --stdin
3d446f4f877e1bea82e603328a845ad7b036338e

This PR just tweaks the example to use bytesize instead of length

@ben
Copy link
Member

ben commented Jun 16, 2020

Brilliant! I'm not terribly surprised that a Unicode issue snuck in, that's a blind spot we English-speaking Americans tend to have. Thanks!

@ben ben merged commit 69addf3 into progit:master Jun 16, 2020
max123kl added a commit to max123kl/progit2-de_main that referenced this pull request Jul 3, 2020
Co-Authored-By: Osman Khwaja <osman.khwaja@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants