File.(basename|extname) are broken #2227

Closed
etehtsea opened this Issue Mar 22, 2013 · 2 comments

Comments

Projects
None yet
2 participants

rubinius 2.0.0.rc1 (1.9.3 f915f7e4 yyyy-mm-dd JI) [x86_64-apple-darwin12.3.0]

>> File.extname('Имя.m4a')
=> ""
>> File.basename('/path/Имя.m4a')
=> "Имя.m"
>> File.basename('/path/Офис.m4a')
=> "Офис."

As far as I understood this could be fixed if String#byteslice will be replaced with String#slice.
https://github.com/etehtsea/rubinius/compare/master...file-cyr-fix
This partially helped.

>> File.extname('Офис.m4a')
=> ""
>> File.basename('/path/Офис.m4a')
=> "Офис.m4a"
>> File.basename('/путь/path/Офис.m4a')
=> ".m4a"

Root of this is:

>> str = '/path/Офис.m4a'
=> "/path/Офис.m4a"
>> str.find_string_reverse(".", str.size)
=> 14
>> str.rindex(".", str.size)
=> 10

Current ugly workaround:

>> URI.unescape(File.basename(URI.escape '/путь/path/Офис.m4a')).force_encoding('UTF-8')
=> "Офис.m4a"
>> URI.unescape(File.extname(URI.escape '/путь/path/Офис.m4a')).force_encoding('UTF-8')
=> ".m4a"

Hope it helps!

dbussink closed this in 7d8d6c8 Mar 25, 2013

Owner

dbussink commented Mar 25, 2013

This bug was actually fixed by doing the reverse, making sure everything operates on byte indexes instead of character ones. This makes the code simpler and doesn't need to convert all the time from byte to character offsets and vice versa.

Thank you for explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment