Permalink
Browse files

String length operators now counts code points, not bytes (#145)

The length operator ${#s} returns the number of code points now instead
of the number of bytes. Also added a test for this. Addresses #142.

* Change hard tab to soft tab
  • Loading branch information...
caroline-lin authored and andychu committed Jun 8, 2018
1 parent f238a6d commit 5f4ef1879306fbef2e68def0583252a3046580e4
Showing with 10 additions and 2 deletions.
  1. +3 −1 core/word_eval.py
  2. +7 −1 spec/var-op-other.test.sh
View
@@ -298,7 +298,9 @@ def _ApplyPrefixOp(self, val, op_id):
if op_id == Id.VSub_Pound: # LENGTH
if val.tag == value_e.Str:
length = len(val.s)
unicode_val = val.s.decode('utf-8')
length = len(unicode_val)
# length = len(val.s)
elif val.tag == value_e.StrArray:
# There can be empty placeholder values in the array.
length = sum(1 for s in val.strs if s is not None)
@@ -9,6 +9,13 @@ v=foo
echo ${#v}
# stdout: 3
### Unicode string length (UTF-8)
v=$'_\u03bc_'
echo ${#v}
## stdout: 3
## BUG dash stdout: 9
## BUG mksh stdout: 4
### Length of undefined variable
echo ${#undef}
# stdout: 0
@@ -158,4 +165,3 @@ echo ${foo:1:3}
# BUG mksh stdout: -μ
# N-I dash status: 2
# N-I dash stdout-json: ""

0 comments on commit 5f4ef18

Please sign in to comment.