POSIX-compliant shell functions emulating functionality of arrays. The implementation mostly follows Bash arrays behavior.
- posix-arrays.sh: full version, provides both associative and indexed arrays functionality
- posix-arrays-[associative|indexed].sh: full version, provides functionality for one type of arrays
- posix-arrays-[a|i]-mini.sh: trimmed down to absolute minimum and optimized for small arrays version, provides functionality for one type of arrays
The mini scripts include only these functions: set_[a|i]_arr_el
, get_[a|i]_arr_val
, get_[a|i]_arr_[keys|indices]
and do not support sorted output.
-
Source the relevant script in your script:
. [path]/posix-arrays.sh
Or
. [path]/posix-arrays-[associative|indexed].sh
Or
. [path]/posix-arrays-[a|i]-mini.sh
-
Call any function from the sourced script in your script.
Alternatively, copy functions which you need (and their dependencies) to your own scripts. Note that the last few lines in the scripts set some variables. These variables need to be set for the functions to work correctly.
To create an array, use any function that stores a value in the array: declare_[x]_arr
or set_[x]_arr_el
or add_i_arr_el
or read_i_arr
.
declare_i_arr <array_name> [value] [value] ... [value]
- Resets the array and assigns values to sequential elements, starting from index 0.
read_i_arr <array_name> [string]
- Resets the array, then reads lines from [string]
and assigns each line as a value to sequential elements, starting from index 0.
add_i_arr_el <array_name> <value>
- Adds a new element to the array and assigns a value to it. Index is set to previous highest index+1, or to 0 if no prior elements exist.
set_i_arr_el <array_name> <index> [value]
- Assigns [value]
to element with index <index>
. Indices should always be nonnegative integer numbers. Indices don't have to be sequential. If <value>
is an empty string, stores empty string as a value.
unset_i_arr_el <array_name> <index>
- Unsets the element with index <index>
.
get_i_arr_val <array_name> <index> <var>
- Assigns value for <index>
to variable <var>
.
get_i_arr_values [-s] <array_name> <var>
- Gets all values from an indexed array as a whitespace-separated list and assigns the result to variable <var>
. When called with the [-s]
option, sorts the indices and outputs a numerically sorted (by index) list of values.
get_i_arr_indices [-s] <array_name> <var>
- Gets all indices from an indexed array as a whitespace-separated list and assigns the result to variable <var>
. When called with the [-s]
option, sorts the indices and outputs a numerically sorted list.
get_i_arr_el_cnt <array_name> <var>
- Gets elements count of an indexed array and assigns the result to variable <var>
.
get_i_arr_max_index <array_name> <var>
- Gets the currently highest index in the array and assigns it to variable <var>
. Sorts the array if its current state is unsorted. Returns an error if the array is empty or doesn't exist.
get_i_arr_last_val <array_name> <var>
- Gets the value assigned to highest index in the array and assigns it to variable <var>
. Sorts the array if its current state is unsorted. Returns an error if the array is empty or doesn't exist.
sort_i_arr <array_name>
- Sorts indices stored in the array. Sorting is a relatively slow operation. Functions keep track of sorted/unsorted state of the array via a flag and will only perform sorting if current state is unsorted. For indexed arrays, optimizations are implemented which allow the array to keep the sorted state in most (but not all) cases when setting elements. Specifically, the 'sorted' flag will be removed if setting a previously unset element with a lower index than the current max index of the array. Unsetting elements doesn't affect the sorted/unsorted state of the array.
unset_i_arr <array_name>
- Unsets all variables used to store the array in memory.
Examples
set_i_arr_el test_arr 10 some_val
get_i_arr_val test_arr 10 test_var; echo "$test_var"
$ some_val
declare_i_arr test_arr val1 val2 "val3 123 etc"
get_i_arr_values test_arr test_var; echo "$test_var"
$ val1 val2 val3 123 etc
get_i_arr_val test_arr 2 test_var; echo "$test_var"
$ val3 123 etc
declare_a_arr <array_name> [key=value] [key=value] ... [key=value]
- Resets the array and assigns values to keys.
set_a_arr_el <array_name> <key>=[value]
- Assigns [value]
to element with key <key>
. If [value]
is an empty string, stores empty string as a value.
unset_a_arr_el <array_name> <key>
- Unsets the element with key <key>
.
get_a_arr_val <array_name> <key> <var>
- Assigns value for <key>
from the associative array to variable <var>
.
get_a_arr_values [-s] <array_name> <var>
- Gets all values as a whitespace-separated list and assigns the result to variable <var>
. When called with the [-s]
option, sorts the keys and outputs an alphabetically sorted (by key) list of values.
get_a_arr_keys [-s] <array_name> <var>
- Gets all keys as a whitespace-separated list and assigns the result to variable <var>
. When called with the [-s]
option, sorts the keys and outputs an alphabetically sorted list.
get_a_arr_el_cnt <array_name> <var>
- Gets elements count of an associative array and assigns the result to variable <var>
.
sort_a_arr <array_name>
- Sorts keys stored in the array. Sorting is a relatively slow operation. Functions keep track of sorted/unsorted state of the array via a flag and will only perform sorting if current state is unsorted. For associative arrays, an optimization is implemented which allows the array to keep the sorted state in some (but not all) cases when setting elements. Specifically, when changing a value of a previously set element, the sorted state of the array is kept if it was sorted before. Setting a new element removes the flag which marks the array as sorted. Unsetting elements doesn't affect the sorted/unsorted state of the array.
unset_a_arr <array_name>
- Unsets all variables used to store the array in memory.
Examples
set_a_arr_el test_arr some_key="this is a test"
get_a_arr_val test_arr some_key test_var; echo "$test_var"
$ this is a test
declare_a_arr test_arr "cars=Audi, Honda, Mercedes" "music_genres=jazz, classical, rock"
get_a_arr_keys test_arr test_keys; echo "$test_keys"
$ cars music_genres
get_a_arr_val test_arr music_genres genres; echo "$genres"
$ jazz, classical, rock
- The arrays can hold strings that have any characters in them, including whitespaces, single and double quotation marks and newlines.
- For indexed arrays, indices start at 0.
- As is default for shells, if you request a value corresponding to an index or to a key that has not been set previously, the functions output an empty string and do not return an error.
- Similarly, if you request a value from an array that has not been created, the functions output an empty string and do not return an error.
- The indexed array effectively works as a sparse array, so indices do not have to be sequential.
- The declare functions are not necessary to create an array. It's just a way to set N elements in one command. You can create an array by calling any function that stores a value in the array.
- Assigning an empty string as a value doesn't unset the array element. This mimics the behavior of Bash arrays. To unset an element, use the
unset_[x]_arr_el()
functions. - Functions provide output via a variable because this way the performance is better, in some cases much better.
- The code went through multiple rounds of optimization. Currently for most use cases, on x86 CPU the performance for small arrays (<= 200 elements) is comparable to Bash arrays. Functions perform reasonably well with arrays containing up to 2000 elements. Higher than that, the performance drops at an accelerating rate, mainly because the system takes longer to look up variables in memory.
- Performance is affected by the length of the strings stored in the array, and for associative arrays, by the length of the strings used as keys.
- Performance is also affected by the workload - more on that in the Limitations section.
- The
set_[x]_arr_el()
andget_[x]_arr_val()
functions perform significantly (20%-50%) faster if disabling checks for valid arguments. Other functions get a minor speedup as well. To benefit from this speedup, consider setting thenocheckvars
variable in production when your code is well tested. Specifically this disables checks for validity of following arguments: array name, output variable name, and (for associative arrays) key. NOTE that the values of these argumets are used internally insideeval
commands, so with thenocheckvars
variable set, you must verify that the values are properly sanitized and under your code's full control, otherwise you may be introducing security vulnerabilities into your code.
Benchmarks:
Measured on i7-4770 with 40-character strings in each element. For associative arrays, measured with 16-18 character keys. N is the number of elements. All tests ran with the nocheckvars
variable set.
N=10:
Array type | Test | Time |
---|---|---|
Indexed | set N elements one by one (cold*) | 1ms |
Indexed | set N elements one by one (hot**) | 1ms |
Indexed | add N elements one by one | 1ms |
Indexed | get N values one by one | 1ms |
Indexed | get all values | 2ms |
Indexed | get all indices | 1ms |
Indexed | unset N consecutive elements one by one | 1ms |
------------- | -------------------------------------------------- | ------- |
Associative | set N elements one by one (cold*) | 1ms |
Associative | set N elements one by one (hot**) | 1ms |
Associative | get N values one by one | 1ms |
Associative | get all values | 2ms |
Associative | get all keys | 1ms |
Associative | unset N consecutive elements one by one | 2ms |
N=100:
Array type | Test | Time |
---|---|---|
Indexed | set N elements one by one (cold*) | 3ms |
Indexed | set N elements one by one (hot**) | 2ms |
Indexed | add N elements one by one | 1ms |
Indexed | get N values one by one | 1ms |
Indexed | get all values | 2ms |
Indexed | get all indices | 1ms |
Indexed | unset N consecutive elements one by one | 3ms |
------------- | -------------------------------------------------- | ------- |
Associative | set N elements one by one (cold*) | 3ms |
Associative | set N elements one by one (hot**) | 2ms |
Associative | get N values one by one | 2ms |
Associative | get all values | 2ms |
Associative | get all keys | 1ms |
Associative | unset N consecutive elements one by one | 4ms |
N=1000:
Array type | Test | Time |
---|---|---|
Indexed | set N elements one by one (cold*) | 18ms |
Indexed | set N elements one by one (hot**) | 6ms |
Indexed | add N elements one by one | 12ms |
Indexed | get N values one by one | 5ms |
Indexed | get all values | 6ms |
Indexed | get all indices | 1ms |
Indexed | unset N consecutive elements one by one | 18ms |
------------- | -------------------------------------------------- | ------- |
Associative | set N elements one by one (cold*) | 22ms |
Associative | set N elements one by one (hot**) | 10ms |
Associative | get N values one by one | 7ms |
Associative | get all values | 5ms |
Associative | get all keys | 1ms |
Associative | unset N consecutive elements one by one | 34ms |
N=2000:
Array type | Test | Time |
---|---|---|
Indexed | set N elements one by one (cold*) | 34ms |
Indexed | set N elements one by one (hot**) | 15ms |
Indexed | add N elements one by one | 25ms |
Indexed | get N values one by one | 14ms |
Indexed | get all values | 15ms |
Indexed | get all indices | 1ms |
Indexed | unset N consecutive elements one by one | 40ms |
------------- | -------------------------------------------------- | ------- |
Associative | set N elements one by one (cold*) | 52ms |
Associative | set N elements one by one (hot**) | 21ms |
Associative | get N values one by one | 16ms |
Associative | get all values | 14ms |
Associative | get all keys | 2ms |
Associative | unset N consecutive elements one by one | 104ms |
* cold - values are set without prior elements initialization
** hot - values are set after prior elements initialization
- Unsetting individual elements is relatively slow because it requires to process all current indices/keys in the array as a string. To work around this, when possible, assign an empty string as a value to the element instead of unsetting the element. Alternatively, if you want to free up the memory used by the array, use the
unset_[x]_arr()
functions which work hundreds of times faster than unsetting individual elements. Optimizations have been implemented which cover unsetting elements sequentially from the 1st one upwards or from the last one downwards (in the order in which the elements are stored), so under these conditions unsetting elements does not incur a large performance hit. - By default, functions that output all indices/keys/values do not sort the output. This is different from Bash arrays behavior which sorts the output. The reason for this is that sorting is relatively slow and in most cases not required. When sorted output is required, use the functions with the
-s
option to get a sorted (by index/key) output. Once sorting occurs, the array will stay sorted until new elements are set. In some cases, the functions are able to maintain the sorted state when setting elements and in other cases not, as described in theSorting an array
sections above. Unsetting elements doesn't affect the sorted/unsorted state of arrays. - Array names and (for associative arrays) keys are limited to English alphanumeric characters and underlines -
_
. - Functions have been tested exclusively with the
POSIX
(orC
) locale and are likely to misbehave in some other locales. To avoid such issues, theposix-arrays.sh
andposix-arrays-[associative|indexed]
scripts export theLC_ALL
variable. If you want to use individual functions, make sure to set that variable in your script to 'C':LC_ALL=C
. Orexport LC_ALL=C
if you want to use the functions in daughter scripts called from parent script.
- The values are stored in dynamically created variables. The name of such variable is in the format
_[a|i]_[arr_name]_[key|index]
. - For example, if calling a function to set an element of an indexed array:
set_i_arr_el test_arr 5 "test_value"
, the function will create a variable called_i_test_arr_5
and store the value in it. - The raw values stored in the variables have an ASCII code
\35
prepended to the "logical" value. This serves as a flag to mark the element as set. This way, an empty string can be assigned as a value to an element. This is in line with the behavior of Bash arrays. Functions which retrieve values from the arrays remove the ASCII prefix before assigning the result to the output variable.
- The files inside the "tests" folder are used for testing the main script.
- posix-arrays-tests*.sh scripts implement functionality testing
- perf-tests*.sh implement performance testing
- The test units are not very systematic but I tried to cover all points where the functionality may be fragile.
- Test units check for correct return codes, including in cases where an error is expected. So a significant portion of the tests intentionally induce errors in the functions. In order to avoid errors spam, by default STDERR output from the functions under test is silenced. If you want to see the errors anyway, uncomment the line
#print_stderr=true
. - You can add your own test units following the same format. The test functions automatically parse the
.list
files and execute the tests inside, as long as they comply with the format.
If you like this project, please consider taking a second to give it a star. This will help other people to find it.