Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change `env_var_t` type to an actual array/vector of values (was "converting between a list and a flat string is very error prone") #4200

Closed
krader1961 opened this issue Jul 8, 2017 · 4 comments
Assignees
Labels
Milestone

Comments

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jul 8, 2017

While working on implementing a fix for issue #4190 I decided it would be useful to define a couple of new helper functions for converting from a list of strings to a flat string that represents the internal representation of a fish script array var. As part of that work I converted several places that used the symbols ARRAY_SEP, ARRAY_SEP_STR, or ENV_NULL, to use the new helper functions. I was surprised when several unit tests failed. Mostly because the hand crafted code I replaced with the helper functions explicitly dealt with empty value lists.

That brought to my attention that this mechanism does not have an appropriate abstraction since concepts like the var being set but having zero elements or a single element that is an empty string is ill-defined in the C++ code. There shouldn't be any place in the code that calls tokenize_variable_array() other than the implementation of the env_var_t class. Similarly, there are too many places in the code that explicitly convert between the list and flat representations by referencing the aforementioned symbols.

The concept of a fish script variable in the C++ code needs an overhaul with a cleaner abstraction.

@krader1961 krader1961 added the cleanup label Jul 8, 2017
@krader1961 krader1961 added this to the fish-future milestone Jul 8, 2017
@krader1961 krader1961 self-assigned this Jul 8, 2017
@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Jul 8, 2017

I just merged commit a9aa234 that takes a baby step to improving the situation. The next step is to replace the flat string representation with a class based on an actual array (i.e., std::vector). We'll still need to use \x1D and \x1E when serializing the vars into the environment and uvar storage to maintain compatibility with current fish versions. But internally we'll always use an actual array representation. I'm betting that will make benchmarks like this one noticeably faster:

for i in (seq 10000)
    set PATH $PATH
    echo $i
end

We should consider using a different representation when serializing vars into the environment for fish 3.0. JSON, XML, and google protobufs are the obvious choices. I'll repurpose issue #3341 to discuss that.

krader1961 added a commit that referenced this issue Jul 8, 2017
This is the first step in implementing a better abstraction for handling
fish script vars in the C++ code. It implements a new function (with two
signatures) to provide a standard method for construct the flag string
representation of a fish script array.

Partial fix for #4200
PenegalECI pushed a commit to PenegalECI/fish-shell that referenced this issue Jul 11, 2017
This is the first step in implementing a better abstraction for handling
fish script vars in the C++ code. It implements a new function (with two
signatures) to provide a standard method for construct the flag string
representation of a fish script array.

Partial fix for fish-shell#4200
@krader1961 krader1961 added enhancement and removed cleanup labels Jul 19, 2017
@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Jul 19, 2017

Changing this from "cleanup" to "enhancement" because I'm going to implement proper arrays for fish script vars in the core C++ code. Both to simplify the code and improve performance. By "simplify" I mean remove the need for functions like tokenize_variable_array(). In some respects this will actually complicate the code because some users of the env_var_t type will have to be changed to handle a vector rather than a simple string.

This is also an important first step for implementing issue #3341.

@krader1961 krader1961 modified the milestones: fish-3.0, fish-future Jul 19, 2017
@krader1961 krader1961 changed the title converting between a list and a flat string is very error prone change `env_var_t` type to an actual array/vector of values (was "converting between a list and a flat string is very error prone") Jul 19, 2017
@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Jul 19, 2017

Also, it's pretty obvious why the current implementation uses a flat string internally to represent an array of strings now that I've thought about it. It is a consequence of the original fish implementation having been written in C which does not have the C++ vector class. It still strikes me as weird since even in C it would have been easy to use dynamically resized arrays managed via malloc/free given an appropriate interface for managing fish script vars.

krader1961 added a commit that referenced this issue Aug 6, 2017
This is the first step to implementing issue #4200 is to stop subclassing
env_var_t from wcstring. Not too surprisingly doing this identified
several places that were incorrectly treating env_var_t and wcstring as
interchangeable types. I'm not talking about those places that passed
an env_var_t instance to a function that takes a wcstring. I'm talking
about doing things like assigning the former to the latter type, relying
on the implicit conversion, and thus losing information.

We also rename `env_get_string()` to `env_get()` for symmetry with
`env_set()` and to make it clear the function does not return a string.
krader1961 added a commit that referenced this issue Aug 6, 2017
Kurtis Rader
Another step towards implementing issue #4200 is to make the
`tokenize_variable_array()` function private to the env.cpp module.
krader1961 added a commit that referenced this issue Aug 10, 2017
My previous change to eliminate `class var_entry_t` caused me to notice
that `env_get()` turned a set but empty var into a missing var. Which
is wrong. Fixing that brought to light several other pieces of code that
were wrong as a consequence of the aforementioned bug.

Another step to fixing issue #4200.
krader1961 added a commit that referenced this issue Aug 15, 2017
My previous change to eliminate `class var_entry_t` caused me to notice
that `env_get()` turned a set but empty var into a missing var. Which
is wrong. Fixing that brought to light several other pieces of code that
were wrong as a consequence of the aforementioned bug.

Another step to fixing issue #4200.
@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Aug 19, 2017

While I have resolved this issue I make no claims my change is optimal. Just changing from subclassing wcstring to using wcstring_list_t in the class was a huge undertaking. I have no doubt that someone like @ridiculousfish or @mqudsi can make additional performance improvements by avoiding copying values and the like.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant