Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAmend RFC 517: Add material on std::env #578
Conversation
aturon
referenced this pull request
Jan 13, 2015
Merged
RFC: io and os reform: initial skeleton #517
sfackler
reviewed
Jan 13, 2015
| **Environment variables**: | ||
|
|
||
| * `vars` (renamed from `env`): yields a vector of `(OsStrBuf, OsStrBuf)` pairs. | ||
| * `var` (renamed from `getenv`): take a value bounded by `IntoOsStrBuf`, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
I believe that these APIs will need to go into an OsStrBuf regardless as they need to be passed to a system API which means they have to be nul-terminated (and a slice won't have the terminator).
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 28, 2015
Member
These actually all ended up being AsStrBuf anyway, so I think the RFC just needs to be updated.
This comment has been minimized.
This comment has been minimized.
wycats
Jan 30, 2015
Contributor
I think it would make more sense for these APIs to return a Result<String, EnvError> where EnvError is:
enum EnvError {
Missing,
NotUnicode(OsString)
}
That way you can quickly and ergonomically get at a String, which is usually what you want, but still get at the OsString if you really need it (usually to present a good error message).
This comment has been minimized.
This comment has been minimized.
wycats
Jan 30, 2015
Contributor
Sorry, to be more precise, I think it makes sense to have two variants of these APIs, one of which provides an Option<OsString> and one of which provides a Result.
I would propose var for the version that returns a Result<String, E> and raw_var for the one that returns an Option<OsString>
mahkoh
reviewed
Jan 13, 2015
|
|
||
| **Arguments**: | ||
|
|
||
| * `args`: change to yield an iterator rather than vector if possible; in any case, it should produce an `OsStrBuf`. |
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
change to yield an iterator rather than vector if possible
For what purpose?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
Iterators are one of the idiomatic staples of the standard library, and providing them wherever possible tends to increase composability with other APIs. It also has the added bonus of avoiding a Vec where one may not be necessary.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
I assume you mean avoiding an allocation. That would make sense if the iterator didn't return allocated memory. How often does it happen that someone wants more than the first but not all arguments? Is it not easier to turn a vector into an iterator than an iterator into a vector?
This comment has been minimized.
This comment has been minimized.
XMPPwocky
Jan 13, 2015
Contributor
If we're going to do that, can we also make it an iterator over Cow<OsStr, OsStrBuf>, so it's not necessary to allocate & copy on most platforms?
This comment has been minimized.
This comment has been minimized.
aturon
Jan 13, 2015
Author
Member
I assume you mean avoiding an allocation. That would make sense if the iterator didn't return allocated memory. How often does it happen that someone wants more than the first but not all arguments? Is it not easier to turn a vector into an iterator than an iterator into a vector?
You save an additional allocation by not producing a vector. Yielding an iterator makes it easier to collect the data into an arbitrary container, rather than forcing an allocation of a vector which is then just thrown away to produce an iterator.
If we're going to do that, can we also make it an iterator over Cow<OsStr, OsStrBuf>, so it's not necessary to allocate & copy on most platforms?
Seems worth looking at, though I believe on Windows at least we must always copy (due to encoding differences).
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 30, 2015
Contributor
As I mentioned above the implementation does not actually have a static lifetime (to returned a borrowed Cow) on all platforms.
You keep bringing up the current implementation. This discussion is about avoiding allocations. The current implementation does allocate on all platforms. So it is clear that the point of the discussion is to improve the current implementation. Are you trying to say that the current implementation can't be improved this way?
(to returned a borrowed Cow)
If it that were the goal then there would be no need for Cow. @aturon suggested Cow so that it can be borrowed on the platforms that support it and owned on other platforms.
Additionally, on platforms like OSX there is no safeguard against other libraries in the process perhaps modifying the arguments.
Then maybe it has to be owned on OSX. The functions used to retrieve argv/c seem to be completely undocumented. Maybe modifying the memory behind it is even UB. If this is what you're worried about then the current implementation is already broken because another thread could modify argc/v while they are being copied.
but it's somewhat questionable whether the work will be worth it in the long run.
It seems to be trivial.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 30, 2015
Contributor
Additionally, on platforms like OSX there is no safeguard against other libraries in the process perhaps modifying the arguments.
In fact, the current implementation is most definitely broken. It retrieves the args anew every times args_as_bytes is called. The args_as_bytes docs say:
/// Returns the arguments which this program was started with (normally passed
/// via the command line) as byte vectors.
If another library modifies the arguments then this is no longer true and the arguments might even change between invocations.
This comment has been minimized.
This comment has been minimized.
aturon
Jan 30, 2015
Author
Member
What are the negative consequences of using Cow?
Hm, I'm glad you asked me this, because I think I've been avoiding it based on somewhat stale ergonomic considerations. In previous incarnations (the old MaybeOwned) it was not as pleasant to work with, but with my new deref-based design and deref coercions, I think in most cases you won't notice the difference between this and a slice.
The only other hesitation was the sense that Cow is a bit more complicated than returning an owned type directly, for what is here I think a marginal performance benefit. But I don't really have a strong opinion.
So I withdraw what I said above. Assuming that we can safely carry out an implementation (which I'm reasonably confident we can), I'm happy to use Cow here.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 30, 2015
Member
Are you trying to say that the current implementation can't be improved this way?
Ah yes, I'll try to be more specific. Today we do not store the value of argv passed to main, we make a local copy and stick it in a static. This is then deallocated when std::rt::cleanup is called. This means that the current implementation most definitely does not have a static lifetime on unix (minus ios/osx).
I believe the only lifetime which can be returned for a CowOsString is 'static as we otherwise do not have a lifetime to tie to. This means that the current implementation cannot ever yield a borrowed instance (none have the static lifetime). An alternative implementation would be to perhaps store the raw value of argv in a global location to be referenced later.
The primary reason I am worried about doing this is that I do not know the answer to: "Does this pointer actually have a static lifetime?" Arbitrary code can be running while main is running so if the arguments are stored anywhere locally and deallocated before exit this could lead to memory unsafety (via concurrent code in other threads).
Some other minor concerns (but not showstoppers) are:
- Using a raw
argvpointers means that it's basically impossible to support theargs::initAPI. This is not necessarily a large worry, however. - I have been worried about passing pointers into this memory as we don't necessarily have any guarantees that no one else is modifying it. You bring up a good point, however, that this also applicable to the implementation which copies out of the pointers. The upside of not passing the raw pointer out is that there is a deterministic scope in which you know that the arguments are accessed from the system, but I do not know if this would actually be relevant at all in practice. I doubt that anyone modifies arguments anyway.
Does that help clear up my concerns?
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 30, 2015
Contributor
Arbitrary code can be running while main is running so if the arguments are stored anywhere locally and deallocated before exit this could lead to memory unsafety (via concurrent code in other threads).
At least on linux, argv points to memory that has been allocated by the OS and not the libc. Therefore it won't be deallocated by the libc either and all threads exit before this happens. If you are worried that someone might call rt::init with not-static arguments then this is simply an issue of documenting that this causes UB. rt::init is already unsafe.
I doubt that anyone modifies arguments anyway.
In C at least it's not unlikely that this happens because allocating and deallocating memory is a PITA. It's unlikely that a library does that, though.
alexcrichton
reviewed
Jan 13, 2015
|
|
||
| **Environment variables**: | ||
|
|
||
| * `vars` (renamed from `env`): yields a vector of `(OsStrBuf, OsStrBuf)` pairs. |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
For symmetry with args, perhaps this should return an iterator? I think that we're required to always make a snapshot, but API-wise it may make more sense to keep in line.
This comment has been minimized.
This comment has been minimized.
huonw
Jan 28, 2015
Member
It seems a little unfortunate to convert from a Vec to an Iterator when we started with a Vec: if someone actually needs a Vec, there'll be two allocations, the original one and the collectd one, while it's easy to go from a Vec to an Iterator. (This applies to the args case somewhat too, since many platforms start with a Vec forcing two allocations there.)
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 28, 2015
Member
It does depend on what platform you're running on, however. On Windows there's no extra allocation necessary as the OS hands you an already-allocated block of strings. On Unix we're basically doing the same thing as Windows, except we're the one implementing it (copying the environ into a locally owned allocation). So for something like env::vars().collect::<Vec<_>>() you would save an allocation on Unix if you instead returned a Vec but you would force an unnecessary extra one on Windows if you didn't need to collect to a vector.
The analogy is pretty much the same for args as well I believe. Windows will hand us an already-allocated block that we locally own but we're forced to create one on Unix ourselves. All that being said though, I do also think that consistency and API design is probably more important here than runtime performance as these probably aren't the source of slowdown for too many programs :)
alexcrichton
reviewed
Jan 13, 2015
|
|
||
| **Architecture information**: | ||
|
|
||
| * `num_cpus`, `page_size`: stay as they are |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
In light of the discussions on the previous RFC, we may want to leave both of these unstable for now.
This comment has been minimized.
This comment has been minimized.
tshepang
Jan 13, 2015
Contributor
I, and a few others, would prefer if num_cpus was renamed to cpu_count.
This comment has been minimized.
This comment has been minimized.
nagisa
Jan 13, 2015
Contributor
This comment has been minimized.
This comment has been minimized.
nodakai
Jan 15, 2015
@tshepang Both of them sound alike to me... After all they fail to convey the important details: the API will return the number of logical CPU cores rather than that of CPUs (for example it will return 8 instead of 1 on Core i7 4790 system.) (I don't oppose to renaming)
This comment has been minimized.
This comment has been minimized.
tshepang
Jan 15, 2015
Contributor
@nodakai thought it would be 8 logical CPUs, but 4 physical CPUs. I thought the latter was the case even though all of this is in one chip.
This comment has been minimized.
This comment has been minimized.
retep998
Jan 15, 2015
Member
Keep in mind there is also a difference between the number of CPU sockets and the number of CPU cores.
This comment has been minimized.
This comment has been minimized.
nagisa
Jan 15, 2015
Contributor
@tshepang you cannot hot-plug cores, only chips. Otherwise the line between physical cores and chips is very thin. What’s the meaning of “CPU” is mostly irrelevant to the discussion, though, since we only have one function to rename and it returns the count of online logical cores.
This comment has been minimized.
This comment has been minimized.
tshepang
Jan 15, 2015
Contributor
@retep998 by CPU socket, do you mean chip? That is, the 4790 is:
- 1 chip
- 4 physical CPUs (aka cores)
- 8 logical CPUs
I just want to make sure I understand properly.
This comment has been minimized.
This comment has been minimized.
retep998
Jan 15, 2015
Member
@tshepang Something like that, yes. People often use the term CPU interchangeably for both a CPU chip/socket and a CPU core, so cpu_count is a confusing name.
This comment has been minimized.
This comment has been minimized.
tshepang
Jan 15, 2015
Contributor
@retep998 in software terms, CPUs normally refer to logical CPUs. I think that's the definition most software cares about... It's rare for software to care about actual cores. For those special cases, we'll have to add something like physical_cpu_count or core_count or...
alexcrichton
referenced this pull request
Jan 13, 2015
Merged
Amend to RFC 517: add subsection on string handling #575
nrc
assigned
aturon
Jan 15, 2015
This comment has been minimized.
This comment has been minimized.
arthurprs
commented
Jan 27, 2015
|
Please let's get the names consistent using Ya, there's a lot of discussion in the previous PR about thread/physical-core count and here. |
This comment has been minimized.
This comment has been minimized.
|
I've got a working implementation of There are still a few lingering questions on this RFC, but I'd like to propose marking those APIs as
If I missed something though please let me know! |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@brson: |
This comment has been minimized.
This comment has been minimized.
|
re: |
This comment has been minimized.
This comment has been minimized.
|
An update on this RFC: Use of After discussion above and on IRC, it seems that everyone is basically happy to use Variants for returning `String The idea that @wycats mentioned about However, it focuses just on I suggest that instead we keep |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Jan 30, 2015
This comment has been minimized.
This comment has been minimized.
|
There seems to be broad enough support for this RFC that I am going to merge it. There are still some components which are going to be left Thanks again for all the feedback everyone, it has all been quite helpful! |
aturon commentedJan 13, 2015
The IO reform RFC is being split into several semi-independent pieces, posted as PRs like this one.
This RFC amendment adds material for
std::env.Rendered