-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elfshaker clone
#75
elfshaker clone
#75
Conversation
This PR also includes an improved progress bar. |
Also added ureq and url crates as dependencies. Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>
Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>
Heads up, the test appears to be failing on github. |
I wanted to write self-contained functional tests like the ones we have for other commands, however, I have struggled to do so. The only functional test I have added is for |
Ah, I see it. The issue is that you have a race condition. elfshaker/test-scripts/check.sh Lines 385 to 388 in 99acc18
There is no 'happens before' constraint on the listen() inside the backgrounded nc relative to the connect inside elfshaker update. So the connect is happening before the listen. To do this properly you need to arrange that you listen before running elfshaker update. So far as I'm aware, there is no straightforwardly correct/fast way to make bash wait until nc is listening.
If I were trying to do something like this I might try and use a unix socket rather than a TCP socket (to avoid issues of port collisions during a parallel run), and open the socket 'in the same process' as the one which invokes elfshaker update. And then pass that open file descriptor through to the backgrounded shell. Or some variant on this, there are lots of possible solutions but they have to be along the lines of 'elfshaker doesn't start until the listen syscall has completed'. If you get lucky you can observe the problem with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, big pull request. Nice feature, I like the implementation and the way it works. It looks really well thought out.
I'm approving this as-is and think it should be merged, I've made mostly stylistic/feely comments, which you may ignore or address sooner or later as you see fit.
You've written pretty good documentation in the PR body, would be nice to see that land in the repo somewhere (docs/contributors? + some user docs?). At a minimum I think we should create an issue to follow up on that and have it block the 1.0 milestone.
I note that If I pipe elfshaker pack
into cat
, the final render is this, suggesting something is off with terminal detection, perhaps?
Compressing objects [3/3]=============> ] 66% [2/3]
If it's straightfoward, maybe split the progress bar into a separate review? (and don't address my below comment there!) If no, no problem, we can address possible bugs like the above as we go along.
It's also not totally clear to me what the '3' represents but I haven't spent too long trying to figure it out. I think it might be the number of shards?
At some point the progress reporting of packing needs to be fixed to instead report the proportion of uncompressed bytes fed to the compressor, I think this would lead to better behaviour, because at the moment progress is reported as a percentage of shards completed, where different shards can take quite different lengths of time to compress. By showing proportion of bytes fed to the compressor I think you wouldn't totally fix the unformity issue but you would at least have something to report progress at a finer granularity.
/// macro tasks (e.g. extract snapshot includes fetching the .esi, | ||
/// fetching individual pack, etc.), it is useful to use a "factory", | ||
/// instead of argument passing for the [`ProgressReporter`]. | ||
progress_reporter_factory: Box<dyn Fn(&str) -> ProgressReporter<'static> + Send + Sync>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel great about putting this state on the repository if it can be avoided, since it doesn't feel like repository state (application state, maybe). That said, I understand there are tradeoffs to make here so I'm not going to request you change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really state as much as it is configuration. It is a progress reporter that is local to the repository object and can be used internally to print progress.
Note that the progress information and the methods we print it from might change with time. So instead of passing in a ProgressReporter
to every public method, we set it for the whole Repository
. The second-best alternative would be to use a global.
Does this change how you see this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing to change this now, though perhaps there are refinements we could do to progress reporter if it could be broken out into a separate PR.
Just to flesh out my undestanding: I can imagine a case where we might have more than one repository. Would this work as is today, or would it be broken? If it were a global, would it be broken?
When I say 'state', in my mind, I mean that the repository should only really know about things which relate to the repository. Putting something unrelated to it seems strange. The 'struct' which holds knowledge of things relating to application state might be better called App
in my mind. I think I would object less to the Repository
holding a reference to an App
so that it could print things.
I don't particularly like the idea of the reporter factory being a global state, either, from simply an aversion to having global state. Howeer, it's intersting to entertain the idea -- is it really so different from println!()
and friends, which effectively access a writer through magic? Possibly; yes it is, because println!()
presumably doesn't hold any state other than perhaps buffers, contrasting with the progress reporter, which is keeping track of how much work has been done.
Something else I would like to see tried is if the reporter factory were simply passed-by-argument instead. Do we have a sense of how bad that would be? How many function signatures would need to be modified? If it were, say, less than 10, I might even favour simply passing it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave you incorrect information in my comment above which confused the both of us -- sorry about that.
It is a progress reporter ++factory++ that is local to the repository object and can be used internally to ++create ProgressReporters which can be used to++ print progress.
This progress_reporter_factory
doesn't hold state. It holds a boxed callable which creates a ProgressReporter
-- it is a stateless factory. The progress reporter is what holds the state, but ProgressReporters are created locally, by calling (self.progress_reporter_factory)(...)
. The ProgressReporters are passed as arguments. The factory is configured from the top-level run
method:
elfshaker/src/bin/elfshaker/clone.rs
Line 47 in 627f704
repo.set_progress_reporter(|msg| create_percentage_print_reporter(msg, 5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Lots of small things I missed.
I believe I've responded to everything except the remark about the error type, I need to think a bit more about whether what you're suggesting is better.
src/repo/remote.rs
Outdated
timeout: Option<Duration>, | ||
if_modified_since: Option<SystemTime>, | ||
) -> Result<Option<(usize, impl Read)>, Error> { | ||
let mut agent_builder = AgentBuilder::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to refactor this so it can be passed from higher up the call stack.
Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>
Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>
Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>
2c0ed74
to
cf6efc0
Compare
Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>
`elfshaker clone` clones a remote repository into a new directory. It does that by fetching a remote `.esi` (over HTTP) and creating a remote called origin. After fetching the `.esi`, the command proceeds to fetch all of the available `.pack.idx` from the remote, so that `elfshaker update` does not have to be run manually. Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>
Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>
Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>
I think we're ready to merge. I've split off the progress bar code, but not the rest of the ProgressReporter-related changes because some are tightly-coupled with this PR. (ProgressReporter-aware wrappers like ProgressWriter and the argument passing that occurs everywhere.) |
Motivation
The motivation for adding this command is to enable automatic fetching of remote packs.
clone
Usage
elfshaker clone <url> <directory>
Example
Implementation
<directory>
<directory>/elfshaker_data/remotes/origin.esi
(creating missing directories).pack.idx
of all packs listed inpacks
and store inelfshaker_data/packs/main
In case any of the steps 1-3 fails,
<directory>
is removed before the process exits.update
Usage
elfshaker update
Implementation
elfshaker_data/remotes/*.esi
url
origin
via HTTP GET (Headers:If-Modified-Since: <now> GMT
).esi
file with the response ifStatus: OK
, exit ifStatus: Not modified
, error if other.pack.idx
listed inpacks
and overwrite the fileselfshaker_data/packs/origin
.pack.idx
which are not available locally.pack.idx
whose checksum on-disk does not match the checksum in the .esiThe above sequence of operations is carried out for all
.esi
files in the directory.Any error is reported on stderr and cancels the operation for the target
.esi
, but not for any other indexes. The new.esi
and.pack.idx
are kept, the old ones are lost.Changes to existing commands
The addition of
clone
changes the behaviour of existing commands.extract
elfshaker extract [<remote>/<pack>]:<snapshot>
extract
is extended to automatically fetch.pack
files when those are available from a remote.<remote>
and<pack>
below are resolved in the usual way (by reading available.pack.idx
). If a matching pack cannot be found, the process exists with an appropriate error message.If
elfshaker_data/<remote>/<pack>.pack
is not found<pack>.pack
in the list of packs inelfshaker_data/remotes/<remote>.esi
<pack>.pack
, verify its checksum, and store toelfshaker_data/packs/<remote>/<pack>.pack
<pack>:<snapshot>
with the usual semanticsOtherwise
Proceed with the usual semantics of
extract
(whether success or error).Incompatibilities
elfshaker_data/packs/<remote>
to store the packs, users should not create a directory with the same name to store packs..esi
file formatThe elfshaker index format is a plain text file. Values are tab-separated.
It starts with the line
meta v1
. The second line starts withurl
followed by the URL of the.esi
file on the hosting server, which is used to refresh the .esi duringelfshaker update
.The following lines are tab-separated pack checksum, pack index checksum and URL (relative to
url
or absolute) from which to fetch the pack file. Pack indexes must be obtainable by appending.idx
to the strings inpacks
.Future work
The design allows for multiple remotes to be supported in the future, by having multiple .esi files and corresponding sub-directories under
elfshaker_data/packs/
. This makes the likelihood of a name clash between the names of the remotes and user-created directories inelfshaker_data/packs/
greater, but since those are user-defined identifiers, the expectation is that users would be able to resolve these clashes manually, by naming remotes accordingly.The operations above are defined in terms of operation on files in
elfshaker_data/remotes
and should work the same regardless of the number of remotes added. (update updates all remotes, extract looks up all .pack.idx)