`elfshaker clone` #75

veselink1 · 2022-04-29T16:07:12Z

Motivation

The motivation for adding this command is to enable automatic fetching of remote packs.

`clone`

Usage

elfshaker clone <url> <directory>

Example

elfshaker clone https://github.com/elfshaker/manyclangs/releases/download/v0.9.0/aarch64-ubuntu2004-manyclangs.esi manyclangs

Implementation

Create a directory <directory>
Fetch the .esi (ElfShaker Index) file (via HTTP GET)
Store the file in <directory>/elfshaker_data/remotes/origin.esi (creating missing directories)
Fetch the .pack.idx of all packs listed in packs and store in elfshaker_data/packs/main

In case any of the steps 1-3 fails, <directory> is removed before the process exits.

`update`

Usage

elfshaker update

Implementation

Open elfshaker_data/remotes/*.esi
Read the property url
Fetch origin via HTTP GET (Headers: If-Modified-Since: <now> GMT)
Overwrite the .esi file with the response if Status: OK, exit if Status: Not modified, error if other
Fetch the .pack.idx listed in packs and overwrite the files elfshaker_data/packs/origin
- For all .pack.idx which are not available locally
- For all .pack.idx whose checksum on-disk does not match the checksum in the .esi

The above sequence of operations is carried out for all .esi files in the directory.

Any error is reported on stderr and cancels the operation for the target .esi, but not for any other indexes. The new .esi and .pack.idx are kept, the old ones are lost.

Changes to existing commands

The addition of clone changes the behaviour of existing commands.

`extract`

elfshaker extract [<remote>/<pack>]:<snapshot>

extract is extended to automatically fetch .pack files when those are available from a remote.
<remote> and <pack> below are resolved in the usual way (by reading available .pack.idx). If a matching pack cannot be found, the process exists with an appropriate error message.

If `elfshaker_data/<remote>/<pack>.pack` is not found

Find <pack>.pack in the list of packs in elfshaker_data/remotes/<remote>.esi
Fetch <pack>.pack, verify its checksum, and store to elfshaker_data/packs/<remote>/<pack>.pack
Extract <pack>:<snapshot> with the usual semantics

Otherwise

Proceed with the usual semantics of extract (whether success or error).

Incompatibilities

Since we are using elfshaker_data/packs/<remote> to store the packs, users should not create a directory with the same name to store packs.

`.esi` file format

The elfshaker index format is a plain text file. Values are tab-separated.

It starts with the line meta v1. The second line starts with url followed by the URL of the .esi file on the hosting server, which is used to refresh the .esi during elfshaker update.
The following lines are tab-separated pack checksum, pack index checksum and URL (relative to url or absolute) from which to fetch the pack file. Pack indexes must be obtainable by appending .idx to the strings in packs.

meta    v1
url    https://github.com/elfshaker/manyclangs/releases/download/v0.9.0/aarch64-ubuntu2004-manyclangs.esi
039c501ac8dfcac91c6f05601cee876e1cc07e17    91768d65e5095a85472378f6dece7c5fe2524e90    aarch64-ubuntu2004-manyclangs-202102.pack
cfd7585fe30db8a6690cb4425b94fbaeaeceb483    7871d5a9eb7d92cf5825dff75127b7d8ebf15dd7    aarch64-ubuntu2004-manyclangs-202103.pack

Future work

The design allows for multiple remotes to be supported in the future, by having multiple .esi files and corresponding sub-directories under elfshaker_data/packs/. This makes the likelihood of a name clash between the names of the remotes and user-created directories in elfshaker_data/packs/ greater, but since those are user-defined identifiers, the expectation is that users would be able to resolve these clashes manually, by naming remotes accordingly.

The operations above are defined in terms of operation on files in elfshaker_data/remotes and should work the same regardless of the number of remotes added. (update updates all remotes, extract looks up all .pack.idx)

veselink1 · 2022-04-29T16:09:26Z

This PR also includes an improved progress bar.

Also added ureq and url crates as dependencies. Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

peterwaller-arm · 2022-05-03T19:39:47Z

Heads up, the test appears to be failing on github.

veselink1 · 2022-05-03T20:02:35Z

Heads up, the test appears to be failing on github.

I wanted to write self-contained functional tests like the ones we have for other commands, however, I have struggled to do so.

The only functional test I have added is for elfshaker update and it uses netcat to simulate an .esi and pack files available remotely (by binding to local ports and sending back an HTTP response). Unfortunately, it only succeeeds in a fraction of cases, and I have no idea why. Do you think this test is necessary and can you think of a better way of doing what I'm describing?

peterwaller-arm · 2022-05-05T08:00:44Z

Unfortunately, it only succeeeds in a fraction of cases, and I have no idea why.

Ah, I see it. The issue is that you have a race condition.

elfshaker/test-scripts/check.sh

Lines 385 to 388 in 99acc18

    
           (printf "HTTP/1.1 200 OK\r\nContent-Length: $pack_idx_length\r\n\r\n"; cat "$input.idx") | nc -l -p 43103 & 
        
           server2_pid=$! 
        
           if ! "$elfshaker" update; then

There is no 'happens before' constraint on the listen() inside the backgrounded nc relative to the connect inside elfshaker update. So the connect is happening before the listen. To do this properly you need to arrange that you listen before running elfshaker update. So far as I'm aware, there is no straightforwardly correct/fast way to make bash wait until nc is listening.

If I were trying to do something like this I might try and use a unix socket rather than a TCP socket (to avoid issues of port collisions during a parallel run), and open the socket 'in the same process' as the one which invokes elfshaker update. And then pass that open file descriptor through to the backgrounded shell. Or some variant on this, there are lots of possible solutions but they have to be along the lines of 'elfshaker doesn't start until the listen syscall has completed'.

If you get lucky you can observe the problem with strace -f, note that when it fails you can see a 'connect' but no 'listen'.

pwaller

Phew, big pull request. Nice feature, I like the implementation and the way it works. It looks really well thought out.

I'm approving this as-is and think it should be merged, I've made mostly stylistic/feely comments, which you may ignore or address sooner or later as you see fit.

You've written pretty good documentation in the PR body, would be nice to see that land in the repo somewhere (docs/contributors? + some user docs?). At a minimum I think we should create an issue to follow up on that and have it block the 1.0 milestone.

I note that If I pipe elfshaker pack into cat, the final render is this, suggesting something is off with terminal detection, perhaps?

Compressing objects [3/3]=============>        ] 66% [2/3]

If it's straightfoward, maybe split the progress bar into a separate review? (and don't address my below comment there!) If no, no problem, we can address possible bugs like the above as we go along.

It's also not totally clear to me what the '3' represents but I haven't spent too long trying to figure it out. I think it might be the number of shards?

At some point the progress reporting of packing needs to be fixed to instead report the proportion of uncompressed bytes fed to the compressor, I think this would lead to better behaviour, because at the moment progress is reported as a percentage of shards completed, where different shards can take quite different lengths of time to compress. By showing proportion of bytes fed to the compressor I think you wouldn't totally fix the unformity issue but you would at least have something to report progress at a finer granularity.

src/bin/elfshaker/clone.rs

src/bin/elfshaker/utils.rs

src/repo/remote.rs

src/repo/repository.rs

pwaller · 2022-06-06T21:38:13Z

src/repo/repository.rs

+    /// macro tasks (e.g. extract snapshot includes fetching the .esi,
+    /// fetching individual pack, etc.), it is useful to use a "factory",
+    /// instead of argument passing for the [`ProgressReporter`].
+    progress_reporter_factory: Box<dyn Fn(&str) -> ProgressReporter<'static> + Send + Sync>,


I don't feel great about putting this state on the repository if it can be avoided, since it doesn't feel like repository state (application state, maybe). That said, I understand there are tradeoffs to make here so I'm not going to request you change it.

This is not really state as much as it is configuration. It is a progress reporter that is local to the repository object and can be used internally to print progress.

Note that the progress information and the methods we print it from might change with time. So instead of passing in a ProgressReporter to every public method, we set it for the whole Repository. The second-best alternative would be to use a global.

Does this change how you see this?

I'm not proposing to change this now, though perhaps there are refinements we could do to progress reporter if it could be broken out into a separate PR.

Just to flesh out my undestanding: I can imagine a case where we might have more than one repository. Would this work as is today, or would it be broken? If it were a global, would it be broken?

When I say 'state', in my mind, I mean that the repository should only really know about things which relate to the repository. Putting something unrelated to it seems strange. The 'struct' which holds knowledge of things relating to application state might be better called App in my mind. I think I would object less to the Repository holding a reference to an App so that it could print things.

I don't particularly like the idea of the reporter factory being a global state, either, from simply an aversion to having global state. Howeer, it's intersting to entertain the idea -- is it really so different from println!() and friends, which effectively access a writer through magic? Possibly; yes it is, because println!() presumably doesn't hold any state other than perhaps buffers, contrasting with the progress reporter, which is keeping track of how much work has been done.

Something else I would like to see tried is if the reporter factory were simply passed-by-argument instead. Do we have a sense of how bad that would be? How many function signatures would need to be modified? If it were, say, less than 10, I might even favour simply passing it in.

I gave you incorrect information in my comment above which confused the both of us -- sorry about that.

It is a progress reporter ++factory++ that is local to the repository object and can be used internally to ++create ProgressReporters which can be used to++ print progress.

This progress_reporter_factory doesn't hold state. It holds a boxed callable which creates a ProgressReporter -- it is a stateless factory. The progress reporter is what holds the state, but ProgressReporters are created locally, by calling (self.progress_reporter_factory)(...). The ProgressReporters are passed as arguments. The factory is configured from the top-level run method:

elfshaker/src/bin/elfshaker/clone.rs

Line 47 in 627f704

repo.set_progress_reporter(|msg| create_percentage_print_reporter(msg, 5));

veselink1

Thanks for the review! Lots of small things I missed.

I believe I've responded to everything except the remark about the error type, I need to think a bit more about whether what you're suggesting is better.

src/repo/remote.rs

veselink1 · 2022-06-06T22:33:06Z

src/repo/remote.rs

+    timeout: Option<Duration>,
+    if_modified_since: Option<SystemTime>,
+) -> Result<Option<(usize, impl Read)>, Error> {
+    let mut agent_builder = AgentBuilder::new();


I will try to refactor this so it can be passed from higher up the call stack.

src/repo/remote.rs

Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>

Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

`elfshaker clone` clones a remote repository into a new directory. It does that by fetching a remote `.esi` (over HTTP) and creating a remote called origin. After fetching the `.esi`, the command proceeds to fetch all of the available `.pack.idx` from the remote, so that `elfshaker update` does not have to be run manually. Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>

Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>

veselink1 · 2022-06-08T15:25:59Z

I think we're ready to merge. I've split off the progress bar code, but not the rest of the ProgressReporter-related changes because some are tightly-coupled with this PR. (ProgressReporter-aware wrappers like ProgressWriter and the argument passing that occurs everywhere.)

veselink1 requested a review from peterwaller-arm April 29, 2022 16:07

veselink1 added 2 commits April 30, 2022 08:09

Add RemoteIndex

a423a53

Also added ureq and url crates as dependencies. Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

Add elfshaker update

015043b

Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

veselink1 force-pushed the clone branch from 9d2cbfa to 99acc18 Compare April 30, 2022 07:10

veselink1 force-pushed the clone branch from 99acc18 to 0f0c3dd Compare May 5, 2022 12:17

veselink1 force-pushed the clone branch from 0f0c3dd to 2802589 Compare June 6, 2022 14:18

pwaller approved these changes Jun 6, 2022

View reviewed changes

veselink1 commented Jun 6, 2022

View reviewed changes

veselink1 and others added 3 commits June 8, 2022 15:34

Discover and fetch remote packs on-demand

fb61f44

Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>

Fix loose pack check

45df296

Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

Fix handling of relative pack URLs and clippy

ccd5da0

Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

veselink1 force-pushed the clone branch 2 times, most recently from 2c0ed74 to cf6efc0 Compare June 8, 2022 15:06

veselink1 and others added 4 commits June 8, 2022 16:14

Add source path to .esi parse errors

d03afcd

Signed-off-by: Veselin Karaganev <veselin.karaganev@arm.com>

Fix clippy warnings

b151ba9

Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>

Test elfshaker update and clone

6c7caf6

Signed-off-by: Veselin Karaganev <vesko.karaganev@gmail.com>

veselink1 force-pushed the clone branch from cf6efc0 to 6c7caf6 Compare June 8, 2022 15:16

veselink1 merged commit 748d88f into main Jun 8, 2022

veselink1 deleted the clone branch June 8, 2022 15:26

peterwaller-arm mentioned this pull request Aug 21, 2023

Issues encountered by newcomers #10

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`elfshaker clone` #75

`elfshaker clone` #75

veselink1 commented Apr 29, 2022

veselink1 commented Apr 29, 2022

peterwaller-arm commented May 3, 2022

veselink1 commented May 3, 2022

peterwaller-arm commented May 5, 2022 •

edited

pwaller left a comment

pwaller Jun 6, 2022

veselink1 Jun 6, 2022 •

edited

pwaller Jun 7, 2022 •

edited

veselink1 Jun 8, 2022

veselink1 left a comment

veselink1 Jun 6, 2022

veselink1 commented Jun 8, 2022

elfshaker clone #75

elfshaker clone #75

Conversation

veselink1 commented Apr 29, 2022

Motivation

clone

Usage

Example

Implementation

update

Usage

Implementation

Changes to existing commands

extract

If elfshaker_data/<remote>/<pack>.pack is not found

Otherwise

Incompatibilities

.esi file format

Future work

veselink1 commented Apr 29, 2022

peterwaller-arm commented May 3, 2022

veselink1 commented May 3, 2022

peterwaller-arm commented May 5, 2022 • edited

pwaller left a comment

Choose a reason for hiding this comment

pwaller Jun 6, 2022

Choose a reason for hiding this comment

veselink1 Jun 6, 2022 • edited

Choose a reason for hiding this comment

pwaller Jun 7, 2022 • edited

Choose a reason for hiding this comment

veselink1 Jun 8, 2022

Choose a reason for hiding this comment

veselink1 left a comment

Choose a reason for hiding this comment

veselink1 Jun 6, 2022

Choose a reason for hiding this comment

veselink1 commented Jun 8, 2022

`elfshaker clone` #75

`elfshaker clone` #75

`clone`

`update`

`extract`

If `elfshaker_data/<remote>/<pack>.pack` is not found

`.esi` file format

peterwaller-arm commented May 5, 2022 •

edited

veselink1 Jun 6, 2022 •

edited

pwaller Jun 7, 2022 •

edited