New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intrinsic to download a file #6660

Merged
merged 3 commits into from Nov 8, 2018

Conversation

3 participants
@illicitonion
Contributor

illicitonion commented Oct 19, 2018

No description provided.

@illicitonion illicitonion requested a review from stuhood Oct 19, 2018

@illicitonion illicitonion added this to To Do in Python Pipeline Porting via automation Oct 19, 2018

// TODO: Share a client which is pre-configured.
// TODO: Async
let mut response = try_future!(
reqwest::Client::new()

This comment has been minimized.

@illicitonion

illicitonion Oct 19, 2018

Contributor

Switching this to use a reqwest::async::Client would be great

This comment has been minimized.

@stuhood

stuhood Oct 22, 2018

Member

Probably fine for a followup ticket that moves the client.

.map_err(|err| throw(&format!("Error making tempdir to download file: {}", err)))
);
// TODO: Share a client which is pre-configured.

This comment has been minimized.

@illicitonion

illicitonion Oct 19, 2018

Contributor

A client should maybe be initialised and live on Core for sharing, as creating a client is non-trivial overhead

This comment has been minimized.

@stuhood

stuhood Oct 22, 2018

Member

+1

Although probably not a blocker for a first version.

@illicitonion

This comment has been minimized.

Contributor

illicitonion commented Oct 19, 2018

This is untested, but should be pretty easy to test :)

@illicitonion

This comment has been minimized.

Contributor

illicitonion commented Oct 19, 2018

Also, all of the .expect("TODO")s should become proper error handling

@stuhood

Looks good! I think I agree that almost everything except the tests can be deferred to followups.

@@ -125,6 +125,11 @@ class DirectoryToMaterialize(datatype([('path', text_type), ('directory_digest',
"""A request to materialize the contents of a directory digest at the provided path."""
pass
class UrlToFetch(datatype([('url', text_type)])):

This comment has been minimized.

@stuhood

stuhood Oct 22, 2018

Member

I would love to require a digest here (and to then expose it in https://github.com/pantsbuild/pants/pull/6660/files#diff-11db197014e1e80c9551e9b61aba6aa0R1018).

But since it requires a bit of design work, I've opened #6672, and we can leave a TODO here.

This comment has been minimized.

@stuhood

stuhood Oct 24, 2018

Member

@blorente : After reviewing #6661 I realized that it would be feasible to require the Digest now, because the initial usecase for file downloading is downloading a universal/cross-platform pex. That sidesteps the whole "needing to configure digests for a bunch of different binaries" issue.

It might be a good idea to have and use the Digest in this first PR, because I think that it would clean up where/how the file is stored and cached. If we just totally skip storing/caching in the first PR, then maybe it would be better to add the required digest as a small followup PR... smaller than #6672 even, because #6661 doesn't need to be cross-platform.

Having typed this out, tiny followup PR seems very reasonable, but interested in your thoughts.

This comment has been minimized.

@blorente

blorente Oct 24, 2018

Contributor

I have no strong feelings either way, but I feel like it would be better to have it in this PR, thinking about when people look back, I think it makes sense to put it in the same commit if we have decided that this is the behavior we want and the change is so tiny. If I were to look at the history, I think I would expect it to be in this PR.

);
let file_name = url
.path_segments()
.expect("TODO")

This comment has been minimized.

@stuhood

stuhood Oct 22, 2018

Member

Should fix before landing. Multiple expects like this seem like a good situation for combinators like map/and_then:

let file_name = url.path_segments().and_then(|ps| ps.last()).map(|f| f.to_owned()).ok_or_else(..);
.map_err(|err| throw(&format!("Error making tempdir to download file: {}", err)))
);
// TODO: Share a client which is pre-configured.

This comment has been minimized.

@stuhood

stuhood Oct 22, 2018

Member

+1

Although probably not a blocker for a first version.

// TODO: Share a client which is pre-configured.
// TODO: Async
let mut response = try_future!(
reqwest::Client::new()

This comment has been minimized.

@stuhood

stuhood Oct 22, 2018

Member

Probably fine for a followup ticket that moves the client.

@stuhood stuhood added the P3 - M3 label Oct 22, 2018

@stuhood stuhood moved this from To Do to In Progress in Python Pipeline Porting Oct 24, 2018

@stuhood stuhood force-pushed the twitter:dwagnerhall/v2/urlfetch branch 2 times, most recently from 3192686 to d005bc1 Oct 25, 2018

@stuhood

This comment has been minimized.

Member

stuhood commented Oct 26, 2018

@blorente : In local testing, I think I confirmed that this does not fail on 404, meaning that currently #6661 renders a 404 when run with -ldebug, but then proceeds to try and run the test anyway.

The reqwest docs show checking the status field: https://docs.rs/reqwest/0.9.3/reqwest/ ... I haven't looked more deeply.

@blorente blorente referenced this pull request Oct 26, 2018

Closed

Follow work of #6660 #6689

@stuhood stuhood assigned stuhood and unassigned blorente Nov 1, 2018

@blorente

This comment has been minimized.

Contributor

blorente commented Nov 1, 2018

I have been able to repro the travis failure in a linux machine running linux mint (Ubuntu). After some fiddling, it seems that the call to chttp is triggering a segfault. To illustrate, if I modify DownloadedFile::run to have this code:

    println!("BL: Calling chttp");

    let mut resp = chttp::get(&url).expect("Response didn't arrive correctly");

    println!("BL: Response arrived {:?}", resp);

The first println gets triggered, but not the second:

  FAILURE: Test was killed by signal 11.collected 33 items / 32 deselected
  tests/python/pants_test/engine/test_fs.py BL: Calling chttp
  tests/python/pants_test/engine:fs                                               .....   NOT RUN

This doesn't happen in a MacOS machine.

@stuhood

This comment has been minimized.

Member

stuhood commented Nov 1, 2018

I've opened sagebind/chttp#15 for the chttp failure. Will look at the linking issue a bit more today.

@stuhood

This comment has been minimized.

Member

stuhood commented Nov 2, 2018

With regard to what we've tried:

  • chttp (local success, linux segfault: sagebind/chttp#15; but works in a standalone crate)
    • ldd reports a dependency on openssl
    • !! have minimal repro indicating that this is related to interplay with grpcio
  • reqwests (ldd does not report a dependency on openssl, fails with missing symbol at python import time):
diff --git a/src/rust/engine/src/cffi_build.rs b/src/rust/engine/src/cffi_build.rs
index 2653b34..03250e4 100644
--- a/src/rust/engine/src/cffi_build.rs
+++ b/src/rust/engine/src/cffi_build.rs
@@ -53,4 +53,6 @@ fn main() {
   if cfg!(target_os = "linux") {
     println!("cargo:rustc-link-lib=static=stdc++");
+    println!("cargo:rustc-link-lib=crypto");
+    println!("cargo:rustc-link-lib=ssl");
   }

@stuhood stuhood assigned illicitonion and unassigned stuhood Nov 2, 2018

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/v2/urlfetch branch 4 times, most recently from e5d0123 to 3b31b79 Nov 5, 2018

@illicitonion illicitonion changed the title from WIP: Add intrinsic to download a file to Add intrinsic to download a file Nov 7, 2018

@illicitonion

This comment has been minimized.

Contributor

illicitonion commented Nov 7, 2018

@stuhood This is now reviewable, separate commits are very separate and self-contained (probably to the extent that they should be multiple PRs - we should probably not squash this when merging)

@stuhood

stuhood approved these changes Nov 7, 2018

Thanks!

Show resolved Hide resolved src/python/pants/util/contextutil.py
Show resolved Hide resolved src/python/pants/util/contextutil.py
@stuhood

This comment has been minimized.

Member

stuhood commented Nov 8, 2018

Applied my own review feedback to try and save you some CI time. Let me know if you'd rather I didn't in the future.

stuhood added a commit to twitter/pants that referenced this pull request Nov 8, 2018

illicitonion added some commits Nov 7, 2018

Disable HTTPS support in grpc
We're finding it hard to link both grpcio (which statically links
openssl) and reqwest (which dynamically links openssl).

We plan to switch to Tower soon, so just disabling HTTPS support at the
grpcio layer for now, which I don't think anyone is relying on.

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/v2/urlfetch branch from 6cbd073 to 51592eb Nov 8, 2018

@illicitonion illicitonion merged commit 31ccf73 into pantsbuild:master Nov 8, 2018

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

Python Pipeline Porting automation moved this from In Progress to Done Nov 8, 2018

@illicitonion illicitonion deleted the twitter:dwagnerhall/v2/urlfetch branch Nov 8, 2018

wisechengyi added a commit to wisechengyi/pants that referenced this pull request Nov 12, 2018

Add intrinsic to download a file (pantsbuild#6660)
* Allow temporary http server to be started

* Disable HTTPS support in grpc

We're finding it hard to link both grpcio (which statically links
openssl) and reqwest (which dynamically links openssl).

We plan to switch to Tower soon, so just disabling HTTPS support at the
grpcio layer for now, which I don't think anyone is relying on.

* Add intrinsic to download a file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment