Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] stderr is duplicated in mlflow::mlflow_run() in R #2673

Closed
lorenzwalthert opened this issue Apr 5, 2020 · 5 comments
Closed

[BUG] stderr is duplicated in mlflow::mlflow_run() in R #2673

lorenzwalthert opened this issue Apr 5, 2020 · 5 comments
Labels
area/projects MLproject format, project running backends bug Something isn't working help wanted We would like help from the community to add this support language/r R APIs and clients priority/important-longterm Important over the long term, but may not be staffed or may need multiple releases to complete.

Comments

@lorenzwalthert
Copy link
Contributor

lorenzwalthert commented Apr 5, 2020

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOs 10.15.3
  • MLflow installed from (source or binary): binary
  • MLflow version (run mlflow --version): 1.7.0
  • Python version: 3.6
  • npm version, if running the dev UI:
  • Exact command to reproduce:
mlflow::mlflow_run('.', entry_point = 'R/script.R', parameters = list(
  large_dbl = 2,
  other = 3
))

Describe the problem

Running the above code returns

2020/04/05 15:10:05 INFO mlflow.projects: === Created directory /var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T/tmpq2y52qgo for downloading remote URIs passed to arguments of type 'path' ===
2020/04/05 15:10:05 INFO mlflow.projects: === Running command 'source /Users/lorenz/opt/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-da39a3ee5e6b4b0d3255bfef95601890afd80709 1>&2 && Rscript -e "mlflow::mlflow_source('R/script.R')" --args --large_dbl 2 --other 3' in run with ID '6e9154ec335c42c2930e46f8233bd18a' === 
2020/04/05 15:10:05 INFO mlflow.projects: === Created directory /var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T/tmpq2y52qgo for downloading remote URIs passed to arguments of type 'path' ===
2020/04/05 15:10:05 INFO mlflow.projects: === Running command 'source /Users/lorenz/opt/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-da39a3ee5e6b4b0d3255bfef95601890afd80709 1>&2 && Rscript -e "mlflow::mlflow_source('R/script.R')" --args --large_dbl 2 --other 3' in run with ID '6e9154ec335c42c2930e46f8233bd18a' === 
2020/04/05 15:10:06 INFO mlflow.projects: === Run (ID '6e9154ec335c42c2930e46f8233bd18a') succeeded ===
2020/04/05 15:10:06 INFO mlflow.projects: === Run (ID '6e9154ec335c42c2930e46f8233bd18a') succeeded ===

The same problem can be observed in CI/CD output, e.g here. Investigating, we see that in mlflow:::mlflow_cli(), the CLI is invoked with the run command from {processx} (substituting relevant arguments from the debugger):

run(mlflow_bin, args = unlist(args), echo = TRUE, 
        echo_cmd = FALSE, stderr_callback = function(x, p) cat(x, file = stderr()))

Running this in debugging mode, we see that apparently, stdout and stderr are for some reason interchanged. this output comes from stderr. The return value of the above command (without echo):

$status
[1] 0

$stdout
[1] ""

$stderr
[1] "2020/04/05 15:23:02 INFO mlflow.projects: === Created directory /var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T/tmpjb_u9bvr for downloading remote URIs passed to arguments of type 'path' ===\n2020/04/05 15:23:02 INFO mlflow.projects: === Running command 'source /Users/lorenz/opt/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-da39a3ee5e6b4b0d3255bfef95601890afd80709 1>&2 && Rscript -e \"mlflow::mlflow_source('R/script.R')\" --args --large_dbl 2 --other 3' in run with ID '5a7e4782051c40ed85dd630709afa882' === \n2020/04/05 15:23:03 INFO mlflow.projects: === Run (ID '5a7e4782051c40ed85dd630709afa882') succeeded ===\n"

$timeout
[1] FALSE

Because echo = TRUE and the stderr_callback is printing to the console, we get all output twice. I believe this is a problem with the {processx} package in the first place, so I filed an issue there too (r-lib/processx#250). For the time being, we could set the stderr callback to function(x, p) NULL when calling mlflow:::mlflow_cli() from mlflow::mlflow_runs(). Maybe you can clarify why the output does not go to stdout in the first place because to me, it would make more sense to send the output to stdout.

Code to reproduce issue

Run the above command in this minimally reproducible git repo, file R/runner.R.

@lorenzwalthert
Copy link
Contributor Author

lorenzwalthert commented Apr 5, 2020

Reference: r-lib/processx#250

@lorenzwalthert lorenzwalthert changed the title [BUG] stdout is duplicated in mlflow::mlflow_run() in R [BUG] stderr is duplicated in mlflow::mlflow_run() in R Apr 5, 2020
@pogil pogil added help wanted We would like help from the community to add this support language/r R APIs and clients labels Apr 17, 2020
@pogil
Copy link
Contributor

pogil commented Apr 17, 2020

@javierluraschi @kevinykuo is this something you could help with?

@lorenzwalthert
Copy link
Contributor Author

lorenzwalthert commented Apr 18, 2020

Note that the mlflow_log_artifact() API is also affected. It prints the logging 3 times.
I updated the above example to include a call to mlflow_log_artifact() and that's how it looks in a typical 80 character with terminal.

2020/04/18 08:45:04 INFO mlflow.projects: === Created directory /var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T/tmpv_4jh3fs for 
downloading remote URIs passed to arguments of type 'path' ===
2020/04/18 08:45:04 INFO mlflow.projects: === Created directory /var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T/tmpv_4jh3fs for 
downloading remote URIs passed to arguments of type 'path' ===
2020/04/18 08:45:04 INFO mlflow.projects: === Running command 'source
/Users/lorenz/opt/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-da39a3ee5e6b4b0d3255bfef95601890afd80709 
1>&2 && Rscript -e "mlflow::mlflow_source('R/script.R')" --args --large_dbl 4
 --other 5 --lr 2' in run with ID '4fbdbce6dd8a4251a599465ce685df5b' === 
2020/04/18 08:45:04 INFO mlflow.projects: === Running command 'source
/Users/lorenz/opt/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow
da39a3ee5e6b4b0d3255bfef95601890afd80709 1>&2 && Rscript -e
"mlflow::mlflow_source('R/script.R')" --args --large_dbl 4 --other 5 --lr 2' in run with ID
'4fbdbce6dd8a4251a599465ce685df5b' === 
2020/04/18 08:45:11 INFO mlflow.store.artifact.cli: Logged artifact from local file
/var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T//RtmpKNTLVG/file362556bc031b to
artifact_path=random
2020/04/18 08:45:11 INFO mlflow.store.artifact.cli: Logged artifact from local file
/var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T//RtmpKNTLVG/file362556bc031b to
artifact_path=random
2020/04/18 08:45:11 INFO mlflow.store.artifact.cli: Logged artifact from local file
/var/folders/6g/bg5mcmsj7s51r33snbxhm1400000gn/T//RtmpKNTLVG/file362556bc031b to
artifact_path=random
Root URI: /Users/lorenz/git/mlflow-r-casting bug/mlruns/1/4fbdbce6dd8a4251a599465ce685df5b/artifacts
Root URI: /Users/lorenz/git/mlflow-r-casting bug/mlruns/1/4fbdbce6dd8a4251a599465ce685df5b/artifacts
2020/04/18 08:45:11 INFO mlflow.projects: === Run (ID '4fbdbce6dd8a4251a599465ce685df5b') succeeded ===
2020/04/18 08:45:11 INFO mlflow.projects: === Run (ID '4fbdbce6dd8a4251a599465ce685df5b') succeeded ===

I think this is an relevant user-facing problem because it convolutes the terminal substantially.

@dmatrix dmatrix added the area/projects MLproject format, project running backends label Jul 17, 2020
@juntai-zheng juntai-zheng added the priority/important-longterm Important over the long term, but may not be staffed or may need multiple releases to complete. label Jul 20, 2020
@yitao-li
Copy link
Contributor

yitao-li commented Jul 21, 2020

@lorenzwalthert @javierluraschi I think the fix for this might be as simple as #3145

Let me know what you think.

@yitao-li
Copy link
Contributor

yitao-li commented Oct 8, 2020

FYI the fix has been merged and this can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/projects MLproject format, project running backends bug Something isn't working help wanted We would like help from the community to add this support language/r R APIs and clients priority/important-longterm Important over the long term, but may not be staffed or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

5 participants