Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow crashes R session with fatal error #6049

Open
jeffkeller87 opened this issue Jan 17, 2020 · 7 comments
Open

Tensorflow crashes R session with fatal error #6049

jeffkeller87 opened this issue Jan 17, 2020 · 7 comments
Labels

Comments

@jeffkeller87
Copy link

@jeffkeller87 jeffkeller87 commented Jan 17, 2020

System details

RStudio Edition : Server
RStudio Version : 1.2.5033
OS Version      : Ubuntu 18.04
R Version       : 3.6.2

Steps to reproduce the problem

Install tensorflow in a fresh environment

install.packages("tensorflow")
install_tensorflow(method = "virtualenv", version = "2.0.0")

Run tensorflow commands from RStudio

x <- matrix(2, ncol = 1, nrow = 1)
m <- tf$matmul(x, x)
m

RStudio returns message:

R Session Error
The previous R session was abnormally terminated due to an unexpected crash.
You may have lost workspace data as a result of this crash.

Describe the problem in detail

The crash occurs whenever I attempt to use one of the tensorflow::tf$* functions.

While the error suggests that this is an R crash, I think it is an RStudio issue because I can put the same commands into a script tensorflow.R and run Rscript tensorflow.R from the terminal (even RStudio's built-in terminal) just fine:

ubuntu@run-5e21ef797043a50006f98198-sqlj7:/mnt$ Rscript tensorflow.R
2020-01-17 14:19:46.567377: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlowbinary was not compiled to use: AVX2 AVX512F FMA
2020-01-17 14:19:46.571845: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2020-01-17 14:19:46.572210: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560c23c662a0 executing computations on platform Host. Devices:
2020-01-17 14:19:46.572240: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
tf.Tensor([[4.]], shape=(1, 1), dtype=float64)

I can even run the commands interactively in a plain R session (with RStudio's built-in terminal, even):

ubuntu@run-5e21ef797043a50006f98198-sqlj7:/mnt$ R

R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(tensorflow)
> x <- matrix(2, ncol = 1, nrow = 1)
> m <- tf$matmul(x, x)
2020-01-17 14:22:40.037945: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlowbinary was not compiled to use: AVX2 AVX512F FMA
2020-01-17 14:22:40.042353: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2020-01-17 14:22:40.042770: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b39cc31050 executing computations on platform Host. Devices:
2020-01-17 14:22:40.042800: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
> m
tf.Tensor([[4.]], shape=(1, 1), dtype=float64)
>

Similar to #5406, except reinstalling tensorflow, R, or RStudio does not solve the issue.

I am running RStudio Server in Docker, so none of the instructions for generating a diagnostics report seemed applicable. Happy to do this if someone can tell me how.

Describe the behavior you expected

For the tensorflow code to run successfully.

@ronblum

This comment has been minimized.

Copy link

@ronblum ronblum commented Jan 17, 2020

@jeffkeller87 Thank you for filing the issue! I'm unable to reproduce it using the same environment, though. I'll mark this for triage as we continue work on RStudio and for suggestions on getting diagnostics from an RStudio Server in Docker.

@jmcphers

This comment has been minimized.

Copy link
Member

@jmcphers jmcphers commented Jan 20, 2020

This issue is very likely to be a crash in Tensorflow or a library that it uses, not RStudio, but we can't be sure without a call stack. Could you get one? Here's how:

  1. Download the latest daily build of RStudio: https://dailies.rstudio.com/
  2. Start RStudio
  3. Attach gdb to the rsession process
  4. Reproduce the crash
  5. Get a backtrace in gdb (bt) and post it here
@kevinykuo

This comment has been minimized.

Copy link

@kevinykuo kevinykuo commented Jan 24, 2020

Repro'd on RSP 1.2.5033-1, ubuntu 18.04, R 3.6.2, TF2

Running any TF related code, e.g. reticulate::import("tensorflow") crashes R session.

Same code runs fine in R through terminal.

Stack trace below

(gdb)
Thread 1 "rsession" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fcfc24c6801 in __GI_abort () at abort.c:79
#2  0x00007fcfc250f897 in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7fcfc263cb9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fcfc251690a in malloc_printerr (
    str=str@entry=0x7fcfc263e7a8 "munmap_chunk(): invalid pointer") at malloc.c:5350
#4  0x00007fcfc251decc in munmap_chunk (p=0x7ffef94b9830) at malloc.c:2846
#5  __GI___libc_free (mem=0x7ffef94b9840) at malloc.c:3117
#6  0x00005580b970a089 in std::_Hashtable<std::string, std::string, std::allocator<std::string>, std::__detail::_Identity, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::string, true>*) ()
#7  0x00007fcf46177747 in std::_Function_handler<tensorflow::Status (tensorflow::Status const&, tensorflow::OpDef const&), tensorflow::LoadLibrary(char const*, void**, void const**, unsigned long*)::{lambda(tensorflow::Status const&, tensorflow::OpDef const&)#1}>::_M_invoke(std::_Any_data const&, tensorflow::Status c---Type <return> to continue, or q <return> to quit---
onst&, tensorflow::OpDef const&) ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#8  0x00007fcf461a1d10 in tensorflow::OpRegistry::RegisterAlreadyLocked(std::function<tensorflow::Status (tensorflow::OpRegistrationData*)> const&) const ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#9  0x00007fcf461a2144 in tensorflow::OpRegistry::CallDeferred() const ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#10 0x00007fcf461a226a in tensorflow::OpRegistry::ProcessRegistrations() const ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#11 0x00007fcf46177ffb in tensorflow::LoadLibrary(char const*, void**, void const**, unsigned long*) ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2
---Type <return> to continue, or q <return> to quit---
#12 0x00007fcf4a127ea7 in TF_LoadLibrary ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#13 0x00007fcf496e6428 in _wrap_TF_LoadLibrary ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#14 0x00007fcfa56bd807 in _PyCFunction_FastCallDict ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#15 0x00007fcfa5702789 in call_function.lto_priv ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#16 0x00007fcfa5658153 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#17 0x00007fcfa5700ab0 in _PyFunction_FastCall ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#18 0x00007fcfa57028af in call_function.lto_priv ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
---Type <return> to continue, or q <return> to quit---
#19 0x00007fcfa5658153 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#20 0x00007fcfa5702063 in _PyEval_EvalCodeWithName ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#21 0x00007fcfa5702a6e in PyEval_EvalCodeEx ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#22 0x00007fcfa5654b2c in PyEval_EvalCode ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#23 0x00007fcfa565ffce in builtin_exec ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#24 0x00007fcfa56bda65 in PyCFunction_Call ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#25 0x00007fcfa565b2b1 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#26 0x00007fcfa5702063 in _PyEval_EvalCodeWithName ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
---Type <return> to continue, or q <return> to quit---
#27 0x00007fcfa5702616 in call_function.lto_priv ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#28 0x00007fcfa5658153 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#29 0x00007fcfa5700ab0 in _PyFunction_FastCall ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#30 0x00007fcfa57028af in call_function.lto_priv ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#31 0x00007fcfa5658153 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#32 0x00007fcfa5700ab0 in _PyFunction_FastCall ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#33 0x00007fcfa57028af in call_function.lto_priv ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#34 0x00007fcfa5658153 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
---Type <return> to continue, or q <return> to quit---
#35 0x00007fcfa5700ab0 in _PyFunction_FastCall ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#36 0x00007fcfa57028af in call_function.lto_priv ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#37 0x00007fcfa5658153 in _PyEval_EvalFrameDefault ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#38 0x00007fcfa5700ab0 in _PyFunction_FastCall ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#39 0x00007fcfa5702400 in _PyFunction_FastCallDict ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#40 0x00007fcfa568632f in _PyObject_FastCallDict ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#41 0x00007fcfa5686abc in _PyObject_CallMethodIdObjArgs ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#42 0x00007fcfa563c57d in PyImport_ImportModuleLevelObject ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
[[TRUNCATED]]
#518 0x00007fcfa563cd48 in PyImport_Import ()
   from target:/home/rstudio_user/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
#519 0x00007fcfa5a6cb8a in py_import (module=...) at python.cpp:286
#520 0x00007fcfa5a75aa7 in py_module_import (module=..., convert=<optimized out>) at python.cpp:2154
#521 0x00007fcfa5a60808 in _reticulate_py_module_import (moduleSEXP=<optimized out>,
    convertSEXP=0x5580bd3acce0) at RcppExports.cpp:438
#522 0x00007fcfc43e6474 in bcEval (body=body@entry=0x5580bf57e580, rho=rho@entry=0x5580bf96dcd0,
    useCache=useCache@entry=TRUE) at eval.c:7283
#523 0x00007fcfc43f01b0 in Rf_eval (e=0x5580bf57e580, rho=rho@entry=0x5580bf96dcd0) at eval.c:620
#524 0x00007fcfc43f200f in R_execClosure (call=call@entry=0x5580be265250,
    newrho=newrho@entry=0x5580bf96dcd0, sysparent=<optimized out>, rho=rho@entry=0x5580bf27dca0,
    arglist=arglist@entry=0x5580bf96dde8, op=op@entry=0x5580bf57e938) at eval.c:1780
#525 0x00007fcfc43f2d53 in Rf_applyClosure (call=call@entry=0x5580be265250, op=op@entry=0x5580bf57e938,
    arglist=<optimized out>, rho=rho@entry=0x5580bf27dca0, suppliedvars=<optimized out>) at eval.c:1706
#526 0x00007fcfc43e6d52 in bcEval (body=body@entry=0x5580be269ae8, rho=rho@entry=0x5580bf27dca0,
    useCache=useCache@entry=TRUE) at eval.c:6733
#527 0x00007fcfc43f01b0 in Rf_eval (e=0x5580be269ae8, rho=rho@entry=0x5580bf27dca0) at eval.c:620
#528 0x00007fcfc43f200f in R_execClosure (call=call@entry=0x5580bf27b488,
    newrho=newrho@entry=0x5580bf27dca0, sysparent=<optimized out>, rho=rho@entry=0x5580bb4f3ff0,
    arglist=arglist@entry=0x5580bf27ddb8, op=op@entry=0x5580be26cdf0) at eval.c:1780
#529 0x00007fcfc43f2d53 in Rf_applyClosure (call=call@entry=0x5580bf27b488, op=op@entry=0x5580be26cdf0,
    arglist=<optimized out>, rho=rho@entry=0x5580bb4f3ff0, suppliedvars=<optimized out>) at eval.c:1706
#530 0x00007fcfc43f037a in Rf_eval (e=e@entry=0x5580bf27b488, rho=rho@entry=0x5580bb4f3ff0) at eval.c:743
#531 0x00007fcfc4421132 in Rf_ReplIteration (rho=0x5580bb4f3ff0, savestack=<optimized out>,
    browselevel=0, state=0x7ffef94d0580) at main.c:260
#532 0x00007fcfc44214f1 in R_ReplConsole (rho=0x5580bb4f3ff0, savestack=0, browselevel=0) at main.c:310
#533 0x00007fcfc44215a8 in run_Rmainloop () at main.c:1086
#534 0x00005580b97290aa in rstudio::r::session::runEmbeddedR(rstudio::core::FilePath const&, rstudio::core::FilePath const&, bool, bool, SA_TYPE, rstudio::r::session::Callbacks const&, rstudio::r::session::InternalCallbacks*) ()
#535 0x00005580b970e34c in rstudio::r::session::run(rstudio::r::session::ROptions const&, rstudio::r::session::RCallbacks const&) ()
#536 0x00005580b9067e25 in main ()
@jeffkeller87

This comment has been minimized.

Copy link
Author

@jeffkeller87 jeffkeller87 commented Jan 24, 2020

Thanks @kevinykuo for the stack trace. I'm running RStudio Server in Docker and was having a hard time getting a terminal attached so that I could run gdb.

@wlandau wlandau mentioned this issue Feb 12, 2020
2 of 2 tasks complete
@jeffkeller87

This comment has been minimized.

Copy link
Author

@jeffkeller87 jeffkeller87 commented Feb 12, 2020

Apologies for the delay, took me a while to figure out how to use gdb in Docker.

I tested this against today's daily build of RStudio Server (1.3.825) and the crash does not occur.

Unless @jmcphers would like more information, I think this issue can be closed.

@wlandau

This comment has been minimized.

Copy link

@wlandau wlandau commented Feb 14, 2020

FYI: along the lines of what @r2evans suggested here, external processes launched via callr from the IDE's regular R console appear unaffected.

# https://rstudio.cloud/project/931550
callr::r(function() tensorflow::tf_config())
#> TensorFlow v2.0.0 ()
#> Python v3.6 (~/.local/share/r-miniconda/envs/r-reticulate/bin/python)
tensorflow::tf_config()
#> Crashes R session, sometimes with an explicit message:
#> *** Error in /usr/lib/rstudio-server/bin/rsession': munmap_chunk(): invalid pointer: 0x00007ffd59ea4240 ***

Oddly enough though, an equivalent external process with a multisession future hangs for a couple minutes and then crashes RStudio:

library(future)
plan(multisession)
# Works fine:
f <- future("hello world")
value(f)
# Hangs for a couple minutes and then crashes RStudio:
f <- future(tensorflow::tf_config())
value(f)
@wlandau

This comment has been minimized.

Copy link

@wlandau wlandau commented Feb 14, 2020

For onlookers, the solution that worked for me was to downgrade TensorFlow to version 1.13.1 (no need for v2 in my use case). I thought I already tried that, but then I realized keras::install_keras() automatically upgrades TensorFlow if I do not supply a version string to the tensorflow argument. Setup script for a local Python environment with the latest Keras and TensorFlow 1.13.1:

reticulate::install_miniconda("miniconda")
Sys.setenv(WORKON_HOME = "virtualenvs")
reticulate::virtualenv_create("r-reticulate", python = "miniconda/bin/python")
keras::install_keras(
  method = "virtualenv",
  conda = "miniconda/bin/conda",
  envname = "r-reticulate",
  tensorflow = "1.13.1",
  restart_session = FALSE
)
# Now set WORKON_HOME to the path to virtualenvs in .Renviron
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.