Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gen on master fails to precompile when inside Docker on a Mac #311

Open
bzinberg opened this issue Sep 18, 2020 · 11 comments
Open

Gen on master fails to precompile when inside Docker on a Mac #311

bzinberg opened this issue Sep 18, 2020 · 11 comments

Comments

@bzinberg
Copy link
Contributor

When trying to run GenSceneGraphs from Docker, @jcrosenb encountered the error

┌ Info: Precompiling GenSceneGraphs [dcc2c3cc-8ed1-11e9-1eb3-1dd41a11fee5]
└ @ Base loading.jl:1278
ERROR: LoadError: Failed to precompile Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a] to /root/.julia/compiled/v1.5/Gen/OEZG1_Y796z.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923
 [6] include(::Function, ::Module, ::String) at ./Base.jl:380
 [7] include(::Module, ::String) at ./Base.jl:368
 [8] top-level scope at none:2
 [9] eval at ./boot.jl:331 [inlined]
 [10] eval(::Expr) at ./client.jl:467
 [11] top-level scope at ./none:3
in expression starting at /julia_projects/GenSceneGraphs/src/GenSceneGraphs.jl:12
Failed to precompile GenSceneGraphs [dcc2c3cc-8ed1-11e9-1eb3-1dd41a11fee5] to /root/.julia/compiled/v1.5/GenSceneGraphs/N6XWy_Y796z.ji.

This persisted even after deleting the directory ~/.julia/compiled.

Possible next step A:
Build a Docker image containing Gen at a specific recent commit (e.g. current head, 07a5427) and see whether the same precompile error occurs on @jcrosenb's machine. If it does, we should get some specs of her machine and then see if it reproduces on a different Mac.

Possible next step B:
Test out https://github.com/probcomp/GenSceneGraphs.jl/pull/205 on a different Mac. If we can find someone who has a Mac and is able to do this, that would be the easiest path to finding out whether this is really a Mac-wide issue or is specific to some aspect of @jcrosenb's setup. (For the time being, she's attempting to get the Docker image running from within a Linux VM).

@bzinberg
Copy link
Contributor Author

@fplk, @agarret7, @nishadgothoskar, do you have access to a Mac and able to help out with step B here?

@fplk
Copy link

fplk commented Sep 18, 2020

I think I can reproduce this with a minimal example:

FROM ubuntu:20.04
LABEL maintainer="MIT Probabilistic Computing Project"
# Find current Julia version on https://julialang.org/downloads/
ARG JULIA_VERSION_SHORT="1.5"
ARG JULIA_VERSION_FULL="${JULIA_VERSION_SHORT}.1"
ENV JULIA_INSTALLATION_PATH=/opt/julia
ENV DEBIAN_FRONTEND=noninteractive
ENV JULIA_INSTALLATION_PATH=/opt/julia
RUN apt-get update -qq \
    && apt-get install -qq -y --no-install-recommends\
        build-essential \
        ca-certificates \
        curl \
        ffmpeg \
        git \
        graphviz \
        hdf5-tools \
        python3-dev \
        python3-pip \
        python3-tk \
        rsync \
        software-properties-common \
        wget \
        zlib1g-dev \
    && rm -rf /var/lib/apt/lists/* && \
    ln -s /usr/bin/python3 /usr/bin/python
RUN wget https://julialang-s3.julialang.org/bin/linux/x64/${JULIA_VERSION_SHORT}/julia-${JULIA_VERSION_FULL}-linux-x86_64.tar.gz && \
    tar zxf julia-${JULIA_VERSION_FULL}-linux-x86_64.tar.gz && \
    mkdir -p "${JULIA_INSTALLATION_PATH}" && \
    mv julia-${JULIA_VERSION_FULL} "${JULIA_INSTALLATION_PATH}/" && \
    ln -fs "${JULIA_INSTALLATION_PATH}/julia-${JULIA_VERSION_FULL}/bin/julia" /usr/local/bin/ && \
    rm julia-${JULIA_VERSION_FULL}-linux-x86_64.tar.gz && \
    julia -e 'import Pkg; Pkg.add("IJulia")'
RUN julia -e 'import Pkg; Pkg.add(["Gen"])'

Building and running this (docker build -t gen . and docker run -it gen bash) and then adding and importing Gen works on Ubuntu, but under macOS I hit

(@v1.4) pkg> add Gen.jl
julia> using Gen
[ Info: Precompiling Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a]
ERROR: Failed to precompile Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a] to /root/.julia/compiled/v1.4/Gen/OEZG1_t5nDi.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

What is really disturbing to me is that I was certain it would have to work on Ubuntu 20.04 inside a type-2 hypervisor, so I installed Ubuntu in a VirtualBox VM on a Macbook Pro, built the image there, reran this and... incredibly, this fails with the same error. This is OS virtualization inside type-2 virtualization - this should be so encapsulated that I'm very surprised it can break. Is this kernel related?! Or did I overlook something? Other packages like LinearAlgebra or Plots seem to work, btw. [for the latter you first have to run a python3 -m pip install matplotlib, of course].

Image above working on Ubuntu 20.04 workstation:
GenUbuntu2004

Image above breaking on Ubuntu 20.04 VM on macOS host:
GenMacOS

PS:
I noticed the image in this repo is old and will not build anymore - I have PRed updated base images (both for CPU and GPU) that should also be suited for basing related projects' images on (still need to test this further), cmp. PR 312 [But I fear you will have to investigate yourself why the error above occurs - I'm not deep enough in the Gen codebase to debug this, unfortunately.]

PPS:
I would have uploaded an rr trace, but /proc/sys/kernel/perf_event_paranoid is 3 and mounted as read-only inside the container. Furthermore, I'm not sure how to use the rr record -n alternative the command suggests - if I naively use it instead of rr I get ERROR: Unknown report type: rr record -n. So I fear I have gone as far as I can with tracking this bug as an outsider.

PPPS:
@jcrosenb Gen works natively on macOS, though. While you hit an interesting bug, does that help you in the meantime?

julia> using Gen
[ Info: Precompiling Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a]
WARNING: Method definition special_reverse_exec!(ReverseDiff.SpecialInstruction{typeof(Base.fill), I, O, C} where C where O where I) in module ReverseDiff at /Users/fpk/.julia/packages/ReverseDiff/jFRo1/src/macros.jl:213 overwritten in module Gen at /Users/fpk/.julia/packages/Gen/5JiNL/src/backprop.jl:35.
  ** incremental compilation may be fatally broken for this module **
WARNING: Method definition special_forward_exec!(ReverseDiff.SpecialInstruction{typeof(Base.fill), I, O, C} where C where O where I) in module ReverseDiff at /Users/fpk/.julia/packages/ReverseDiff/jFRo1/src/macros.jl:229 overwritten in module Gen at /Users/fpk/.julia/packages/Gen/5JiNL/src/backprop.jl:44.
  ** incremental compilation may be fatally broken for this module **
julia>

This is via Julia 1.3.1 on macOS 10.15.6.

@jcrosenb
Copy link

Given that this bug has replicated, does it still make sense for me to install VirtualBox and try the Linux Docker version? From Falk's comment, looks it still somehow fails that way? Happy to do it if we need a second attempt.

I could try to get an IBM VM and work from there. Or continue debugging my non-Docker install.

@bzinberg
Copy link
Contributor Author

@fplk - Thanks so much for doing this work. If I were not on the Gen team I would have waited to file an upstream bug until I had a reproducible minimal example, and tracked that as an issue in my own repo. But I figured this way was ok especially because the Step B approach could have been doable with very little work by someone who had a Mac. But the degree of investigation you did here is much more extensive and super helpful - thanks!

@bzinberg
Copy link
Contributor Author

It is really a bummer that we are seeing an error that appears to involve the virtualization software itself...

@bzinberg
Copy link
Contributor Author

bzinberg commented Sep 19, 2020

@fplk, could this be anything other than a bug in VirtualBox? (I mean sure, it could be multiple bugs, but seems like at least one of them has to be in VirtualBox.) Are you able to try the Docker on a different hypervisor?

[Edit: Oh shoot, IIUC you're saying this may be a bug in some virtualization facility of the Darwin kernel that all reasonable hypervisors rely on. Well, seems still worth trying...]

@cameronfreer
Copy link

cameronfreer commented Nov 24, 2020

A user of the docker image for the 6.885 psets has encountered the following error upon running this command on an Intel Mac Pro:

> docker run \
    -it \
    --name gen-pset1and2 \
    --publish 8080:8080/tcp \
    --publish 8090:8090/tcp \
    --publish 8091:8091/tcp \
    --publish 8092:8092/tcp \
    probcomp/mit-6.885-spring2020-gen-student-pset1and2
Invalid instruction at 0x7f6b0b1cdba8: 0x62, 0xf1, 0x7d, 0x48, 0xef, 0xc0, 0xc3, 0x90, 0x89, 0xc2, 0x83, 0xe2, 0xe0, 0x0f, 0x8e
signal (4): Illegal instruction
in expression starting at none:0
dot_compute at /julia-1.3.1/bin/../lib/julia/libopenblas64_.so (unknown line)
Allocations: 2452 (Pool: 2443; Big: 9); GC: 0
Illegal instruction

Perhaps this involves an interaction between OpenBLAS and Julia and the their actual CPU architecture (see, e.g., JuliaLang/julia#29652). It's not yet clear if they are encountering the same problem as in this issue #311, in which case it probably isn't just a bug in Julia, but also in virtualization on the host machine -- perhaps in the Darwin kernel or one of its modules (or analogous things).

@ztangent
Copy link
Member

ztangent commented Dec 1, 2020

Also getting this issue when running a Docker image of ubuntu within a Windows host this time, sigh:

julia> using Gen
[ Info: Precompiling Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a]
ERROR: Failed to precompile Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a] to /root/.julia/compiled/v1.5/Gen/OEZG1_Bbn6e.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923

It's very strange since all the other packages seem to precompile fine??

@ztangent
Copy link
Member

ztangent commented Dec 1, 2020

I was hoping to work around this by Pkg.dev-ing a local copy of Gen, then adding the __precompile__(false) flag, but at least on my Docker build (Ubuntu within a Windows host, as mentioned before), Julia just ends up crashing...?

julia> using Gen
[ Info: Precompiling Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a]
[ Info: Skipping precompilation since __precompile__(false). Importing Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a].
Killed

@postylem
Copy link
Contributor

postylem commented Apr 9, 2021

Was this puzzle solved? I'm noticing the same kind of error precompiling Gen in a Ubuntu Docker image on macOS, on certain machines and not on others. Specifically, I'm using a slighly modified version of the 6.885 psets, and I've had a two users report this error, both on macOS, but I haven't been able to reproduce it on the mac I have, nor does it occur on the windows machine I have access to. I'm puzzled where to look for clues.

However, I've found some mention of macOS 10.15.6 introducing some errors with virtualization. The users that have the issue are on 10.15, as is the above error reproduced by @fplk (and I do not have the issue with macOS 11.2.3). Could this be related? Seems very unlikely.

@postylem
Copy link
Contributor

postylem commented Apr 10, 2021

So to follow up with a little more detail: now I’ve gotten reports from users getting this issue on macOS 10.15.4, 10.15.6, and 10.15.7, and reports of users not getting this issue on macOS 11 and Windows 10. I have tested it personally on macOS 10.15.7, and get the issue, but don’t get it on my machine running 11.2.3, nor on a Windows 10 machine I tried.

Here’s the minimal dockerfile, more or less replicating @fplk's test above

FROM ubuntu:20.04

ARG DEBIAN_FRONTEND=noninteractive
ARG JULIA_VERSION_SHORT="1.5"
ARG JULIA_VERSION_FULL="${JULIA_VERSION_SHORT}.3"

ENV JULIA_INSTALLATION_PATH=/opt/julia

RUN apt-get update -qq \
    && apt-get install -qq -y --no-install-recommends\
        wget \
        git \
        python3-dev \
        python3-pip \
        python3-tk \
        zlib1g-dev \
    && rm -rf /var/lib/apt/lists/* && \
    ln -s /usr/bin/python3 /usr/bin/python

RUN wget https://julialang-s3.julialang.org/bin/linux/x64/${JULIA_VERSION_SHORT}/julia-${JULIA_VERSION_FULL}-linux-x86_64.tar.gz && \
    tar zxf julia-${JULIA_VERSION_FULL}-linux-x86_64.tar.gz && \
    mkdir -p "${JULIA_INSTALLATION_PATH}" && \
    mv julia-${JULIA_VERSION_FULL} "${JULIA_INSTALLATION_PATH}/" && \
    ln -fs "${JULIA_INSTALLATION_PATH}/julia-${JULIA_VERSION_FULL}/bin/julia" /usr/local/bin/ && \
    rm julia-${JULIA_VERSION_FULL}-linux-x86_64.tar.gz && \
    julia -e 'import Pkg; Pkg.add(["IJulia","Gen"])'

On macOS 10.15.7, running docker build -t gentest . with that Dockerfile, and then docker run -it gentest julia, Gen does not precompile.

Screen Shot 2021-04-10 at 2 59 24 PM

Other packages I tried precompile fine, only Gen fails. I also tried it and got the same issue with using FROM ubuntu:16.04, and using earlier versions of Gen (]rm Gen then ]add Gen@0.3.5 for instance), and in all cases, it fails on the macOS 10.15 host but not on the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants