Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting ARM architectures #62

Closed
gedankenstuecke opened this issue Jul 19, 2018 · 30 comments · Fixed by #679
Closed

Supporting ARM architectures #62

gedankenstuecke opened this issue Jul 19, 2018 · 30 comments · Fixed by #679
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@gedankenstuecke
Copy link
Contributor

It would be 💯 to get TLJH running on Raspberry Pis and other small boards. Unfortunately miniconda doesn't support their ARM processor architectures, so some changes to the installer are needed.

As discussed with @yuvipanda a fix for this would be using virtualenv instead of conda. The user environment should be configurable to use either conda or virtualenv. Furthermore nodesource also supports ARM architectures.

@yuvipanda
Copy link
Collaborator

Thank you for testing and reporting this early, @gedankenstuecke!

Our plans now are:

  1. Require Ubuntu 18.04 as minimum, since system python here is 3.6
  2. Use nodesource + venv for the hub environment unconditionally.
  3. Use conda as default user environment in everything except ARM, where venv is user environment. This is not going to be user toggleable - we'll switch based on architecture.
  4. Integration test ARM with qemu. This is gonna be tricky but absolutely necessary. We probably won't run unit tests in it.
  5. Use only pip for installing packages into the user environment / hub environment in the default install. This reduces split between ARM & x86 environments.

I think this covers it.

@gedankenstuecke
Copy link
Contributor Author

I looked into how to get docker to use the qemu integration and it seems like it's not too hard to pull off as there's already some images for that.

To get qemu set up you can run

docker run --rm --privileged multiarch/qemu-user-static:register --reset

and that's the only real trick to it. From there on you can use a base image for raspbian, e.g. resin/rpi-raspbian. For a test I ran a simple Dockerfile on my end:

# Pull base image
ARG distro=stretch
FROM resin/rpi-raspbian:$distro

RUN apt-get update && apt-get install -y python3 python3-pip

CMD ["python3", "--version"]

After

docker build -t test/armtest .
docker run test/armtest

this yields Python 3.5.3.

@yuvipanda
Copy link
Collaborator

Awesome, @gedankenstuecke! Can you write up a small script similar to https://github.com/jupyterhub/the-littlest-jupyterhub/blob/master/.circleci/integration-test.py that helps run TLJH inside a qemu / arm container?

@gedankenstuecke
Copy link
Contributor Author

Yeah, I tried adapting the whole thing here with a custom Dockerfile and integration-test.py: gedankenstuecke@de1c7ce

The building of the image seems to work out fine, but then when trying to start the container it dies right away and I couldn't yet figure out what's going wrong here.

@yuvipanda
Copy link
Collaborator

Things that need to happen here:

  1. Allow switching between conda & venv for user environment at install time
  2. Add tests for both
  3. Add tests for the littlest jupyterhub in debian stretch images
  4. Add tests (with QEMU) on ARM architectures

@scparker
Copy link

Sounds like a good JupyterCon paper! I'll check it out later today...

@yuvipanda yuvipanda added the enhancement New feature or request label May 20, 2019
@pisymbol
Copy link

Is this dead?

@yuvipanda
Copy link
Collaborator

@pisymbol nobody is currently working on it, unfortunately :( The core set of tasks needed haven't changed though.

@GeorgianaElena GeorgianaElena added the help wanted Extra attention is needed label Oct 20, 2020
@cdibble
Copy link
Contributor

cdibble commented Mar 27, 2021

I want to throw in a +1 for this ticket. I'd love if there was ARM support for TLJH.

@yuvipanda
Copy link
Collaborator

@cdibble may I ask what you are planning on running this on? Raspberry PI?

@cdibble
Copy link
Contributor

cdibble commented Mar 27, 2021

@cdibble may I ask what you are planning on running this on? Raspberry PI?

@yuvipanda - Actually I am just interested in taking advantage of the price:performance on the latest generation of AWS servers- the EC2 instances on ARM have pretty attractive specs compared with the previous generations on x64. So it's just about upgrading for me, not a use case with a mandatory ARM architecture. I understand this may not be the most motivating use case for dev work on this ticket.

TLJH has made the deployment and maintenance of a Jupyter hub server a dream- many thanks to you and the other contributors.

@yuvipanda
Copy link
Collaborator

That's actually more motivating than Raspberry PIs - RPIs are not powerful enough for most hub use cases.

ARM migration should be easier now, since we use miniforge, which does have arm64 support.

Am very glad you found it useful, @cdibble!

@yuvipanda
Copy link
Collaborator

At least with docker on mac, you can trivially run arm64 builds. This should make testing much easier!

@cdibble
Copy link
Contributor

cdibble commented Mar 27, 2021

ARM migration should be easier now, since we use miniforge, which does have arm64 support.

This is a good hint. Thank you. FWIW I do see a note in tljh/installer.py line 183 to add support for miniforge. Is there a branch where that's implemented? I've forked to see if I can get it to work. It seems like just a matter of modifying the tljh/conda.py functions related to installing and checking packages with conda. Any input welcome.

@yuvipanda
Copy link
Collaborator

@GeorgianaElena did some work on it a few months ago, maybe we can split out the miniforge commits from there?

@cdibble I started a test run of TLJH setup with ARM, via this PR: #674. It's compiling so manythings, and the emulation is so slow - I'm still at the point where setup.py dependencies are being installed. We somehow require grpc in our base install - not sure why?!

@yuvipanda
Copy link
Collaborator

Can't install grpcio on arm :( I filed jupyterhub/traefik-proxy#125

@cdibble
Copy link
Contributor

cdibble commented Mar 27, 2021

Seems we got to similar points. I can't start the service:

Mar 27 21:45:59 ip-10-13-7-223 sudo[14971]:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu/the-littlest-jupyterhub/bootstrap ; USER=root ; COMMAND=/bin/systemctl restart jupyterhub.se
Mar 27 21:45:59 ip-10-13-7-223 sudo[14971]: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: traefik.service: Start request repeated too quickly.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: traefik.service: Failed with result 'exit-code'.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: Failed to start traefik.service.
-- Subject: Unit traefik.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit traefik.service has failed.
--
-- The result is RESULT.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: Dependency failed for jupyterhub.service.
-- Subject: Unit jupyterhub.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit jupyterhub.service has failed.
--
-- The result is RESULT.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: jupyterhub.service: Job jupyterhub.service/start failed with result 'dependency'.
Mar 27 21:45:59 ip-10-13-7-223 sudo[14971]: pam_unix(sudo:session): session closed for user root

Though I can't find the reference to the dependency that caused the failure on my end. I'm not seeing the install of grpcio or etcd3. Where is that happening?

I just changed a few things in conda.py to point to miniforge to get to this stage. Hopefully if we can make some progress on that new issue you filed, it will fall into place.

@cdibble
Copy link
Contributor

cdibble commented Mar 27, 2021

BTW, I had an EC2 instance with ARM running, so I skipped the docker dev environment setup.

@yuvipanda
Copy link
Collaborator

Hah, you have definitely gotten farther than me :D Unfortunately I don't have access to an AWS ARM instance :(

@yuvipanda
Copy link
Collaborator

yuvipanda commented Mar 28, 2021

OK, I got it running locally!

Things I had to do:

  • Install jupyter-traefik-proxy with this PR applied
  • Use miniforge with ARM64 instead of miniconda
  • Use Ubuntu 20.04 as base - primarily to get python3.8 instead of 3.6

I think we can do all this independently and get us to aarch64 support

@cdibble
Copy link
Contributor

cdibble commented Mar 28, 2021

Nice! Thank you for putting time into this :)

I'm not quite there. I've added your fork+branch of jupyterhub-traefik-proxy to the setup.py for tljh. So something like: install_requires=[..., jupyterhub-traefik-proxy@git+https://github.com/yuvipanda/traefik-proxy.git@optional-deps]. That is installing as expected. And I moved to Ubuntu 20.04/Python3.8.

I can run the bootstrap.py script just fine, but the service fails to start again with the same message- the traefik.service failed to start.

So I modified the traefik.py file to point to the traefik version for linux_arm64 like so:

plat = "linux_arm64"
traefik_version = "2.4.8"

But that isn't working- the published checksum doesn't match what I get with the download. I tried just using the checksum that results from the download, but that does not fix my error with traefik.service. Any ideas? Did you have to modify traefik.py?

UPDATE- The checksums are fine- I wasn't able to download from the url configured in traefik.py. I changed that to traefik_url = ( f"https://github.com/traefik/traefik/releases/download/v{traefik_version}/traefik_v{traefik_version}_{plat}.tar.gz" ) to get the traefik installation routine working. Sadly, that still didn't fix the traefik.service start failure.

     Loaded: loaded (/etc/systemd/system/traefik.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2021-03-28 17:37:51 UTC; 7s ago
    Process: 13532 ExecStart=/opt/tljh/hub/bin/traefik -c /opt/tljh/state/traefik.toml (code=exited, status=203/EXEC)
   Main PID: 13532 (code=exited, status=203/EXEC)

Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: traefik.service: Scheduled restart job, restart counter is at 5.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: Stopped traefik.service.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: traefik.service: Start request repeated too quickly.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: traefik.service: Failed with result 'exit-code'.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: Failed to start traefik.service.```

@yuvipanda
Copy link
Collaborator

@cdibble I filed jupyterhub/traefik-proxy#128 to work on the traefik proxy installer.

However, that wasn't a problem for me, and I've no idea how :| That it worked makes me suspect the arm-ness of my docker based setup...

@cdibble
Copy link
Contributor

cdibble commented Mar 28, 2021

Yes I am surprised you were able to get it to work without an ARM build of traefik. But I haven't been able to get it to work even adding in the url and checksums for the ARM traefik versions in tljh/traefik.py. It does download the appropriate binary and complete the checksums, but then the service still doesn't work. So not sure what's going on.

@cdibble
Copy link
Contributor

cdibble commented Mar 29, 2021

Looks like we need traefik v1.7.*.

plat = "linux-arm64"
traefik_version = "1.7.28"

Did the trick. Haven't tested much yet but I will this week.

@yuvipanda
Copy link
Collaborator

@cdibble awesome, yay! Please send PRs when you can.

@yuvipanda
Copy link
Collaborator

Opened jupyterhub/traefik-proxy#129 to allow for ARM builds in the traefik_proxy installer. I opened #675 to switch TLJH to using the traefik installer by default so we don't have to repeat that here.

@cdibble
Copy link
Contributor

cdibble commented Apr 4, 2021

Ok- sorry for the delay. Busy week.

I've got my fork working properly now and I've tested it with Ubuntu 20.04, python 3.8 on both x86-64 and aarch64 (arm64) servers. Everything seems to work as expected.

I've opened a PR #679 if you want to incorporate these changes. I'd be happy to help resolve any issues. There are also some opportunities for code cleanup (e.g., getting rid of old functions used in the miniconda installation), but I've left those pieces in place.

So, what is different:

  1. Now installing miniforge instead of miniconda. Automatically selects binary based on platform. Has hard-coded checksums for amd64 and arm64 miniforge binaries (for version 4.10.0-0).
  2. Relying on jupyterhub/traefik-proxy#129 for traefik-proxy support, but currently pointing to a dev fork/branch (see below).
  3. Installing traefik version based on platform architecture.

What needs to be updated:

  1. setup.py is pointing to 'jupyterhub-traefik-proxy@git+https://github.com/yuvipanda/traefik-proxy.git@optional-deps' pending the release of the changes in jupyterhub/traefik-proxy#129
  2. In tljh/installer.py the check_miniforge_version routine is not really checking anything meaningful at this point. I just mimicked the checks it was making with check_miniconda_version without having a good reason to check those particular version numbers.

@psychemedia
Copy link

psychemedia commented Apr 12, 2021

FWIW, I started looking at some docker stacks for amd64/arm64/arm32 here crossbuilt in a really inefficient way using Github Actions.

To try to speed things up, I also started building the arm32/arm64 packages on RPis and adding them to my own wheelhouse (I'm not sure piwheels does 32 and 64 bit wheels?)

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/tljh-on-m1-mac-arm-docker-installer-is-x86-specific/10894/2

@consideRatio
Copy link
Member

We've come a long way to support arm64 at this point!

I think #679 can be updated to do very little as a lot of changes are already merged in dedicated PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
9 participants