Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor bootstrap.py script for readability #715

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
232 changes: 136 additions & 96 deletions bootstrap/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,35 @@
curl <script-url> | sudo python3 -

Constraints:
- Entire script should be compatible with Python 3.6 (We run on Ubuntu 18.04+)
- Script should parse in Python 3.4 (since we exit with useful error message on Ubuntu 14.04+)
- Use stdlib modules only

- The entire script should be compatible with Python 3.6, which is the on
Ubuntu 18.04+.
- The script should parse in Python 3.5 as we print error messages for using
Ubuntu 16.04+ which comes with Python 3.5 by default. This means no
f-strings can be used.
- The script must depend only on stdlib modules, as no previous installation
of dependencies can be assumed.

Environment variables:

TLJH_INSTALL_PREFIX Defaults to "/opt/tljh", determines the location
of the tljh installations root folder.
TLJH_BOOTSTRAP_PIP_SPEC From this location, the bootstrap script will
pip install --upgrade the tljh installer.
TLJH_BOOTSTRAP_DEV Determines if --editable is passed when
installing the tljh installer. Pass the values
yes or no.

Command line flags:

The bootstrap.py script accept the following command line flags. All other
flags are passed through to the tljh installer without interception by this
script.

--show-progress-page Starts a local web server listening on port 80 where
logs can be accessed during installation. If this is
passed, it will pass --progress-page-server-pid=<pid>
to the tljh installer for later termination.
"""
import os
from http.server import SimpleHTTPRequestHandler, HTTPServer
Expand All @@ -21,18 +47,19 @@
import shutil
import urllib.request

html = """
progress_page_favicon_url = "https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/share/jupyterhub/static/favicon.ico"
progress_page_html = """
<html>
<head>
<title>The Littlest Jupyterhub</title>
<title>The Littlest Jupyterhub</title>
</head>
<body>
<meta http-equiv="refresh" content="30" >
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width">
<img class="logo" src="https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/docs/images/logo/logo.png">
<div class="loader center"></div>
<div class="center main-msg">Please wait while your TLJH is building...</div>
<div class="center main-msg">Please wait while your TLJH is setting up...</div>
<div class="center logs-msg">Click the button below to see the logs</div>
<div class="center tip" >Tip: to update the logs, refresh the page</div>
<button class="logs-button center" onclick="window.location.href='/logs'">View logs</button>
Expand Down Expand Up @@ -99,25 +126,14 @@
</style>
</head>
</html>

"""

logger = logging.getLogger(__name__)

def get_os_release_variable(key):
"""
Return value for key from /etc/os-release

/etc/os-release is a bash file, so should use bash to parse it.

Returns empty string if key is not found.
"""
return subprocess.check_output([
'/bin/bash', '-c',
"source /etc/os-release && echo ${{{key}}}".format(key=key)
]).decode().strip()

# Copied into tljh/utils.py. Make sure the copies are exactly the same!
# This function is needed both by the process starting this script, and by the
# TLJH installer that this script execs in the end. Make sure its replica at
# tljh/utils.py stays in sync with this version!
def run_subprocess(cmd, *args, **kwargs):
"""
Run given cmd with smart output behavior.
Expand Down Expand Up @@ -147,11 +163,26 @@ def run_subprocess(cmd, *args, **kwargs):
# For now, prioritizing human readability over machine readability.
logger.debug(proc.stdout.decode())

def validate_host():

def ensure_host_system_can_install_tljh():
"""
Make sure TLJH is installable in current host
Check if TLJH is installable in current host system and exit with a clear
error message otherwise.
"""
# Support only Ubuntu 18.04+
def get_os_release_variable(key):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer leaving these functions in the top level, as I think that's clearer and we can't accidentally re-use variables from the function inside which this function is defined. Do you prefer having them be nested instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this to depend a lot on situations, but what influences my perception that this became more readable and such were.

  1. Nested function makes it clear it's only used there
  2. All logic for the bootstrap CLI is in this single file, and that makes me a bit motivated to not have everything declared in the top level scope for some reason. Having this helper function nested under the only function helped me overview the large script files purpose as a whole better, which became quite relevant as the file quite a bit of logical pieces that isn't easy apparent how they relate to each other.

Would you like me to change it back? I have a preference towards having this function either nested or splitting this file into multiple files, but I don't think it's important and can change it back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to change it back! I find extra levels of nesting harder to reason about... If you don't feel too strongly about it I'd appreciate it being changed back.

It's not ideal it's in one file only but it's necessary for our curl based installation...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find very nested functions that don't include any indentation are OK, but if there are nested blocks, long comments or long lines leading to indented continued lines, it gets very hard to read. Would prefixing a top-level internal function with _ help make it clearer?

Alternatively would creating a class with static methods work as a form of separation even if it's not strictly necessary?

"""
Return value for key from /etc/os-release

/etc/os-release is a bash file, so should use bash to parse it.

Returns empty string if key is not found.
"""
return subprocess.check_output([
'/bin/bash', '-c',
"source /etc/os-release && echo ${{{key}}}".format(key=key)
]).decode().strip()

# Require Ubuntu 18.04+
distro = get_os_release_variable('ID')
version = float(get_os_release_variable('VERSION_ID'))
if distro != 'ubuntu':
Expand All @@ -161,20 +192,23 @@ def validate_host():
print('The Littlest JupyterHub requires Ubuntu 18.04 or higher')
sys.exit(1)

if sys.version_info < (3, 5):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still print error messages if python is too old, right? Or is this handled somewhere else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've updated this correctly in d36c53d, given that we now require ubuntu 18.04+, please check the commit I pushed about this. I don't think we provide an error message elsewhere or similar regarding this.

I deleted this reasoning that you won't have ubuntu18.04+ and Python 3.5 or lower, but maybe and why not have this check still?

print("bootstrap.py must be run with at least Python 3.5")
# Require Python 3.6+
if sys.version_info < (3, 6):
print("bootstrap.py must be run with at least Python 3.6")
sys.exit(1)

if not (shutil.which('systemd') and shutil.which('systemctl')):
# Require systemd (systemctl is a part of systemd)
if not shutil.which('systemd') or not shutil.which('systemctl'):
print("Systemd is required to run TLJH")
# Only fail running inside docker if systemd isn't present
# Provide additional information about running in docker containers
if os.path.exists('/.dockerenv'):
print("Running inside a docker container without systemd isn't supported")
print("We recommend against running a production TLJH instance inside a docker container")
print("For local development, see http://tljh.jupyter.org/en/latest/contributing/dev-setup.html")
sys.exit(1)

class LoaderPageRequestHandler(SimpleHTTPRequestHandler):

class ProgressPageRequestHandler(SimpleHTTPRequestHandler):
def do_GET(self):
if self.path == "/logs":
with open("/opt/tljh/installer.log", "r") as log_file:
Expand All @@ -197,45 +231,60 @@ def do_GET(self):
else:
SimpleHTTPRequestHandler.send_error(self, code=403)

def serve_forever(server):
try:
server.serve_forever()
except KeyboardInterrupt:
pass

def main():
flags = sys.argv[1:]
temp_page_flag = "--show-progress-page"
"""
This script intercepts the --show-progress-page flag, but all other flags
are passed through to the TLJH installer script.

The --show-progress-page flag indicates that the bootstrap script should
start a local webserver temporarily and report its installation progress via
a web site served locally on port 80.
"""
ensure_host_system_can_install_tljh()


# Various related constants
install_prefix = os.environ.get('TLJH_INSTALL_PREFIX', '/opt/tljh')
hub_prefix = os.path.join(install_prefix, 'hub')
python_bin = os.path.join(hub_prefix, 'bin', 'python3')
pip_bin = os.path.join(hub_prefix, 'bin', 'pip')
initial_setup = not os.path.exists(python_bin)


# Check for flag in the argv list. This doesn't use argparse
# because it's the only argument that's meant for the boostrap script.
# All the other flags will be passed to and parsed by the installer.
if temp_page_flag in flags:
# Attempt to start a web server to serve a progress page reporting
# installation progress.
tljh_installer_flags = sys.argv[1:]
if "--show-progress-page" in tljh_installer_flags:
# Remove the bootstrap specific flag and let all other flags pass
# through to the installer.
tljh_installer_flags.remove("--show-progress-page")

# Write HTML and a favicon to be served by our webserver
with open("/var/run/index.html", "w+") as f:
f.write(html)
favicon_url="https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/share/jupyterhub/static/favicon.ico"
urllib.request.urlretrieve(favicon_url, "/var/run/favicon.ico")
f.write(progress_page_html)
urllib.request.urlretrieve(progress_page_favicon_url, "/var/run/favicon.ico")

# If the bootstrap is run to upgrade TLJH, then this will raise an "Address already in use" error
# If TLJH is already installed and Traefik is already running, port 80
# will be busy and we will get an "Address already in use" error. This
# is acceptable and we can ignore the error.
try:
loading_page_server = HTTPServer(("", 80), LoaderPageRequestHandler)
p = multiprocessing.Process(target=serve_forever, args=(loading_page_server,))
# Serves the loading page until TLJH builds
# Serve the loading page until manually aborted or until the TLJH
# installer terminates the process
def serve_forever(server):
try:
server.serve_forever()
except KeyboardInterrupt:
pass
progress_page_server = HTTPServer(("", 80), ProgressPageRequestHandler)
p = multiprocessing.Process(target=serve_forever, args=(progress_page_server,))
p.start()

# Remove the flag from the args list, since it was only relevant to this script.
flags.remove("--show-progress-page")

# Pass the server's pid as a flag to the istaller
pid_flag = "--progress-page-server-pid"
flags.extend([pid_flag, str(p.pid)])
# Pass the server's pid to the installer for later termination
tljh_installer_flags.extend(["--progress-page-server-pid", str(p.pid)])
except OSError:
# Only serve the loading page when installing TLJH
pass

validate_host()
install_prefix = os.environ.get('TLJH_INSTALL_PREFIX', '/opt/tljh')
hub_prefix = os.path.join(install_prefix, 'hub')

# Set up logging to print to a file and to stderr
os.makedirs(install_prefix, exist_ok=True)
Expand All @@ -252,15 +301,16 @@ def main():
stderr_logger.setFormatter(logging.Formatter('%(message)s'))
stderr_logger.setLevel(logging.INFO)
logger.addHandler(stderr_logger)

logger.setLevel(logging.DEBUG)

logger.info('Checking if TLJH is already installed...')
if os.path.exists(os.path.join(hub_prefix, 'bin', 'python3')):
logger.info('TLJH already installed, upgrading...')
initial_setup = False

if not initial_setup:
logger.info('Existing TLJH installation detected, upgrading...')
else:
logger.info('Setting up hub environment')
initial_setup = True
logger.info('Existing TLJH installation not detected, installing...')
logger.info('Setting up hub environment...')
logger.info('Installing Python, venv, pip, and git via apt-get...')
# Install software-properties-common, so we can get add-apt-repository
# That helps us make sure the universe repository is enabled, since
# that's where the python3-pip package lives. In some very minimal base
Expand All @@ -277,48 +327,38 @@ def main():
'python3-pip',
'git'
])
logger.info('Installed python & virtual environment')

logger.info('Setting up virtual environment at {}'.format(hub_prefix))
os.makedirs(hub_prefix, exist_ok=True)
run_subprocess(['python3', '-m', 'venv', hub_prefix])
logger.info('Set up hub virtual environment')


# Upgrade pip
logger.info('Upgrading pip...')
run_subprocess([pip_bin, 'install', '--upgrade', 'pip==20.0.*'])


# Install/upgrade TLJH installer
tljh_install_cmd = [pip_bin, 'install', '--upgrade']
if os.environ.get('TLJH_BOOTSTRAP_DEV', 'no') == 'yes':
tljh_install_cmd.append('--editable')
tljh_install_cmd.append(
os.environ.get(
'TLJH_BOOTSTRAP_PIP_SPEC',
'git+https://github.com/jupyterhub/the-littlest-jupyterhub.git'
)
)
if initial_setup:
logger.info('Setting up TLJH installer...')
logger.info('Installing TLJH installer...')
else:
logger.info('Upgrading TLJH installer...')
run_subprocess(tljh_install_cmd)

pip_flags = ['--upgrade']
if os.environ.get('TLJH_BOOTSTRAP_DEV', 'no') == 'yes':
pip_flags.append('--editable')
tljh_repo_path = os.environ.get(
'TLJH_BOOTSTRAP_PIP_SPEC',
'git+https://github.com/jupyterhub/the-littlest-jupyterhub.git'
)

# Upgrade pip
run_subprocess([
os.path.join(hub_prefix, 'bin', 'pip'),
'install',
'--upgrade',
'pip==20.0.*'
])
logger.info('Upgraded pip')

run_subprocess([
os.path.join(hub_prefix, 'bin', 'pip'),
'install'
] + pip_flags + [tljh_repo_path])
logger.info('Setup tljh package')

logger.info('Starting TLJH installer...')
os.execv(
os.path.join(hub_prefix, 'bin', 'python3'),
[
os.path.join(hub_prefix, 'bin', 'python3'),
'-m',
'tljh.installer',
] + flags
)
# Run TLJH installer
logger.info('Running TLJH installer...')
os.execv(python_bin, [python_bin, '-m', 'tljh.installer'] + tljh_installer_flags)


if __name__ == '__main__':
main()
3 changes: 3 additions & 0 deletions tljh/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
from tljh import hooks


# This function is needed also by the bootstrap script that starts this
# installer script. Make sure its replica at bootstrap/bootstrap.py stays in
# sync with this version!
def run_subprocess(cmd, *args, **kwargs):
"""
Run given cmd with smart output behavior.
Expand Down