Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kolibri/tasks/install.yml spring cleaning (tighten up code, flow, in-line docs) #3514

Merged
merged 4 commits into from Mar 30, 2023

Conversation

holta
Copy link
Member

@holta holta commented Mar 27, 2023

@holta holta added this to the 8.1 milestone Mar 27, 2023
@holta
Copy link
Member Author

holta commented Mar 27, 2023

Credit goes to @benjaoming and @arky who originally structured this code.

Thank you to both 💯

- name: Create Linux user {{ kolibri_user }} and add it to groups {{ apache_user }}, disk
user:
name: "{{ kolibri_user }}"
groups:
- "{{ apache_user }}"
- disk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holta Security feature of debian. Please refer https://wiki.debian.org/SystemGroups

Copy link
Member Author

@holta holta Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If group disk is needed for USB drive access, yes we should annotate that in future.

This root-style access level might or might not be fully necessary (requirement seems to remain somewhat ambiguous?)

As noted @ 190ac34

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 'plugdev' would be more suitable given the issue is with mounting usb based filesystems. Remember IIAB has it's own usb mounting routine in usbmount.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 'plugdev' would be more suitable

Might be promising yes:

Group plugdev "Allows members to mount (only with the options nodev and nosuid, for security reasons) and umount removable devices through pmount." according to https://wiki.debian.org/SystemGroups

Remember IIAB has it's own usb mounting routine in usbmount.

That's https://github.com/iiab/iiab/tree/master/roles/usb_lib if anybody's interested.

Copy link
Member Author

@holta holta Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI/FWIW: if one allows apt install kolibri to create and set up a new KOLIBRI_USER, it does not add the user to any Linux groups (like www-data or disk).

[*] i.e. if installing a Kolibri .deb package at the command-line (outside of Ansible) and without setting export DEBIAN_FRONTEND=noninteractive — so as to be presented with up-to-5 debconf interactive screens.

Copy link
Member Author

@holta holta Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another FYI/FWIW: Kolibri appears to work just fine (with IIAB) when KOLIBRI_USER is not added to any Linux groups (like www-data or disk).

However I only tested on a VM (Ubuntu 22.04). So further testing would be required to see if Kolibri's USB drive functionality works or not. Which of course still requires:

# Set umask=0000 for VFAT, NTFS and exFAT in /etc/usbmount/usbmount.conf so
# Kolibri can export & import channels to USB sticks/drive:
usb_lib_umask0000_for_kolibri: True  

As explained in usb_lib/README.rst here:

IIAB will generally mount USB drives 'rw' allowing root to both read and write to them. In addition, in March 2021 (PR #2715 <https://github.com/iiab/iiab/issues/2715>) Kolibri exports were enabled by also giving non-root users read and write access to VFAT/FAT32, NTFS and exFAT USB drives, using umask=0000 (in /etc/usbmount/usbmount.conf) to override the umask=0022 default. If however you prefer to restore usbmount's default, set usb_lib_umask0000_for_kolibri: False in /etc/iiab/local_vars.yml <http://FAQ.IIAB.IO/#What_is_local_vars.yml_and_how_do_I_customize_it.3F> (preferably do this prior to installing IIAB).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI/FWIW: if one allows apt install kolibri to create and set up a new KOLIBRI_USER, it does not add the user to any Linux groups (like www-data or disk).

Might want to tinker with https://github.com/iiab/iiab/blob/master/roles/kolibri/templates/kolibri.service.j2#L12 Think the only thing that might need 'disk' would be something like dd to access the raw device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick context for Group=www-data on Line 12 of /etc/systemd/system/kolibri.service :

meaning of systemd "Group" option
https://serverfault.com/questions/805879/meaning-of-systemd-group-option/805883#805883

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux group disk is proven to be unneeded with USB stick testing (using 64-bit Raspberry Pi OS, on a Raspberry Pi 4).

So I will remove it entirely.

(It would appear this root-level access was never needed from day one, and it's time to take security more seriously in 2023 now!)

@arky
Copy link
Contributor

arky commented Mar 28, 2023

@holta Someone needs to audit and update this code to ensure kolibri-server performance fixes are ported into IIAB.

This was created long time ago before Kolibri official support for debian landed.

@holta
Copy link
Member Author

holta commented Mar 28, 2023

kolibri-server performance fixes

In principle, yes a good idea.

In practice however:

  • https://github.com/learningequality/kolibri-server is not mainline Kolibri (it has not been updated much in recent years).

  • At the moment IIAB communities strongly prefer mainline Kolibri, as that is updated with very important fixes and school/student-oriented improvements almost every month.

  • kolibri-server hasn't proven to be necessary, when learning hotspots rarely have more than 32 simultaneous users.

@arky
Copy link
Contributor

arky commented Mar 28, 2023

@holta Someone needs to audit and update this code to ensure kolibri-server performance fixes are ported into IIAB.

This was created long time ago before Kolibri official support for debian landed.

kolibri-server performance fixes

In principle, yes a good idea.

In practice however:

* https://github.com/learningequality/kolibri-server is not mainline Kolibri (it has not been updated much in recent years).

It has been many moons since I worked on this. So I'll let /cc @jredrejo

@jvonau
Copy link
Contributor

jvonau commented Mar 28, 2023

On a side note the upstream documentation references KOLIBRI_HOME as containing a .kolibri path, perhaps that path could be employed in IIAB also for maximum compatibility with the upstream docs? ie just change KOLIBRI_HOME to be '/library/kolibri/.kolibri'

Other thing to consider would be populating /etc/kolibri/daemon.conf with more of the environmental variables from the custom systemd unit file template and just reference the daemon.conf file with 'EnvironmentFile='

-AND/OR-

The stock options.ini as provided by kolibri below might need tinkering also as there would be options in use by the daemon that should match what the command line expects in CONTENT_DIR HTTP_PORT URL_PATH_PREFIX

jerryv@box:~ $ cat /library/kolibri/options.ini 
[Cache]
# Which backend to use for the main cache - if 'memory' is selected, then for most cache operations,
# an in-memory, process-local cache will be used, but a disk based cache will be used for some data
# that needs to be persistent across processes. If 'redis' is used, it is used for all caches.
# CACHE_BACKEND = memory

# Default timeout for entries put into the cache.
# CACHE_TIMEOUT = 300

# Maximum number of entries to maintain in the cache at once.
# CACHE_MAX_ENTRIES = 1000

# Password to authenticate to Redis, Redis only.
# CACHE_PASSWORD = 

# Host and port at which to connect to Redis, Redis only.
# CACHE_LOCATION = localhost:6379

# The database number for Redis.
# CACHE_REDIS_DB = 0

# Maximum number of simultaneous connections to allow to Redis, Redis only.
# CACHE_REDIS_MAX_POOL_SIZE = 50

# How long to wait when trying to connect to Redis before timing out, Redis only.
# CACHE_REDIS_POOL_TIMEOUT = 30

# Maximum memory that Redis should use, Redis only.
# CACHE_REDIS_MAXMEMORY = 0

# Eviction policy to use when using Redis for caching, Redis only.
# CACHE_REDIS_MAXMEMORY_POLICY = 

[Database]
# Which database backend to use, choices are 'sqlite' or 'postgres'
# DATABASE_ENGINE = sqlite

# For SQLite - the name of a database file to use for the main Kolibri database.
# For Postgresql, the name of the database to use for all Kolibri data.
# DATABASE_NAME = 

# The password to authenticate with when connecting to the database, Postgresql only.
# DATABASE_PASSWORD = 

# The user to authenticate with when connecting to the database, Postgresql only.
# DATABASE_USER = 

# The host on which to connect to the database, Postgresql only.
# DATABASE_HOST = 

# The port on which to connect to the database, Postgresql only.
# DATABASE_PORT = 

[Server]
DEBUG = False
DEBUG_LOG_DATABASE = False
# How many threads the Kolibri server should use to serve requests
# CHERRYPY_THREAD_POOL = 56

# How long a socket should wait for data flow to resume before
# it considers that the connection has been interrupted.
# Increasing this may help in situations where there is high
# latency on a network or the bandwidth is bursty, with some
# expected data flow interruptions which may not be indicative of the connection failing.
# CHERRYPY_SOCKET_TIMEOUT = 10

# How many requests to allow in the queue.
# Increasing this may help situations where requests are instantly refused by the server.
# CHERRYPY_QUEUE_SIZE = 30

# How many seconds to wait for a request to be put into the queue.
# Increasing this may help situations where requests are instantly refused by the server.
# CHERRYPY_QUEUE_TIMEOUT = 0.1

# Activate the server profiling middleware.
# PROFILE = False

# Run Kolibri with Django setting DEBUG = True
# DEBUG = False

# Activate debug logging for Django ORM operations.
# DEBUG_LOG_DATABASE = False

[Paths]
# The directory that will store content files and content database files.
# To change this in a currently active server it is recommended to use the
# 'content movedirectory' management command.
# CONTENT_DIR = content

# Additional directories in which Kolibri will look for content files and content database files.
# CONTENT_FALLBACK_DIRS = 

# The file that contains the automatic device provisioning data.
# AUTOMATIC_PROVISION_FILE = 

[Urls]
# URL to use as the default source for content import.
# Slightly counterintuitively this will still be displayed in the UI as 'import from Kolibri Studio'.
# CENTRAL_CONTENT_BASE_URL = https://studio.learningequality.org

# URL to use as the target for data portal syncing.
# DATA_PORTAL_SYNCING_BASE_URL = https://kolibridataportal.learningequality.org

[Deployment]
# Sets the port that Kolibri will serve on. This can be further overridden by command line arguments.
# HTTP_PORT = 8080

# Turn off the statistics pingback. This will also disable update notifications
# DISABLE_PING = False

# Serve Kolibri from a subpath under the main domain. Used when serving multiple applications from
# the same origin. This option is not heavily tested, but is provided for user convenience.
# URL_PATH_PREFIX = /

# The user interface languages to enable on this instance of Kolibri (has no effect on languages of imported content channels).
# The default will include all the languages Kolibri supports.
# LANGUAGES = kolibri-supported

# When running by default (value blank), Kolibri frontend looks for the zipcontent endpoints
# on the same domain as Kolibri proper, but uses ZIP_CONTENT_PORT instead of HTTP_PORT.
# When running behind a proxy, set the value to the port where zipcontent endpoint is served on,
# and it will be substituted for the port that Kolibri proper is being served on.
# When zipcontent is being served from a completely separate domain, you can set an
# absolute origin (full protocol plus domain, e.g. 'https://myzipcontent.com/')
# to be used for all zipcontent origin requests.
# ZIP_CONTENT_ORIGIN = 

# Sets the port that Kolibri will serve the alternate origin server on. This is the server that
# is used to serve all content for the zipcontent endpoint, so as to provide safe IFrame sandboxing
# but avoiding issues with null origins.
# This is the alternate origin server equivalent of HTTP_PORT.
# ZIP_CONTENT_PORT = 0

# The zip content equivalent of URL_PATH_PREFIX - allows all zip content URLs to be prefixed with
# a fixed path. This both changes the URL from which the endpoints are served by the alternate
# origin server, and the URL prefix where the Kolibri frontend looks for it.
# In the case that ZIP_CONTENT_ORIGIN is pointing to an entirely separate origin, this setting
# can still be used to set a URL prefix that the frontend of Kolibri will look to when
# retrieving alternate origin URLs.
# ZIP_CONTENT_URL_PATH_PREFIX = /

# Boolean flag that causes content import processes to skip trying to import any
# content, as it is assumed that the remote source has everything available.
# Server configuration should handle ensuring that the files are properly served.
# REMOTE_CONTENT = False

# In case a SoUD connects to this server, the SoUD should use this interval to resync every user.
# SYNC_INTERVAL = 60

# The minimum free disk space that Kolibri should try to maintain on the device. This will
# be used as the floor value to prevent Kolibri completely filling the disk during file import.
# Value can either be a number suffixed with a unit (e.g. MB, GB, TB) or an integer number of bytes.
# MINIMUM_DISK_SPACE = 250MB

[Python]
# Which Python pickle protocol to use. Pinned to 2 for now to provide maximal cross-Python version compatibility.
# Can safely be set to a higher value for deployments that will never change Python versions.
# PICKLE_PROTOCOL = 2

[Tasks]
# Whether to use Python multiprocessing for worker pools. If False, then it will use threading. This may be useful,
# if running on a dedicated device with multiple cores, and a lot of asynchronous tasks get run.
# USE_WORKER_MULTIPROCESSING = False

# The number of workers to spin up for regular priority asynchronous tasks.
# REGULAR_PRIORITY_WORKERS = 4

# The number of workers to spin up for high priority asynchronous tasks.
# HIGH_PRIORITY_WORKERS = 2

[Learn]
# Whether to enable custom channel navigation applications
# ENABLE_CUSTOM_CHANNEL_NAV = False

I think the common ground between the cmdline tool (/usr/bin/kolibri) and running a daemon might be the options.ini file.

@tim-moody
Copy link
Contributor

I think setting group is a good practice, but has no practical implications as world and group have the same permissions on the assets in /library/kolibri/content

@holta
Copy link
Member Author

holta commented Mar 29, 2023

I think setting group is a good practice, but has no practical implications as world and group have the same permissions on the assets in /library/kolibri/content

I agree this is likely best.

Hopefully I &/or others will have more time to confirm today/soon.

@holta
Copy link
Member Author

holta commented Mar 29, 2023

Recap — after careful testing that group disk is not necessary for Kolibri's access to USB drives inserted in a Raspberry Pi 4: (with latest 64-bit RasPiOS)

  • Assigning user kolibri to secondary group www-data (in addition to the user's primary group kolibri) is also not needed...
  • But we can keep this secondary group (www-data) if others prefer that, and it's extremely low risk... as per this current PR.

@holta
Copy link
Member Author

holta commented Mar 30, 2023

Thanks everyone for the comments + suggestions over the past 3 days. Let's go with this PR. And if necessary later, we can further simplify in a future PR (e.g. removing all secondary groups for KOLIBRI_USER kolibri, or any other changes that further simplifies Kolibri's installation, hewing as closely as possible to Learning Equality's official/emerging recommendations!)

@radinamatic, @rtibbles and/or @jamalex might know when Kolibri docs are next scheduled to be updated (this year ideally, if possible!) Just FYI many of Kolibri's install docs and guidelines are currently stale by a few years, so let's give them time to catch up in 2023 hopefully. Just 3 examples here:

  1. @jredrejo fixed apt (PPA) configuration in 2022, as OS's changed many years ago towards killing off the "obsolete" apt-key command. But his recommendation is not quite yet documented: Test IIAB on Ubuntu 22.10 "Kinetic Kudu" [with ppa:learningequality/kolibri-proposed] [keyring /etc/apt/trusted.gpg DEPRECATED] #3343 (comment)
  2. Kolibri's main /etc/kolibri/README contains a few errors, awaiting an update since 2019: Honor (don't ignore) /etc/kolibri/username when installing Kolibri — putting an end to 2 contradictory KOLIBRI_USER settings learningequality/kolibri-installer-debian#117 (comment)
  3. Kolibri's 29 supported languages are now 32, after adding Greek, Ukrainian and Haitian: Haitian creole learningequality/kolibri#10150 (comment)

@holta
Copy link
Member Author

holta commented Mar 30, 2023

Link fixed just above, to help with installing Kolibri via apt/PPA: keyring /etc/apt/trusted.gpg DEPRECATED (#3343)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants