Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support maverick commands #202

Open
SamuelDudley opened this issue Jan 22, 2020 · 33 comments
Open

support maverick commands #202

SamuelDudley opened this issue Jan 22, 2020 · 33 comments
Assignees
Milestone

Comments

@SamuelDudley
Copy link
Member

control services

maverick start ..
maverick stop ..
maverick status ..

@fnoop
Copy link
Member

fnoop commented Jan 22, 2020

Note that maverick start .. etc are just wrappers for systemctl. It prefixes 'maverick-' to each systemd manifest name, so:
maverick status nginx
actually refers to the system service 'maverick-nginx'. the actual system command that is then run is:
sudo systemctl status maverick-nginx
All the maverick manifests are in /etc/systemd/system, and are all prefixed with 'maverick-'

The same kind of thing happens with maverick start .. and maverick stop ... The maverick script itself (~/software/maverick/bin/maverick) is the reference.

Given that, it might be better for python/-api to use systemd directly using pythonic modules:
https://github.com/mosquito/cysystemd
https://github.com/systemd/python-systemd

@fnoop
Copy link
Member

fnoop commented Jan 22, 2020

Also a lot of the maverick services are controlled through puppet, ie through localconf.json. So as well as starting/stopping services, they should also be set in localconf.json to persist the change. Or maybe that's not always wanted.
Also note there are two different levels of service activation:

  • start/stop - this refers to the current status of the service - whether it is running or not right now
  • enable/disable - this refers to the persistent state of the service - ie whether it should start or not at boot or at runlevel target change

@SamuelDudley
Copy link
Member Author

re interacting with systemd directly, that was my plan where possible. Last time I had a look at options this one was at the top of my list: https://github.com/facebookincubator/pystemd

@fnoop
Copy link
Member

fnoop commented Jan 22, 2020

Ooh interesting. Can we guarantee all the platforms will have dbus?

@SamuelDudley SamuelDudley pinned this issue Jan 23, 2020
@fnoop
Copy link
Member

fnoop commented Jan 23, 2020

Also note that maverick status will show all running status. It used to be that all the services were manually maintained in the bin/maverick script, but now there is a parsed file hierarchy in ~/software/maverick/bin/status.d/. Each directory in their represents a 'section', and each file within each directory represents a service within that section.

@SamuelDudley SamuelDudley self-assigned this Feb 2, 2020
@SamuelDudley SamuelDudley added this to the 1.2 milestone Feb 2, 2020
@SamuelDudley
Copy link
Member Author

SamuelDudley commented Feb 2, 2020

mav@dev:~/software/python/bin$ sudo python3.7 <-- must be sudo :(

from pystemd.systemd1 import Unit
import glob
glob.glob('/etc/systemd/system/maverick*')

unit = Unit(b'maverick-webdev.service')
unit.load()

unit.Service.GetProcesses()

unit.Unit.Names
[b'maverick-webdev.service']

unit.Unit.ActiveState
b'active'

unit.Unit.SubState
b'running'

unit.Service.GetProcesses()
[(b'/system.slice/maverick-webdev.service', 4917, b'node /usr/bin/yarn run serve'), (b'/system.slice/maverick-webdev.service', 4949, b'/bin/sh -c vue-cli-service serve'), (b'/system.slice/maverick-webdev.service', 4950, b'/usr/bin/node /srv/maverick/code/maverick-web/node_modules/.bin/vue-cli-service serve')]

unit.Unit.Stop('replace')

unit.Unit.ActiveState
b'failed'

unit.Unit.SubState
b'failed'

unit.Unit.Start('replace')

hmmmm

not sure if this is worth it... we have to run -api as sudo and we cant enable / disable services... just stick to subprocess...

@fnoop
Copy link
Member

fnoop commented Feb 3, 2020

Interesting OK, we'll hit this problem with whatever we use to control systemd. There are ways we can provide sufficient granular root access to a process.
Alternatively we can subprocess out to the maverick command, but it should be noted that the maverick command (well OK let's be honest the entire mav user) has unrestricted root access. Which really should be changed at some point. There is a parameter that controls this: base::users::mav_sudopass. And it's of course less efficient and pythonic.
Perhaps we could create a microservice that runs as root that just deals with systemd - that would make @cglusky incredibly happy.

@SamuelDudley
Copy link
Member Author

I like the idea of a microservice here. I'll put something together. Preferences on the communication? Rest, graphql, something else?
I'm leaning towards a simple rest interface that we can await on using async calls for -api.

@fnoop
Copy link
Member

fnoop commented Feb 3, 2020

If we used dbus, we could use the permissions manager in that instead, which is more flexible:
https://github.com/facebookincubator/pystemd

But then that means depending on dbus, which isn't always a given in non-desktop environments. Probably not a good idea.

@fnoop
Copy link
Member

fnoop commented Feb 3, 2020

Looks like systemd requires dbus..

One option is to add a PolicyKit pkla:

[dev] [root@maverick-raspberrylite ~]# cat /etc/polkit-1/localauthority/50-local.d/org.freedesktop.systemd1.pkla
[Allow user mav to run systemctl commands]
Identity=unix-user:mav
Action=org.freedesktop.systemd1.manage-units
ResultInactive=yes
ResultActive=yes
ResultAny=yes

This allows us to start/stop/restart (any) services as the mav user:
[dev] [mav@maverick-raspberrylite /usr/share/polkit-1/actions]$ systemctl restart maverick-visiond

Unfortunately it doesn't allow us to specify which services can be controlled (allows all system services), and also doesn't allow us to enable/disable services, which requires root filesystem access (to change the symlinks in /etc/systemd/system/multi-user.target.wants/):

[dev] [mav@maverick-raspberrylite /usr/share/polkit-1/actions]$ systemctl disable maverick-visiond
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-unit-files ===
Authentication is required to manage system service or unit files.
Authenticating as: root
Password:

But it's an option - the enable/disable could be done separately through maverick configure which always runs as root.

@fnoop
Copy link
Member

fnoop commented Feb 3, 2020

Another option is to create user systemd manifests, by putting them in ~/.config/systemd/user. This is possibly a cleaner way of doing it, although might be a bit more confusing for people used to looking for them in /etc/systemd/system.

This allows control through the mav user without any escalated privileges, however not sure if it allows processes to be run as root, which some of the mav processes do.

@fnoop
Copy link
Member

fnoop commented Feb 3, 2020

Using subprocess and just calling maverick [command] [service] or sudo systemctl [command] service] and creating specific sudo rules for maverick-* services may well just be the easier thing to do in the short term. It's a bit messier, but it's an awful lot simpler as it's already all working, doesn't require separate python modules/code, doesn't require a microservice, doesn't require redoing the privilege escalation.

@SamuelDudley
Copy link
Member Author

refactored the async process runner here: https://github.com/goodrobots/maverick-api/blob/7ef29f4283e618320710c825c3b21513ab62bb96/maverick_api/modules/base/util/process_runner.py

For services we only care about the return value which is returned by the complete_callback() lifecycle hook.

@SamuelDudley
Copy link
Member Author

Still need to add parsing of the service structure but queries, mutations and subscriptions in place.
https://github.com/goodrobots/maverick-api/blob/master/maverick_api/modules/api/maverick/maverick_service.py

@SamuelDudley
Copy link
Member Author

I have added a query that will get the current status of all services and return it in a big list
http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickServiceList%20%7B%0A%20%20services%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%7D

the order of the list is as per the directory structure. You will see there are a few more fields populated with info that I lifted from the directory structure
e.g. category, displayCategory, displayName and updateTime
note that it is a very expensive query for the server as we are getting the status and if its enabled for all services

You can subscribe to updates using: http://dev.maverick.one:7000/graphiql?query=%0A%0Asubscription%7B%0A%20%20MaverickService%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D

You can query individual services, or service categories
individual service: http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickService(name%3A%22api%40dev%22)%20%7B%0A%20%20name%0A%20%20displayName%0A%20%20category%0A%20%20displayCategory%0A%20%20enabled%0A%20%20running%0A%20%20updateTime%0A%7D%7D%0A

whole category: http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickService(category%3A%22dev%22)%20%7B%0A%20%20name%0A%20%20displayName%0A%20%20category%0A%20%20displayCategory%0A%20%20enabled%0A%20%20running%0A%20%20updateTime%0A%7D%7D%0A
note that the queries don't return anything, but you catch the output in the subscription

modification of a service: http://dev.maverick.one:7000/graphiql?query=mutation%20%7B%0A%20%20MaverickService(name%3A%20%22api%40fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%0A

modification of a whole category: http://dev.maverick.one:7000/graphiql?query=mutation%20%7B%0A%20%20MaverickService(category%3A%20%22fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%7D%0A%7D%0A

My recommendation is to do a one off MaverickServiceList query and store the results in vuex.
On page load, create a subscription to MaverickService and update the vuex state using the data that comes in from the subscription.
For control you can submit MaverickService mutations with the appropriate name or category argument populated, along with the desired modification.

@SamuelDudley SamuelDudley mentioned this issue Mar 10, 2020
@SamuelDudley
Copy link
Member Author

SamuelDudley commented Mar 11, 2020

@fnoop
re using pystemd

Setup:
sudo apt install libsystemd-dev
pip install pystemd

I have not been able to find a way to attach callbacks to systemd. The following finds and then polls each of the maverick-* services approx once every few seconds (polling faster causes systemd to chew CPU cycles). If the status is changed a callback is ran.
I can start this in a background process from -api and use it to internally mutate the service state.

import time
from pystemd.systemd1 import Manager


class ServiceStatus:
    __slots__ = ["_name", "_running", "_enabled", "update_time", "callback"]
    def __init__(self, name, callback = None):
        self._name = name
        self._running = b"unknown"
        self._enabled = b"unknown"
        self.callback = callback
        self.update_time = time.time()
    
    @property
    def name(self):
        return self._name[:self._name.find(b".service")].decode()
    
    @property
    def running(self):
        if self._running == b'running':
            return True
        elif self._running == b'unknown':
            return None
        else:
            return False

    @property
    def enabled(self):
        if self._enabled == b'enabled':
            return True
        elif self._enabled == b'unknown':
            return None
        else:
            return False

    
    def update_running(self, running = b'unknown'):
        if self._running == running:
            pass
        else:
            self._running = running
            self.update_time = time.time()
            if self.callback:
                self.callback(self)
    
    def update_enabled(self, enabled = b'unknown'):
        if self._enabled == enabled:
            pass
        else:
            self._enabled = enabled
            self.update_time = time.time()
            if self.callback:
                self.callback(self)

    def __eq__(self, other):
        if not isinstance(other, ServiceStatus):
            return False
        return self._name == other._name and self._running == other._running and self._enabled == other._enabled

    def __hash__(self):
        return hash((self._name, self._running, self._enabled))


def list_units():
    with Manager() as manager:
        print("Version", manager.Manager.Version)
        print("Architecture", manager.Manager.Architecture)

        services = {}
        while 1:
            time.sleep(1) # backoff the systemd calls to spare CPU cycles
            for unit_path, enabled_status in manager.Manager.ListUnitFilesByPatterns(["enabled", "disabled"], ["maverick-*"]):
                time.sleep(0.2)
                idx = unit_path.find(b"maverick-")
                if not idx:
                    continue
                
                sevice_name = unit_path[idx:]
                if sevice_name not in services:
                    ss = ServiceStatus(sevice_name, callback=some_callback)
                    ss.update_enabled(enabled=enabled_status)
                    services[sevice_name] = ss
                    continue

                services[sevice_name].update_enabled(enabled=enabled_status)
            
            for unit in manager.Manager.ListUnitsByNames(services.keys()):
                time.sleep(0.2)
                services[unit[0]].update_running(running=unit[4])
                
    

def some_callback(service):
    print(service.name)
    print(f"running:{service.running}")
    print(f"enabled:{service.enabled}")
    print()

list_units()

Any thoughts?

@SamuelDudley
Copy link
Member Author

What the above does not do is get the status of indirect services such as api@dev... Once again, not sure how much better / more robust this would be than having a bunch of process calls going in the background that update the service status periodically.

@fnoop
Copy link
Member

fnoop commented Mar 13, 2020

Do the callbacks in https://github.com/facebookincubator/pystemd/blob/master/examples/monitor_all_units_from_signal.py do us for start/stop service? Does it work for indirect services (apparently called 'template units')?

I would be less bothered about enabled/disabled, as we can set that in config and depend on a maverick run to commit. It would be great to bypass puppet if we can do it directly of course, but doesn't have to be a top priority.

@SamuelDudley
Copy link
Member Author

SamuelDudley commented Mar 14, 2020

Yes they do pick up the changes in start / stop of template units...

This is what I have at the moment:

import pystemd
import asyncio
import tornado.ioloop
import functools
from pprint import pprint

from pystemd.dbuslib import DBus
from pystemd.systemd1 import Unit

unit_data = [
    ("maverick-visiond.service", "visiond"),
    ("maverick-api@dev.service", "api@dev"),
]


def process(msg, error=None, userdata=None):
    msg.process_reply(True)
    print("*" * 80)
    # print(" * New message:\n")
    # pprint({k: v for k, v in msg.headers.items() if v is not None})
    member = msg.headers.get("Member", None)

    if member == b"PropertiesChanged":
        # pprint(msg.body[1])
        unit_path = msg.headers.get("Path", None)
        sub_state = msg.body[1].get(b"SubState", None)
        timestamp_monotonic = msg.body[1].get(b"StateChangeTimestampMonotonic", None)
        if unit_path and sub_state:
            if unit_path in userdata:
                print(userdata[unit_path]["name"])
                print(sub_state)
                print(timestamp_monotonic)
    elif member == b"UnitFilesChanged":
        print("An enable / disable event occured")
    else:
        pass
    print("#" * 80)
    print("\n")


def read_callback(fd, event, **kwargs):
    kwargs["bus"].process()


async def monitor(event, units):
    loop = tornado.ioloop.IOLoop.current()

    with DBus() as bus:
        for unit_data in units.values():
            bus.match_signal(
                unit_data["unit"].destination,
                unit_data["unit"].path,
                b"org.freedesktop.DBus.Properties",
                b"PropertiesChanged",
                process,  # callback for this message
                units,
            )
        bus.match_signal(
            b"org.freedesktop.systemd1",
            None,
            None,
            b"UnitFilesChanged",
            process,  # callback for this message
            units,
        )
        loop.add_handler(
            bus.get_fd(), functools.partial(read_callback, bus=bus), loop.READ
        )
        await event.wait()


loop = tornado.ioloop.IOLoop.current()

units = {}
for unit_service_name, name in unit_data:
    unit = Unit(unit_service_name)
    units[unit.path] = {"unit": unit, "name": name}

event = asyncio.Event()
loop.add_callback(functools.partial(monitor, event, units))
loop.start()

we can use the existing code to populate the following with all the services we care about

unit_data = [
    ("maverick-visiond.service", "visiond"),
    ("maverick-api@dev.service", "api@dev"),
]

This will allow us to update the running state of services without making subprocess calls.

The code above also detects that something has been enabled / disabled in the system, but as I don't have a nice way of knowing what service was changed, we need to check all of the services that we are interested in using the existing subprocess call.

At least then the subprocess calls are limited to a change event (rather than a dumb loop)

@fnoop
Copy link
Member

fnoop commented Mar 14, 2020

Great, that's really cool! So we just need to work out how to automatically populate unit_data somehow. systemd/psystemd must have a way to query a list of all services?
Otherwise we'll have to keep updating -api everytime we add/remove a service, and it won't know about any template unit instances that the user adds (for example the user can add any number of sitl/ros/mavros/-api instances)

@SamuelDudley
Copy link
Member Author

SamuelDudley commented Mar 14, 2020

Currently I'm parsing the directory structure that the maverick status command uses to determine what services to show ~/software/maverick/bin/status.d/ .

If we ignore that I can get all the maverick- services using the Manager interface, but it doesn't list instances of template units e.g. it gives maverick-api@.service but doesn't give us dev or fc.

So we could have maverick update the directory structure when a user adds a sitl/ros/mavros/-api instance... Or keep trying to get them via pystemd

@SamuelDudley
Copy link
Member Author

SamuelDudley commented Mar 15, 2020

okay, we can get all the maverick- services with the following code

    def read_services_dbus(self):
        services = []
        with Manager() as manager:
            for _unit in manager.Manager.ListUnits():
                idx = _unit[0].find(b"maverick-")
                if idx == -1: 
                    continue
                services.append({
                    "unit": Unit(_unit[0]), 
                    "category_name": None,
                    "category_display_name": None,
                    "command": _unit[0][0:-8].decode(),
                    "service_name": _unit[0][9:-8].decode(),
                    "service_display_name": _unit[1].decode(),
                    "category_index": None,
                    "service_index": None,
                    "last_update": None,
                    "enabled": None,
                    "running": True if _unit[4] == b'running' else False,
                })
        for service in services:
            application_log.debug(f"Adding service: {service}")

I'm re-writing the services module to use the dbus code. I wont have time to work on it much this week.

@SamuelDudley
Copy link
Member Author

dbus code is in... https://github.com/goodrobots/maverick-api/blob/master/maverick_api/modules/api/maverick/maverick_service.py

note that the hot reload of the api server causes the dbus code to stop watching for events... need to add custom logic to unload this module

@SamuelDudley
Copy link
Member Author

need to support systems without dbus

[I 2020-03-19,11:04:31 /home/nova/code/maverick-api/maverick_api/modules/__init__.py:138]
    Appending schema from maverick_api.modules.api.maverick.MaverickServiceSchema
Traceback (most recent call last):
  File "maverick-api.py", line 41, in <module>
    api.initialize()
  File "/home/nova/code/maverick-api/maverick_api/modules/base/apiserver/__init__.py", line 47, in initialize
    generate_schema()
  File "/home/nova/code/maverick-api/maverick_api/modules/__init__.py", line 139, in generate_schema
    _module_schema[ref_name] = attribute()
  File "/home/nova/code/maverick-api/maverick_api/modules/api/maverick/maverick_service.py", line 87, in __init__
    self.services = self.read_services_dbus()
  File "/home/nova/code/maverick-api/maverick_api/modules/api/maverick/maverick_service.py", line 170, in read_services_dbus
    with Manager() as manager:
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 31, in __enter__
    self.load()
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 89, in load
    unit_xml = self.get_introspect_xml()
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 73, in get_introspect_xml
    with self.bus_context() as bus:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 64, in bus_context
    bus.open()
  File "pystemd/dbuslib.pyx", line 303, in pystemd.dbuslib.DBus.open
pystemd.dbusexc.DBusFileNotFoundError: [err -2]: Could not open a bus to DBus

@fnoop
Copy link
Member

fnoop commented Mar 19, 2020

What system is that? When I did a bit of research it appeared that dbus is basically a dependency of systemd, except in unusual circumstances.

@SamuelDudley
Copy link
Member Author

SamuelDudley commented Mar 19, 2020

windows subsystem for linux (I use it for testing)

@fnoop
Copy link
Member

fnoop commented Mar 19, 2020

Ah never ever considered running it on windows :)
This indicates dbus is present, or at least possible:
https://www.novaspirit.com/2016/08/19/ubuntu-bash-windows-10-dbus-error-fix/

@SamuelDudley
Copy link
Member Author

SamuelDudley commented Mar 28, 2020

There is a development configuration option that needs to be set to True, otherwise graphiql returns a 403

https://github.com/goodrobots/maverick-api/blob/master/maverick_api/config/maverick-api.conf.sample#L26

@fnoop
Copy link
Member

fnoop commented Mar 28, 2020

Stopping (or starting) a single service a single time, we get 2 subscription updates:

Screenshot 2020-03-28 at 16 47 10

@SamuelDudley
Copy link
Member Author

That's because I'm treating all dbus statuses that are not 'running' as 'dead'. Transitional states between dead an running will cause a subscription update but the service will be reported as dead.
If the additional subscription updates are an issue we can:

  • 1 report the exact states in the GUI (and let the GUI 'parse' them)
  • 2 I can tighten up the matching logic to only report dead and running

@fnoop
Copy link
Member

fnoop commented Mar 28, 2020

What are the transitional states? If we can't see them in the message and they're not useful to the front end then ideally we shouldn't see them, just the final change to live, or a final change to dead.

@SamuelDudley
Copy link
Member Author

I can't recall the exact names but they are effectively reporting starting and stopping states. My concern was that unless the service is in that final state of running, for all purposes it's still dead and hence I was reporting it as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants