Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow startup of new instances. #830

Closed
VGrol opened this issue Aug 27, 2018 · 11 comments
Closed

Slow startup of new instances. #830

VGrol opened this issue Aug 27, 2018 · 11 comments

Comments

@VGrol
Copy link

VGrol commented Aug 27, 2018

Kitty seems to have a really slow start up on some systems, where as it is unaffected by this on others,
does not appear to be related to slow hardware.

I have a salvaged two core Baytrail system. that pulls acceptable start up time.
The system that I experience this under however runs the following:

Z170-A
I7-6700K
GTX 1050

Arch Linux - 4.18.4
I3-gaps - 4.150
Closed source NVIDIA drivers. - 396.54-2
I figured I'd be notable that I'm also running the Intel firmware init.

Now the unaffected system mirrors this software configuration except for the fact that it runs an integrated CPU and thus utilizes the mesa drivers instead of the NVIDIA ones.
Utilizing the -1 parameter bypasses the slow start up, as it hooks into an existing process.

Testing was conducted via executing time kitty fish -c exit: with a potential -1
I default to the fish shell, but replacing it with sh or zsh has identical timing.

Unaffected Baytrail system timings.

Normal
0.38user 0.10system 0:00.53elapsed 93%CPU
0inputs+136outputs (0major+7867minor)pagefaults 0swaps

-1
0.22user 0.02system 0:00.25elapsed 98%CPU
0inputs+0outputs (0major+2244minor)pagefaults 0swaps

Now the affected system

Normal
0.15user 0.04system 0:01.63elapsed 12%CPU (0avgtext+0avgdata 51816maxresident)k
0inputs+8outputs (0major+9193minor)pagefaults 0swaps

-1
0.05user 0.00system 0:00.06elapsed 98%CPU (0avgtext+0avgdata 17768maxresident)k
0inputs+0outputs (0major+2253minor)pagefaults 0swaps

It seems that CPU utilization is throwing some kind of wrench in the loop, but I cannot say.

@kovidgoyal
Copy link
Owner

The most obvious thing to check first is the GPU drivers. Temporarily switch to using the open source noveau (i think its called) driver and see if it makes a difference.

The next step is to instrument kitty code to see where the slowdown is occuring. Notable startup code is in main.py (the _run_app function) and glfw.c (the create_os_window() function). Stick some timing prints into those and see where the slowdown is happening.

@VGrol
Copy link
Author

VGrol commented Aug 27, 2018

The most obvious thing to check first is the GPU drivers. Temporarily switch to using the open source noveau (i think its called) driver and see if it makes a difference.

Nouveau was tested and it resulted in no difference.
Since the CPU utilization seems to be the problem I'm going to guess it might have something to do with Intel-firmware and if that is the case, you can safely consider fixing this a lost cause.
So lets hope it isn't.

The next step is to instrument kitty code to see where the slowdown is occuring. Notable startup code is in main.py (the _run_app function) and glfw.c (the create_os_window() function). Stick some timing prints into those and see where the slowdown is happening.

It's been a while since I've dabbled in Python, would you care to show me what to insert?

@kovidgoyal
Copy link
Owner

from time import monotonic
print(monotonic(), 'xxx')

insert that in a few places in the _run_app function to see where the time is being spent. I'm going to guess it is in create_os_window() which you will find in glfw.c

@VGrol
Copy link
Author

VGrol commented Aug 27, 2018

Naming might be slightly off, but I've done my best to document them.

python 3 . -1

6353.968618329 post import
6353.96863208 post load_all_shaders
6353.968638318 post _run_app
6353.971400743 init talk_to_instance
6353.971448085 try single_instance.socket

python3 .

6670.208323149 post import
6670.208340215 post load_all_shaders
6670.208348032 post _run_app
6670.20874988 init init_glfw
6670.208759927 post glfw_module definition init_glfw
6671.609475208 pre return glfw_module init_glfw
6671.611280359 init setup_environment
6671.611455396 post setup_environment
6671.624444976 init _run_app
6671.624471288 init get_new_os_window_trigger
6671.624482386 post Ibeam, pre load_all_shaders.cursor_text_color _run_app
6671.884769013 init load_all_shaders
6671.886944326 mid load_all_shaders
6671.898920765 init _run_app>if-not is_wayland
6671.903956884 init try boss.child_monitor _run_app

The major hang up seems to be from the init_glfw, specifically before the module returns.
More specifically, the if not glfw_init(lfw_path(glfw_module), debug_keyboard) seems to be right in between the slowdown, as the post glfw_module definition init_glfw comes before it,
where as pre return glfw_module init_glfw is right after it.

Edit: I figured that it'd probably be clearer if I linked the modified code, so it becomes much clearer what executes where.

def init_glfw(debug_keyboard=False):
    print(monotonic(), 'init init_glfw')
    glfw_module = 'cocoa' if is_macos else ('wayland' if is_wayland else 'x11')
    print(monotonic(), 'post glfw_module definition init_glfw')
    if not glfw_init(glfw_path(glfw_module), debug_keyboard):
        raise SystemExit('GLFW initialization failed')
        print(monotonic(), 'post if-not glfw_init init_glfw')
    print(monotonic(), 'pre return glfw_module init_glfw')
    return glfw_module
    print(monotonic(), 'post init_glfw'))

@kovidgoyal
Copy link
Owner

glfw_init is in glfw.c line 588. There you will find the slow function is glfwInit() which initialises the glfw toolkit that kitty uses. You can trace that further in init.c where you will probably find the slowdown in _glfwPlatformInit() which will be in x11_init.c (assuming you are on an X11 system)

@kovidgoyal
Copy link
Owner

Also, I note that compiling the OpenGL shaders (load_all_shaders()) seems to be very slow on your system, it takes 0.26 s. Considering that it only has to compile ~ 300 lines of shader code, that is insane.
On my system it takes 0.005 seconds.

@kovidgoyal
Copy link
Owner

Or maybe I am misreading your trace, the lines of concern are:

6671.624482386 post Ibeam, pre load_all_shaders.cursor_text_color _run_app
6671.884769013 init load_all_shaders

Not sure exactly what those correspond to.

@VGrol
Copy link
Author

VGrol commented Aug 28, 2018

Or maybe I am misreading your trace

Let me just plop code associated with those parameters below.


def load_all_shaders(semi_transparent=0):
    print(monotonic(), 'init load_all_shaders')
    load_shader_programs(semi_transparent, load_all_shaders.cursor_text_color)
    load_borders_program()
    print(monotonic(), 'mid load_all_shaders')

and


def _run_app(opts, args):
    print(monotonic(), 'init _run_app')
    new_os_window_trigger = get_new_os_window_trigger(opts)
    if False and is_macos:
        # This is disabled because using custom cursors fails
        # on dual GPU machines: https://github.com/kovidgoyal/kitty/issues/794
        set_custom_ibeam_cursor()
    print(monotonic(), 'post Ibeam, pre load_all_shaders.cursor_text_color _run_app')
    load_all_shaders.cursor_text_color = opts.cursor_text_color
    with cached_values_for(run_app.cached_values_name) as cached_values:
        with startup_notification_handler(extra_callback=run_app.first_window_callback) as pre_show_callback:
            window_id = create_os_window(
                    run_app.initial_window_size_func(opts, cached_values),
                    pre_show_callback,
                    appname, args.name or args.cls or appname,
                    args.cls or appname, load_all_shaders)
        if not is_wayland and not is_macos:  # no window icons on wayland
            with open(logo_data_file, 'rb') as f:
                print(monotonic(), 'init _run_app>if-not is_wayland')
                set_default_window_icon(f.read(), 256, 256)
        boss = Boss(window_id, opts, args, cached_values, new_os_window_trigger)
        boss.start()
        try:
            print(monotonic(), 'init try boss.child_monitor _run_app')
            boss.child_monitor.main_loop()
            print(monotonic(), 'init try boss.child_monitor _run_app')
        finally:
            print(monotonic(), 'init finally boss.destroy _run_app')
            boss.destroy()
            print(monotonic(), 'post finally boss.destroy _run_app')
print(monotonic(), 'post _run_app')

glfw_init is in glfw.c line 588. There you will find the slow function is glfwInit() which initialises the glfw toolkit that kitty uses. You can trace that further in init.c where you will probably find the slowdown in _glfwPlatformInit() which will be in x11_init.c (assuming you are on an X11 system)

I'll investigate this somewhere in the next few days.
You would also be correct in assuming X11, considering Nvidia still doesn't support Wayland. (outside of gnomem)

@kovidgoyal
Copy link
Owner

OK so its not the shader compilation it is somethin in create_os_window as I initially suspected. You can find that in glfw.c

@VGrol
Copy link
Author

VGrol commented Aug 30, 2018

Line 588 in glfw.c

PyObject*
glfw_init(PyObject UNUSED *self, PyObject *args) {
    clock_t initglfwinit = clock();
    const char* path;
    int debug_keyboard = 0;
    if (!PyArg_ParseTuple(args, "s|p", &path, &debug_keyboard)) return NULL;
    const char* err = load_glfw(path);
    if (err) { PyErr_SetString(PyExc_RuntimeError, err); return NULL; }
    glfwSetErrorCallback(error_callback);
    glfwInitHint(GLFW_DEBUG_KEYBOARD, debug_keyboard);
    global_state.opts.debug_keyboard = debug_keyboard != 0;
#ifdef __APPLE__
    glfwInitHint(GLFW_COCOA_CHDIR_RESOURCES, 0);
    glfwInitHint(GLFW_COCOA_MENUBAR, 0);
#endif
    PyObject *ans = glfwInit() ? Py_True: Py_False;
    if (ans == Py_True) {
        OSWindow w = {0};
        set_os_window_dpi(&w);
        global_state.default_dpi.x = w.logical_dpi_x;
        global_state.default_dpi.y = w.logical_dpi_y;
    }
    Py_INCREF(ans);
    clock_t postglfwinit = clock();
    printf("GLFW.c l588 glfw_init, init through post: Elapsed: %f seconds\n", (double)(initglfwinit - postglfwinit) / CLOCKS_PER_SEC);
    return ans;
}       

17436.690079278 post import
17436.690093246 post load_all_shaders
17436.690098353 post _run_app
17436.690469583 init init_glfw
17436.690479242 post glfw_module definition init_glfw
GLFW.c l588 glfw_init, init through post: Elapsed: -0.011814 seconds
17437.870943571 pre return glfw_module init_glfw
17437.872857656 init setup_environment
17437.872995816 post setup_environment
17437.885006948 init _run_app

Returns seem normal, Will delve into the x11_init.c next.

@VGrol
Copy link
Author

VGrol commented Aug 30, 2018

21282.956007498 post import
21282.956020632 post load_all_shaders
21282.95602483 post _run_app
21282.956352217 init init_glfw
21282.956361505 post glfw_module definition init_glfw
x11_init.c l153 _glfwPlatformInit, init through post: Elapsed: -0.006364 seconds
x11_init.c l512 createHiddenCursor, init through post: Elapsed: -0.000001 seconds
x11_init.c l612 _glfwPlatformInit, init through post: Elapsed: -0.008986 seconds
GLFW.c l200 glfwInit, init through post: Elapsed: -0.011605 seconds
GLFW.c l588 glfw_init, init through post: Elapsed: -0.012374 seconds
21284.334419186 pre return glfw_module init_glfw
21284.336204432 init setup_environment
21284.33635971 post setup_environment

Code for the timed module highlighted

int _glfwPlatformInit(void)
{
    clock_t initglfwinit4 = clock();
    XInitThreads();
    XrmInitialize();

    if (pipe2(_glfw.x11.eventLoopData.wakeupFds, O_CLOEXEC | O_NONBLOCK) != 0)
    {   
        _glfwInputError(GLFW_PLATFORM_ERROR,
                "X11: failed to create self pipe");
        return GLFW_FALSE;
    }

    _glfw.x11.display = XOpenDisplay(NULL);
    if (!_glfw.x11.display)
    {
        const char* display = getenv("DISPLAY");
        if (display)
        {
            _glfwInputError(GLFW_PLATFORM_ERROR,
                            "X11: Failed to open display %s", display);
        }
        else
        {
            _glfwInputError(GLFW_PLATFORM_ERROR,
                            "X11: The DISPLAY environment variable is missing");
        }
        
        return GLFW_FALSE;
    }

    initPollData(&_glfw.x11.eventLoopData, _glfw.x11.eventLoopData.wakeupFds[0], ConnectionNumber(_glfw.x11.display));
    glfw_dbus_init(&_glfw.x11.dbus, &_glfw.x11.eventLoopData);

    _glfw.x11.screen = DefaultScreen(_glfw.x11.display);
    _glfw.x11.root = RootWindow(_glfw.x11.display, _glfw.x11.screen);
    _glfw.x11.context = XUniqueContext();
    _glfw.x11.RESOURCE_MANAGER = XInternAtom(_glfw.x11.display, "RESOURCE_MANAGER", True);
    XSelectInput(_glfw.x11.display, _glfw.x11.root, PropertyChangeMask);

    _glfwGetSystemContentScaleX11(&_glfw.x11.contentScaleX, &_glfw.x11.contentScaleY, GLFW_FALSE);

    if (!initExtensions())
        return GLFW_FALSE;

    _glfw.x11.helperWindowHandle = createHelperWindow();
    _glfw.x11.hiddenCursorHandle = createHiddenCursor();

#if defined(__linux__)
    if (!_glfwInitJoysticksLinux())
        return GLFW_FALSE;
    if (_glfw.linjs.inotify > 0)
        addWatch(&_glfw.x11.eventLoopData, "joystick", _glfw.linjs.inotify, POLLIN, 1, NULL, NULL);
#endif

    _glfwInitTimerPOSIX();

    _glfwPollMonitorsX11(); 
    clock_t postglfwinit4 = clock();    printf("x11_init.c l612 _glfwPlatformInit, init through post: Elapsed: %f seconds\n", (double)(initglfwinit4 - postglfwinit4) / CLOCKS_PER_SEC);
    return GLFW_TRUE;
}

The highlighted statement gets posted right after the slow down, but the time calculation seems acceptable, so I went ahead and moved the timer up, upto the point where it got posted before the slowdown.

#if defined(__linux__)
     if (!_glfwInitJoysticksLinux())
        return GLFW_FALSE;
     if (_glfw.linjs.inotify > 0)
        addWatch(&_glfw.x11.eventLoopData, "joystick", _glfw.linjs.inotify, POLLIN, 1, NULL, NULL);
#endif

Seems to be the problem.
It can be found under the x11_init.c file, at approximately line ~670.

I figured I'd try removing those lines all together and it seems to fix the slow startup.
While this entire section can be removed, this seems to be the offender.

    if (!_glfwInitJoysticksLinux())
         return GLFW_FALSE;

0.11user 0.04system 0:00.21elapsed 70%CPU (0avgtext+0avgdata 50900maxresident)k
0inputs+8outputs (0major+8975minor)pagefaults 0swaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants