Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configdata.yml performance issues / switch away from PyYAML #2777

Open
The-Compiler opened this issue Jul 4, 2017 · 19 comments
Open

configdata.yml performance issues / switch away from PyYAML #2777

The-Compiler opened this issue Jul 4, 2017 · 19 comments

Comments

@The-Compiler
Copy link
Member

@The-Compiler The-Compiler commented Jul 4, 2017

With the current new-config branch, test_init_benchmark in test_configdata.py takes (median):

  • ~20ms with the PyYAML C extension on my machine
  • ~200ms without the C extension on my machine
  • ~20s (?!) on Travis (without C extension, I think)

Even if this is a travis only thing, 200ms+ on each start is quite a hard hit to take. I can think of a couple of solutions:

  • Try if something improves with ruamel.yaml (#2745)
  • Log a warning if C extensions weren't found (where are they available? Where not? Can we install a wheel from PyPI with them?)
  • "Compile" the YAML to a Python file at development time. Since we already need to regenerate the docs anyways, probably not a too big issue, and easiest for users
  • "Compile" the YAML to a Python (or JSON, if loading that is fast?) file at the first start. That needs some code to detect when configdata.yml has changed and stuff though.
  • "Compile" the YAML to a Python file when packaging, similar to asciidoc2html.py. If it's unavailable, load the yml file and show warnings as per above. Kind of difficult to figure out whether data is outdated or not.
@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Oct 4, 2017

This is still quite an issue, and I don't know of a solution as long as we're keeping the YAML file.

However, I think .ini files could actually work quite nicely for this purpose!

[bindings.key_mappings]
default = {
    "<Ctrl-[>": "<Escape>",
    "<Ctrl-6>": "<Ctrl-^>",
    "<Ctrl-M>": "<Return>",
    "<Ctrl-J>": "<Return>",
    "<Shift-Return>": "<Return>",
    "<Enter>": "<Return>",
    "<Shift-Enter>": "<Return>",
    "<Ctrl-Enter>": "<Ctrl-Return>"
  }
type_name = Dict
type_keytype = Key
type_valtype = Key
desc = This setting can be used to map keys to other keys.

    When the key used as dictionary-key is pressed, the binding for the key used
    as dictionary-value is invoked instead.

    This is useful for global remappings of keys, for example to map Ctrl-[ to
    Escape.

    Note that when a key is bound (via `bindings.default` or
    `bindings.commands`), the mapping is ignored.
>>> import configparser
>>> cp = configparser.ConfigParser()
>>> cp.read('configdata.ini')
['configdata.ini']
>>> import json
>>> json.loads(cp['bindings.key_mappings']['default'])
{'<Ctrl-[>': '<Escape>', '<Ctrl-6>': '<Ctrl-^>', '<Ctrl-M>': '<Return>', '<Ctrl-J>': '<Return>', '<Shift-Return>': '<Return>', '<Enter>': '<Return>', '<Shift-Enter>': '<Return>', '<Ctrl-Enter>': '<Ctrl-Return>'}
>>> print(cp['bindings.key_mappings']['desc'])
This setting can be used to map keys to other keys.

When the key used as dictionary-key is pressed, the binding for the key used
as dictionary-value is invoked instead.

This is useful for global remappings of keys, for example to map Ctrl-[ to
Escape.

Note that when a key is bound (via `bindings.default` or
`bindings.commands`), the mapping is ignored.

It requires a bit more parsing on our side (decoding default as JSON, and handling type_* keys), but it might be the least worst solution and still is easy to write.

@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Oct 10, 2017

I still hate the idea of making this an .ini (or .json or .py), I want to dig a bit deeper first.

Looks like the following environments have C extensions:

  • PyPI-installs on Travis
  • Archlinux docker containers (fixed by installing gcc and libyaml)

And the following don't:

  • macOS on Travis
  • AppVeyor

However, they took <3s to load the configdata.yml. It's still a lot, but miles away from the 20s I saw earlier. It looks like they're loading configdata more often than needed, though.

The-Compiler added a commit to qutebrowser/docker-ci that referenced this issue Oct 11, 2017
The-Compiler added a commit that referenced this issue Oct 11, 2017
See #2777
The-Compiler added a commit that referenced this issue Oct 11, 2017
The-Compiler added a commit that referenced this issue Oct 11, 2017
The-Compiler added a commit that referenced this issue Oct 11, 2017
The-Compiler added a commit that referenced this issue Oct 11, 2017
The-Compiler added a commit that referenced this issue Oct 11, 2017
See #2777
@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Oct 11, 2017

The master branch now shows a warning when it take longer than 1s (or 3s on the CI), pointing to this issue. Let's leave it at that for v1.0.0 for now and see if it's a problem in practice.

Also, all environments on Travis now have C extensions for PyYAML available.

@The-Compiler The-Compiler removed this from the v1.0.0 milestone Oct 11, 2017
@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Oct 30, 2017

In #3177 (comment) apparently @lejar has seen this while running the testsuite.

@lejar Do you see this when starting qutebrowser as well, and does it actually take 14+ seconds? What if you start it via ./.tox/py36/bin/python -m qutebrowser --temp-basedir?

@lejar
Copy link
Contributor

@lejar lejar commented Oct 30, 2017

@The-Compiler Nope when I run it normally it is pretty snappy (start < 2 sec). Do note though that I've been doing all of this in a vmware image of ubuntu on a windows 10 host, with the vm being stored on a 7200 rpm hdd (though I'd hope that most of what the tests need is already in ram). It has 4 cores and 4gb ram.

I ran the tests again with htop open and by the time the tests get to the yaml errors, it's using about 13.5% ram and 100% of one core. On the windows (host) side I'm monitoring disk IO but it is really minimal.

One thing I did notice is that the reported failure duration increases with every failure:

Failed: Got logging message on logger misc with level WARNING: YAML load took unusually long, please report this at https://github.com/qutebrowser/qutebrowser/issues/2777 duration: 2.535158s
5.506342s
8.392097s
11.435129s
14.344682s
17.371652s
20.29147s
@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Oct 30, 2017

Hm, that is a very interesting observation! Can you please pastebin the full output you get (e.g. via tox -e py36 -- --qute-bdd-webengine 2>&1 | tee tox.log or so)?

@lejar
Copy link
Contributor

@lejar lejar commented Oct 31, 2017

I killed it after a couple hours. tox.log

@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Nov 3, 2017

@lejar I've taken a first look at the log but I haven't figured out what's going on yet. I plan to get back to it somewhen this weekend though.

@seelaman
Copy link
Contributor

@seelaman seelaman commented Dec 28, 2017

same issue

Ubuntu 17.10, fresh install with tox

$ git rev-parse HEAD
2ef6e74

tox.log

@mjauvin
Copy link
Contributor

@mjauvin mjauvin commented Jan 26, 2018

I got this warning:

E       Failed: Got logging message on logger misc with level WARNING: YAML load took unusually long, please report this at https://github.com/qutebrowser/qutebrowser/issues/2777
E       duration: 70.800697s
E       PyYAML version: 3.12
E       C extension: False
E       Stack:
E       
E         File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
E           "__main__", mod_spec)
E         File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
E           exec(code, run_globals)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pytest.py", line 73, in <module>
E           raise SystemExit(pytest.main())
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/config.py", line 59, in main
E           return config.hook.pytest_cmdline_main(config=config)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 617, in __call__
E           return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 222, in _hookexec
E           return self._inner_hookexec(hook, methods, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 216, in <lambda>
E           firstresult=hook.spec_opts.get('firstresult'),
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/callers.py", line 180, in _multicall
E           res = hook_impl.function(*args)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/main.py", line 134, in pytest_cmdline_main
E           return wrap_session(config, _main)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/main.py", line 103, in wrap_session
E           session.exitstatus = doit(config, session) or 0
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/main.py", line 141, in _main
E           config.hook.pytest_runtestloop(session=session)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 617, in __call__
E           return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 222, in _hookexec
E           return self._inner_hookexec(hook, methods, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 216, in <lambda>
E           firstresult=hook.spec_opts.get('firstresult'),
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/callers.py", line 180, in _multicall
E           res = hook_impl.function(*args)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/main.py", line 164, in pytest_runtestloop
E           item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 617, in __call__
E           return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 222, in _hookexec
E           return self._inner_hookexec(hook, methods, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 216, in <lambda>
E           firstresult=hook.spec_opts.get('firstresult'),
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/callers.py", line 180, in _multicall
E           res = hook_impl.function(*args)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 62, in pytest_runtest_protocol
E           runtestprotocol(item, nextitem=nextitem)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 70, in runtestprotocol
E           rep = call_and_report(item, "setup", log)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 155, in call_and_report
E           call = call_runtest_hook(item, when, **kwds)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 175, in call_runtest_hook
E           return CallInfo(lambda: ihook(item=item, **kwds), when=when)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 189, in __init__
E           self.result = func()
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 175, in <lambda>
E           return CallInfo(lambda: ihook(item=item, **kwds), when=when)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 617, in __call__
E           return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 222, in _hookexec
E           return self._inner_hookexec(hook, methods, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 216, in <lambda>
E           firstresult=hook.spec_opts.get('firstresult'),
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/callers.py", line 180, in _multicall
E           res = hook_impl.function(*args)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 100, in pytest_runtest_setup
E           item.session._setupstate.prepare(item)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/runner.py", line 493, in prepare
E           col.setup()
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/python.py", line 1175, in setup
E           fixtures.fillfixtures(self)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 242, in fillfixtures
E           request._fillfixtures()
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 385, in _fillfixtures
E           item.funcargs[argname] = self.getfixturevalue(argname)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 427, in getfixturevalue
E           return self._get_active_fixturedef(argname).cached_result[0]
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 453, in _get_active_fixturedef
E           result = self._getfixturevalue(fixturedef)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 518, in _getfixturevalue
E           val = fixturedef.execute(request=subrequest)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 791, in execute
E           return hook.pytest_fixture_setup(fixturedef=self, request=request)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 617, in __call__
E           return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 222, in _hookexec
E           return self._inner_hookexec(hook, methods, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/__init__.py", line 216, in <lambda>
E           firstresult=hook.spec_opts.get('firstresult'),
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/pluggy/callers.py", line 180, in _multicall
E           res = hook_impl.function(*args)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 822, in pytest_fixture_setup
E           result = call_fixture_func(fixturefunc, request, kwargs)
E         File "/home/marc/src/qutebrowser/.tox/py36-pyqt59/lib/python3.6/site-packages/_pytest/fixtures.py", line 714, in call_fixture_func
E           res = fixturefunc(**kwargs)
E         File "/home/marc/src/qutebrowser/tests/unit/config/test_config.py", line 38, in configdata_init
E           configdata.init()
E         File "/home/marc/src/qutebrowser/qutebrowser/config/configdata.py", line 246, in init
E           DATA, MIGRATIONS = _read_yaml(utils.read_file('config/configdata.yml'))
E         File "/home/marc/src/qutebrowser/qutebrowser/config/configdata.py", line 193, in _read_yaml
E           data = utils.yaml_load(yaml_data)
E         File "/home/marc/src/qutebrowser/qutebrowser/utils/utils.py", line 902, in yaml_load
E           ''.join(traceback.format_stack())))
E       !

tests/helpers/logfail.py:61: Failed

edited by @The-Compiler to add proper code tags

@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Feb 10, 2018

I think I finally tracked this down. It only affects the testsuite and shouldn't have affected real usage.

The-Compiler added a commit that referenced this issue Feb 10, 2018
See #2777
@hjek
Copy link

@hjek hjek commented Jul 20, 2018

Got a message telling me to report this issue here. Running Qutebrowser 1.3.3 on FreeBSD.

23:57:51 WARNING: YAML load took unusually long, please report this at https://github.com/qutebrowser/qutebrowser/issues/2777                                                                              
duration: 5.25404s                                                                                    PyYAML version: 3.12                                                                                 
C extension: False                                                                                    Stack:                                                                                               
                                                                                                        File "/usr/local/bin/qutebrowser", line 11, in <module>                                            
    load_entry_point('qutebrowser==1.3.3', 'gui_scripts', 'qutebrowser')()                              File "/usr/local/lib/python3.6/site-packages/qutebrowser/qutebrowser.py", line 188, in main        
    return app.run(args)                                                                                File "/usr/local/lib/python3.6/site-packages/qutebrowser/app.py", line 101, in run                 
    configinit.early_init(args)                                                                         File "/usr/local/lib/python3.6/site-packages/qutebrowser/config/configinit.py", line 40, in early_in
it                                                                                                        configdata.init()                                                                                
  File "/usr/local/lib/python3.6/site-packages/qutebrowser/config/configdata.py", line 257, in init       DATA, MIGRATIONS = _read_yaml(utils.read_file('config/configdata.yml'))                          
  File "/usr/local/lib/python3.6/site-packages/qutebrowser/config/configdata.py", line 200, in _read_yaml                                                                                                  
    data = utils.yaml_load(yaml_data)                                                                   File "/usr/local/lib/python3.6/site-packages/qutebrowser/utils/utils.py", line 657, in yaml_load   
    ''.join(traceback.format_stack()))) 
@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Aug 17, 2018

Reopening as this is still an issue for some people. Probably time to switch to TOML or JSON. Maybe we might want to get rid of PyYAML entirely seeing that it repeatedly had issues with its maintainance so far.

@The-Compiler The-Compiler reopened this Aug 17, 2018
@The-Compiler The-Compiler changed the title configdata.yml performance issues configdata.yml performance issues / switch away from PyYAML Aug 17, 2018
@jgkamat
Copy link
Member

@jgkamat jgkamat commented Jan 30, 2019

I would vote for json as well, if we pass some flags before dumping it to a file, it should be as easy to read in plaintext as well. With it being in core, we can count on it being fast everywhere (I hope...).

@hjek way too late for a response, but if you change build options to build with --with-libyaml, the problem should go away for you. I'm guessing that's PORT_OPTIONS:MLIBYAML.

@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Feb 17, 2019

Dumping some more thoughts here.

  • PyYAML maintenance still is quite inactive, see e.g. yaml/pyyaml#181. Last commit June 2018.
  • Even YAML as a format really sucks
  • Alternatives
    • For configdata: See above
      • crazy idea: Why not XML? 😆
    • For sessions/etc.: JSON is probably fine
    • There's commentjson for Python, and we could probably easily strip out comments with a regex anyways (especially when not allowing inline comments).
    • JSON5 would be nice, but doesn't seem to be commonly packaged yet.
    • TOML might be nice, not sure about popularity.
  • We'll still need a YAML parser for a while for migrations (e.g. of session files), but maybe that could be ruamel.yaml? We didn't switch to it in #2745 because of performance issues, but those are likely not a problem for anything but configdata.yml.
@jgkamat
Copy link
Member

@jgkamat jgkamat commented Feb 17, 2019

@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Feb 18, 2019

Maybe we should try profiling both of them against the session/configdata.yml
files and see which one performs better, since they are both noticeable in
overall qutebrowser profiles.

I quickly talked to @jgkamat in IRC about this - I agree for configdata.yml performance is quite important (as it's rather big), but for the sessions, that's just the rather broken autosave. With #4545/#4087 and #3918 that shouldn't really matter anymore.

@zabbal
Copy link

@zabbal zabbal commented Jul 10, 2020

TOML might be nice, not sure about popularity

It's used for systemd units so it's almost impossible to find GNU/Linux machine nowadays without some TOML on it. Given that python seems to be converging on pyproject.toml it'll become even more commonly used in the near future.

@The-Compiler
Copy link
Member Author

@The-Compiler The-Compiler commented Jul 11, 2020

@zabbal systemd uses something ini-like which is not TOML. Also, I was mainly talking about the toml Python library above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.