Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize getting relative page URLs #2272

Closed
wants to merge 5 commits into from
Closed

Conversation

oprypin
Copy link
Contributor

@oprypin oprypin commented Dec 27, 2020

This is a custom implementation that's ~5 times faster. That's important because calls to normalize_url on a site with ~300 pages currently take up ~20% of the total run time due to the sheer number of them. The number of calls is at least the number of pages squared.

The old implementation relied on posixpath.relpath, which (among other unnecessary parts) first expands every path with abspath for whatever reason, so actually it becomes even slower if you happen to be building under a deeply nested path.

Profile before/after

profile

In context

profile

This is a custom implementation that's ~5 times faster. That's important because calls to `normalize_url` on a site with ~300 pages currently take up ~20% of the total run time due to the sheer number of them. The number of calls is at least the number of pages squared.

The old implementation relied on `posixpath.relpath`, which (among other unnecessary parts) first expands every path with `abspath` for whatever reason, so actually it becomes even slower if you happen to be building under a deeply nested path.
@waylan
Copy link
Member

waylan commented Dec 27, 2020

So this change removes calls to the standard lib and replaces them with your own custom implementation. I realize that there may be some standard lib code that we don't need, but the assumption is that the standard lib code is well tested and accurate. Your custom code is an unknown. I suspect the existing tests don't cover nearly enough to be sure this doesn't break various edge cases (admittedly I didn't check). We will always go with accuracy over performance. If you want this accepted, then you will need to demonstrate that this is very well tested.

By the way, you have been providing a lot of performance patches lately. I don't object to that. In fact, I appreciate that someone is looking at the issue, as I don't have any time to dedicate to the issue. In fact, reviewing all these patches which don't improve the project in any perceivable way (from my perspective) just adds more burden on me. Therefore, from my perspective you need to demonstrate that you aren't breaking anything. Your comments always mention how much of a performance improvement this makes. But I don't care much about that. The largest site I have is less than 20 pages. I have the perception that my sites build instantly on my very old machine. Personally I see no need for performance improvements. What I care about is ease of maintenance and accuracy. If you would like me to continue to spend time reviewing your performance improvements, please focus on those two factors first. Any performance improvements will always be second to those in my reviews.

As an example, if the proposed change in this PR were to result in a future bug report, my "fix" would simply be to revert to the previous implementation. I'm not interesting in spending the time to work through the custom code to fix various edge cases. That is more of a maintenance burden than I want to take on. Therefore, before I am willing to accept it, I need to be sure that those bug reports aren't likely to happen in the future.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 27, 2020

the assumption is that the standard lib code is well tested and accurate

Well, generally, it's tested for file system paths relative to the current directory, not something virtual...
I think this argument should not be used like that; stdlib or not makes no difference. But sure, you can say this particular usage of it is tested by time.
Anyway, that's just a pointless comment by me.


you will need to demonstrate that this is very well tested

So this function certainly supports less functionality than the old one, for example collapsing foo/.. into nothing (and, well, the quite funny functionality of resolving paths relative to the current directory...). But it's easy to see that that functionality is not needed.

It's slightly unclear how to declare exhaustiveness, but the main idea is proving that inputs to it can only be some particular subset of strings, and then perhaps fuzz it within those constraints.


If you would like me to continue to spend time reviewing your performance improvements

Well, don't worry much about that. I intended to have stopped already but this one is just such a big realization (also realization that stdlib usage isn't exempt from scrutiny).

I'm not interesting in spending the time to work through the custom code to fix various edge cases

I can promise a 24-hour response time with a fix :D

@waylan
Copy link
Member

waylan commented Dec 28, 2020

If you would like this merged, then you will need to either point to an existing set of tests (I haven't taken the time to check), or. if they don't exist, provide a comprehensive set of tests. My guess is that you will only find a limited set of tests and need to provide more so that we have a comprehensive set of tests.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 28, 2020

I have done fuzzing as mentioned. It asserts that the output of the old function is the same as of the new function.

It found mostly differences that I already knew about (and I did ensure that it found them for real), so I explicitly ruled out those inputs (see assume below).

The fuzzer is told to generate a concatenation of any combination of any number of "any unicode string", "slash", "dot"
(see one_of -> lists -> join below).
I added slash and dot explicitly because without them the fuzzer is totally lost in the search space, although with infinite time it'd make no difference.

# Requires: `pip install pytest hypothesis atheris`

import posixpath

def get_relative_url_1(url, other):
    if other != '.':
        # Remove filename from other url if it has one.
        parts = posixpath.split(other)
        other = parts[0] if '.' in parts[1] else other
    relurl = posixpath.relpath(url, other)
    return relurl + '/' if url.endswith('/') else relurl

def get_relative_url_2(url, other):
    other_parts = other.split('/')
    # Remove filename from other url if it has one.
    if other_parts and '.' in other_parts[-1]:
        other_parts.pop()

    other_parts = [p for p in other_parts if p not in ('.', '')]
    dest_parts = [p for p in url.split('/') if p not in ('.', '')]
    common = 0
    for a, b in zip(other_parts, dest_parts):
        if a != b:
            break
        common += 1

    rel_parts = ['..'] * (len(other_parts) - common) + dest_parts[common:]
    relurl = '/'.join(rel_parts) or '.'
    return relurl + '/' if url.endswith('/') else relurl


from hypothesis import strategies as st, given, assume, settings, example

@st.composite
def path(draw):
    elements = st.one_of(st.text(), st.just('/'), st.just('.'))
    path = ''.join(draw(st.lists(elements)))

    assume(path != '')  # The original function has `raise ValueError("no path specified")`
    assume('/../' not in f'/{path}/')  # The new function doesn't support normalizing paths

    # print(repr(path))  # Uncomment to see produced examples.
    return path

@settings(max_examples=100000)
@given(path(), path())
def test_equal(url, other):
    assume(url.startswith('/') == other.startswith('/'))  # The original function sees one path as relative to the current directory!

    assert get_relative_url_1(url, other) == get_relative_url_2(url, other)

# Run with 'hypotesis' fuzzer: `pytest fuzz_get_relative_url.py -s`

# Run with 'atheris' fuzzer: `python fuzz_get_relative_url.py`
if __name__ == "__main__":
    import sys, atheris
    atheris.Setup(sys.argv, test_equal.hypothesis.fuzz_one_input)
    atheris.Fuzz()

I found that the function produced differences only for these classes of inputs that I didn't already know about:

@example('foo', 'bar./')  # The trailing slash should disqualify it from being a file.
@example('foo', 'foo/bar./.')  # The trailing dot should disqualify it from being a file.

I fixed this and will also add unittests.

After that I continued to run on each of those 2 fuzzers for well over an hour and for well over 100000 examples, and it didn't find anything more.

Now what remains is to prove that resolving .. is indeed not relied on anywhere, and we might be good to go.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 29, 2020

And my exhaustive analysis of usages. Sadly the results aren't so clear cut.

Summary - the only things that could've regressed:

  • If someone passed a path like aaa/../aaa/x.html to the |url filter, previously it'd become aaa/x.html, now it will stay like that. And in a subdirectory, of course, it will become '../aaa/../aaa/x.html'. Which would actually probably still work but yeah...

  • If someone passed a path like aaa/../aaa/x.html to config['extra_templates], previously it'd become aaa/x.html, now it will stay like that and affect the base_url in the template.

I really think nobody would've done such things, and also this old behavior was probably not even the preferred behavior among these two. It's also quite easy to look through a bunch of themes with git grep '| *url' and see that indeed there isn't anything that looks even remotely like that (actually this could even be the main way forward).

Otherwise, if we want to preserve this, in each of those cases, first wrapping those into a normpath would fix it.

I'm willing to go through the motions, whatever the choice. Just hope you're not too tired of this.

And I'll mention again that the old implementation has latent bugs, e.g. if leading slashes are involved (thankfully this very audit proves that such inputs can't happen) and if other has too many .. then real filesystem paths are revealed (also never happens currently). Maybe figuring out those edge cases is the higher mental overhead.

Full analysis (read only the parts in bold 😋)

Listing all origins of the inputs:

  1. base_url = utils.get_relative_url('.', page.url)
    • url: '.', ok
    • other: Page.url below
  2. base_url = utils.get_relative_url('.', name)
    • url: '.', ok
    • other: from _build_template(name, …)
      actually these are first guarded by
      1. try:
        template = env.get_template(template_name)
        except TemplateNotFound:
        log.warning("Template skipped: '{}' not found in theme directories.".format(template_name))
        return
        output = _build_template(template_name, template, files, config, nav)
        • first this is guarded by env.get_template so absolute or empty paths can't pass.
        • from _build_theme_template(template_name, …)
          1. for template in config['theme'].static_templates:
            _build_theme_template(template, env, files, config, nav)
            • config['theme'].static_templates below
      2. file = files.get_file_from_path(template_name)
        if file is None:
        log.warning("Template skipped: '{}' not found in docs_dir.".format(template_name))
        return
        try:
        with open(file.abs_src_path, 'r', encoding='utf-8', errors='strict') as f:
        template = jinja2.Template(f.read())
        except Exception as e:
        log.warning("Error reading template '{}': {}".format(template_name, e))
        return
        output = _build_template(template_name, template, files, config, nav)
        • first this is guarded by files.get_file_from_path so absolute or empty paths can't pass.
        • from _build_extra_template(template_name, …)
          1. for template in config['extra_templates']:
            _build_extra_template(template, files, config, nav)
            • config['extra_templates'] below
  3. return utils.get_relative_url(self.url, other.url if isinstance(other, File) else other)
  4. def normalize_url(path, page=None, base=''):
    """ Return a URL relative to the given page or using the base. """
    path, is_abs = _get_norm_url(path)
    if is_abs:
    return path
    if page is not None:
    return get_relative_url(path, page.url)
    • other: Page.url, already mentioned
    • url: from normalize_url(path, …)
      1. def normalize_url(path, page=None, base=''):
        """ Return a URL relative to the given page or using the base. """
        path = path_to_url(path or '.')
        • Note: emptiness gets replaced with '.', no other change
      2. def url_filter(context, value):
        """ A Template filter to normalize URLs. """
        return normalize_url(value, page=context['page'], base=context['base_url'])
        • Would have been problematic to pass '' as an input to |url filter, if the top of normalize_url didn't eliminate it.
        • normalize_url also has a check against absolute paths, so leading slash isn't let through.
        • if someone was explicitly using '..' to pass to |url filter, that can be a problem.
          for example 'css/../css/base.css'|url stays as is, without being normalized. And in a subdirectory it becomes '../css/../css/base.css'.
          So this is the big one, and the only one, really. If there's any crazy template explicitly adding '..' into the value passed |url filter (because where else would it come from?) then there is a regression for it.
    • url: path = _get_norm_url(path)
      1. def _get_norm_url(path):
        path = path_to_url(path or '.')
        # Allow links to be fully qualified URL's
        parsed = urlparse(path)
        if parsed.scheme or parsed.netloc or path.startswith(('/', '#')):
        return path, True
        return path, False
        • Can't introduce '..' or empty output or leading slash

Indirect origins mentioned above:

  • Page.url
    1. @property
      def url(self):
      return '' if self.file.url == '.' else self.file.url
      • can't introduce '..' or leading slash, can introduce empty output, as File.url can indeed be '.'. This is where I realize that my fuzzer should also allow '' as the 2nd (other) arg, although it's not allowed as 1st (url). So I revise it and run it a bunch more. Page.url is never used for url arg, only other, so emptiness is ok.
  • File.url
    1. self.url = self._get_url(use_directory_urls)
      • def _get_url(self, use_directory_urls):
        """ Return url based in destination path. """
        url = self.dest_path.replace(os.path.sep, '/')
        dirname, filename = os.path.split(url)
        if use_directory_urls and filename == 'index.html':
        if dirname == '':
        url = '.'
        else:
        url = dirname + '/'
        return urlquote(url)
        • replace, split, + '/', = '.', urlquote can't introduce '..' or empty output or leading slash
        • self.dest_path: below
  • File.dest_path:
    1. self.dest_path = self._get_dest_path(use_directory_urls)
      • def _get_dest_path(self, use_directory_urls):
        """ Return destination path based on source path. """
        if self.is_documentation_page():
        parent, filename = os.path.split(self.src_path)
        if not use_directory_urls or self.name == 'index':
        # index.md or README.md => index.html
        # foo.md => foo.html
        return os.path.join(parent, self.name + '.html')
        else:
        # foo.md => foo/index.html
        return os.path.join(parent, self.name, 'index.html')
        return self.src_path
        • os.path.join can't introduce '..' or empty output or leading slash
        • parent (os.path.split) can't introduce '..' or leading slash other than through self.src_path;
          emptiness irrelevant: os.path.join('', 'a') == 'a'
        • self.src_path: below
  • File.src_path:
    1. self.src_path = os.path.normpath(path)
      • explicitly normalized, so all anomalies excluded
  • config['theme'].static_templates:
    1. self.static_templates = set(os.listdir(mkdocs_templates))
      • just a directory name, no anomalies
  • config['extra_templates']:
    1. ('extra_templates', config_options.Type(list, default=[])),
      • above we already excluded leading slash or empty. And I sure hope nobody was passing paths with '..' in their config here. But yes, this would be a regression. E.g. extra_templates: ['sub/../sub/../sub/aaa.html'] works but sets the wrong base_path. But we could just explicitly normpath here and that works fine to eliminate this issue.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 29, 2020

And, of course, there's the option of just implementing normalization as well, with almost no performance penalty compared to the previous state. Pushed the commits here for posterity, though you may not like the code size.

@waylan
Copy link
Member

waylan commented Dec 29, 2020

I still want to see a comprehensive set of unittests included with this patch. And I don't care if some potential inputs will not currently be generated by MkDocs. We can't guarantee that will be the case in the future. If you want to do a custom function, then you ned to provide a complete set of unittests which demonstrate that any potential (past, present or future) input will return the correct normalized result. Until I see that, I'm not spending any more time on this.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 29, 2020

Please don't always respond with the immediate assumption that I will not do something. First was just making sure that this wouldn't be discarded.
I even said

will also add unittests

So, is the code acceptable? If so, great, I will definitely add unittests.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 29, 2020

Added unittest.

Btw, the existing tests are actually quite comprehensive. They just didn't try to find super abnormal cases, I suppose. But now that's covered.

def test_get_relative_url_use_directory_urls(self):
to_files = [
'index.md',
'foo/index.md',
'foo/bar/index.md',
'foo/bar/baz/index.md',
'foo.md',
'foo/bar.md',
'foo/bar/baz.md'
]
to_file_urls = [
'.',
'foo/',
'foo/bar/',
'foo/bar/baz/',
'foo/',
'foo/bar/',
'foo/bar/baz/'
]
from_file = File('img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=True)
expected = [
'img.jpg', # img.jpg relative to .
'../img.jpg', # img.jpg relative to foo/
'../../img.jpg', # img.jpg relative to foo/bar/
'../../../img.jpg', # img.jpg relative to foo/bar/baz/
'../img.jpg', # img.jpg relative to foo
'../../img.jpg', # img.jpg relative to foo/bar
'../../../img.jpg' # img.jpg relative to foo/bar/baz
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True)
self.assertEqual(from_file.url, 'img.jpg')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
from_file = File('foo/img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=True)
expected = [
'foo/img.jpg', # foo/img.jpg relative to .
'img.jpg', # foo/img.jpg relative to foo/
'../img.jpg', # foo/img.jpg relative to foo/bar/
'../../img.jpg', # foo/img.jpg relative to foo/bar/baz/
'img.jpg', # foo/img.jpg relative to foo
'../img.jpg', # foo/img.jpg relative to foo/bar
'../../img.jpg' # foo/img.jpg relative to foo/bar/baz
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True)
self.assertEqual(from_file.url, 'foo/img.jpg')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
from_file = File('index.html', '/path/to/docs', '/path/to/site', use_directory_urls=True)
expected = [
'.', # . relative to .
'..', # . relative to foo/
'../..', # . relative to foo/bar/
'../../..', # . relative to foo/bar/baz/
'..', # . relative to foo
'../..', # . relative to foo/bar
'../../..' # . relative to foo/bar/baz
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True)
self.assertEqual(from_file.url, '.')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
from_file = File('file.md', '/path/to/docs', '/path/to/site', use_directory_urls=True)
expected = [
'file/', # file relative to .
'../file/', # file relative to foo/
'../../file/', # file relative to foo/bar/
'../../../file/', # file relative to foo/bar/baz/
'../file/', # file relative to foo
'../../file/', # file relative to foo/bar
'../../../file/' # file relative to foo/bar/baz
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True)
self.assertEqual(from_file.url, 'file/')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
def test_get_relative_url(self):
to_files = [
'index.md',
'foo/index.md',
'foo/bar/index.md',
'foo/bar/baz/index.md',
'foo.md',
'foo/bar.md',
'foo/bar/baz.md'
]
to_file_urls = [
'index.html',
'foo/index.html',
'foo/bar/index.html',
'foo/bar/baz/index.html',
'foo.html',
'foo/bar.html',
'foo/bar/baz.html'
]
from_file = File('img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=False)
expected = [
'img.jpg', # img.jpg relative to .
'../img.jpg', # img.jpg relative to foo/
'../../img.jpg', # img.jpg relative to foo/bar/
'../../../img.jpg', # img.jpg relative to foo/bar/baz/
'img.jpg', # img.jpg relative to foo.html
'../img.jpg', # img.jpg relative to foo/bar.html
'../../img.jpg' # img.jpg relative to foo/bar/baz.html
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False)
self.assertEqual(from_file.url, 'img.jpg')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
from_file = File('foo/img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=False)
expected = [
'foo/img.jpg', # foo/img.jpg relative to .
'img.jpg', # foo/img.jpg relative to foo/
'../img.jpg', # foo/img.jpg relative to foo/bar/
'../../img.jpg', # foo/img.jpg relative to foo/bar/baz/
'foo/img.jpg', # foo/img.jpg relative to foo.html
'img.jpg', # foo/img.jpg relative to foo/bar.html
'../img.jpg' # foo/img.jpg relative to foo/bar/baz.html
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False)
self.assertEqual(from_file.url, 'foo/img.jpg')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
from_file = File('index.html', '/path/to/docs', '/path/to/site', use_directory_urls=False)
expected = [
'index.html', # index.html relative to .
'../index.html', # index.html relative to foo/
'../../index.html', # index.html relative to foo/bar/
'../../../index.html', # index.html relative to foo/bar/baz/
'index.html', # index.html relative to foo.html
'../index.html', # index.html relative to foo/bar.html
'../../index.html' # index.html relative to foo/bar/baz.html
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False)
self.assertEqual(from_file.url, 'index.html')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])
from_file = File('file.html', '/path/to/docs', '/path/to/site', use_directory_urls=False)
expected = [
'file.html', # file.html relative to .
'../file.html', # file.html relative to foo/
'../../file.html', # file.html relative to foo/bar/
'../../../file.html', # file.html relative to foo/bar/baz/
'file.html', # file.html relative to foo.html
'../file.html', # file.html relative to foo/bar.html
'../../file.html' # file.html relative to foo/bar/baz.html
]
for i, filename in enumerate(to_files):
file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False)
self.assertEqual(from_file.url, 'file.html')
self.assertEqual(file.url, to_file_urls[i])
self.assertEqual(from_file.url_relative_to(file.url), expected[i])
self.assertEqual(from_file.url_relative_to(file), expected[i])

@oprypin
Copy link
Contributor Author

oprypin commented Dec 30, 2020

And latest profile

s

@oprypin
Copy link
Contributor Author

oprypin commented Jan 12, 2021

BTW, in my view I have done what was requested so far. (so this is just a ping, I guess)

@oprypin
Copy link
Contributor Author

oprypin commented Jan 12, 2021

Please let me know your overall opinion on this direction. In case this just needs to wait before review, that's OK.

@waylan
Copy link
Member

waylan commented Jan 12, 2021

I have given it a cursory look and it seems okay in principle, but I want to be extra sure your custom replacement for the standard lib function is comely reliable and that is going to take time. Time I havn't been motivated to spend on it.

@oprypin
Copy link
Contributor Author

oprypin commented Jan 13, 2021

I have done soooo much fuzzing, and they just give 100% the same result, other than known differences that are being excluded. Just no way to be more sure.

You can run it yourself and evaluate its thoroughness. Even intentionally introduce a bug and see how quickly it finds it.
https://gist.github.com/oprypin/e968c613bb9de87b2d2fbd4409bed3ff

The differences are just the fact that the old function can spew paths that depend on the current working directory (revealing parts of it). I presume that's not a wanted behavior. In any case, I have also proven manually that these cases are never hit, but we will be safer with the new func.

@waylan
Copy link
Member

waylan commented Apr 4, 2021

I'm closing this in favor of #2296. This adds our own implementation which is more of a maintenance burden.

@waylan waylan closed this Apr 4, 2021
@oprypin
Copy link
Contributor Author

oprypin commented Apr 4, 2021

What do you mean "in favor of"...

#2296 was a stepping stone to get all the breaking changes out of the way. This pull request now 100% exactly matches the implementation at master.

This change, now compared to master, still cuts off 9% of build times for a site with 142 pages. And more if there are more.
https://github.com/crystal-lang/crystal-book/tree/1fc786788f25f95dde767aad1676eed7c322eda3/docs

$ for i in {1..3}; do mkdocs build 2>&1 | grep built; done
INFO    -  Documentation built in 3.13 seconds 
INFO    -  Documentation built in 3.09 seconds 
INFO    -  Documentation built in 3.05 seconds 
$ for i in {1..3}; do mkdocs build 2>&1 | grep built; done
INFO    -  Documentation built in 2.88 seconds 
INFO    -  Documentation built in 2.83 seconds 
INFO    -  Documentation built in 2.81 seconds 

You had mentioned before

The largest site I have is less than 20 pages

Well, there you go, the link above. That's a very real site with every page written by a human.


At the very least I would've wanted to merge master to preserve this PR in a better shape. But certainly I didn't predict this to be closed instantly.

Now the best view of this diff we have is not this pull request but rather https://github.com/oprypin/mkdocs/compare/relpath#diff


more of a maintenance burden

  1. The implementation is well-specified; if the result is exactly the same as the current one and isn't subject to change, I don't see why you'd need to work on it more. That is, the amount of code doesn't matter that much. At the same time, it's not even a lot of added code.

  2. I would be very happy to step in for maintenance if you can't.

@waylan
Copy link
Member

waylan commented Apr 4, 2021

This pull request now 100% exactly matches the implementation at master.

Which means it adds more maintenance burden for no gain.

This change, now compared to master, still cuts off 9% of build times for a site with 142 pages.

Meh. Not a priority. The amount of gain is not worth the extra burden.

@oprypin
Copy link
Contributor Author

oprypin commented Apr 4, 2021

Can you hand off the burden to me?

How do you calculate whether it's worth saving thousands of people a second at a time?

@oprypin
Copy link
Contributor Author

oprypin commented Apr 24, 2021

This issue really bothers me.

I am developing a plugin that produces a lot of documentation pages based on source code. My users immediately see that the build is slow. I know exactly how to improve it and have done all the work (even way more work than should be necessary) but I am still powerless to fix it.

I can tell the users "well, you can use this fork of MkDocs to speed up the build by 10%, because the maintainer of MkDocs doesn't accept my improvement". Um, well it is actually what I say, but they don't take me up on that.

But hey, maybe you don't care about that usecase -- maybe I'm holding it wrong. But I have also shown you a very real site that just has 142 pages, each of them hand-written, regardless of plugins.
https://github.com/crystal-lang/crystal-book/tree/1fc786788f25f95dde767aad1676eed7c322eda3/docs

I don't know where your impression of what performance gains are "worth it" is coming from, but I think a guaranteed 9% improvement would generally be considered really good, especially from a code delta of merely 30 lines added, 15 deleted.
Again, considering just how many people use this project, and that they wait for the reloads "in real time", I don't know how you so easily weigh a small convenience in your favor over all of their experience.

I have also provided all kinds of other arguments. E.g. that the code has no reason to change so it's not exactly a maintenance burden. Also that I would gladly respond to any issues with this code myself. Also you can easily revert it any time.

I don't know how I deserve this -- you really just shrug away everything I say. My words, even perfectly backed up by facts, are instantly overridden just by your opinion, feeling, even.

@oprypin
Copy link
Contributor Author

oprypin commented May 13, 2021

@waylan
Could you stop ignoring me whenever that is convenient?

This is still a very important improvement. I'm not going to just forget this. It comes up in my mind quite often actually.

You still haven't provided any solid arguments against this, and I'm appalled that you think it's OK to let your mere opinion stand above arguments.

And to reiterate on the claim that you think you have: any "maintenance burden" from this is instantly fixable by reverting this.

@mkdocs mkdocs locked as too heated and limited conversation to collaborators May 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants