Optimize getting relative page URLs #2272

oprypin · 2020-12-27T11:10:29Z

This is a custom implementation that's ~5 times faster. That's important because calls to normalize_url on a site with ~300 pages currently take up ~20% of the total run time due to the sheer number of them. The number of calls is at least the number of pages squared.

The old implementation relied on posixpath.relpath, which (among other unnecessary parts) first expands every path with abspath for whatever reason, so actually it becomes even slower if you happen to be building under a deeply nested path.

Profile before/after

In context

This is a custom implementation that's ~5 times faster. That's important because calls to `normalize_url` on a site with ~300 pages currently take up ~20% of the total run time due to the sheer number of them. The number of calls is at least the number of pages squared. The old implementation relied on `posixpath.relpath`, which (among other unnecessary parts) first expands every path with `abspath` for whatever reason, so actually it becomes even slower if you happen to be building under a deeply nested path.

waylan · 2020-12-27T20:38:44Z

So this change removes calls to the standard lib and replaces them with your own custom implementation. I realize that there may be some standard lib code that we don't need, but the assumption is that the standard lib code is well tested and accurate. Your custom code is an unknown. I suspect the existing tests don't cover nearly enough to be sure this doesn't break various edge cases (admittedly I didn't check). We will always go with accuracy over performance. If you want this accepted, then you will need to demonstrate that this is very well tested.

By the way, you have been providing a lot of performance patches lately. I don't object to that. In fact, I appreciate that someone is looking at the issue, as I don't have any time to dedicate to the issue. In fact, reviewing all these patches which don't improve the project in any perceivable way (from my perspective) just adds more burden on me. Therefore, from my perspective you need to demonstrate that you aren't breaking anything. Your comments always mention how much of a performance improvement this makes. But I don't care much about that. The largest site I have is less than 20 pages. I have the perception that my sites build instantly on my very old machine. Personally I see no need for performance improvements. What I care about is ease of maintenance and accuracy. If you would like me to continue to spend time reviewing your performance improvements, please focus on those two factors first. Any performance improvements will always be second to those in my reviews.

As an example, if the proposed change in this PR were to result in a future bug report, my "fix" would simply be to revert to the previous implementation. I'm not interesting in spending the time to work through the custom code to fix various edge cases. That is more of a maintenance burden than I want to take on. Therefore, before I am willing to accept it, I need to be sure that those bug reports aren't likely to happen in the future.

oprypin · 2020-12-27T21:09:13Z

the assumption is that the standard lib code is well tested and accurate

Well, generally, it's tested for file system paths relative to the current directory, not something virtual...
I think this argument should not be used like that; stdlib or not makes no difference. But sure, you can say this particular usage of it is tested by time.
Anyway, that's just a pointless comment by me.

you will need to demonstrate that this is very well tested

So this function certainly supports less functionality than the old one, for example collapsing foo/.. into nothing (and, well, the quite funny functionality of resolving paths relative to the current directory...). But it's easy to see that that functionality is not needed.

It's slightly unclear how to declare exhaustiveness, but the main idea is proving that inputs to it can only be some particular subset of strings, and then perhaps fuzz it within those constraints.

If you would like me to continue to spend time reviewing your performance improvements

Well, don't worry much about that. I intended to have stopped already but this one is just such a big realization (also realization that stdlib usage isn't exempt from scrutiny).

I'm not interesting in spending the time to work through the custom code to fix various edge cases

I can promise a 24-hour response time with a fix :D

waylan · 2020-12-28T15:15:13Z

If you would like this merged, then you will need to either point to an existing set of tests (I haven't taken the time to check), or. if they don't exist, provide a comprehensive set of tests. My guess is that you will only find a limited set of tests and need to provide more so that we have a comprehensive set of tests.

oprypin · 2020-12-28T23:28:02Z

I have done fuzzing as mentioned. It asserts that the output of the old function is the same as of the new function.

It found mostly differences that I already knew about (and I did ensure that it found them for real), so I explicitly ruled out those inputs (see assume below).

The fuzzer is told to generate a concatenation of any combination of any number of "any unicode string", "slash", "dot"
(see one_of -> lists -> join below).
I added slash and dot explicitly because without them the fuzzer is totally lost in the search space, although with infinite time it'd make no difference.

# Requires: `pip install pytest hypothesis atheris`

import posixpath

def get_relative_url_1(url, other):
    if other != '.':
        # Remove filename from other url if it has one.
        parts = posixpath.split(other)
        other = parts[0] if '.' in parts[1] else other
    relurl = posixpath.relpath(url, other)
    return relurl + '/' if url.endswith('/') else relurl

def get_relative_url_2(url, other):
    other_parts = other.split('/')
    # Remove filename from other url if it has one.
    if other_parts and '.' in other_parts[-1]:
        other_parts.pop()

    other_parts = [p for p in other_parts if p not in ('.', '')]
    dest_parts = [p for p in url.split('/') if p not in ('.', '')]
    common = 0
    for a, b in zip(other_parts, dest_parts):
        if a != b:
            break
        common += 1

    rel_parts = ['..'] * (len(other_parts) - common) + dest_parts[common:]
    relurl = '/'.join(rel_parts) or '.'
    return relurl + '/' if url.endswith('/') else relurl


from hypothesis import strategies as st, given, assume, settings, example

@st.composite
def path(draw):
    elements = st.one_of(st.text(), st.just('/'), st.just('.'))
    path = ''.join(draw(st.lists(elements)))

    assume(path != '')  # The original function has `raise ValueError("no path specified")`
    assume('/../' not in f'/{path}/')  # The new function doesn't support normalizing paths

    # print(repr(path))  # Uncomment to see produced examples.
    return path

@settings(max_examples=100000)
@given(path(), path())
def test_equal(url, other):
    assume(url.startswith('/') == other.startswith('/'))  # The original function sees one path as relative to the current directory!

    assert get_relative_url_1(url, other) == get_relative_url_2(url, other)

# Run with 'hypotesis' fuzzer: `pytest fuzz_get_relative_url.py -s`

# Run with 'atheris' fuzzer: `python fuzz_get_relative_url.py`
if __name__ == "__main__":
    import sys, atheris
    atheris.Setup(sys.argv, test_equal.hypothesis.fuzz_one_input)
    atheris.Fuzz()

I found that the function produced differences only for these classes of inputs that I didn't already know about:

@example('foo', 'bar./')  # The trailing slash should disqualify it from being a file.
@example('foo', 'foo/bar./.')  # The trailing dot should disqualify it from being a file.

I fixed this and will also add unittests.

After that I continued to run on each of those 2 fuzzers for well over an hour and for well over 100000 examples, and it didn't find anything more.

Now what remains is to prove that resolving .. is indeed not relied on anywhere, and we might be good to go.

oprypin · 2020-12-29T02:33:16Z

And my exhaustive analysis of usages. Sadly the results aren't so clear cut.

Summary - the only things that could've regressed:

If someone passed a path like aaa/../aaa/x.html to the |url filter, previously it'd become aaa/x.html, now it will stay like that. And in a subdirectory, of course, it will become '../aaa/../aaa/x.html'. Which would actually probably still work but yeah...
If someone passed a path like aaa/../aaa/x.html to config['extra_templates], previously it'd become aaa/x.html, now it will stay like that and affect the base_url in the template.

I really think nobody would've done such things, and also this old behavior was probably not even the preferred behavior among these two. It's also quite easy to look through a bunch of themes with git grep '| *url' and see that indeed there isn't anything that looks even remotely like that (actually this could even be the main way forward).

Otherwise, if we want to preserve this, in each of those cases, first wrapping those into a normpath would fix it.

I'm willing to go through the motions, whatever the choice. Just hope you're not too tired of this.

And I'll mention again that the old implementation has latent bugs, e.g. if leading slashes are involved (thankfully this very audit proves that such inputs can't happen) and if other has too many .. then real filesystem paths are revealed (also never happens currently). Maybe figuring out those edge cases is the higher mental overhead.

Full analysis (read only the parts in bold 😋)

Listing all origins of the inputs:

mkdocs/mkdocs/commands/build.py

Line 38 in 3c9358b

base_url = utils.get_relative_url('.', page.url)
- url: '.', ok
- other: Page.url below

mkdocs/mkdocs/commands/build.py

Line 82 in 3c9358b

base_url = utils.get_relative_url('.', name)

url: '.', ok

other: from _build_template(name, …)
actually these are first guarded by

mkdocs/mkdocs/commands/build.py

Lines 106 to 112 in 3c9358b

    
           try: 
        
               template = env.get_template(template_name) 
        
           except TemplateNotFound: 
        
               log.warning("Template skipped: '{}' not found in theme directories.".format(template_name)) 
        
               return 
        
           output = _build_template(template_name, template, files, config, nav)

first this is guarded by env.get_template so absolute or empty paths can't pass.
from _build_theme_template(template_name, …)
1. mkdocs/mkdocs/commands/build.py
  
  Lines 290 to 291 in 3c9358b
  
  for template in config['theme'].static_templates:
  
  _build_theme_template(template, env, files, config, nav)
  - config['theme'].static_templates below

mkdocs/mkdocs/commands/build.py

Lines 134 to 146 in 3c9358b

    
           file = files.get_file_from_path(template_name) 
        
           if file is None: 
        
               log.warning("Template skipped: '{}' not found in docs_dir.".format(template_name)) 
        
               return 
        
           try: 
        
               with open(file.abs_src_path, 'r', encoding='utf-8', errors='strict') as f: 
        
                   template = jinja2.Template(f.read()) 
        
           except Exception as e: 
        
               log.warning("Error reading template '{}': {}".format(template_name, e)) 
        
               return 
        
           output = _build_template(template_name, template, files, config, nav)

first this is guarded by files.get_file_from_path so absolute or empty paths can't pass.
from _build_extra_template(template_name, …)
1. mkdocs/mkdocs/commands/build.py
  
  Lines 293 to 294 in 3c9358b
  
  for template in config['extra_templates']:
  
  _build_extra_template(template, files, config, nav)
  - config['extra_templates'] below

mkdocs/mkdocs/structure/files.py

Line 170 in 56f98cb

return utils.get_relative_url(self.url, other.url if isinstance(other, File) else other)
- url: File.url below
- other: File.url below
- other: from url_relative_to(other)
  1. mkdocs/mkdocs/structure/pages.py
    
    Line 227 in 56f98cb
    
    path = target_file.url_relative_to(self.file)
    - self.file → only used as File.url, already mentioned

mkdocs/mkdocs/utils/__init__.py

Lines 276 to 282 in 8fa2a1a

    
           def normalize_url(path, page=None, base=''): 
        
               """ Return a URL relative to the given page or using the base. """ 
        
               path, is_abs = _get_norm_url(path) 
        
               if is_abs: 
        
                   return path 
        
               if page is not None: 
        
                   return get_relative_url(path, page.url)

other: Page.url, already mentioned

url: from normalize_url(path, …)

mkdocs/mkdocs/utils/__init__.py

Lines 260 to 262 in ff0b726

    
           def normalize_url(path, page=None, base=''): 
        
               """ Return a URL relative to the given page or using the base. """ 
        
               path = path_to_url(path or '.')

Note: emptiness gets replaced with '.', no other change

mkdocs/mkdocs/utils/filters.py

Lines 12 to 14 in ff0b726

    
           def url_filter(context, value): 
        
               """ A Template filter to normalize URLs. """ 
        
               return normalize_url(value, page=context['page'], base=context['base_url'])

Would have been problematic to pass '' as an input to |url filter, if the top of normalize_url didn't eliminate it.
normalize_url also has a check against absolute paths, so leading slash isn't let through.
if someone was explicitly using '..' to pass to |url filter, that can be a problem.
for example 'css/../css/base.css'|url stays as is, without being normalized. And in a subdirectory it becomes '../css/../css/base.css'.
So this is the big one, and the only one, really. If there's any crazy template explicitly adding '..' into the value passed |url filter (because where else would it come from?) then there is a regression for it.

url: path = _get_norm_url(path)

mkdocs/mkdocs/utils/__init__.py

Lines 287 to 293 in 8fa2a1a

    
           def _get_norm_url(path): 
        
               path = path_to_url(path or '.') 
        
               # Allow links to be fully qualified URL's 
        
               parsed = urlparse(path) 
        
               if parsed.scheme or parsed.netloc or path.startswith(('/', '#')): 
        
                   return path, True 
        
               return path, False

Can't introduce '..' or empty output or leading slash

Indirect origins mentioned above:

Page.url

mkdocs/mkdocs/structure/pages.py

Lines 87 to 89 in 520314f

    
           @property 
        
           def url(self): 
        
               return '' if self.file.url == '.' else self.file.url

can't introduce '..' or leading slash, can introduce empty output, as File.url can indeed be '.'. This is where I realize that my fuzzer should also allow '' as the 2nd (other) arg, although it's not allowed as 1st (url). So I revise it and run it a bunch more. Page.url is never used for url arg, only other, so emptiness is ok.

File.url

mkdocs/mkdocs/structure/files.py

Line 125 in 520314f

self.url = self._get_url(use_directory_urls)

mkdocs/mkdocs/structure/files.py

Lines 157 to 166 in 520314f

    
           def _get_url(self, use_directory_urls): 
        
               """ Return url based in destination path. """ 
        
               url = self.dest_path.replace(os.path.sep, '/') 
        
               dirname, filename = os.path.split(url) 
        
               if use_directory_urls and filename == 'index.html': 
        
                   if dirname == '': 
        
                       url = '.' 
        
                   else: 
        
                       url = dirname + '/' 
        
               return urlquote(url)

replace, split, + '/', = '.', urlquote can't introduce '..' or empty output or leading slash
self.dest_path: below

File.dest_path:

mkdocs/mkdocs/structure/files.py

Line 123 in 520314f

self.dest_path = self._get_dest_path(use_directory_urls)

mkdocs/mkdocs/structure/files.py

Lines 144 to 155 in 520314f

    
           def _get_dest_path(self, use_directory_urls): 
        
               """ Return destination path based on source path. """ 
        
               if self.is_documentation_page(): 
        
                   parent, filename = os.path.split(self.src_path) 
        
                   if not use_directory_urls or self.name == 'index': 
        
                       # index.md or README.md => index.html 
        
                       # foo.md => foo.html 
        
                       return os.path.join(parent, self.name + '.html') 
        
                   else: 
        
                       # foo.md => foo/index.html 
        
                       return os.path.join(parent, self.name, 'index.html') 
        
               return self.src_path

os.path.join can't introduce '..' or empty output or leading slash
parent (os.path.split) can't introduce '..' or leading slash other than through self.src_path;
emptiness irrelevant: os.path.join('', 'a') == 'a'
self.src_path: below

File.src_path:
1. mkdocs/mkdocs/structure/files.py
  
  Line 120 in 520314f
  
  self.src_path = os.path.normpath(path)
  - explicitly normalized, so all anomalies excluded
config['theme'].static_templates:
1. mkdocs/mkdocs/theme.py
  
  Line 36 in c5e4018
  
  self.static_templates = set(os.listdir(mkdocs_templates))
  - just a directory name, no anomalies
config['extra_templates']:
1. mkdocs/mkdocs/config/defaults.py
  
  Line 82 in ff0b726
  
  ('extra_templates', config_options.Type(list, default=[])),
  - above we already excluded leading slash or empty. And I sure hope nobody was passing paths with '..' in their config here. But yes, this would be a regression. E.g. extra_templates: ['sub/../sub/../sub/aaa.html'] works but sets the wrong base_path. But we could just explicitly normpath here and that works fine to eliminate this issue.

oprypin · 2020-12-29T03:40:25Z

And, of course, there's the option of just implementing normalization as well, with almost no performance penalty compared to the previous state. Pushed the commits here for posterity, though you may not like the code size.

waylan · 2020-12-29T20:53:07Z

I still want to see a comprehensive set of unittests included with this patch. And I don't care if some potential inputs will not currently be generated by MkDocs. We can't guarantee that will be the case in the future. If you want to do a custom function, then you ned to provide a complete set of unittests which demonstrate that any potential (past, present or future) input will return the correct normalized result. Until I see that, I'm not spending any more time on this.

oprypin · 2020-12-29T20:59:29Z

Please don't always respond with the immediate assumption that I will not do something. First was just making sure that this wouldn't be discarded.
I even said

will also add unittests

So, is the code acceptable? If so, great, I will definitely add unittests.

oprypin · 2020-12-29T22:45:52Z

Added unittest.

Btw, the existing tests are actually quite comprehensive. They just didn't try to find super abnormal cases, I suppose. But now that's covered.

mkdocs/mkdocs/tests/structure/file_tests.py

Lines 407 to 591 in ff0b726

    
           def test_get_relative_url_use_directory_urls(self): 
        
               to_files = [ 
        
                   'index.md', 
        
                   'foo/index.md', 
        
                   'foo/bar/index.md', 
        
                   'foo/bar/baz/index.md', 
        
                   'foo.md', 
        
                   'foo/bar.md', 
        
                   'foo/bar/baz.md' 
        
               ] 
        
               to_file_urls = [ 
        
                   '.', 
        
                   'foo/', 
        
                   'foo/bar/', 
        
                   'foo/bar/baz/', 
        
                   'foo/', 
        
                   'foo/bar/', 
        
                   'foo/bar/baz/' 
        
               ] 
        
               from_file = File('img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
               expected = [ 
        
                   'img.jpg',           # img.jpg relative to . 
        
                   '../img.jpg',        # img.jpg relative to foo/ 
        
                   '../../img.jpg',     # img.jpg relative to foo/bar/ 
        
                   '../../../img.jpg',  # img.jpg relative to foo/bar/baz/ 
        
                   '../img.jpg',        # img.jpg relative to foo 
        
                   '../../img.jpg',     # img.jpg relative to foo/bar 
        
                   '../../../img.jpg'   # img.jpg relative to foo/bar/baz 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
                   self.assertEqual(from_file.url, 'img.jpg') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
               from_file = File('foo/img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
               expected = [ 
        
                   'foo/img.jpg',    # foo/img.jpg relative to . 
        
                   'img.jpg',        # foo/img.jpg relative to foo/ 
        
                   '../img.jpg',     # foo/img.jpg relative to foo/bar/ 
        
                   '../../img.jpg',  # foo/img.jpg relative to foo/bar/baz/ 
        
                   'img.jpg',        # foo/img.jpg relative to foo 
        
                   '../img.jpg',     # foo/img.jpg relative to foo/bar 
        
                   '../../img.jpg'   # foo/img.jpg relative to foo/bar/baz 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
                   self.assertEqual(from_file.url, 'foo/img.jpg') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
               from_file = File('index.html', '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
               expected = [ 
        
                   '.',         # . relative to . 
        
                   '..',        # . relative to foo/ 
        
                   '../..',     # . relative to foo/bar/ 
        
                   '../../..',  # . relative to foo/bar/baz/ 
        
                   '..',        # . relative to foo 
        
                   '../..',     # . relative to foo/bar 
        
                   '../../..'   # . relative to foo/bar/baz 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
                   self.assertEqual(from_file.url, '.') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
               from_file = File('file.md', '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
               expected = [ 
        
                   'file/',           # file relative to . 
        
                   '../file/',        # file relative to foo/ 
        
                   '../../file/',     # file relative to foo/bar/ 
        
                   '../../../file/',  # file relative to foo/bar/baz/ 
        
                   '../file/',        # file relative to foo 
        
                   '../../file/',     # file relative to foo/bar 
        
                   '../../../file/'   # file relative to foo/bar/baz 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=True) 
        
                   self.assertEqual(from_file.url, 'file/') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
           def test_get_relative_url(self): 
        
               to_files = [ 
        
                   'index.md', 
        
                   'foo/index.md', 
        
                   'foo/bar/index.md', 
        
                   'foo/bar/baz/index.md', 
        
                   'foo.md', 
        
                   'foo/bar.md', 
        
                   'foo/bar/baz.md' 
        
               ] 
        
               to_file_urls = [ 
        
                   'index.html', 
        
                   'foo/index.html', 
        
                   'foo/bar/index.html', 
        
                   'foo/bar/baz/index.html', 
        
                   'foo.html', 
        
                   'foo/bar.html', 
        
                   'foo/bar/baz.html' 
        
               ] 
        
               from_file = File('img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
               expected = [ 
        
                   'img.jpg',           # img.jpg relative to . 
        
                   '../img.jpg',        # img.jpg relative to foo/ 
        
                   '../../img.jpg',     # img.jpg relative to foo/bar/ 
        
                   '../../../img.jpg',  # img.jpg relative to foo/bar/baz/ 
        
                   'img.jpg',           # img.jpg relative to foo.html 
        
                   '../img.jpg',        # img.jpg relative to foo/bar.html 
        
                   '../../img.jpg'      # img.jpg relative to foo/bar/baz.html 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
                   self.assertEqual(from_file.url, 'img.jpg') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
               from_file = File('foo/img.jpg', '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
               expected = [ 
        
                   'foo/img.jpg',    # foo/img.jpg relative to . 
        
                   'img.jpg',        # foo/img.jpg relative to foo/ 
        
                   '../img.jpg',     # foo/img.jpg relative to foo/bar/ 
        
                   '../../img.jpg',  # foo/img.jpg relative to foo/bar/baz/ 
        
                   'foo/img.jpg',    # foo/img.jpg relative to foo.html 
        
                   'img.jpg',        # foo/img.jpg relative to foo/bar.html 
        
                   '../img.jpg'      # foo/img.jpg relative to foo/bar/baz.html 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
                   self.assertEqual(from_file.url, 'foo/img.jpg') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
               from_file = File('index.html', '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
               expected = [ 
        
                   'index.html',           # index.html relative to . 
        
                   '../index.html',        # index.html relative to foo/ 
        
                   '../../index.html',     # index.html relative to foo/bar/ 
        
                   '../../../index.html',  # index.html relative to foo/bar/baz/ 
        
                   'index.html',           # index.html relative to foo.html 
        
                   '../index.html',        # index.html relative to foo/bar.html 
        
                   '../../index.html'      # index.html relative to foo/bar/baz.html 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
                   self.assertEqual(from_file.url, 'index.html') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i]) 
        
               from_file = File('file.html', '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
               expected = [ 
        
                   'file.html',           # file.html relative to . 
        
                   '../file.html',        # file.html relative to foo/ 
        
                   '../../file.html',     # file.html relative to foo/bar/ 
        
                   '../../../file.html',  # file.html relative to foo/bar/baz/ 
        
                   'file.html',           # file.html relative to foo.html 
        
                   '../file.html',        # file.html relative to foo/bar.html 
        
                   '../../file.html'      # file.html relative to foo/bar/baz.html 
        
               ] 
        
               for i, filename in enumerate(to_files): 
        
                   file = File(filename, '/path/to/docs', '/path/to/site', use_directory_urls=False) 
        
                   self.assertEqual(from_file.url, 'file.html') 
        
                   self.assertEqual(file.url, to_file_urls[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file.url), expected[i]) 
        
                   self.assertEqual(from_file.url_relative_to(file), expected[i])

oprypin · 2020-12-30T00:23:41Z

And latest profile

oprypin · 2021-01-12T21:19:16Z

BTW, in my view I have done what was requested so far. (so this is just a ping, I guess)

oprypin · 2021-01-12T21:23:14Z

Please let me know your overall opinion on this direction. In case this just needs to wait before review, that's OK.

waylan · 2021-01-12T21:31:18Z

I have given it a cursory look and it seems okay in principle, but I want to be extra sure your custom replacement for the standard lib function is comely reliable and that is going to take time. Time I havn't been motivated to spend on it.

oprypin · 2021-01-13T01:42:25Z

I have done soooo much fuzzing, and they just give 100% the same result, other than known differences that are being excluded. Just no way to be more sure.

You can run it yourself and evaluate its thoroughness. Even intentionally introduce a bug and see how quickly it finds it.
https://gist.github.com/oprypin/e968c613bb9de87b2d2fbd4409bed3ff

The differences are just the fact that the old function can spew paths that depend on the current working directory (revealing parts of it). I presume that's not a wanted behavior. In any case, I have also proven manually that these cases are never hit, but we will be safer with the new func.

waylan · 2021-04-04T19:22:54Z

I'm closing this in favor of #2296. This adds our own implementation which is more of a maintenance burden.

oprypin · 2021-04-04T20:06:02Z

What do you mean "in favor of"...

#2296 was a stepping stone to get all the breaking changes out of the way. This pull request now 100% exactly matches the implementation at master.

This change, now compared to master, still cuts off 9% of build times for a site with 142 pages. And more if there are more.
https://github.com/crystal-lang/crystal-book/tree/1fc786788f25f95dde767aad1676eed7c322eda3/docs

$ for i in {1..3}; do mkdocs build 2>&1 | grep built; done
INFO    -  Documentation built in 3.13 seconds 
INFO    -  Documentation built in 3.09 seconds 
INFO    -  Documentation built in 3.05 seconds

$ for i in {1..3}; do mkdocs build 2>&1 | grep built; done
INFO    -  Documentation built in 2.88 seconds 
INFO    -  Documentation built in 2.83 seconds 
INFO    -  Documentation built in 2.81 seconds

You had mentioned before

The largest site I have is less than 20 pages

Well, there you go, the link above. That's a very real site with every page written by a human.

At the very least I would've wanted to merge master to preserve this PR in a better shape. But certainly I didn't predict this to be closed instantly.

Now the best view of this diff we have is not this pull request but rather https://github.com/oprypin/mkdocs/compare/relpath#diff

more of a maintenance burden

The implementation is well-specified; if the result is exactly the same as the current one and isn't subject to change, I don't see why you'd need to work on it more. That is, the amount of code doesn't matter that much. At the same time, it's not even a lot of added code.
I would be very happy to step in for maintenance if you can't.

waylan · 2021-04-04T20:17:44Z

This pull request now 100% exactly matches the implementation at master.

Which means it adds more maintenance burden for no gain.

This change, now compared to master, still cuts off 9% of build times for a site with 142 pages.

Meh. Not a priority. The amount of gain is not worth the extra burden.

oprypin · 2021-04-04T20:19:43Z

Can you hand off the burden to me?

How do you calculate whether it's worth saving thousands of people a second at a time?

oprypin · 2021-04-24T17:55:50Z

This issue really bothers me.

I am developing a plugin that produces a lot of documentation pages based on source code. My users immediately see that the build is slow. I know exactly how to improve it and have done all the work (even way more work than should be necessary) but I am still powerless to fix it.

I can tell the users "well, you can use this fork of MkDocs to speed up the build by 10%, because the maintainer of MkDocs doesn't accept my improvement". Um, well it is actually what I say, but they don't take me up on that.

But hey, maybe you don't care about that usecase -- maybe I'm holding it wrong. But I have also shown you a very real site that just has 142 pages, each of them hand-written, regardless of plugins.
https://github.com/crystal-lang/crystal-book/tree/1fc786788f25f95dde767aad1676eed7c322eda3/docs

I don't know where your impression of what performance gains are "worth it" is coming from, but I think a guaranteed 9% improvement would generally be considered really good, especially from a code delta of merely 30 lines added, 15 deleted.
Again, considering just how many people use this project, and that they wait for the reloads "in real time", I don't know how you so easily weigh a small convenience in your favor over all of their experience.

I have also provided all kinds of other arguments. E.g. that the code has no reason to change so it's not exactly a maintenance burden. Also that I would gladly respond to any issues with this code myself. Also you can easily revert it any time.

I don't know how I deserve this -- you really just shrug away everything I say. My words, even perfectly backed up by facts, are instantly overridden just by your opinion, feeling, even.

oprypin · 2021-05-13T15:46:14Z

@waylan
Could you stop ignoring me whenever that is convenient?

This is still a very important improvement. I'm not going to just forget this. It comes up in my mind quite often actually.

You still haven't provided any solid arguments against this, and I'm appalled that you think it's OK to let your mere opinion stand above arguments.

And to reiterate on the claim that you think you have: any "maintenance burden" from this is instantly fixable by reverting this.

waylan added the Needs further review label Dec 27, 2020

Bugfix for trailing slash/dot in 'other'

8fa2a1a

oprypin added 2 commits December 29, 2020 04:04

Also implement path normalization

86ceadf

Further optimize

958d190

Add unittest and doc

2e53a89

oprypin mentioned this pull request Feb 6, 2021

Exclude edge cases where get_relative_url depends on CWD #2296

Merged

waylan closed this Apr 4, 2021

oprypin mentioned this pull request Apr 28, 2021

Improve serve performance with on demand page rendering #2384

Closed

mkdocs locked as too heated and limited conversation to collaborators May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize getting relative page URLs #2272

Optimize getting relative page URLs #2272

oprypin commented Dec 27, 2020

waylan commented Dec 27, 2020

oprypin commented Dec 27, 2020

waylan commented Dec 28, 2020 •

edited

oprypin commented Dec 28, 2020

oprypin commented Dec 29, 2020

oprypin commented Dec 29, 2020

waylan commented Dec 29, 2020

oprypin commented Dec 29, 2020

oprypin commented Dec 29, 2020

oprypin commented Dec 30, 2020

oprypin commented Jan 12, 2021

oprypin commented Jan 12, 2021

waylan commented Jan 12, 2021

oprypin commented Jan 13, 2021

waylan commented Apr 4, 2021

oprypin commented Apr 4, 2021

waylan commented Apr 4, 2021

oprypin commented Apr 4, 2021

oprypin commented Apr 24, 2021

oprypin commented May 13, 2021

Optimize getting relative page URLs #2272

Optimize getting relative page URLs #2272

Conversation

oprypin commented Dec 27, 2020

Profile before/after

In context

waylan commented Dec 27, 2020

oprypin commented Dec 27, 2020

waylan commented Dec 28, 2020 • edited

oprypin commented Dec 28, 2020

oprypin commented Dec 29, 2020

Full analysis (read only the parts in bold 😋)

oprypin commented Dec 29, 2020

waylan commented Dec 29, 2020

oprypin commented Dec 29, 2020

oprypin commented Dec 29, 2020

oprypin commented Dec 30, 2020

oprypin commented Jan 12, 2021

oprypin commented Jan 12, 2021

waylan commented Jan 12, 2021

oprypin commented Jan 13, 2021

waylan commented Apr 4, 2021

oprypin commented Apr 4, 2021

waylan commented Apr 4, 2021

oprypin commented Apr 4, 2021

oprypin commented Apr 24, 2021

oprypin commented May 13, 2021

waylan commented Dec 28, 2020 •

edited