Skip to content

Refactor tags #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 29, 2017
Merged

Refactor tags #269

merged 12 commits into from
Apr 29, 2017

Conversation

dometto
Copy link
Member

@dometto dometto commented Apr 23, 2017

This refactor shaves about 5 seconds of load time for a page with a ridiculous amount (1000) of tags (~25 seconds vs. ~30), and (hopefully) improves readability and maintainability. To further improve performance I would like to eventually collapse the search for files and pages so we don't traverse the repository twice, and of course implement caching of the repo tree map. But that requires refactoring the Page and File classes, and implementing the idea for a Gollum::IO class...

Some breaking changes:

  • syntax for link attributes changed from image.jpg|frame|alt=Bla to image.jpg|frame, alt=Bla
  • image links and file links and are now also reversed if @markup.reverse_links? (previously only for page links and external links). This is because reversal is now handled at the start of the link processing rather than in each method separately.
  • a page link to waa.md would first be interpreted as a relative link, and if that didn't find a page, it would be interpreted as an absolute link (to /waa.md). So in cases of broken links the repository was traversed twice. I've disabled this, so an absolute link now requires an explicit /. However, this change doesn't break any tests.

@dometto
Copy link
Member Author

dometto commented Apr 23, 2017

Todo: update inline docs for tag filter.

@@ -4,7 +4,8 @@
class Gollum::Filter::Tags < Gollum::Filter
# Extract all tags into the tagmap and replace with placeholders.
def extract(data)
return data if @markup.format == :txt || @markup.format == :asciidoc
return data if @markup.skip_tags?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expresses the intent much clearer, nice. 👍

end

def process_link_tag(link_part, extra)
process_file_link_tag(link_part, extra) || process_page_link_tag(link_part, extra)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that all tags are processed as file links initially. Is there any way to avoid this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what bothered me originally. I would like to just walk through the repo searching for the requested paths, returning Page objects if the found object has a valid page extensions and File objects otherwise. Then process_link_tag should choose what to with each link based on what kind of object it gets back (if any).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think that may require a new approach to Page and File objects altogether. Perhaps have BlobEntry superclass of which Page and File are both subclasses, with a common find-and-construct-from-path method. Or just have the BlobEntry class with a #file? and #page? method?

@@ -98,10 +109,9 @@ def process_tag(tag)
# if it is not.
#
def process_include_tag(tag)
return unless /^include:/.match(tag)
return html_error('Cannot process include directive: no page name given') if tag.length <= 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this magic number 8?

Copy link
Member Author

@dometto dometto Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Point taken 👍

page_name = tag[8..-1]
resolved_page_name = ::File.expand_path(page_name, "/"+@markup.dir)

resolved_page_name = ::File.expand_path(page_name, "#{::File::SEPARATOR}#{@markup.dir}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice (helpful going forward on Windows). I guess it doesn't make sense to work with Pathname objects here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I tried putting a ::File.join here but that broke some tests. The point here is to prevent people from including /etc/password. File.expand_path here never returns a path lower in the directory tree than @markup.dir. Does Pathname have a similar method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but we could just build a Pathname object from the string that expand_path comes up with. The bigger question is whether we need something more than a string here. I guess time will tell. ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. If we want to rebuild the API for searching/retrieving objects from the repo to use Pathname, we should also do it here.

path = name
elsif name =~ /.+(jpg|png|gif|svg|bmp)$/i
elsif (file = @markup.find_file(name))
path = ::File.join @markup.wiki.base_path, file.path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically I'd like to see the join call have parentheses.


name = parts[0].strip
path = parts[1] && parts[1].strip
def process_file_link_tag(name, path)
if path && file = @markup.find_file(path)
path = ::File.join @markup.wiki.base_path, file.path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This methods returns HTML. Is there any way to refactor that so that gollum (and not gollum-lib) wraps path and name in the appropriate html tags?

@dometto
Copy link
Member Author

dometto commented Apr 27, 2017

@bartkamphorst: incorporated your stylistic suggestions, @bartkamphorst, and refactored further.

I don't see how we can easily get rid of returning HTML inside the tag filter, since the purpose of that filter (and other filters) is basically to generate HTML on top of what the markup parser generates. However, I've tried to at least cleanly separate the tag-parsing logic from the html-generation logic. Hardcoded HTML is now only present in #generate_link and #generate_image and one or two helper methods they call.

One way to get the hardcoded HTML out of gollum-lib would perhaps be to move #generate_link and #generate_image to gollum, and somehow pass them on to the tag filter as a Proc from there on? But that would mean that the filter by itself would not be fully functional 'out of the box'.

@dometto
Copy link
Member Author

dometto commented Apr 27, 2017

And as you suggested, process_file_link_tag and process_page_link_tag should be collapsed into a single method. But I would propose to merge this now, and to take that further step once we have refactored the Page and File classes.

@dometto
Copy link
Member Author

dometto commented Apr 27, 2017

Another change in behavior affected by this PR is that File links without a description now work, whereas in master they seem to result in a broken img for some reason. The latter clearly seems like a bug, so I'll add a regression test.

@dometto
Copy link
Member Author

dometto commented Apr 29, 2017

Re-implemented the tag skipping logic in the Markup class as you suggested, reverted the changes to the Filter class. I think this is nice and clean now, thanks for the review!

@@ -126,7 +126,7 @@ def render_default(data, format=:markdown, name='render_default.md')
@format = format
@name = name

chain = [:YAML, :PlainText, :Emoji, :TOC, :RemoteCode, :Code, :Sanitize, :PlantUML, :Tags, :Render]
chain = [:YAML, :PlainText, :Emoji, :TOC, :RemoteCode, :Code, :Sanitize, :PlantUML, :Tags, :Render].reject {|filter| skip_filter?(filter)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean why in the render_default method? It's used in the tests at least, so filter skipping should work there too, I thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you also call .reject on line 180.

Copy link
Member Author

@dometto dometto Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the render and render_default methods are enitrely separate from each other, so the rejecting needs to happen twice. But I now notice that render_default is called just once in gollum-lib and gollum taken together, namely in a test. If I modify that test we can just remove this method alltogether. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

render_default was being used to allow use of the Markup class without a wiki. I can see some usecases for that, but I don't know we should consider them within the legitimate scope of the gollum project. Perhaps in the interest of general code sanity, we can remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it. 😄 People can always come knocking if they miss this functionality in 5.x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add to the 5.x changelog though!

@dometto dometto merged commit 745136c into gollum:gollum-lib-5.x Apr 29, 2017
@dometto dometto deleted the refactor_tags branch September 6, 2022 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants