Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add truncate template function #2882

Merged
merged 7 commits into from Jan 6, 2017
Merged

Add truncate template function #2882

merged 7 commits into from Jan 6, 2017

Conversation

biilmann
Copy link
Contributor

@biilmann biilmann commented Jan 3, 2017

This commit adds a truncate template function for safely truncating text without
breaking words. The truncate function is HTML aware, so if the input text is a
template.HTML it will be truncated without leaving broken or unclosed HTML tags.

{{ "this is a very long text" | truncate 10 " ..." }}
{{ "With [Markdown](/markdown) inside." | markdownify | truncate 10 }}

The HTML truncation is based on Django's truncatehtml template helper, so while I would normally be really cautious about doing anything regexp based with HTML, this is a very battle tested implementation. I've also tested it on a corpus of 2600+ blog posts from a large publication.

if err != nil {
return "", errors.New("text to truncate must be a string")
}
text, err = cast.ToStringE(textParam)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use := and remove var text string above to delay the allocation.

}

var count int
words := strings.Fields(text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd bet that something like this would be more efficient. Doesn't use strings.Fields that would generate more allocs.

This commit adds a truncate template function for safely truncating text without
breaking words. The truncate function is HTML aware, so if the input text is a
template.HTML it will be truncated without leaving broken or unclosed HTML tags.

    {{ "this is a very long text" | truncate 10 " ..." }}
    {{ "With [Markdown](/markdown) inside." | markdownify | truncate 10 }}
@biilmann
Copy link
Contributor Author

biilmann commented Jan 4, 2017

Thanks @moorereason I've added both performance optimizations.

Copy link
Member

@bep bep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biilmann a couple of comments/requests:

  • Pull the implementation and test (not the "smoke test") out into its own files, name it template_func_truncate.go and template_func_truncate_test.go. The files they live in now have gotten a bit on the long side.
  • The text truncate variant (I have not checked the HTML code path) assumes that every character is 1 byte, which fail pretty fast.
  • Hugo is Big in Japan ... And Japanese and the other CJK languages are inherently space-less. Needs test cases to confirm that this works.
  • The test line coverage looks ... average. I don't care about obvious error paths, but the other conditionals should be covered (or removed if not relevant).

All in all, it looks fine. It doesn't look particularly fast nor memory effective, but I guess a faster version would be much more complex and hard to read.

@moorereason
Copy link
Contributor

@bep,
Can you point out where it assumes each character is a byte? Ranging on a string iterates the runes.

Can we stop creating tpl/template_something.go already? We're in the tpl directory. Feels like the file names are stuttering. I'd rather see tpl/func_truncate.go. I want to rename most of the files in the directory, but I'm afraid I'd break everyone's PRs. 😀

@bep
Copy link
Member

bep commented Jan 4, 2017

@moorereason it should be pretty obvious if you add those missing test cases. As to naming, I just follow a naming convention already there. Let us take that discussion somewhere else.

@biilmann
Copy link
Contributor Author

biilmann commented Jan 5, 2017

Spot on with the unicode comments. I've made some changes that fixes the issues with unicode slicing of the texts and that should handle languages with no spaces.

Take a look and let me know if you spot any other issues.

The two code paths between text and HTML truncation are much more similar now, and I'll want to refactor a bit to have just one path before this is ready to merge...

Add test cases for some edge cases and japanese characters
@bep
Copy link
Member

bep commented Jan 5, 2017

Yea, this looks more like a truncate func people would want to steal for their CMS project ... I will have a closer look later.

@bep
Copy link
Member

bep commented Jan 5, 2017

Here is a failing test case for you:

{2, template.HTML("<p>P1</p><p>P2</p>"), nil, template.HTML("<p>P1 …</p>"), false},

Avoid having two separate code branches for truncating text and HTML
@biilmann
Copy link
Contributor Author

biilmann commented Jan 5, 2017

Good catch - fixed that edge case and got rid of the duplication between the text and HTML truncation.


if isHTML {
// Make sure we keep tag of HTML tags
slice := string(text[i:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text is a string now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@bep
Copy link
Member

bep commented Jan 5, 2017

There is a flaw in the tag-closing logic. This test case fails:

{3, template.HTML(strings.Repeat("<p>P</p>", 20)), nil, template.HTML("<p>P</p><p>P</p><p>P …</p>"), false},
	

@@ -662,6 +662,15 @@ e.g.
* `{{slicestr "BatMan" 3}}` → "Man"
* `{{slicestr "BatMan" 0 3}}` → "Bat"

### truncate

Truncate a text to a max length without cutting words or HTML tags in half. Since go templates are HTML aware, truncate will handle normal strings vs HTML strings intelligently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say:

Truncate a text to a max length without cutting words or leaving unclosed HTML tags. Since Go templates are HTML-aware, truncate will handle normal strings vs HTML strings intelligently. It's important to note that if you have a raw string that contains HTML tags that you want treated as HTML, you will need to convert the string to HTML using the safeHTML template function before sending the value to truncate; otherwise, the HTML tags will be escaped by truncate.

`{{ "<em>Keep my HTML</em>" | safeHTML | truncate 10 }}` → `<em>Keep my …</em>`

@biilmann
Copy link
Contributor Author

biilmann commented Jan 6, 2017

Rewrote the tag closing logic and updated the docs. Added a few more edge case test cases as well around tag closing.

@bep bep merged commit 2989c38 into gohugoio:master Jan 6, 2017
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants