New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add truncate template function #2882

Merged
merged 7 commits into from Jan 6, 2017

Conversation

Projects
None yet
3 participants
@biilmann
Contributor

biilmann commented Jan 3, 2017

This commit adds a truncate template function for safely truncating text without
breaking words. The truncate function is HTML aware, so if the input text is a
template.HTML it will be truncated without leaving broken or unclosed HTML tags.

{{ "this is a very long text" | truncate 10 " ..." }}
{{ "With [Markdown](/markdown) inside." | markdownify | truncate 10 }}

The HTML truncation is based on Django's truncatehtml template helper, so while I would normally be really cautious about doing anything regexp based with HTML, this is a very battle tested implementation. I've also tested it on a corpus of 2600+ blog posts from a large publication.

Show outdated Hide outdated tpl/template_funcs.go
Show outdated Hide outdated tpl/template_funcs.go
Add truncate template function
This commit adds a truncate template function for safely truncating text without
breaking words. The truncate function is HTML aware, so if the input text is a
template.HTML it will be truncated without leaving broken or unclosed HTML tags.

    {{ "this is a very long text" | truncate 10 " ..." }}
    {{ "With [Markdown](/markdown) inside." | markdownify | truncate 10 }}
@biilmann

This comment has been minimized.

Show comment
Hide comment
@biilmann

biilmann Jan 4, 2017

Contributor

Thanks @moorereason I've added both performance optimizations.

Contributor

biilmann commented Jan 4, 2017

Thanks @moorereason I've added both performance optimizations.

@bep

bep reviewed Jan 4, 2017 edited

@biilmann a couple of comments/requests:

  • Pull the implementation and test (not the "smoke test") out into its own files, name it template_func_truncate.go and template_func_truncate_test.go. The files they live in now have gotten a bit on the long side.
  • The text truncate variant (I have not checked the HTML code path) assumes that every character is 1 byte, which fail pretty fast.
  • Hugo is Big in Japan ... And Japanese and the other CJK languages are inherently space-less. Needs test cases to confirm that this works.
  • The test line coverage looks ... average. I don't care about obvious error paths, but the other conditionals should be covered (or removed if not relevant).

All in all, it looks fine. It doesn't look particularly fast nor memory effective, but I guess a faster version would be much more complex and hard to read.

@moorereason

This comment has been minimized.

Show comment
Hide comment
@moorereason

moorereason Jan 4, 2017

Contributor

@bep,
Can you point out where it assumes each character is a byte? Ranging on a string iterates the runes.

Can we stop creating tpl/template_something.go already? We're in the tpl directory. Feels like the file names are stuttering. I'd rather see tpl/func_truncate.go. I want to rename most of the files in the directory, but I'm afraid I'd break everyone's PRs. 😀

Contributor

moorereason commented Jan 4, 2017

@bep,
Can you point out where it assumes each character is a byte? Ranging on a string iterates the runes.

Can we stop creating tpl/template_something.go already? We're in the tpl directory. Feels like the file names are stuttering. I'd rather see tpl/func_truncate.go. I want to rename most of the files in the directory, but I'm afraid I'd break everyone's PRs. 😀

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Jan 4, 2017

Member

@moorereason it should be pretty obvious if you add those missing test cases. As to naming, I just follow a naming convention already there. Let us take that discussion somewhere else.

Member

bep commented Jan 4, 2017

@moorereason it should be pretty obvious if you add those missing test cases. As to naming, I just follow a naming convention already there. Let us take that discussion somewhere else.

@biilmann

This comment has been minimized.

Show comment
Hide comment
@biilmann

biilmann Jan 5, 2017

Contributor

Spot on with the unicode comments. I've made some changes that fixes the issues with unicode slicing of the texts and that should handle languages with no spaces.

Take a look and let me know if you spot any other issues.

The two code paths between text and HTML truncation are much more similar now, and I'll want to refactor a bit to have just one path before this is ready to merge...

Contributor

biilmann commented Jan 5, 2017

Spot on with the unicode comments. I've made some changes that fixes the issues with unicode slicing of the texts and that should handle languages with no spaces.

Take a look and let me know if you spot any other issues.

The two code paths between text and HTML truncation are much more similar now, and I'll want to refactor a bit to have just one path before this is ready to merge...

biilmann added some commits Jan 5, 2017

Make truncate work with unicode
Add test cases for some edge cases and japanese characters
@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Jan 5, 2017

Member

Yea, this looks more like a truncate func people would want to steal for their CMS project ... I will have a closer look later.

Member

bep commented Jan 5, 2017

Yea, this looks more like a truncate func people would want to steal for their CMS project ... I will have a closer look later.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Jan 5, 2017

Member

Here is a failing test case for you:

{2, template.HTML("<p>P1</p><p>P2</p>"), nil, template.HTML("<p>P1 …</p>"), false},
Member

bep commented Jan 5, 2017

Here is a failing test case for you:

{2, template.HTML("<p>P1</p><p>P2</p>"), nil, template.HTML("<p>P1 …</p>"), false},

biilmann added some commits Jan 5, 2017

Just 1 code branch for handling truncation
Avoid having two separate code branches for truncating text and HTML
@biilmann

This comment has been minimized.

Show comment
Hide comment
@biilmann

biilmann Jan 5, 2017

Contributor

Good catch - fixed that edge case and got rid of the duplication between the text and HTML truncation.

Contributor

biilmann commented Jan 5, 2017

Good catch - fixed that edge case and got rid of the duplication between the text and HTML truncation.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Jan 5, 2017

Member

There is a flaw in the tag-closing logic. This test case fails:

{3, template.HTML(strings.Repeat("<p>P</p>", 20)), nil, template.HTML("<p>P</p><p>P</p><p>P …</p>"), false},
	
Member

bep commented Jan 5, 2017

There is a flaw in the tag-closing logic. This test case fails:

{3, template.HTML(strings.Repeat("<p>P</p>", 20)), nil, template.HTML("<p>P</p><p>P</p><p>P …</p>"), false},
	
@biilmann

This comment has been minimized.

Show comment
Hide comment
@biilmann

biilmann Jan 6, 2017

Contributor

Rewrote the tag closing logic and updated the docs. Added a few more edge case test cases as well around tag closing.

Contributor

biilmann commented Jan 6, 2017

Rewrote the tag closing logic and updated the docs. Added a few more edge case test cases as well around tag closing.

@bep bep merged commit 2989c38 into gohugoio:master Jan 6, 2017

4 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/wercker Wercker build passed
Details
licence/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment