RawContent isn't raw anymore #2601

Closed
bep opened this Issue Oct 18, 2016 · 11 comments

Comments

Projects
None yet
4 participants

@bep bep added the Bug label Oct 18, 2016

@bep bep added this to the v0.18 milestone Oct 18, 2016

@joonazan

This comment has been minimized.

Show comment
Hide comment
@joonazan

joonazan Oct 24, 2016

Contributor

I am currently working on this. Have already reproduced the bug.

Contributor

joonazan commented Oct 24, 2016

I am currently working on this. Have already reproduced the bug.

@joonazan

This comment has been minimized.

Show comment
Hide comment
@joonazan

joonazan Oct 24, 2016

Contributor

The problem is that Page.rawContent does not contain the raw content of the file. A comment states that this is to "save memory". This makes no sense, as clearly the raw content is needed.

rawContent and rawContentCopy should probably be named differently.

rawContentCopy should be eliminated altogether, because it serves no real purpose. Instead one could just not reassign rawContent. At least it should not be part of the struct, as it is only used in preparePagesForRender and the methods it calls.

I am going to do this refactoring of rawContentCopy, but what do you think I should do to rawContent?

Contributor

joonazan commented Oct 24, 2016

The problem is that Page.rawContent does not contain the raw content of the file. A comment states that this is to "save memory". This makes no sense, as clearly the raw content is needed.

rawContent and rawContentCopy should probably be named differently.

rawContentCopy should be eliminated altogether, because it serves no real purpose. Instead one could just not reassign rawContent. At least it should not be part of the struct, as it is only used in preparePagesForRender and the methods it calls.

I am going to do this refactoring of rawContentCopy, but what do you think I should do to rawContent?

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Oct 24, 2016

Member

I am going to do this refactoring of rawContentCopy, but what do you think I should do to rawContent?

I would be surprised if rawContentCopy isn't needed, as I was the one to put it there. The current modification of the content saves one copy of the entire page set, which makes a lot of sense if you have 10000 pages.

I think we should let this one sit for a little bit. One option is to deprecate the RawContent method. It was probably exported a little bit hasty, and now it is time to think about it before too many start to depend on it.

Member

bep commented Oct 24, 2016

I am going to do this refactoring of rawContentCopy, but what do you think I should do to rawContent?

I would be surprised if rawContentCopy isn't needed, as I was the one to put it there. The current modification of the content saves one copy of the entire page set, which makes a lot of sense if you have 10000 pages.

I think we should let this one sit for a little bit. One option is to deprecate the RawContent method. It was probably exported a little bit hasty, and now it is time to think about it before too many start to depend on it.

@joonazan

This comment has been minimized.

Show comment
Hide comment
@joonazan

joonazan Oct 25, 2016

Contributor

I have made a pull request that removes Page.rawContentCopy. #2634

It was only used in one method and functions that that method calls, that really should be more obviously part of the method.

Contributor

joonazan commented Oct 25, 2016

I have made a pull request that removes Page.rawContentCopy. #2634

It was only used in one method and functions that that method calls, that really should be more obviously part of the method.

@joonazan

This comment has been minimized.

Show comment
Hide comment
@joonazan

joonazan Oct 25, 2016

Contributor

I don't think the memory savings of not having the raw content are worth it. It is meant to be written by hand after all, which means you need pretty many writers to just keep up with RAM growth.

Contributor

joonazan commented Oct 25, 2016

I don't think the memory savings of not having the raw content are worth it. It is meant to be written by hand after all, which means you need pretty many writers to just keep up with RAM growth.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Oct 25, 2016

Member

It is meant to be written by hand after all,

What do you mean?

As to being "worth it", do you have any data to back your claim?

Member

bep commented Oct 25, 2016

It is meant to be written by hand after all,

What do you mean?

As to being "worth it", do you have any data to back your claim?

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Oct 25, 2016

Member

As to numbers: https://github.com/bep/hugo-benchmark can be useful.

Member

bep commented Oct 25, 2016

As to numbers: https://github.com/bep/hugo-benchmark can be useful.

@joonazan

This comment has been minimized.

Show comment
Hide comment
@joonazan

joonazan Oct 25, 2016

Contributor

It is meant to be written by hand after all,

What do you mean?

I meant that the raw content is not automatically generated and is therefore rather limited in size.

I do not have any data on whether raw content is needed other than that there is a person who had this issue with it. Leaving out the feature can be a good idea IMO, but not because of memory optimization. Less features is a desirable goal.

Why not optimize memory? A book is about a megabyte long. For the raw content to take up a gigabyte, it'd have to be about 1k books. Sites with a lot of text are usually one book long. For example Fallen London has stated that it is 1.5M words long. Which is only a few megabytes.

The only kind of site I can think of where memory would be even significant, though not problematic is the site of a large news producer. And for that amount of content you'd want so much traffic that I don't think you'd care about wasting a gigabyte of memory.

Contributor

joonazan commented Oct 25, 2016

It is meant to be written by hand after all,

What do you mean?

I meant that the raw content is not automatically generated and is therefore rather limited in size.

I do not have any data on whether raw content is needed other than that there is a person who had this issue with it. Leaving out the feature can be a good idea IMO, but not because of memory optimization. Less features is a desirable goal.

Why not optimize memory? A book is about a megabyte long. For the raw content to take up a gigabyte, it'd have to be about 1k books. Sites with a lot of text are usually one book long. For example Fallen London has stated that it is 1.5M words long. Which is only a few megabytes.

The only kind of site I can think of where memory would be even significant, though not problematic is the site of a large news producer. And for that amount of content you'd want so much traffic that I don't think you'd care about wasting a gigabyte of memory.

@aphecetche

This comment has been minimized.

Show comment
Hide comment
@aphecetche

aphecetche Oct 31, 2016

If I may add my 2 cents, this "RawContent-not-being-raw-anymore" issue is breaking the ability to integrate with (at least) reveal.js for instance (see https://discuss.gohugo.io/t/integration-with-reveal-js-to-present-slides/3047), which I find highly annoying ;-)

If I may add my 2 cents, this "RawContent-not-being-raw-anymore" issue is breaking the ability to integrate with (at least) reveal.js for instance (see https://discuss.gohugo.io/t/integration-with-reveal-js-to-present-slides/3047), which I find highly annoying ;-)

@moorereason

This comment has been minimized.

Show comment
Hide comment
@moorereason

moorereason Dec 1, 2016

Member

An alternative approach: allow the use of markup = raw in the content front-matter which will bypass the markup renderers. Then just use .Content in the templates.

Member

moorereason commented Dec 1, 2016

An alternative approach: allow the use of markup = raw in the content front-matter which will bypass the markup renderers. Then just use .Content in the templates.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Dec 1, 2016

Member

An alternative approach: allow the use of markup = raw

Yes, that I can like. I will first do a quick check to see what performance impact creating a copy of the content has.

Member

bep commented Dec 1, 2016

An alternative approach: allow the use of markup = raw

Yes, that I can like. I will first do a quick check to see what performance impact creating a copy of the content has.

@bep bep self-assigned this Dec 1, 2016

bep added a commit to bep/hugo that referenced this issue Dec 1, 2016

hugolib: Make RawContent raw again
This was a regression introduced in Hugo 0.17.

Fixes #2601

@bep bep closed this in #2754 Dec 1, 2016

bep added a commit that referenced this issue Dec 1, 2016

hugolib: Make RawContent raw again
This was a regression introduced in Hugo 0.17.

Fixes #2601

@bep bep referenced this issue Dec 1, 2016

Closed

Add markup = "raw" #2755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment