-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Fix/4459/excerpt formatting ignored #9716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/4459/excerpt formatting ignored #9716
Conversation
Did you check #5586 which tackles same problem?
There's no way around it I think. Unless excerpt separator is visible in markdown AST, then we could use same AST and break manually on separator (but I don't think it's there)
🙏 At some point I would like to have |
I can look to see if the separator is in the AST... One alternative is that we could just not run plugins on the extract. This is an unfortunate limitation, though it's definitely better than the current situation and might be better than potentially breaking all/half of the plugins that use the |
So the comment does make its way into the markdown AST. With that said, removing it (and everything that follows it) is not trivial. At first I thought it would be simple, but as I looked at the markdown AST data structure a bit closer, I realized that it's storing position relative to the original text. Here is an example: node { type: 'root',
children:
[ { type: 'paragraph', children: [Array], position: [Object] },
{ type: 'html', value: '<!-- end -->', position: [Object] },
{ type: 'paragraph', children: [Array], position: [Object] },
{ type: 'paragraph', children: [Array], position: [Object] } ],
position:
{ start: { line: 1, column: 1, offset: 0 },
end: { line: 6, column: 5, offset: 1013 } } } You can see that the AST stores the offsets for the beginning and ending of every node. If I were to remove a subset of nodes from the AST, I would need to update all these positions. Unfortunately, as far as I can tell, there aren't any helper utilities available online to make these AST updates for me. I think I'm going to continue with the current approach and rely on calling |
This PR is mostly ready aside from updating some snapshots. The code should be pretty much finalized. The tests need a little help. Let me give you an overview of what I've done
Looking forward to hearing your thoughts |
You should have access to it via
Yeah, this would be breaking change so would need major version bump (if merged as-is). Will think if we should do intermediate step of keeping old field (and deprecate it) and add new field (i.e |
I was able to access the
Let me know how you'd like to approach it. I'm fine creating a new GraphQL field in the interim. |
was it happening in jests tests or when you were running gatsby with your changes applied? |
Yeah it was happening through Jest tests. Sounds like they aren't set up quite correctly? |
Seems like it -
pluginOptions (which includes excerpt_separator )
I guess this wasn't needed before because |
Ah, that makes sense. Thank you. As far as a final decision on this PR goes, are you settled on creating a |
@pieh Hey, just checking in, what are you thinking for a final decision on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few notes:
I think this is a great change. I've personally run into some weirdness with this, so I make an excerpt
field on my frontmatter that I pull from first and then fallback to excerpt made available automatically on the markdown node.
I think there are a few approaches here, all valid/useful in different ways. The overarching principle here should be that users could be relying on old behavior (e.g. using excerpt
for plain text, and then getting HTML) so we can't do this cleanly without a breaking change and a major version bump which is OK but not necessarily ideal.
To avoid that, I have a few ideas:
- Use something like remove-markdown to remove markdown characters
- This should (🤞) not really be a breaking change, and users should just see a benefit from it; unless they were relying on that field being markdown, which seems odd
- Expose a separate field (
excerptHtml
) and keepexcerpt
as is - Expose arguments, e.g.
format: 'plain'
(this should be the default) andformat: 'html'
I think I prefer the last approach. What are your thoughts @nadrane?
it sounds like you have a wayyy better understanding of the use cases of this field than I do! How are excerpts used for meta tags or SEO, just out of curiosity? So it sounds like we're in alignment to go with option 3, where we have an I'll modify the PR with these changes in mind! looking forward to getting this merged! |
@nadrane here's an example I've used in the past :) https://github.com/DSchau/blog/blob/master/src/templates/blog-post.js#L32-L34 |
@nadrane I'm looking forward to getting this merged too! Thanks for being patient and working with us on this, I think this is a great change 🎉 |
I'm just about done with this PR. I'm struggling to test code related to the I've tried a couple things
How the heck am I supposed to integrate this plugin with the greater system? Are there are docs around? |
@nadrane awesome! As far as testing the changes locally - you better believe it! Check out gatsby-dev-cli, specifically check out these instructions for getting up and running with it. Let us know if you need any help! |
@@ -373,8 +389,59 @@ module.exports = ( | |||
type: GraphQLBoolean, | |||
defaultValue: false, | |||
}, | |||
format: { | |||
type: GraphQLString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use an enum type here? It seems like only plain
and html
are options, and designating string could lead to subtle bugs (e.g. typos or something!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. Great catch.
alright, I think this is finally ready for review! Thanks for the gatsby-dev-cli tips. Worked like a charm |
@nadrane pulling this down to test right now. Awesome job thus far 👌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validated with this repo, particularly see this file which I used to validate the approach.
This looks great and I think is a nice, new improvement while also maintaining backwards compatibility.
One final thing I think we should do is actually document the feature. I'd recommend the README in gatsby-transformer-remark
. Sound like something we can do? Once that's done I'll go ahead and merge!
I think we're ready! |
@nadrane small little merge conflict! Want to take a look? Otherwise I can iron it out! |
@DSchau not sure how that happened. I fixed it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for all your work on this 💪
Holy buckets, @nadrane — we just merged your PR to Gatsby! 💪💜 Gatsby is built by awesome people like you. Let us say “thanks” in two ways:
If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’. Thanks again! |
|
This is the beginnings of a fix for gatsbyjs#4459 Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the `pruneLength` setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached. I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts: 1. The work in the `excerpt` resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this. 2. We are invoking `getMarkdownAST` (previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice. 3. I tore apart the `getAST` function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability. I'd love to hear people's thoughts.
This is the beginnings of a fix for #4459
Right now, the fix works when excerpt_separator is provided. I have not yet implemented a fix for when the excerpt is generated using the
pruneLength
setting, though I have thoughts on how to solve it. My plan to is do a pre-order traversal over the html AST, accumulating nodes until the pruneLength is reached.I would love some preliminary feedback to get a sense of how the contributors feel about this PR so far. Here are some of my initial thoughts:
The work in the
excerpt
resolve function is a little hacky. We are directly invoking a variety of functions get our HTML. I'm curious if anyone has a more streamlined approach for this.We are invoking
getMarkdownAST
(previously getAST) twice, once for the excerpt and once for the entire page. I'm worried that this is going to break non-idempotent existing plugins since they will now run twice.I tore apart the
getAST
function a bit and think I did a good job simplifying it. The big change is that we don't generate multiple promises anymore. i think this goes a long way towards improving readability.I'd love to hear people's thoughts.