Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MarkdownFixer] Do not escape username with underscores in code or codeblocks #5785

Conversation

JuanitoFatas
Copy link
Contributor

@JuanitoFatas JuanitoFatas commented Jan 28, 2020

What type of PR is this? (check all applicable)

  • Refactor

  • Feature

  • Bug Fix

  • Optimization

  • Documentation Update

  • Tests are passing

    bundle exec rspec spec/labor/markdown_fixer_spec.rb spec/requests/api/v0/articles_spec.rb spec/services/articles/creator_spec.rb spec/services/mentions/create_all_spec.rb
  • I‘ve verified it works on development
    starts the server at local, sign in, new post, copy the following to preview and see if things work as expected. You should not see \ in code and codeblock, and underscored username renders correctly.

    Code for testing at development
    @_dev_
    
    ```
    @_dev_
    ```
    
    `@_dev_`
    
    @_dev_
    

Description

The underscored username got escaped in code and code blocks. We probably do not want that. The fix is only to escape underscored username if they are not in code or code blocks.

Other possible approaches:

  • Write regexp that only pull out underscored usernames not in code or code blocks
  • Turn markdown into HTML and use Nokogiri to work on text of ancestors not pre or code

Related Tickets & Documents

Fixes #5739.

Screenshots

Current on dev (2019.01.28, 26e4412)

スクリーンショット 2020-01-28 17 41 26

After this Pull Request (tested against 432885f)

スクリーンショット 2020-02-09 22 49 42

スクリーンショット 2020-02-09 22 36 07

Added to documentation?

  • docs.dev.to
  • readme
  • no documentation needed
  • code comments added

[optional] What gif best describes this PR or how it makes you feel?

@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Jan 28, 2020
@claassistantio
Copy link

claassistantio commented Jan 28, 2020

CLA assistant check
All committers have signed the CLA.

@JuanitoFatas JuanitoFatas force-pushed the markdown_fixer/not-escape-for-code-or-codeblocks branch from d486f87 to 6591cb7 Compare January 28, 2020 08:37
@rhymes rhymes changed the title [MarkdownFixer] Do not escape underscores in code or codeblocks [WIP] [MarkdownFixer] Do not escape underscores in code or codeblocks Jan 28, 2020
@pr-triage pr-triage bot removed the PR: unreviewed bot applied label for PR's with no review label Jan 28, 2020
@JuanitoFatas JuanitoFatas force-pushed the markdown_fixer/not-escape-for-code-or-codeblocks branch from 6591cb7 to 5315367 Compare January 28, 2020 13:13
@JuanitoFatas JuanitoFatas changed the title [WIP] [MarkdownFixer] Do not escape underscores in code or codeblocks [MarkdownFixer] Do not escape underscores in code or codeblocks Jan 28, 2020
@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Jan 28, 2020
@JuanitoFatas JuanitoFatas changed the title [MarkdownFixer] Do not escape underscores in code or codeblocks [WIP] [MarkdownFixer] Do not escape username with underscores in code or codeblocks Jan 28, 2020
@pr-triage pr-triage bot removed the PR: unreviewed bot applied label for PR's with no review label Jan 28, 2020
@JuanitoFatas JuanitoFatas changed the title [WIP] [MarkdownFixer] Do not escape username with underscores in code or codeblocks [MarkdownFixer] Do not escape username with underscores in code or codeblocks Jan 29, 2020
@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Jan 29, 2020
@rhymes rhymes requested review from maestromac and Zhao-Andy and removed request for rhymes January 30, 2020 13:20
Zhao-Andy
Zhao-Andy previously approved these changes Feb 3, 2020
Copy link
Contributor

@Zhao-Andy Zhao-Andy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for the fix!

@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Feb 3, 2020
@Zhao-Andy
Copy link
Contributor

I assume this PR is good because all tests are passing now, but @JuanitoFatas let me know if I'm mistaken. :)

@JuanitoFatas JuanitoFatas force-pushed the markdown_fixer/not-escape-for-code-or-codeblocks branch from 5315367 to 477f152 Compare February 4, 2020 00:30
@pr-triage pr-triage bot added PR: unreviewed bot applied label for PR's with no review and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 4, 2020
@JuanitoFatas
Copy link
Contributor Author

@Zhao-Andy

I assume this PR is good because all tests are passing now, but @JuanitoFatas let me know if I'm mistaken. :)

Rebased with master and verified it works ✅

スクリーンショット 2020-02-04 09 27 52

and tests passed 🎉

スクリーンショット 2020-02-04 09 32 22

:shipit:

Zhao-Andy
Zhao-Andy previously approved these changes Feb 4, 2020
@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Feb 4, 2020
@mstruve mstruve requested a review from rhymes February 6, 2020 18:50
Copy link
Contributor

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JuanitoFatas, markdown parsing bugs are always complicated to fix, your solution seems very clever!

I left notes about adding comments to the new code as node traversal and regexp aren't usually super intuitive for future new readers.

ps. don't forget to prefix your branches with your name in the future, you can checkout the contributing guide for details.

app/labor/markdown_fixer.rb Outdated Show resolved Hide resolved
app/labor/markdown_fixer.rb Show resolved Hide resolved
@JuanitoFatas JuanitoFatas force-pushed the markdown_fixer/not-escape-for-code-or-codeblocks branch from 477f152 to 432885f Compare February 9, 2020 13:35
@pr-triage pr-triage bot added PR: unreviewed bot applied label for PR's with no review and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 9, 2020
@JuanitoFatas
Copy link
Contributor Author

@rhymes

Thanks for the review. I‘ve added comments and changed the implementation to a less clever one 😉 . I also pulled the node traverser into a separate file, hope that make it easier to read. Verified this works and updated the screenshot in the Pull Request accordingly 🎉

@@ -47,11 +47,29 @@ def split_tags(markdown)
end

def underscores_in_usernames(markdown)
markdown.gsub(/(@)(_\w+)(_)/, '\1\\\\\2\\\\\3')
return markdown unless markdown.include?("@")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard clause has to go through all of markdown to determine whether or not to return, so I don't think it saves us much work. If we skip this check we go through the document once, if we keep it we go through the document once + the percentage of text preceding the username.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are number of tests that pass "", front matters, or text that do not have username at all (without @) to here. These make sense in real world. Hence checking for @ here to avoid doing work for irrelevant cases and all tests are happy.

Copy link
Contributor

@citizen428 citizen428 Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reason for the check, my only slight concern was this:

Case 1: markdowndoesn't contain an @: we're still going to compare every character in the document to @ for the include? check. I'd guess a majority of articles won't have usernames in them.

Case 2. markdown contains an @: we read markdown once until we find the @, then traverse again.

Maybe the guard clause should actually make use of the regular expression you defined, something like

return markdown unless markdown =~ USERNAME_WITH_UNDERSCORE_REGEXP

as that's actually what we care about, since the @ sign does not necessarily indicate a username (in some languages it's used for array concatenation for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey checking against the regexp makes sense. I‘ve updated the code.

app/labor/markdown_fixer.rb Outdated Show resolved Hide resolved
app/labor/markdown_fixer.rb Show resolved Hide resolved
app/labor/markdown_traverser.rb Show resolved Hide resolved
Copy link
Contributor Author

@JuanitoFatas JuanitoFatas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏓

@@ -47,11 +47,29 @@ def split_tags(markdown)
end

def underscores_in_usernames(markdown)
markdown.gsub(/(@)(_\w+)(_)/, '\1\\\\\2\\\\\3')
return markdown unless markdown.include?("@")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are number of tests that pass "", front matters, or text that do not have username at all (without @) to here. These make sense in real world. Hence checking for @ here to avoid doing work for irrelevant cases and all tests are happy.

app/labor/markdown_fixer.rb Outdated Show resolved Hide resolved
app/labor/markdown_fixer.rb Show resolved Hide resolved
app/labor/markdown_traverser.rb Show resolved Hide resolved
Copy link
Contributor

@citizen428 citizen428 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch more comments, but don't let any of them stop you.

It only runs when markdown has underscored usernames.

Skip escaping underscore when content is in code or codeblock. It works by
going through all lines of a markdown content. Find underscored username.
Escape if we are not in the codeblock.

The (?<!) is negative lookbehind which rules out the case
when a underscored username is present in code, e.g., `@_dev_` will not be escaped.
@JuanitoFatas JuanitoFatas force-pushed the markdown_fixer/not-escape-for-code-or-codeblocks branch from 0e92d06 to 2ac92e8 Compare February 13, 2020 14:32
Copy link
Contributor

@citizen428 citizen428 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now, thanks for updating the guard clause.

Copy link
Contributor

@maestromac maestromac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very awesome solution. LGTM!

@maestromac maestromac merged commit cc09ed7 into forem:master Feb 25, 2020
@pr-triage pr-triage bot added PR: merged bot applied label for PR's that are merged and removed PR: unreviewed bot applied label for PR's with no review labels Feb 25, 2020
@JuanitoFatas JuanitoFatas deleted the markdown_fixer/not-escape-for-code-or-codeblocks branch September 16, 2020 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: merged bot applied label for PR's that are merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Code snippet rendering issue related to underscore
6 participants