Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fix backtick code block module #5441

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

magnus-ISU
Copy link

fixes #5223

One of the greatest pains to using old reddit ever

@magnus-ISU
Copy link
Author

magnus-ISU commented Dec 21, 2022

Problems:

  • textContent doesn't seem to be accurate actually. See here
  • What is the correct PageType for this to work on? Just comments or commentsLinkList or commentsProfilePage too? I don't know. Should I just remove it?

@magnus-ISU
Copy link
Author

hmm. The regex wasn't the problem. It works on some things, but not always. textContent is very crappy, maybe parsing the json does make more sense.

@serg06
Copy link

serg06 commented Dec 22, 2022

Hello! I haven't worked with RES before but I'm ready to pitch in.

To clarify the textContent issue: You're trying to grab the raw post/comment text so that you can parse it with markdown?

If so: Why don't you grab the raw text from the textarea that follows the post?

image

@magnus-ISU
Copy link
Author

That appears only to exist if you wrote the comment I think? Or doesn't exist on a fresh browser in any case

@serg06
Copy link

serg06 commented Dec 22, 2022

Ah good call. It appears when you click source, which grabs it from the json.

Sounds like json could work. Do you have a way to grab it from your code?

@magnus-ISU
Copy link
Author

Just need to find the correct link, just adding .json to the permalink appears to return json for the whole comments page which I don't think is right. The sourceSnudown module does some random shit, I'm trying to understand it.

@magnus-ISU
Copy link
Author

Got it working. I don't know exactly what the code is doing still to be honest, but it works for both comments and posts.

It doesn't work for post previews in a normal subreddit. Unsure why exactly, it seems watchForThings doesn't call my function for those at all. But sourcesnudown does show source buttons for them.

@serg06
Copy link

serg06 commented Dec 22, 2022

the permalink appears to return json for the whole comments page

You can add limit: 1 to your query to limit your results.


Got it working

Nice! How do I test it out, do I need to do the setup from CONTRIBUTING.md?

@magnus-ISU
Copy link
Author

Go to my repo, clone it,

NODE_OPTIONS=--openssl-legacy-provider yarn start --env browsers=firefox

then install the folder from dist/ as a temporary addon.

Or using chrome, don't pass in --env browsers=firefox and figure out how to load the chrome extension

@magnus-ISU
Copy link
Author

Presently it only works on comment pages. I am investigating and it seems that on a subreddits main page (example: reddit.com/r/test) the entry will be visited. At this moment, the text of the post is not loaded yet. So it does not find any markdown container, and it is skipped.

Then when you expand it, in the div class="expando" the body of the post is loaded in there via network request. But it does not trigger an update from the extension. The extension should add some sort of listener to those expandos so it can reformat posts on subreddit pages too. But that can be an enhancement, at this moment I am happy for this to be merged.

@serg06
Copy link

serg06 commented Dec 22, 2022

It's working! Woo!

@serg06
Copy link

serg06 commented Dec 22, 2022

It doesn't work for styled code blocks, i.e.:

```js
console.log('test');
.``` (ignore the period)

Displays as

js
console.log('test');

When it should display as

console.log('test');

@magnus-ISU
Copy link
Author

Dang. I knew I shouldn't trust regexes from random people which I don't understand

@serg06
Copy link

serg06 commented Dec 22, 2022

Some ideas:

  • We could use a markdown syntax parsing library to parse the text, then converting that to snudown format.

  • We could use a markdown->html library and skip snudown altogether.

@magnus-ISU
Copy link
Author

  • We could fix the regex. I don't want to add dependencies

@serg06
Copy link

serg06 commented Dec 22, 2022

Any particular reason to avoid dependencies?

@magnus-ISU
Copy link
Author

Not my project, I'm just a random dude coming into it. If a dependency should be added that would probably have been brought up by more experienced maintainers discussing this.

Anyway, its not the regex's fault, it's that reddit changes

\`\`\` some stuff
and some more
stuff\`\`\`
```

Into `<code>some stuff and some more stuff</code>` an inline code block.

I guess we can make more network requests for any comment/post which includes a `<code>` block at all.

@serg06
Copy link

serg06 commented Dec 22, 2022

Aside: I don't see any reason why we'd need maintainer approval for that; they don't mention it in CONTRIBUTING.md either. That said, I'm happy to leave it alone since this is still a major improvement.

Also this regex should handle styled code blocks: /```(?:[a-zA-Z0-9]*\n)?([\s\S]+?)```/gm

@magnus-ISU
Copy link
Author

The current regex is fine - it accepts anything already. The trouble was as I described above. I pushed - you'll note though that multiple code blocks in a row look like trash. I will see what I can do about that.

@magnus-ISU
Copy link
Author

Hmm. backticks in four-space code lines are messed up by the regex. I am more and more tempted to just line-by-line parse it as in my first prototype.

@serg06
Copy link

serg06 commented Dec 22, 2022

Nice job on the styled MD fix. Seems to match Github's markdown implementation.

@serg06
Copy link

serg06 commented Dec 22, 2022

If I understand correctly, you'd like this:

    ```
    test
    ```

to display as

.```
test
.```

(minus periods) ?

@magnus-ISU
Copy link
Author

Yes. Look at how new reddit renders my comment

@serg06
Copy link

serg06 commented Dec 22, 2022

I see. How about enforcing that backticks must start at the beginning of a line?

/^```([\s\S]+?)^```/gm

@magnus-ISU
Copy link
Author

I do think that would be a good idea. I think I tried that regex in the past, but before anything worked, so it slipped my mind.

@magnus-ISU
Copy link
Author

Alright. It still does not render everything exactly the same as new reddit, but it seems to be related to behavior of triple backticks becoming single backticks by snudown, not anything I'm doing, which no one uses anyway, so should be fine. And/or is related to new reddit not requiring closing code fence which is a mistake on reddit's part, so I don't care.

@serg06
Copy link

serg06 commented Dec 22, 2022

Another option is to only block the case where backticks are preceded by 4 spaces.

/(?<! {4}|\t)```([\s\S]+?)```/gm

This one might be better since it doesn't break things like:

this right here ``` code ``` is a code block

@serg06
Copy link

serg06 commented Dec 22, 2022

Not sure - whatever's more consistent is better I guess.

@magnus-ISU
Copy link
Author

That shouldn't be a code block.

A code fence is a sequence of at least three consecutive backtick characters (`) or tildes (~). (Tildes and backticks cannot be mixed.) A fenced code block begins with a code fence, preceded by up to three spaces of indentation.

The line with the opening code fence may optionally contain some text following the code fence; this is trimmed of leading and trailing spaces or tabs and called the info string. If the info string comes after a backtick fence, it may not contain any backtick characters. (The reason for this restriction is that otherwise some inline code would be incorrectly interpreted as the beginning of a fenced code block.)

The content of the code block consists of all subsequent lines, until a closing code fence of the same type as the code block began with (backticks or tildes), and with at least as many backticks or tildes as the opening code fence. If the leading code fence is preceded by N spaces of indentation, then up to N spaces of indentation are removed from each line of the content (if present). (If a content line is not indented, it is preserved unchanged. If it is indented N spaces or less, all of the indentation is removed.)

The closing code fence may be preceded by up to three spaces of indentation, and may be followed only by spaces or tabs, which are ignored.

I mean commonmark has so many backwards compatibility rules and weird things anyway that what it says isn't exactly too important, but there is no way that should become a code block.

@magnus-ISU
Copy link
Author

For example, I do not care that a code fence may be preceded by up to 3 spaces of indentation.

@magnus-ISU
Copy link
Author

Okay I think I'm happy now, going to wait to see what someone who can merge this thinks. @honestbleeps ?

@serg06
Copy link

serg06 commented Dec 22, 2022

In terms of efficiency: If you visit a thread with 500 comments with code blocks, you'd make 500 ajax requests. That doesn't seem okay.

Step 1: Since the HTTP request returns data for a ton of comments, why don't we cache all that data and re-use it later?

Step 2: We could add an async mutex to prevent parallel requests.

@magnus-ISU
Copy link
Author

Good point. The http request only returns data for 1 comment though, it is limiting it to 1 somehow which is how it always gets the right one. I think, if that is wrong oops.

But a mutex sounds like a good idea then

@serg06
Copy link

serg06 commented Dec 23, 2022

Iirc the request returns data for a bunch of child comments too, the data is tree shaped.

@calops
Copy link

calops commented Apr 23, 2024

Is there any hope to see this merged one day? At this point I'd be OK with the performance hit if it's an opt-in feature. Threads with a huge amount of code blocks are rare anyway, and they won't all be loaded at the same time.

@JuhaJGamer
Copy link

Is there any hope to see this merged one day? At this point I'd be OK with the performance hit if it's an opt-in feature. Threads with a huge amount of code blocks are rare anyway, and they won't all be loaded at the same time.

Please. It's been two years. Add it as an opt-in, completely gut it and just transform any triple-backticks to code tags, I don't care, I have infinite data. The code is right there, it works, it's inefficient, but I want it on my computer without having to custom build RES.

@benmcgarry
Copy link
Collaborator

benmcgarry commented May 20, 2024

This PR is incomplete, if you want it merged feel free to raise a new one with the actions completed. We are not going to add a half-complete feature on a project thats KTLO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] support for markdown code blocks on old.reddit.com
5 participants