Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting issue for clojure code - hugely spaced out #821

Closed
rsslldnphy opened this issue Oct 1, 2013 · 22 comments
Closed

Formatting issue for clojure code - hugely spaced out #821

rsslldnphy opened this issue Oct 1, 2013 · 22 comments

Comments

@rsslldnphy
Copy link

Clojure code seems to be being split onto many many lines at the moment with whitespace in between, but no other languages seem to be affected. Screenshot attached.
screen shot 2013-10-01 at 21 32 18

@rsslldnphy
Copy link
Author

From what I can see from the HTML this isn't caused by some errant CSS; lots of line-breaks seem to have been introduced somehow into the submission code/markup itself, and aren't there in the code examples in other languages.

Not sure if this is happening at render time or submit time, but maybe someone with access to the DB could check? Either way, seems strange it's only happening for Clojure.

NB. This exercise was submitted with the gem if that makes any difference. I'll change to using the Go client and see if it works then.

@kytrinyx
Copy link
Member

kytrinyx commented Oct 1, 2013

Huh. Very, very odd. It might be the markup/cleanup/sanitization dance that is introducing newlines and such.

I doubt that the ruby gem vs go client will make a difference. Both of these simply read the code in from the file and submit it raw to the API.

@rsslldnphy
Copy link
Author

Sounds implausible it's the gem then. Was just looking at that sanitization commit, and seems to match up reasonably well time-wise with when I first noticed the problem. The only difference I can see is the introduction of Loofah.

@kytrinyx
Copy link
Member

kytrinyx commented Oct 1, 2013

Right now I'm struck by the "hard problems" list: Time Zones, Cache Invalidation, Input Sanitation, and Naming Things. I can't believe how much churn we've had in this part of the code.

This has got to be a solved problem, I just have to ask someone who is smarter than I am :)

@rsslldnphy
Copy link
Author

I thought there were only two hard problems left? Cache invalidation, naming, and off by one errors?

@kytrinyx
Copy link
Member

kytrinyx commented Oct 1, 2013

Clearly I can't count!

@DimaD
Copy link

DimaD commented Oct 5, 2013

Hi,

I investigated the cause a little bit and tracked it down to usage of loofah in Sinatra::MarkdownHelper#md.

First, let's look at how code rendering is organized in HTML. Markdown processor generates
pre which has table with 2 columns: for line numbers and for lines of code.

In the first column each line number is wrapped into div.lineno. These divs have display: block on them so they align into one column where each line with numbers have height of exactly one line of text in the column with code.

In the second column we have our code in which each lexical unit is wrapped into span tag with different classes for keywords/constants and what-not. For each line break presented in the original source code there is line break in the rendered representation as well. It is important to keep them because the whole table is wrapped into pre tag and these breaks actually cause text to be broken into strings and this causes code to align with line numbers.

Now let's look at the following code in clojure:

(defn to-rna [dna]
  (string/replace dna "T" "U"))

Markdown processor generates the following output:

<pre class="highlight clojure"><table><tbody><tr><td class="gutter gl"><div class="lineno">1</div><div class="lineno">2</div></td><td class="code"><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">to-rna</span><span class="w"> </span><span class="p">[</span><span class="n">dna</span><span class="p">]</span><span class="w">
  </span><span class="p">(</span><span class="nf">string/replace</span><span class="w"> </span><span class="n">dna</span><span class="w"> </span><span class="s">&quot;T&quot;</span><span class="w"> </span><span class="s">&quot;U&quot;</span><span class="p">))</span><span class="w">
</span></td></tr></tbody></table></pre>

This HTML renders 2 line numbers and 2 lines with code as we expect it. Now loofah gem kicks in to escape possibly malicious HTML code:

Loofah.xml_fragment(html).scrub!(:escape).to_s

The unexpected consequence of this step is that loofah removes all the original formatting of HTML which is crucial in our case. We get

<pre class="highlight clojure">
  <table>
    <tbody>
      <tr>
        <td class="gutter gl">
          <div class="lineno">1</div>
          <div class="lineno">2</div>
        </td>
        <td class="code">
          <span class="p">(</span>
          <span class="k">defn</span>
          <span class="w"> </span>
          <span class="n">to-rna</span>
          <span class="w"> </span>
          <span class="p">[</span>
          <span class="n">dna</span>
          <span class="p">]</span>
          <span class="w">
  </span>
          <span class="p">(</span>
          <span class="nf">string/replace</span>
          <span class="w"> </span>
          <span class="n">dna</span>
          <span class="w"> </span>
          <span class="s">"T"</span>
          <span class="w"> </span>
          <span class="s">"U"</span>
          <span class="p">))</span>
          <span class="w">
</span>
        </td>
      </tr>
    </tbody>
  </table>
</pre>

As you see each lexical token gets it's own line and it breaks the display completely

Now to make everything complicated let's look at some haskell code:

responseFor message | isSilence message  = "Fine. Be that way!"
                    | isShouting message = "Woah, chill out!"
                    | isQuestion message = "Sure."
                    | otherwise          = "Whatever."

isSilence  = all C.isSpace
isShouting = presentAnd (all C.isUpper) . filter C.isLetter
isQuestion = isSuffixOf "?"

Markdown processor generates the following:

<pre class="highlight haskell"><table><tbody><tr><td class="gutter gl"><div class="lineno">1</div><div class="lineno">2</div><div class="lineno">3</div><div class="lineno">4</div><div class="lineno">5</div><div class="lineno">6</div><div class="lineno">7</div></td><td class="code"><span class="n">responseFor</span> <span class="n">message</span> <span class="o">|</span> <span class="n">isSilence</span> <span class="n">message</span>  <span class="o">=</span> <span class="s">&quot;Fine. Be that way!&quot;</span>
                    <span class="o">|</span> <span class="n">isShouting</span> <span class="n">message</span> <span class="o">=</span> <span class="s">&quot;Woah, chill out!&quot;</span>
                    <span class="o">|</span> <span class="n">isQuestion</span> <span class="n">message</span> <span class="o">=</span> <span class="s">&quot;Sure.&quot;</span>
                    <span class="o">|</span> <span class="n">otherwise</span>          <span class="o">=</span> <span class="s">&quot;Whatever.&quot;</span>
<span class="n">isSilence</span>  <span class="o">=</span> <span class="n">all</span> <span class="kt">C</span><span class="o">.</span><span class="n">isSpace</span>
<span class="n">isShouting</span> <span class="o">=</span> <span class="n">presentAnd</span> <span class="p">(</span><span class="n">all</span> <span class="kt">C</span><span class="o">.</span><span class="n">isUpper</span><span class="p">)</span> <span class="o">.</span> <span class="n">filter</span> <span class="kt">C</span><span class="o">.</span><span class="n">isLetter</span>
<span class="n">isQuestion</span> <span class="o">=</span> <span class="n">isSuffixOf</span> <span class="s">&quot;?&quot;</span>
</td></tr></tbody></table></pre>

And when we pipe the html through loofah it changes formatting but it keeps all line breaks in their places in the second column:

<pre class="highlight haskell">
  <table>
    <tbody>
      <tr>
        <td class="gutter gl">
          <div class="lineno">1</div>
          <div class="lineno">2</div>
          <div class="lineno">3</div>
          <div class="lineno">4</div>
          <div class="lineno">5</div>
          <div class="lineno">6</div>
          <div class="lineno">7</div>
        </td>
        <td class="code"><span class="n">responseFor</span> <span class="n">message</span> <span class="o">|</span> <span class="n">isSilence</span> <span class="n">message</span>  <span class="o">=</span> <span class="s">"Fine. Be that way!"</span>
                    <span class="o">|</span> <span class="n">isShouting</span> <span class="n">message</span> <span class="o">=</span> <span class="s">"Woah, chill out!"</span>
                    <span class="o">|</span> <span class="n">isQuestion</span> <span class="n">message</span> <span class="o">=</span> <span class="s">"Sure."</span>
                    <span class="o">|</span> <span class="n">otherwise</span>          <span class="o">=</span> <span class="s">"Whatever."</span>
<span class="n">isSilence</span>  <span class="o">=</span> <span class="n">all</span> <span class="kt">C</span><span class="o">.</span><span class="n">isSpace</span>
<span class="n">isShouting</span> <span class="o">=</span> <span class="n">presentAnd</span> <span class="p">(</span><span class="n">all</span> <span class="kt">C</span><span class="o">.</span><span class="n">isUpper</span><span class="p">)</span> <span class="o">.</span> <span class="n">filter</span> <span class="kt">C</span><span class="o">.</span><span class="n">isLetter</span>
<span class="n">isQuestion</span> <span class="o">=</span> <span class="n">isSuffixOf</span> <span class="s">"?"</span>
</td>
      </tr>
    </tbody>
  </table>
</pre>

I have not dive deeper yet to figure out why it happens with clojure and does not happens with haskell but will do as soon as I have time.

@kytrinyx
Copy link
Member

kytrinyx commented Oct 5, 2013

Thank you for the in-depth research and write-up.

@rsslldnphy
Copy link
Author

Ok, have done some poking around and - and this is very strange, but I think I've narrowed down what's causing the problem.

The difference between the clojure code and the haskell code is that the clojure code represents whitespace as a <span class="w"> </span> and has no actual whitespace between any of the span elements. The Haskell code however just uses actual whitespace between span elements to represent whitespace in the code.

I wouldn't have thought this would affect anything, but apparently it does. Running the below two nearly identical fragments of code through Loofah in irb gives two very different results. The ONLY difference between them is that one of them has a single space between two of its spans and the other has no spaces. So a single missing character of whitespace is to blame!

# input with no spaces between spans
<pre class="highlight"><span class="p">(</span><span class="k">defn</span><span class="w"></span><span class="n">to-rna</span><span class="p">)</span></pre>

# output has loads of newlines added
<pre class=\"highlight\">\n  <span class=\"p\">(</span>\n  <span class=\"k\">defn</span>\n  <span class=\"w\"/>\n  <span class=\"n\">to-rna</span>\n  <span class=\"p\">)</span>\n</pre>\n

# input with only one char difference, a single space between two of spans
<pre class="highlight"><span class="p">(</span><span class="k">defn</span><span class="w"></span><span class="n">to-rna</span><span class="p">)</span></pre>

# output retains its newlines correctly
<pre class=\"highlight\"><span class=\"p\">(</span><span class=\"k\">defn</span> <span class=\"w\"> </span> <span class=\"n\">to-rna</span><span class=\"p\">)</span></pre>\n

So to fix I guess we need to work out a) why loofah is working in this seemingly bizarre way and/or b) why the clojure syntax highlighter seems to do whitespace differently to the other syntax highlighters. Then work out which is more appropriate to fix!

@kytrinyx
Copy link
Member

kytrinyx commented Oct 5, 2013

Wow, nice find!

@rsslldnphy
Copy link
Author

Ok, so on a hunch I tried in irb using Loofah.fragment instead of Loofah.xml_fragment and... it worked! No weird newlines. So, is there a reason that xml_fragment is being used instead of just fragment? If not I'll create a pull request to make that change.

@rsslldnphy
Copy link
Author

So I started on this but it turns out that if you use fragment instead of xml_fragment, the pre tag is closed by Loofah before the table tag is opened - presumably because it's technically not valid html to have a table inside a pre, or so the HTML5 validation thingy tells me. This then obviously messes up the markup.

The table inside pre thing is something that is added by Rouge, so I'll raise an issue there. From a google for Pygments I'm pretty sure this layout isn't required; the HTML formatter here for example puts the pre tags inside the tds which makes sense.

@etrepum
Copy link

etrepum commented Oct 7, 2013

I don't think this is fixed, I just resubmitted clojure code and it still doesn't look right. Does it need to get deployed or something? http://exercism.io/submissions/52524178140e1d8f61000024

@kytrinyx
Copy link
Member

kytrinyx commented Oct 7, 2013

I haven't deployed. Doing so now.

@kytrinyx
Copy link
Member

kytrinyx commented Oct 7, 2013

Eh. I have failing tests. Give me a few minutes. It should go up shortly, though.

@rsslldnphy
Copy link
Author

I think @jayferd has changed the markup slightly since merging in my pull request so there is a slight difference between my branch and rouge master - I think it's just that the linenos are no longer in divs if that helps.

@kytrinyx
Copy link
Member

kytrinyx commented Oct 7, 2013

Yeah, that's what I'm seeing.

@kytrinyx
Copy link
Member

kytrinyx commented Oct 7, 2013

... and deployed :) Thanks for all the help with this!

@rsslldnphy
Copy link
Author

🎉 Hooray! Thanks. Looking forward to getting back onto the clojure exercises :-) 🎉

@kytrinyx
Copy link
Member

kytrinyx commented Oct 7, 2013

:) So much excitement! 💖

@DimaD
Copy link

DimaD commented Oct 7, 2013

Thanks! 👍

@etrepum
Copy link

etrepum commented Oct 7, 2013

Woot!

DimaD pushed a commit to DimaD/exercism.io that referenced this issue Oct 9, 2013
…HTML

  * Use Loofah::fragment instad of Loofah::xml_fragment as covered in issue exercism#821
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants