Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't trim code selection before sending to kernel #1363

Merged
merged 1 commit into from
Jul 31, 2018

Conversation

kylebarron
Copy link
Contributor

Description of the Change

Don't remove whitespace from start and end of string before sending to kernel. Would fix #862.

I spent several hours trying to figure out enough JavaScript to remove common leading whitespace (that's on this branch, and I'd submit that as a PR if preferred) when I realized that the IPython kernel can deal with common whitespace just fine:

image

The issue is that Hydrogen trims strings by default before sending them to kernels.

export function normalizeString(code: ?string) {
if (code) {
return code.replace(/\r\n|\r/g, "\n").trim();
}
return null;
}

So what IPython actually receives is:
image

Alternate Designs

I wrote code that actually removes common leading whitespace. That's on this branch.

The hydrogen-python extension package tries to deal with this but has some limitations that prevent its use.

Benefits

Selection-based Python code would work regardless of the amount of common indentation.

Possible Drawbacks

Other kernels might be expecting strings without any leading or trailing whitespace. I would imagine that kernels would account for extra whitespace if necessary and that it wouldn't be an issue.

Otherwise, to preserve true backwards compatibility we could have only the Python kernel not trim whitespace. Or have a user-defined config setting. Maintainers probably have the best idea for what is best.

Applicable Issues

Would fix #862, half of #1341, #1285.

Examples

image

image

(This is the same behavior as the IPython kernel)
image

image

@kylebarron
Copy link
Contributor Author

@nikitakit

@kylebarron
Copy link
Contributor Author

Example to show concordance with original comment on 862:

Note that in my example if I were to start my selection at the I character in Imax then the behavior should be to still throw an error because the first line in my selection doesn't have leading whitespace in common with all other lines.

image

@nikitakit
Copy link
Contributor

@kylebarron It's great that you're taking a look at this!

Regarding the patch here:

  1. Is there any situation where it would still make sense to strip whitespace from the end of the string? I can't think of one off the top of my head.
  2. It might still be useful to strip whitespace when there is only one line of non-whitespace that's being run. I'm not sure how other kernels would react to single-line statements that are indented. That said, I may not have the best idea of what other kernels do because I mostly use Python.
  3. I'm OK with failing to run incomplete selections missing the indentation on the first line. I also wouldn't be opposed to adding a workaround that implicitly expands the selection to the start of the line, if that would add whitespace characters only.
  4. Have you tested the regular "run" command, without any selections active? Ideally if you place the cursor on an "if condition:" line and hit "run", it should run the entire if statement. (Issues regarding "else" clauses are TBD)

I'd be happy merging this, assuming there aren't known regressions for other languages.

I see this as a high-priority issue because it's quite annoying and affects a very popular language. If problems come up with other languages we can make this behavior grammar-dependent. My attempts to solve this exclusively in a plugin have all run into insurmountable difficulties, so I'm very much in favor of addressing this in hydrogen itself.

@kylebarron
Copy link
Contributor Author

  1. I don't know. There are no situations that I can think of, but it would be just as easy to change .trim() to .trimRight().

  2. I don't know how other kernels would react either. I did some very basic testing with the ijavascript, R, and Julia kernels. The results were the same no matter if the text had leading indents or not.

  3. I think it might be difficult to try to figure out how much indentation to add to the first line. And it might be confusing for users. I would hope that if they didn't include initial whitespace in their selection and received an IndentationError, that they would conclude that they need to include the original whitespace. As it stands, Hydrogen sends the kernel the same thing that would be sent with a copy-paste into IPython in the terminal, and I think that's a plus for consistency in what Hydrogen runs.

  4. Yes. It seems that the current behavior of Hydrogen is to include any code that is more indented on following lines, and that hasn't changed. Pressing cmd-enter:

image

@mbroedl
Copy link

mbroedl commented Jul 30, 2018

That's a good one @kylebarron!

Looking quite good! Only thing I could find right now is that a mix of hard tabs and spaces causes Indentation Errors, but that's not supposed to be working in python anyway:
image
These two lines should help in case you feel like implementing that:

const tabSub = ' '.repeat(e.getTabLength());
code = code.replace(/\t/g, tabSub);

Another point: I would suggest moving the implementation to normalizeString as every function seems to call that before returning. That'd avoid unforeseen consequences in case getTextInRange, getRow, and getRows are called. (I don't know what the circumstances for this would be.)

RE: 3. I'm in favour of expanding the selection to the front of the line, assuming that there is only whitespace at the beginning of the line.

Only problem is that from the code now passed to hydrogen-python is that you cannot anymore expand the selection because you don't know the indentation level. This would then indeed require another API function in hydrogen. Since the whitespace is now stripped even before the loop whether or not there is a selection, I'm not sure where to put the API anymore... unless you pass the indentation-level as well?

@kylebarron
Copy link
Contributor Author

@mbroedl

Only thing I could find right now is that a mix of hard tabs and spaces causes Indentation Errors, but that's not supposed to be working in python anyway

If the mix of hard tabs and spaces confuses Python anyways, I don't think we should fix that on Hydrogen's end. Otherwise the file will work with hydrogen but not with python file.py, right?

I would suggest moving the implementation to normalizeString

The implementation is in normalizeString.

Only problem is that from the code now passed to hydrogen-python is that you cannot anymore expand the selection because you don't know the indentation level.

Since the whitespace is now stripped even before the loop whether or not there is a selection

Maybe I'm confused... The point is that no whitespace is being stripped with this change. If you select both lines and run this:

    print('hello world')
    print('foobar')

the text that Hydrogen spits out is exactly:

    print('hello world')\n    print('foobar')\n

This should make it easier to find the indentation level, since you can simply get the indentation before the first non-whitespace character.

@mbroedl
Copy link

mbroedl commented Jul 30, 2018

You're absolutely right @kylebarron . Sorry, I looked at the wrong fork, as you pointed out in the other thread! Then my remarks are void.

@kylebarron
Copy link
Contributor Author

@mbroedl Sorry! I was just trying to juxtapose them!

@nikitakit
Copy link
Contributor

Re my point 3: Let's leave this for now to try to get this fix out faster. I want to clarify, however, that the original suggestion was not to infer the leading whitespace, but rather to read it directly from the text editor (and decide to include it or not based on whether there are any non-whitespace characters in same line as the start-of-selection)

Re mixing tabs and spaces: If the whitespace in the original file is a mess, I think the correct action is for the programmer to fix it (Atom has some whitespace normalization commands, I think). I don't see a reason to do this in Hydrogen.

Re my point 2: this is still the main one I'm worried about in terms of possible regressions. I know it won't be a problem for Python, but I have no idea how other kernels handle things. But that doesn't mean this PR has to be changed, unless an actual issue is discovered.

It looks like you already tested a few of the most popular kernels with this change, and I don't have any others in mind that might be problematic. Perhaps it makes sense to sanity check on multi-syntax documents though, since those use a different code path.

Does anyone want to see a particular kernel tested, or have other comments on this PR? Personally I'm in favor of merging now, letting it get tested on master for a little while, and then releasing.

@kylebarron
Copy link
Contributor Author

@nikitakit Thanks for the clarification. And agreed re mixing tabs and spaces. I checked a Python block inside a Markdown document and that worked for indented code.

Regarding possible issues with kernels:

  1. For languages where non-end of line whitespace characters have no semantic meaning, I don't see how added non-end of line whitespace characters could have an effect.
  2. For most languages, end of line characters do have a semantic meaning. So for example, this still fails in Python:
    image
    It may be ideal to allow this to work. I would support trimming any beginning or ending \n characters from the selection, such that the above code would still work. (I, and I assume many people, have settings that delete any trailing whitespace, so selecting a whitespace line with no characters followed by an indented block could be quite common.) Also note that IPython correctly deals with non-indented whitespace after the first line:
    image
  3. This leaves languages where non-end of line whitespace characters have a semantic meaning. I tried to search for some other language where this has an effect.
    • This appears to say that in Coffeescript whitespace can be semantically important. @n-riesco You wrote the Coffeescript Jupyter kernel, no? Is that true?

    • This says

      significant whitespace is also used in Haskell, Occam, YAML, and merd. And Cache COS

      Of those, I've only heard of Haskell and YAML, and you can't run YAML. This says

      Haskell relies on indentation to reduce the verbosity of your code.

      Code which is part of some expression should be indented further in than the beginning of that expression

      Is anyone familiar with Haskell and able to test out IHaskell?

I think those three cases should form the entire effect of the change. Thoughts?

@n-riesco
Copy link
Collaborator

@kylebarron @nikitakit The use of .trim() was an unintended change during Hydrogen's transition from Coffeescript to ES6 (see #503). It's not there to fix any issues with a kernel (this is unlike the transformation to use unix line-endings (see #451) that was introduced to make IRKernel happy).

My view is that, as white-space has semantic meaning for some kernels, white-space transformations should live in language-specific plugins (like hydrogen-python).

My 👍 to merging this PR as is.

Copy link
Member

@BenRussert BenRussert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent several hours trying to figure out enough JavaScript to remove common leading whitespace

It's really cool that you looked into this and found a fix. I remember doing the same for my first PR here

@BenRussert BenRussert merged commit b419749 into nteract:master Jul 31, 2018
@BenRussert
Copy link
Member

Thanks for the PR @kylebarron! Would you like to help us maintain hydrogen?

@kylebarron
Copy link
Contributor Author

I'm interested, though at this point my js knowledge is still very weak, and I know nothing about React.

In the next couple weeks I may try to create a PR for multiple cursors #1194, which is an enhancement I'd be happy to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: strip _common_ leading whitespace from input before passing to kernel
5 participants