Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

private/python_ipc_system: Convert to Cygwin path when needed. #1195

Merged
merged 1 commit into from
Aug 3, 2022

Conversation

alexvong243f
Copy link
Collaborator

@alexvong243f alexvong243f commented Jul 18, 2022

This should fix some errors when Python is running in Cygwin-like
environment. But there could still be errors in other places.

See #1182.

  • inst/private/cygpath.m: New function.
  • inst/private/python_env_is_cygwin_like.m: New function.
  • inst/private/python_ipc_system.m: Use them.

@alexvong243f
Copy link
Collaborator Author

It should work in theory. I did some tests using shim but not in a real cygwin-like
environment, as wine does not support cygwin / msys2.

@cbm755
Copy link
Collaborator

cbm755 commented Jul 18, 2022

Nice work! Maybe we can have sympref diagnose report this information too, see #1009.

@mmuetzel
Copy link
Member

mmuetzel commented Jul 19, 2022

Is there a reason why you prefer opening a pipe over using system (for this one off call)?
Does octsympy try to be Matlab compatible? Afaict, popen2 is an Octave-specific function.
On POSIX platforms, popen2 is just an Octave-wrapper for the native function of the same name. But that function doesn't exist on Windows afaict. Octave tries to emulate that by wrapping around some Windows API functions. While that should work for most cases, it might be "safer" to just use system instead. (Unless, there is a good reason for using popen2.)

Edit: Maybe also check if system ('where /Q cygpath') returns 0 (and keep a persistent result) in a first step to quickly rule out if this approach won't work anyway...

inst/private/cygpath.m Outdated Show resolved Hide resolved
@alexvong243f
Copy link
Collaborator Author

alexvong243f commented Jul 19, 2022 via email

@mmuetzel
Copy link
Member

IIUC, it is true that a malevolent user might be able to able to execute arbitrary programs by passing specifically grafted strings. But I'd say that this risk is pretty low if the string to be executed is essentially a literal. (Or the risk is basically the same with popen2, e.g., if binaries are replaced.)
On the other hand, if a user is malevolent, they might as well just use the system function directly in Octave. I don't currently see why they would try to route their commands through this package...

@alexvong243f
Copy link
Collaborator Author

alexvong243f commented Jul 19, 2022 via email

@mmuetzel
Copy link
Member

mmuetzel commented Jul 19, 2022

The input to cygpath () (at least for the changes you propose) is the output of tempname with a literal suffix. That should be pretty safe. Additionally, it is something for which fopen returned successfully. Maybe with some Unicode shenanigans it might be possible to construct a byte sequence that evaluates to something that fopen is happy with but that could still cause harm when passed as a argument to cygpath. But that is already highly theoretical.
That would still leave the "task" of getting tempname to return that byte sequence. Maybe possible???
But even if it would be possible to exploit that (and I don't know if it is), Octave in that scenario would hopefully run isolated on that web service. So, the amount of damage would hopefully be limited. (And who would use Windows for that web service? 😉)

But to get this PR back on track: Your changes look good. But at least for the sake of compatibility with Matlab, it might be better to use system imho. And maybe check pretty early on what system ('where /Q cygpath') returns (0 means it could find cygpath, anything else means it couldn't).

@cbm755
Copy link
Collaborator

cbm755 commented Jul 19, 2022

Does octsympy try to be Matlab compatible?

Yes, to some extent. But the popen2 IPC stuff is Octave-only, and is faster thansystem. Those are general answers. But here within the system-based IPC we should use system.

@cbm755 cbm755 added this to the 3.0.1 milestone Jul 20, 2022
@mmuetzel
Copy link
Member

I'd guess opening a pipe is only faster if there is repeated communication over the pipe. So, you don't need to spawn a new process each time.
For these "one time" invocations, there shouldn't be much of a difference with respect to performance. (I haven't done any measurements though.)


match = regexp(out, '.*', 'match', 'dotexceptnewline');
assert (length (match) == 1);
posix_path = match{1};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what these lines are used for. But in case, you just want to strip a trailing newline, you could probably also use strtrim. Or am I missing something?

Copy link
Collaborator Author

@alexvong243f alexvong243f Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of using strtrim / deblank but strtrim / deblank removes (horizontal) whitespaces as well from both ends. AFAIK whitespaces are pretty common in windows path. Do you know of any function which is similar to python's rstrip('\n')?

Copy link
Member

@mmuetzel mmuetzel Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right: Spaces in filenames are (unfortunately) pretty common in Windows paths. But I tried to rename a file in Windows Explorer such that it ends with a space (" ") character. It looks like it is not possible to do that. That space is automatically stripped from the file name. So, using strtrim or deblank would most probably do the "correct thing". (Especially with the automatically generated file names this changeset is about.)
Using those functions might also make it a bit easier to understand what this part of the code is actually meant to be doing. (An additional comment might also help.)

Edit: Same for leading spaces.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the current approach is too clumsy... Simplified and added comment as well.

[status, out] = system ([pyexec ' ' tmpfilename]);

if python_env_is_cygwin_like (pyexec)
converted_tmpfilename = cygpath (tmpfilename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need the original Windows path of the temporary file?
If not, you could also re-use the existing variable and simplify this portion of the code a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly code in a functional style, avoiding unnecessary mutations, maintaining referential transparency, etc whenever possible. I would prefer using a new variable instead of overwriting old ones. (Basically, it's the same "use const by default" advice provided by many es6 coding styles.)

Sadly, we can't use conditional expression. Otherwise, it'd be a one-liner converted_tmpfilename = cygpath (tmpfilename) if python_env_is_cygwin_like (pyexec) else tmpfilename

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that is a question of style and preference.
My personal preference is to avoid else branches if they "do nothing".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong feelings here but in this case I'd probably do as @mmuetzel. A few years of Python coding as also made me adverse to else (unless needed of course!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find it more difficult to mentally keep track of all the places where a mutation (of binding or value) has occurred. I prefer to treat variable declarations as mathematical definitions, which cannot be changed in the middle of a proof (an example of programming in the functional style). Anyway, I've removed the else branch in this simple case as both of you find it less readable...


if ~isempty (python_env_is_cygwin_like_memo)
r = python_env_is_cygwin_like_memo;
elseif ispc ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I would definitely do an early return. Then if ispc (). For me, this makes the persistent semantics clearer.

BTW, I like _memo for such variables! I shall try to remember that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I would definitely do an early return. Then if ispc (). For me, this makes the persistent semantics clearer.

Make sense to me, it also makes the code look more symmetric.

BTW, I like _memo for such variables! I shall try to remember that.

The name is borrowed from memoization. I think I saw it somewhere as well. This is one of the reasons why pure function (and referential transparency) is good!

@cbm755
Copy link
Collaborator

cbm755 commented Jul 30, 2022

Thanks @alexvong1995 and @mmuetzel for detailed reviewing. Merge this whenever you are both happy.

end

%% validate output
assert (ischar (out) && logical (regexp (out, '^[^\r\n]+[\r]?[\n]$')));
Copy link
Member

@mmuetzel mmuetzel Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part of this condition might be too specific. I'm not sure if the second output argument of system should really end with a new line character.
That might actually be a bug in Octave. (But I didn't come around to investigate.)

Maybe better check that the string after removal of the (potential) trailing newline is non-empty?

Edit: Thinking of it: Do you need the first part of that assertion? Isn't system guaranteed to return a char vector as the second output argument (potentially empty though)? Would assert (~ isempty (out)) be better here?

Copy link
Collaborator Author

@alexvong243f alexvong243f Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part of this condition might be too specific. I'm not sure if system should really add a new line to the end of the out string.
That might actually be a bug in Octave. (But I didn't come around to investigate.)

I would expect a console program (cygpath in our case) to append exactly 1 newline to the output path. Otherwise, it would be unreadable to human. (0 or 1 is plausible, > 1 would be a bug)

Maybe better check that the string after removal of the (potential) trailing newline is non-empty?

Indeed, it must be chars according to documentation. We can validate the path at the end, which is actually safer.

error ('python_env_is_cygwin_like: %s exited with status %d', ...
pyexec, status);
end
assert (ischar (out));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to your other recent change, this assertion might not be needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now updated.

This should fix some errors when Python is running in Cygwin-like
environment. But there could still be errors in other places.

See gnu-octave#1182.

* inst/private/cygpath.m: New function.
* inst/private/python_env_is_cygwin_like.m: New function.
* inst/private/python_ipc_system.m: Use them.
@mmuetzel
Copy link
Member

mmuetzel commented Aug 2, 2022

LGTM.

@alexvong243f alexvong243f merged commit e598ebe into gnu-octave:main Aug 3, 2022
@alexvong243f alexvong243f deleted the cygwin-python branch August 15, 2022 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants