-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yaml parsing: End_of_file exception on Windows #2991
Comments
I added some logging to semgrep -e "eval(...)" --lang=py demo.py the full command (
I can put a
(This is in Git Bash so I can use
But I noticed that the rules file uses CRLF Windows newlines. When I convert to Unix newlines via
(Above, SO. I can readily change the Python code to replace the contents of But this new error message seems to say that semgrep-core is writing to /tmp which of course doesn't exist in Windows. I see that
Can you help me find out who is attempting to write to (I can also prepare a PR to semgrep to ensure that the files sent to semgrep-core have Unix line ends—or would you prefer to ensure that the OCaml file readers are cross-platform?) |
Indeed it looks like the bad temp folder To test this, you can run
^ if you're getting I'm also getting this because I'm on linux:
You should get |
Ahhh, thank you @mjambon that makes total sense! I built this binary on Cygwin, and it would make sense that the resulting binary only contains the Cygwin module! Otherwise it would be silly to compile all modules in the resulting binary and checking From Semgrep's perspective, you're probably not interested in Cygwin and would rather have a fully-native Windows build (requiring a workaround to I'm sure I can hack the I'll also continue poking around, but—thank you SO MUCH For your time and your hard work in getting to the bottom of this! 🙏🙇🙏! |
"Hmmm" is also my reaction 😅. I don't fully understand what's going on. I asked the question here: https://discuss.ocaml.org/t/executable-built-on-cygwin-runs-on-win32-tries-and-fail-to-use-tmp/7726 PS: I'm not clear on the different variants of Windows builds. Until today I thought there were just 2 kinds: MinGW32 and Cygwin, the former requiring Cygwin at compile time but otherwise building a native Windows executable independent from Cygwin. The latter would require the Cygwin DLL(s) but I thought it would also run in a Cygwin environment and have access to |
Ahh you raise a great point—when Cygwin executables ask for /tmp, who translates that to somewhere on the Windows file system? Is it the shell? No, because I can open cmd.exe and run (And just to confirm. In OCaml for Windows (which as you said uses MinGW plus Cygwin to build fully-native Windows executables):
Whereas in Cygwin:
Makes sense!) |
Copying This must be because I put cygwin1.dll in When I create this "tmp" directory site-packages\semgrep\tmp, and convert my demo.py script to have Unix line endings via
¡¡¡🙌👏👌💋🥳!!! (For completeness, here's the contents of
and here's the contents of
and foo.py contains a use of (If foo.py has Windows file endings, semgrep-core reports a FatalError: Thank you @mjambon for your comment about Cygwin exes understanding Unix paths, that was the crucial breakthrough! Soooo. Taking stock. For Cygwin to be a viable option, we have to, in addition to the steps in my comment to #1330,
(Might this perhaps also explain #1037 and #2237, if files had Windows file endings…?) I think this is what I need to make progress on my project! Hopefully some of this analysis will be useful in a non-Cygwin pure-OCaml for Windows approach! Super duper thank you to @mjambon and @mschwager and everyone for being so patient and generous with your time and effort! I am truly in your debt. |
I posted a reply there that hopefully is both correct and also useful to the OCaml community. In researching it, I realized that as an alternative to creating a |
Excellent work, @fasiha! |
Great work @fasiha! Thanks for looking so deeply into this, you're making Semgrep better for everyone 🎉
As you mentioned, it seems we can leverage
Supporting Windows line endings seems like a quick fix we could make. That would make downstream maintainers lives much easier and limit the "one-off steps" necessary to get these binaries running on Windows environments. To clarify this task a bit, can we consider this issue fixed when we support Windows line endings? |
If you'd like to support both line endings in the same binary, that would be very cool! But this behavior might be "correct": we might be seeing this behavior only because the binary was built in Cygwin, so it only supports Unix line endings, similar to how it would support only Windows line endings if it was built as a native-Windows executable? If this is the case, I think you can close this issue right now, since it's caused by an "incorrect" use of the binary (built on Cygwin, running on Windows). Sidenote. A bit frustratingly for me (and my stupid Cygwin use case 😅)—the files are created by ruamel, which is doing the right thing by emitting Windows line endings: in
My patch for Cygwin support needs to be pretty ugly 😕. I hope we can get a Windows-native build soon! |
Thanks for all the work you've already done on this! I'm looking into whether we can support the line endings. Would it be possible for you to attach a file that exhibits what you're describing (both line endings in the same binary)? |
Here's a Python script that I saved as rules = """rules:
- id: 0.-
pattern: eval(...)
severity: ERROR
languages:
- py
message: <internalonly>
"""
target = """test.py"""
def save_windows_lineends(contents: str, filename: str):
new_contents = '\r\n'.join(contents.splitlines())
open(filename, 'wb').write(bytes(new_contents, 'ascii'))
save_windows_lineends(rules, 'rules.txt')
save_windows_lineends(target, 'target.txt')
eval('1+2') After I run the script, if I invoke semgrep-core like so on Windows:
If I convert from Win to Unix line endings, it works:
Attached is a ZIP file containing the Cygwin-built binaries and required Cygwin DLLs. If you have a Windows machine or download a Windows 10 evaluation image from Microsoft, you should be able to uncompress this and run Hmm, very curious. I just noticed that if I invoke semgrep-core on macOS on the rules file with Windows line endings, it runs fine!? I'm actually not surprised, I'd expect a nice language like OCaml to know how to deal with line endings, but it really confuses me that the Cygwin–Windows build chokes on Windows line endings but the macOS build doesn't? If it's difficult to reproduce this issue in macOS/Linux, please feel free to close, it must be something really weird about Windows, either the build or the runtime. |
Please forgive me for posting this minor observation in this thread instead of #1330—I'm unsure how much of it is related to Windows vs the Cygwin build I'm putting together, so I decided to try here to keep the thread intact. Recall above our discovery that the Cygwin build of semgrep-core needs (1) a I run the following, and get a basic error:
Adding logging, I see that this is the command that Python is running (newlines added for clarity):
which returns 0 but which prints the following to stdout (whitespace added for clarity):
Notice this However, if I remove the
I see in I noticed that running the raw semgrep-core command above results in an empty file in the parsing cache directory called Just a minor issue I thought to document that might be handy in the eventual native-Windows build (or might be handy to someone like me creating a Cygwin version) 😄. |
Thanks for the detailed report, @fasiha. I think I know what causes |
Hi Semgrep friends, I'm running into another interesting error with my Cygwin-compiled Windows build of semgrep-core, specifically with multiline patterns. The following config: rules:
- id: testmodule.Test
languages: [python]
message: Oops...
severity: WARNING
patterns:
- pattern-inside: |
$OBJECT = testmodule.Test(...)
...
$OBJECT.do_something(...)
- pattern: $OBJECT.do_something(..., b=$ARG, ...) runs fine on macOS (of course 😅) but on my Windows build, I see the following error, with whitespace added for ease of reading:
Notice the "Pattern could not be parsed as a Python semgrep pattern" error. Semgrep converted the above into the following
And here's the rules:
- id: 0..0
pattern: |
$OBJECT = testmodule.Test(...)
...
$OBJECT.do_something(...)
severity: WARNING
languages:
- python
message: <internalonly>
- id: 0..1
pattern: $OBJECT.do_something(..., b=$ARG, ...)
severity: WARNING
languages:
- python
message: <internalonly> That
It looks like semgrep-core is parsing the Is semgrep-core saying the pattern is wrong when it says I was wondering if you had any idea what might be causing this
Hopefully the above made sense, forgive me for including so many details, I'm hoping someone can think of another tip. Thanks for your patience with me and my weird Cygwin-based Semgrep build! I'm happy to share a zip file of a conda tarball that you can install on Windows to experiment with this. Edit Adding a
|
A quick update on the multiline pattern problem described above. This happens because pfff's Parse_python.ml opens a temp file and write the pattern to it, but it uses $ ocaml
OCaml version 4.10.2
# let chan = open_out "foo.py" in (output_string chan "a\nb\nc\nd!"; close_out chan);;
- : unit = ()
#
$ xxd foo.py
00000000: 610d 0a62 0d0a 630d 0a64 21 a..b..c..d! You see Therefore, when
A stupid solution appears to be, in A second solution appears to be, in the beginning of I think for my Windows version I'll go with the second solution, but do let me know if you think I ought to raise this question with OCaml Discourse. I can also try and use native Windows OCaml to see if it has the same issue. |
I asked on OCaml Discourse. Someone else might chime in but for now I'm afraid the idea seems to be that we'll have to convert all input channels in Semgrep to binary channels for an eventual Windows port 😕. It's not a Cygwin-only issue but rather a Windows issue with text mode: ocaml/ocaml#9868. |
@fasiha great investigation. I think you should try replacing all instances of |
Ok! Will try this and let you know what happens 💪🤞! |
I went ahead and changed the ocaml code in pfff and semgrep-core so that all files are opened in binary mode, both for writing and for reading (#3663). I hope this will just fix a bunch of problems that are only seen on Windows ( |
Very nice, I'm glad to see all checks passing in the PR! I should have mentioned it earlier: when I ran the tests on semgrep before making this change that you suggested, on Windows, a few weeks ago, I saw, as a baseline:
Then I changed all
I'll clone your PR and see if I can run tests on that in Windows. But it's very gratifying that this change doesn't seem to break anything on Unix! |
I'm also curious to see what we get now. |
See original report.
The text was updated successfully, but these errors were encountered: