Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regexp problem with unterminated strings #89

Closed
ghost opened this issue Oct 4, 2015 · 6 comments
Closed

Regexp problem with unterminated strings #89

ghost opened this issue Oct 4, 2015 · 6 comments

Comments

@ghost
Copy link

ghost commented Oct 4, 2015

[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)

The regular expression provided in the guide is really impressive, however, when the input string is "123, it will filter out the first double quote. So how can we detect the unmatched " then?

I found several implementations, which is using regular expressions to do lexical analysis, have this problem, including C, Go and OCaml.

Mal [c]
user> "123
123
@kanaka
Copy link
Owner

kanaka commented Oct 5, 2015

Okay, I think adding a question mark to the final double-quote should do the trick. Then the string check in read_atom needs raise an exception if the first character is a double-quote but the last is not. I think that will work as long as the language regex engine is properly greedy.

[\s,]*(~@|[\[\]{}()'`~^@]|"(?:[\\].|[^\\"])*"?|;.*|[^\s\[\]{}()'"`@,;]+)

Sound reasonable? I probably won't be able to get to this for a while. Feel free to send me a PR if you feel up to it :-)

@ghost
Copy link
Author

ghost commented Oct 5, 2015

Yeah, using that new one works, at least for Common Lisp. I believe all perl regexp compatible library should just work.

@kanaka
Copy link
Owner

kanaka commented Jan 24, 2019

Now that #90 is implemented, I'm finally getting back around to this. I pushed a "test_unclosed_string" branch with step1 tests to catch this. Here is the list of broken implementations and whether a fix has been implemented yet:

  • awk
  • basic
  • c
  • clojure
  • crystal
  • cs
  • d
  • dart
  • factor
  • fantom
  • go
  • groovy
  • haxe
  • hy
  • io
  • java
  • julia
  • kotlin
  • livescript
  • logo
  • make
  • matlab
  • miniMAL
  • nim
  • objc
  • objpascal
  • ocaml
  • php
  • plpgsql
  • plsql
  • powershell
  • r
  • racket
  • rexx
  • rpython
  • ruby
  • scala
  • skew
  • swift
  • swift3
  • tcl
  • ts
  • vb
  • vhdl
  • wasm
  • yorick

@wasamasa
Copy link
Collaborator

Hm, I've implemented a check in my readers by testing for the first char of the token and if it's a double quote, test the last char of the token as well. Depending on that check either an error is thrown or a string object is returned.

@kanaka
Copy link
Owner

kanaka commented Jan 25, 2019

@wasamasa yeah, there are a number of implementations where the check just doesn't match the error text. The fixes are pretty simple. I was just planning to do them myself instead of creating a branch for fixes since the fixes tend to take less than a minute per implementation. FYI, the run with the fixed up test is here: https://travis-ci.org/kanaka/mal/builds/484027719

@kanaka
Copy link
Owner

kanaka commented Jan 25, 2019

Okay, I pushed fixes for the 47 implementations that were complaining. We'll see if everything passes before closing: https://travis-ci.org/kanaka/mal/builds/484570489

@kanaka kanaka closed this as completed Jan 28, 2019
@kanaka kanaka changed the title Regexp problem Regexp problem with unterminated strings Jan 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants