Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: "Unescaped symbol at position..." on non-latin charsets #1

Open
GoogleCodeExporter opened this issue Apr 1, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Get JSON from 
http://maps.googleapis.com/maps/api/distancematrix/json?origins=Vancouver+BC|Sea
ttle&destinations=San+Francisco|Victoria+BC&mode=bicycling&language=ru-RU&sensor
=false
2. Parse then with JSON 
3. Get error "Unescaped symbol at position..."

What is the expected output? What do you see instead?
Need support for other charsets in "UnescapeString"

What version of the product are you using? On what operating system?
json-1.4. on WindowsXP SP3.

Please provide any additional information below.
Delphi unit code:
procedure TForm1.Button1Click(Sender: TObject);
var
  sl: TStringList;
  ja: TJSONarray;
  s: String;
  pac: PAnsiChar;
begin
  ja := nil;
  sl := TStringList.Create;
  try
    sl.LoadFromFile('json_ru');
    s := Utf8ToAnsi(sl.Text);
    pac := PAnsiChar(s);
    ja := ParseJSON(pac);
  finally
    if Assigned(ja) then
      ja.Free;
    sl.Free;
  end;
end;


Original issue reported on code.google.com by ruslan.p...@gmail.com on 26 Oct 2012 at 10:27

@GoogleCodeExporter
Copy link
Author

Per RFC4627, this could be solved by detecting the encoding of the JSON string 
before decoding, and converting it to UnicodeString (or WideString if you are 
stuck in the bad old days), and then the issues around encoding go away.  But 
this means you'd need considerable modifications to the parser though as it 
currently assumes ANSIString for memory management.

The alternative is to convert to UTF-8, but as this is not a native Delphi 
string format, you take on the messy task of validating the UTF-8 as you go.

As it stands, both the parser and the generator are non-compliant.

The relevant encoding section of RFC4627:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

Original comment by tavultes...@gmail.com on 11 Aug 2013 at 1:57

  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant