Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all unicodes are escaped, even when using alwaysEscapeUnicode: Boolean = true #750

Closed
toshetah opened this issue Jan 14, 2021 · 7 comments · Fixed by #752
Closed

Not all unicodes are escaped, even when using alwaysEscapeUnicode: Boolean = true #750

toshetah opened this issue Jan 14, 2021 · 7 comments · Fixed by #752

Comments

@toshetah
Copy link

json4s version 3.7.0-M7

scala version 2.12.12

jdk version 1.8

This is how to reproduce:

implicit val customJsonFormats = new DefaultFormats {
  override def alwaysEscapeUnicode: Boolean = true
}

println(pretty(render(JString("€"))))
@magnolia-k
Copy link
Contributor

The native version and the jackson version behave differently.

scala> import org.json4s._
import org.json4s._

scala> import org.json4s.native.JsonMethods._
import org.json4s.native.JsonMethods._

scala> implicit val customJsonFormats = new DefaultFormats {
     |   override def alwaysEscapeUnicode: Boolean = true
     | }
     |
val customJsonFormats: org.json4s.DefaultFormats = $anon$1@4137e49

scala> println(pretty(render(JString(""))))
     |
"\u20ac"
scala> import org.json4s._
import org.json4s._

scala> import org.json4s.jackson.JsonMethods._
import org.json4s.jackson.JsonMethods._

scala> implicit val customJsonFormats = new DefaultFormats {
     |   override def alwaysEscapeUnicode: Boolean = true
     | }
val customJsonFormats: org.json4s.DefaultFormats = $anon$1@90e49f9

scala> println(pretty(render(JString(""))))
""

@magnolia-k
Copy link
Contributor

In the current Json4s, the setting to escape UNICODE is not enabled on the jackson module.

The following code will solve the problem.

scala> import org.json4s._
import org.json4s._

scala> import org.json4s.jackson.JsonMethods._
import org.json4s.jackson.JsonMethods._

scala> implicit val customJsonFormats = new DefaultFormats {
     |   override def alwaysEscapeUnicode: Boolean = true
     | }
val customJsonFormats: org.json4s.DefaultFormats = $anon$1@1a26022e

scala> import com.fasterxml.jackson.core.json._
import com.fasterxml.jackson.core.json._

scala> org.json4s.jackson.JsonMethods.mapper.getFactory().configure(JsonWriteFeature.ESCAPE_NON_ASCII.mappedFeature(), true);
val res1: com.fasterxml.jackson.core.JsonFactory = com.fasterxml.jackson.databind.MappingJsonFactory@58e3ef2d

scala> println(pretty(render(JString(""))))
"\u20AC"

In the render method of json4s-jackson, code should be added to determine whether or not to set JsonWriteFeature.ESCAPE_NON_ASCII based on the value of alwaysEscapeUnicode.

@magnolia-k
Copy link
Contributor

magnolia-k commented Jan 16, 2021

Note that the current € symbol is always escaped, regardless of the setting.

This feature is probably not appropriate.

scala> import org.json4s._
import org.json4s._

scala> import org.json4s.native.JsonMethods._
import org.json4s.native.JsonMethods._

scala> implicit val customJsonFormats = new DefaultFormats {
     | override def alwaysEscapeUnicode: Boolean = false
     | }
val customJsonFormats: org.json4s.DefaultFormats = $anon$1@67612481

scala> println(pretty(render(JString(""))))
"\u20ac"

The code to determine the target of escaping is as follows, but it is excessive.

(c >= '\u0000' && c <= '\u001f') || (c >= '\u0080' && c < '\u00a0') || (c >= '\u2000' && c < '\u2100')

@toshetah
Copy link
Author

In the current Json4s, the setting to escape UNICODE is not enabled on the jackson module.

The following code will solve the problem.

scala> import org.json4s._
import org.json4s._

scala> import org.json4s.jackson.JsonMethods._
import org.json4s.jackson.JsonMethods._

scala> implicit val customJsonFormats = new DefaultFormats {
     |   override def alwaysEscapeUnicode: Boolean = true
     | }
val customJsonFormats: org.json4s.DefaultFormats = $anon$1@1a26022e

scala> import com.fasterxml.jackson.core.json._
import com.fasterxml.jackson.core.json._

scala> org.json4s.jackson.JsonMethods.mapper.getFactory().configure(JsonWriteFeature.ESCAPE_NON_ASCII.mappedFeature(), true);
val res1: com.fasterxml.jackson.core.JsonFactory = com.fasterxml.jackson.databind.MappingJsonFactory@58e3ef2d

scala> println(pretty(render(JString(""))))
"\u20AC"

In the render method of json4s-jackson, code should be added to determine whether or not to set JsonWriteFeature.ESCAPE_NON_ASCII based on the value of alwaysEscapeUnicode.

You can see in your example that it doesn't work. Basically what I understand from your answers, is that it is basically impossible at the moment with json4s native. Only using jackson.

@toshetah
Copy link
Author

Is there any plan to fix this?

magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
magnolia-k added a commit to magnolia-k/json4s that referenced this issue Jan 17, 2021
fix json4s#750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.
@magnolia-k
Copy link
Contributor

I made PR.

#752

this PR is merged, escaping in Jackson should work correctly.

mccartney added a commit that referenced this issue Jan 22, 2021
fix #750

In Jackson, the render method did not reference the value of Formats.alwaysEscapeUnicode.
Therefore, it was not in line with the native behavior.

Fix this problem.

Also, Jackson always uses uppercase when escaping UNICODE.
To make the results consistent, the native module now also outputs in uppercase.

Co-authored-by: Greg Oledzki <mccartney@users.noreply.github.com>
@toshetah
Copy link
Author

Thank you! When is that going to be released?

MaxGekk pushed a commit to apache/spark that referenced this issue May 26, 2021
### What changes were proposed in this pull request?
This PR aims to upgrade json4s from   3.7.0-M5  to 3.7.0-M11

Note: json4s version greater than 3.7.0-M11 is not binary compatible with Spark third party jars

### Why are the changes needed?
Multiple defect fixes and improvements  like

json4s/json4s#750
json4s/json4s#554
json4s/json4s#715

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran with the existing UTs

Closes #32636 from vinodkc/br_build_upgrade_json4s.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants