Skip to content
This repository has been archived by the owner. It is now read-only.

selenium-standalone-server.jar v2.44.0 used in grid hub/nodes encode certain HTML characters differently in AWS Linux #8487

Closed
lukeis opened this issue Mar 4, 2016 · 12 comments

Comments

@lukeis
Copy link
Member

@lukeis lukeis commented Mar 4, 2016

Originally reported on Google Code with ID 8487

What steps will reproduce the problem?

Environment:
Selenium Grid setup in AWS
Hub/Nodes are all running in AWS
OS Linux: 3.10.42-52.145.amzn1.x86_64
Nodes are configured to run FF v31
Selenium version:v.2.44.0

All worked without any issues as long as I was using selenium-standalone-server.jar
v2.43.1
I have around 20 nodes each running 20 FFv31

I upgraded to v2.44.0 and high proportion of cases started failing.
Upon debugging, I found the source code of the pages opened in FF, certain characters
in HTML like < , > and = were changed to some weird characters like @ P or some junk
characters. This cause invalid multibyte char (UTF-8) issue + finding elements in the
page failed as well.

I rolled back to v2.43.1 and all started to work fine.

Initially i thought this is because of the https://wiki.jenkins-ci.org/display/JENKINS/Selenium+Plugin

I was using to manage the grid setup, but thats not the case. It is to do with v2.44.0
of selenium-standalone-server.jar

This is causing UI automation to fails because 
a) finding elements ,some using regexp failing
b) causing Invalid Multibyte Char (UTF-8) issue which fails the automation.

I looked at the CHANGELOG between 2.43.1 and 2.44 and this hits my eye.
"Moving from org.json to gson because the license. Fixes issue 7956"

I believe hub/node talks via json and this change could be something to do with the
issue i am seeing.

Let me know if you need any other information. This is easy to reproduce.

Reported by eelam.ragavan on 2015-02-13 11:29:41

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

What version of Maven are you working with ? Can you please try using Maven3 and see
if that helps ?

Reported by krishnan.mahadevan1978 on 2015-02-13 13:42:58

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

Run the hub and a node with -debug option, save the detailed log to the file and attach
it here.

Reported by barancev on 2015-02-13 20:41:56

  • Status changed: NeedsClarification
@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

@Krishnan
I compiled the plugin using the following version of maven
$ mvn --version
Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 13:51:28+0000)
Maven home: /usr/local/apache-maven-3.0.5
Java version: 1.7.0_45, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.10.1", arch: "x86_64", family: "mac"
@Barancev
I'll get you the details as soon as I possible

Reported by eelam.ragavan on 2015-02-14 22:56:07

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

I was going through the gson code
http://google-gson.googlecode.com/svn/trunk/gson/src/main/java/com/google/gson/stream/JsonWriter.java

HTML_SAFE_REPLACEMENT_CHARS['<'] = "\\u003c";
    HTML_SAFE_REPLACEMENT_CHARS['>'] = "\\u003e";
    HTML_SAFE_REPLACEMENT_CHARS['&'] = "\\u0026";
    HTML_SAFE_REPLACEMENT_CHARS['='] = "\\u003d";
    HTML_SAFE_REPLACEMENT_CHARS['\''] = "\\u0027";
....
....
...
  /**
   * Configure this writer to emit JSON that's safe for direct inclusion in HTML
   * and XML documents. This escapes the HTML characters {@code <}, {@code >},
   * {@code &} and {@code =} before writing them to the stream. Without this
   * setting, your XML/HTML encoder should replace these characters with the
   * corresponding escape sequences.
   */
  public final void setHtmlSafe(boolean htmlSafe) {
    this.htmlSafe = htmlSafe;
  }

https://google-gson.googlecode.com/svn/trunk/gson/src/main/java/com/google/gson/Gson.java
htmlSafe is true by default true in Gson constructor.
I'll stop here, and get the logs...

Reported by eelam.ragavan on 2015-02-15 00:51:08

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

i have attached the selenium grid node debug log

Reported by eelam.ragavan on 2015-03-06 14:15:50


- _Attachment: [selenium_grid_node_logs.txt.gz](https://storage.googleapis.com/google-code-attachments/selenium/issue-8487/comment-6/selenium_grid_node_logs.txt.gz)_
@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

it would be great if this can be fixed as soon as possible. I see similar issues reported
bby others as well 
http://stackoverflow.com/questions/4147012/can-you-avoid-gson-converting-and-into-unicode-escape-sequences

Reported by eelam.ragavan on 2015-03-06 14:24:17

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

And what command parameters in this log file are not properly encoded?

Reported by barancev on 2015-03-08 20:33:06

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

It appears correctly in the grid node debug log

</head>\n<body id=\"flightStubPopulator\">\n\n<form method=\"POST\" action=\"/tools/stub/flightStubPopulator?langid=10\"
name=\"tripDetails\" id=\"tripDetails\"><div class=\"navbar navbar-fixed-top\">\n

But when i print the html source, the following characters are shown as junk characters:
< , > , =, &, \

The above piece of html appears like this when i print:

@form method@"POST" action@"/tools/stub/flightStubPopulator?langid@10" name@"tripDetails"
id@"tripDetails"@@div class@"navbar navbar-fixed-top"@

Its very clear from analysing the html source, only the above 5 characters are changed
and gson also changes these 5 characters for some reason!

Reported by eelam.ragavan on 2015-03-09 17:01:49

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

I cloned v2.44.0 from https://github.com/SeleniumHQ/selenium and changed the following
file:
java/client/src/org/openqa/selenium/remote/BeanToJsonConverter.java 
Line No: 65
Changed
return new GsonBuilder().serializeNulls().create().toJson(json);
to 
return new GsonBuilder().disableHtmlEscaping().serializeNulls().create().toJson(json);

compiled and used it in my grid setup, this issue is no longer happening!

Reported by eelam.ragavan on 2015-03-10 22:45:02

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

Fixed by commit 70cc86240794b3ad9d041b4cdd4513196bafc58c. Thanks!

Reported by barancev on 2015-03-20 20:35:05

  • Status changed: Fixed
@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

thanks @barancev

Reported by eelam.ragavan on 2015-03-22 18:00:57

@lukeis

This comment has been minimized.

Copy link
Member Author

@lukeis lukeis commented Mar 4, 2016

Reported by luke.semerau on 2015-09-17 18:25:11

  • Labels added: Restrict-AddIssueComment-Commit
@lukeis lukeis closed this Mar 4, 2016
@SeleniumHQ SeleniumHQ locked and limited conversation to collaborators Mar 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant
You can’t perform that action at this time.