Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Evaluator.AttributeWithValue #474

Closed
savatgithub opened this issue Oct 15, 2014 · 14 comments
Closed

Bug in Evaluator.AttributeWithValue #474

savatgithub opened this issue Oct 15, 2014 · 14 comments
Labels
needs-more-info More information is needed from the reporter to progress the issue no-repro

Comments

@savatgithub
Copy link

final String turkishCapital_I = "İ";
final String html = "<a title=" + turkishCapital_I + " />";
final String selector = "a[title=" + turkishCapital_I + "]";
final Elements elements = Jsoup.parse(html).select(selector);
System.out.println("elements=" + elements.size());

It should print "elements=1" but prints "elements=0".

Bug is in Evaluator.AttributeWithValue. https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/select/Evaluator.java. Constructor stores the "value" in lower case and AttributeWithValue.matches compares it using equalsIgnoreCase. This would fail for some characters unless proper locale is used in toLowerCase. For example:

final String i = "İ";
System.out.println(i.equalsIgnoreCase(i.toLowerCase())); // false
System.out.println(i.equalsIgnoreCase(i.toLowerCase(Locale.forLanguageTag("tr-TR")))); // true

Since the intent is to compare case-insensitive, a fix might be to not use equalsIgnoreCase in AttributeWithValue.matches but use equal and toLowerCase. That is:

value.equals(element.attr(key).toLowerCase()); // value is lower case already

AttributeWithValueNot may suffer from the same issue.

For some reason, it works on http://try.jsoup.org. That might be a separate issue as to why it works there.

@savatgithub
Copy link
Author

Any chance we may get it in 1.8.2 release?

@jasdeep
Copy link

jasdeep commented Oct 16, 2014

@savatgithub

I added following test to ElementsTest in the latest code:

@Test public void filterInternationalization() {
    final String turkishCapital_I = "İ";
    String h = "<p>Excl</p><div class=headline><a title="+turkishCapital_I+" >Hello</a></div>";
    final String selector = "a[title=" + turkishCapital_I + "]";
    Document doc = Jsoup.parse(h);
    Elements els = doc.select(selector);
    assertEquals(1, els.size());
    assertEquals("Hello", els.get(0).text());
}

It is working perfectly fine.

@savatgithub
Copy link
Author

@jasdeep

I ran your test code in 1.8.1 and master branches. assertEquals(1, els.size()) throws AssersionError.

java.lang.AssertionError: expected:[1] but was:[0]
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at org.jsoup.select.ElementsTest.filterInternationalization(ElementsTest.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

@jasdeep
Copy link

jasdeep commented Oct 16, 2014

It is working fine for me, even after I changed the system locale to Turky. Here is the output:

[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building jsoup 1.8.1
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.4:resources (default-resources) @ jsoup ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) @ jsoup ---
[INFO] Compiling 6 source files to D:\dumps\jsoup-jsoup-1.8.1.a\target\classes
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.9:check (animal-sniffer) @ jsoup ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java15:1.0
[INFO] 
[INFO] --- maven-bundle-plugin:2.1.0:manifest (bundle-manifest) @ jsoup ---
[WARNING] Warning in manifest for org.jsoup:jsoup:jar:1.8.1 : Superfluous export-package instructions: [org]
[INFO] 
[INFO] --- maven-resources-plugin:2.4:testResources (default-testResources) @ jsoup ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 14 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.0.2:testCompile (default-testCompile) @ jsoup ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.7.2:test (default-test) @ jsoup ---
[INFO] Surefire report directory: D:\dumps\jsoup-jsoup-1.8.1.a\target\surefire-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.jsoup.select.ElementsTest
Tests run: 29, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.207 sec

Results :

Tests run: 29, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.106s
[INFO] Finished at: Thu Oct 16 14:32:50 IST 2014
[INFO] Final Memory: 6M/20M
[INFO] ------------------------------------------------------------------------

@savatgithub
Copy link
Author

With locale set to "tr-TR" the test will pass. The test should pass irrespective of locale.

@jhy
Copy link
Owner

jhy commented Jul 6, 2015

The test @jasdeep wrote works for me to. My locale is set to us_US.

@savatgithub can you give a junit test case, your java version, and your locale, so we can see why you're able to repro but others aren't (I.e me, jasdeep, and try.jsoup)

@jhy jhy added no-repro needs-more-info More information is needed from the reporter to progress the issue labels Jul 6, 2015
@savatgithub
Copy link
Author

@jhy the test code is same as in the original post which @jasdeep converted to a junit test code. My Java version is 1.7.0_01, system locale is us_US. Dev environment is Eclipse on Windows 8.1.

can you see if you're using Java 6 or 7? http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/

@savatgithub
Copy link
Author

@jhy additionally can you run the following test code on your dev environment. These should assert.
@test public void trI1()
{
final String i = "İ";
assertTrue(i.equalsIgnoreCase(i.toLowerCase()));
// should be false
}
@test public void trI1()
{
final String i = "İ";
assertFalse(i.equalsIgnoreCase(i.toLowerCase(Locale.forLanguageTag("tr-TR"))));
// should be true
}

@jasdeep
Copy link

jasdeep commented Jul 7, 2015

@savatgithub
I could reproduce the issue on eclipse, looks like an eclipse configuration issue.
Since when I ran the test from maven, it worked fine.

mvn -Dtest=ElementsTest test

@savatgithub
Copy link
Author

@jasdeep i can reproduce from command line outside of eclipse. What is your OS?

@jasdeep
Copy link

jasdeep commented Jul 7, 2015

@savatgithub It is Windows 7.

@jasdeep
Copy link

jasdeep commented Jul 8, 2015

@savatgithub Got it reproduced on Java 8, worked fine with Java 6

@jhy
Copy link
Owner

jhy commented Jul 8, 2015 via email

@jhy
Copy link
Owner

jhy commented Jun 27, 2017

This is fixed with @cketti's (super robust) commit. Thanks!

@jhy jhy closed this as completed Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-info More information is needed from the reporter to progress the issue no-repro
Projects
None yet
Development

No branches or pull requests

3 participants