-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Russian text support #29
Comments
This is because the default out formatter encodes any character with a unicode value greater than 160 to [Test]
public void TestRussianText()
{
// Arrange
var s = new HtmlSanitizer();
// Act
var htmlFragment = "Тест";
var outputFormatter = new CsQuery.Output.FormatDefault(DomRenderingOptions.RemoveComments | DomRenderingOptions.QuoteAllAttributes, HtmlEncoders.Minimum);
var actual = s.Sanitize(htmlFragment, "", outputFormatter);
// Assert
var expected = htmlFragment;
Assert.That(actual, Is.EqualTo(expected).IgnoreCase);
} |
I can't repro. The CsQuery docs state:
Which version of CsQuery are you using (1.3.4 here)? |
Yep. I changed all
and
to
And there are 14 failed tests in the Tests.cs. Some of them are dangerous. For example:
|
Since the tests check for exact string equality, some tests will fail if the output formatting is changed but that doesn't automatically mean the output isn't clean. I don't see a XSS problem with the output in the above test. Which other ones do you believe are dangerous? |
@mganss I'm using Sanitizer version 5.0.404 inside a .net core API. |
@RickBlacker This used to be an issue until we switched to AngleSharp years ago. There's no specific configuration necessary in HtmlSanitizr or AngleSharp. It's likely an encoding issue at an earlier stage in your processing pipeline. Can you post an example string and/or code? |
I have a problem with russian text:
Code:
Test result:
The text was updated successfully, but these errors were encountered: