Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure to detect closed tags in fixDesc #103

Merged
merged 1 commit into from
Mar 29, 2021

Conversation

salty-horse
Copy link
Collaborator

The function fixDesc is used when truncating reviews in order to embed them in other pages.
For example, the main page has review snippets in the "New on IFDB" section.

The function tries to detect opening and closing of HTML tags, and if any are open at the point of truncation, it adds closing tags.

Due to a bug, it fails to properly detect closing tags, and ends up with this behavior:

$ php -a
php > include_once "www/util.php";
php > echo fixDesc("hello <I>world</I> bla bla");
hello <I>world</I> bla bla</i>

The expected output is hello <I>world</I> bla bla.

This dangling </i> is currently causing a small markup error on the main page, with a review that uses italics.

The code tracks opened tags in the $tagStack array, and uses $tagSp to indicate the index of the head of the stack.
When the scanner encounters a closing tag, it tries to match it with an opening tag and remove it from the stack. But the condition in the loop while ($tagSp > 0) is wrong, and ignores cases when the stack has a single item.

@dfabulich
Copy link
Collaborator

Great catch! I'm pretty sure this issue has been causing a number of non-critical issues in the RSS feeds. (We're using fixDesc to generate non-valid HTML, then XML encoding it and sending it to users.)

What you're saying here makes sense to me, but this code is super complicated and has no tests, not even a list of manual test cases.

I wonder if we could develop a test plan for this function that covers all of its major branches?

Alternately, maybe we should just merge this as is? @lkcampbell do you have thoughts?

@salty-horse
Copy link
Collaborator Author

I can write tests, but it would require me to try to setup and run an IFDB instance. I only tested the code in the PHP interactive shell.

I'll get to it in a few days, if you don't merge it by then.

@dfabulich
Copy link
Collaborator

Before you "write tests," just coming up with a list of test cases would be great. e.g. a list of input values and expected output values.

(We don't even have any kind of unit test system in place, so we'd have to figure that out before you could write/add tests!)

@salty-horse
Copy link
Collaborator Author

Here are a few tests. Not really exhaustive and I didn't dig into the various edge cases, such as <br> and <spoiler> handling, and the special modes of operation (Spoiler, RSS, Ific).

include_once "www/util.php";

$tests = [
    # Do nothing
    ["hello world", "hello world"],
    ["hello <I>world</I>", "hello <I>world</I>"],
    ["hello <I>world</I> <I>bla</i> bla", "hello <I>world</I> <I>bla</i> bla"],
    ["hello <b><I>world</I></b> <I>bla</i> bla", "hello <b><I>world</I></b> <I>bla</i> bla"],
    ["hello <b><I><u>world</u></I></b> <I>bla</i> bla", "hello <b><I><u>world</u></I></b> <I>bla</i> bla"],

    # Add tags at the end
    ["hello <I>world bla bla", "hello <I>world bla bla</i>"],
    ["hello <I><b>world</b> bla bla", "hello <I><b>world</b> bla bla</i>"],
    ["hello <I><u><b>world</b> bla bla", "hello <I><u><b>world</b> bla bla</u></i>"],

    # Tags in the wrong order -- this is just documenting the current behavior. I don't know if it's desired.
    # It's probably not this function's job to fix bad HTML
    ["hello <I><B>world</I></B>", "hello <I><B>world</b></I></B>"],

    # Unknown tag
    ["hello <unknown><I>world</I>", "hello &lt;unknown&gt;<I>world</I>"],

    # Removed spoiler tag (in regular mode)
    ["hello <spoiler>secret</spoiler> <I>world</I>", "hello  <I>world</I>"],

    # Replaced spoiler tag (in RSS mode)
    ["hello <spoiler>secret</spoiler> <I>world</I>", "hello [spoilers] <I>world</I>", FixDescRSS],
];

foreach ($tests as &$test) {
    $output = fixDesc($test[0], count($test) == 3 ? $test[2] : 0);
    if ($output != $test[1]) {
        echo "Failed test.\nInput: {$test[0]}\nOutput: {$output}\nExpected: {$test[1]}";
    }
}

@lkcampbell
Copy link
Collaborator

@salty-horse agreed, great catch on this!

@dfabulich, tough call, but I suggest a conservative approach. Although it's a very small fix, it is in fixDesc(), which is a complex function that is used all over the website. I'm seeing a lot of testing paths through that one function. I'm not clear what even a small fix to fixDesc() might mean to the rest of the site.

If this problem is not causing any critical bugs, we would probably be better off hanging onto this pull request until we have some better testing in place. The test cases above that @salty-horse provided is certainly a great start.

Copy link
Collaborator

@lkcampbell lkcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the discussion thread.

@dfabulich
Copy link
Collaborator

My inclination is to merge this and check in this test as a manual test suite for fixDesc. If it introduces a regression, we can rollback easily, and we can also roll forward by adding new tests and fixing the ones that are there.

@dfabulich dfabulich mentioned this pull request Mar 29, 2021
@dfabulich dfabulich merged commit 4578a0f into iftechfoundation:main Mar 29, 2021
@salty-horse salty-horse deleted the fixdesc_extra_tag branch March 29, 2021 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants