Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaisyDiff breaks with "There is no Atom with index" when strings have numbers #7

Closed
GoogleCodeExporter opened this issue Apr 23, 2016 · 4 comments

Comments

@GoogleCodeExporter
Copy link

When I run this piece of code, it breaks with 
"java.lang.IndexOutOfBoundsException: There is no Atom with index 3". I am 
using DaisyDiff 1.0 to compile and run this code.

<code>
        String contentOld = "festive 8";
        String contentNew = "What was your highlight of 8?";

        StringWriter writer = new StringWriter(contentNew.length() + 
contentOld.length());
        SAXTransformerFactory tf = (SAXTransformerFactory) 
TransformerFactory.newInstance();
        TransformerHandler result = tf.newTransformerHandler();
        result.setResult(new StreamResult(writer));
        ContentHandler postProcess = (ContentHandler) result;
        postProcess.startDocument();
        postProcess.startElement("", "diffreport", "diffreport", new 
AttributesImpl());
        postProcess.startElement("", "diff", "diff", new AttributesImpl());

        DaisyDiff.diffTag(contentOld, contentNew, postProcess);

        postProcess.endElement("", "diff", "diff");
        postProcess.endElement("", "diffreport", "diffreport");
        postProcess.endDocument();
        System.out.println(writer.getBuffer().toString());
</code>

Original issue reported on code.google.com by diptansh...@gmail.com on 11 Feb 2009 at 4:26

@GoogleCodeExporter
Copy link
Author

Thanks for reporting.

The code works for me when I change the first lines to
        String contentOld = "<html><body>festive 8</body></html>";
        String contentNew = "<html><body>What was your highlight of 8?</body></html>";

Keep in mind that DaisyDiff was created to compare XHTML and that the behaviour 
for
other input is undefined.

Does this help you or do you believe DaisyDiff should support plain text?

Original comment by guy...@gmail.com on 17 Feb 2009 at 4:43

  • Changed state: WontFix

@GoogleCodeExporter
Copy link
Author

This is not the ideal solution I was looking for. Padding texts with html tags 
should 
not be mandatory. 

Although DaisyDiff was created to compare XHTML, I think it should be extended 
to 
handle textual inputs too (just a suggestion). A classic case where this can 
come in 
handy is where you have two different stories to compare and you use this 
library 
from a custom tag to compare the author, created date, title, summary etc. of 
the two 
stories.

Original comment by diptansh...@gmail.com on 17 Feb 2009 at 5:09

@GoogleCodeExporter
Copy link
Author

BTW, I got it to work using the following piece of code.

<code>
    String contentOld = "festive 8";
    String contentNew = "What was your highlight of 8?";
    getDiffForHTMLInput(contentOld, contentNew);

    private String getDiffForHTMLInput ( String contentOld, String contentNew ) 
throws Exception
    {
        contentOld = contentOld == null ? "" : contentOld;
        contentNew = contentNew == null ? "" : contentNew;
        StringWriter writer = new StringWriter(contentNew.length() + 
contentOld.length());
        List styleList = new ArrayList();
        styleList.add("/static/js/difftag/css/difftag.css");

        SAXTransformerFactory tf = (SAXTransformerFactory) 
TransformerFactory.newInstance();
        TransformerHandler result = tf.newTransformerHandler();
        result.setResult(new StreamResult(writer));

        ContentHandler postProcess = result;
        Locale locale = Locale.getDefault();
        String prefix = "diff";
        HtmlCleaner cleaner = new HtmlCleaner();
        InputSource oldSource = new InputSource(new 
ByteArrayInputStream(contentOld.getBytes("UTF-8")));
        InputSource newSource = new InputSource(new 
ByteArrayInputStream(contentNew.getBytes("UTF-8")));
        DomTreeBuilder oldHandler = new DomTreeBuilder();
        cleaner.cleanAndParse(oldSource, oldHandler);
        TextNodeComparator leftComparator = new TextNodeComparator(oldHandler, 
locale);
        DomTreeBuilder newHandler = new DomTreeBuilder();
        cleaner.cleanAndParse(newSource, newHandler);
        TextNodeComparator rightComparator = new TextNodeComparator(newHandler, 
locale);
        postProcess.startDocument();
        postProcess.startElement("", "diffreport", "diffreport", new 
AttributesImpl());
        attachStyleSheets(styleList, postProcess);
        postProcess.startElement("", "diff", "diff", new AttributesImpl());
        HtmlSaxDiffOutput output = new HtmlSaxDiffOutput(postProcess, prefix);
        HTMLDiffer differ = new HTMLDiffer(output);
        differ.diff(leftComparator, rightComparator);
        postProcess.endElement("", "diff", "diff");
        postProcess.endElement("", "diffreport", "diffreport");
        postProcess.endDocument();
        return writer.getBuffer().toString();
    }


    private void attachStyleSheets ( List styles, ContentHandler handler ) throws 
SAXException
    {
        handler.startElement("", "css", "css", new AttributesImpl());
        for (Iterator i = styles.iterator(); i.hasNext(); handler.endElement("", 
"link", "link"))
        {
            String cssLink = (String) i.next();
            AttributesImpl attr = new AttributesImpl();
            attr.addAttribute("", "href", "href", "CDATA", cssLink);
            attr.addAttribute("", "type", "type", "CDATA", "text/css");
            attr.addAttribute("", "rel", "rel", "CDATA", "stylesheet");
            handler.startElement("", "link", "link", attr);
        }
    }

</code>

Original comment by diptansh...@gmail.com on 17 Feb 2009 at 5:12

@GoogleCodeExporter
Copy link
Author

Btw, if you want to diff plain text then I can recommend
http://code.google.com/p/google-diff-match-patch/ . If you have the time I'd be 
very
happy to accept patches that add support for plain text.

Valid XML should have a single root element (like <html>) so anything else is 
not
valid input and you should indeed wrap your snippets.

Original comment by guy...@gmail.com on 17 Feb 2009 at 5:31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant