Setting text contents of a node? #8

CrimsonVex · 2015-05-07T12:36:53Z

I'm assuming this wasn't intended, but would it be possible to create a way to set the text contents of a CNode? I'm in a situation where I need to update parts of a DOM on the fly, and I need such a feature.

If it were to be implemented, I'd imagine overloading .text() for a CNode to accept an std::string would work well and be similar to the JQuery function .html().

TechnikEmpire · 2015-05-07T12:45:02Z

This isn't possible, because of the nature of gumbo iteself, all of the node data you're having exposed to you is entirely managed internally by gumbo. If you mess with it at all, you're going to bug out or even crash your program, because you're tampering with the managed memory of another object. This is the very clear contract that gumbo provides to you, that if you want to own things, you need to copy them.

CrimsonVex · 2015-05-07T12:49:34Z

I'd assume then a suitable option is to perform replacements on the original string given to CDocument and re-parse it? (I suppose that's not so bad)

TechnikEmpire · 2015-05-07T13:04:25Z

I made heavy modifications to gumbo-query just to be able to perform the simplest modifications of nodes at a good speed. These modifications included providing a Get() method to expose the underlying gumbo_node of CDocument/CNode. I then wrote several helper functions, the most important one is generating a unique node ID string. Like so:

std::string SerializeUtil::getUniqueNodeId(GumboNode* node)
{
    std::string nodeId = "";

    nodeId.append(std::to_string(node->index_within_parent));

    GumboNode* parent = node->parent;

    while (parent != nullptr)
    {
        nodeId.append(std::to_string(parent->index_within_parent));

        parent = parent->parent;
    }

    return nodeId;
}

Using this unique node ID, I could then keep a map of nodes I wanted to manipulate by storing them in a simple std::unordered_map<std::string, int> object. The INT can be set to an integer that represents what manipulation you wish to have done on the node while it is being rendered. For example, remove, modify so on. Then I heavily modified https://github.com/google/gumbo-parser/blob/master/examples/serialize.cc to take an optional pointer to such maps, so that while it's rendering the GumboOutput back to an HTML string, it can perform modifications (by checking the unique ID of each node as it begins to render it against the unordered_map provided).

So yeah, not too bad, but there is a lot involved to doing these modifications. For me, this approach was necessary because I'm doing modifications to HTML in real-time as users browse, so speed was of the utmost importance.

CrimsonVex · 2015-05-07T13:09:06Z

In my case speed isn't an issue. I'm making some POST requests, analysing the response and then making subsequent POST requests. I haven't tried it yet, but I'm assuming my simple idea of using the Replace function on my System::Strings should work (that particular replace function is quite fast), as I probably need to replace a couple of

tags each containing a few thousand or so characters after each POST. It's not optimal but it might be okay. Thanks for clarifying that for me though.

TechnikEmpire · 2015-05-07T13:13:09Z

Look at the code behind the text() methods and such in gumbo-query. They are just convenience functions that copy data from the parsed html, which resides exclusively in and owned by GumboOutput. So if you change the text that you get back from node.text(), this will have absolutely no effect on the actual document that you parsed. gumo-parser and gumbo-query only provide to you a read-only access to traverse parsed html. Maybe I'm not understand your use case, maybe the only text you need you're getting copied to you when you call text() on your node. But I want to make it clear that if you're expecting to get a HTML response, replace the text() of one element and end up with the whole response including your modifications, this simply isn't possible out of the box.

CrimsonVex · 2015-05-07T13:24:49Z

I'm thinking more along the lines of having a global variable string. Everytime I make a new request that responds with pieces of HTML, I merge them into the global string by replacing the current CNode.text() with the HTML piece, and pass this global string to CDocument to be analysed again before making further requests.

lazytiger · 2015-06-08T08:10:22Z

I think this feature can be implemented by CNode:startPos and CNode:endPos
You can replace the data from startPos to endPos as what you want.

lazytiger closed this as completed Jun 8, 2015

lazytiger reopened this Jun 8, 2015

lazytiger closed this as completed Oct 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting text contents of a node? #8

Setting text contents of a node? #8

CrimsonVex commented May 7, 2015

TechnikEmpire commented May 7, 2015

CrimsonVex commented May 7, 2015

TechnikEmpire commented May 7, 2015

CrimsonVex commented May 7, 2015

TechnikEmpire commented May 7, 2015

CrimsonVex commented May 7, 2015

lazytiger commented Jun 8, 2015

Setting text contents of a node? #8

Setting text contents of a node? #8

Comments

CrimsonVex commented May 7, 2015

TechnikEmpire commented May 7, 2015

CrimsonVex commented May 7, 2015

TechnikEmpire commented May 7, 2015

CrimsonVex commented May 7, 2015

TechnikEmpire commented May 7, 2015

CrimsonVex commented May 7, 2015

lazytiger commented Jun 8, 2015