New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse for CSS URLS in HTML style elements #143
Comments
I think there can be two approaches to achieve this:
by piping it to another process invoking
I am inclined towards way BTW this brings me to my next question "Is |
To locate |
Maybe @rockdaboot can raise a bit more light here but as I see it, Once you have that, as you already guessed, you'd need to feed it to the CSS parser. If there's something to be modified, that should be in the CSS code. Regarding the docs, we keep them inline, and then generate HTML output with Doxygen. But as you probably already saw, not all functions are documented. |
Regarding parseXML(): It is a stateless xml / html scanner not using memory allocations. Instead it uses a callback function. (Sorry, currently the function is not documented.) But look at libwget/html_url.c as an example. Here the callback function is _html_get_url(). For a better understanding, uncomment L104 (info_print), rebuild all and execute examples/print_html_urls. To parse the style attributes, you possibly can extend this function. |
As @juaristi suggests, once you have the value for the attribute 'style', you could use wget_css_parse_buffer() to avoid a temp file. That would be similar to src/wget.c/css_parse(). Maybe you first create a small .html file including style (with a URL) and than run this with examples/print_html_urls. The URL inside your style attribute won't be printed... then slowly start with your code or your debugging print outs, step by step. |
Not yet all (any additions to the docs are highly welcome). |
@juaristi Oh! I completely missed that. Thanks.
@rockdaboot I did that. But then it says:
I can see that And also since we are passing it to a printing function; it seems that BTW I just declared it as a
Yes. That's how I'm proceeding. |
And when I did that Just to make sure that |
Sorry, that comment was a bit old... I just pushed a corrected version (it must be 'tag' instead of 'dir').
Yup. Because the app doesn't set a 'logger' for 'info' messages (or any other kind of messages). You see in comments (in print_html_urls.c):
Remove the comments to get full output. Just comment out the line with WGET_DEBUG_STREAM and you won't see any debug output (=wget_debug_printf()) any more. Instead of STREAM you can also redirect different kinds of output to a function (_FUNC) or a file on disk (_FILE). |
I am trying hard to follow More specifically:
|
The parser parses both HTML and XML, both are somewhat different. If hints is set to HTML, the parser parses HTML. But really... don't bother with that code (you need a decent understanding of XML and HTML definition). Just use the code as stated above. If there is something unexpected or a bug, report it to me and I'll fix that. Well, of course you can go through it. But I have the feeling that wastes your time. |
BTW, what IDE are you using ? You should not simply use an editor but something more mighty like Eclipse or Netbeans. For example in Netbeans, you simply right-click on 'hints', select 'show usage' and immediately can see that the only value that ever becomes checked for is XML_HINT_HTML (you get a list of all usages within the whole project). |
@rockdaboot I'm using VIM with ctags. I'm not an advanced VIM user though. About Netbeans, that sounds awesome. Without going through the whole project and jumping between files. I should use it. BTW what IDE will you recommend? Is it Netbeans? |
I use Netbeans. But I believe that Eclipse does the same or even more for C/C++. I have lot's of Java projects, so I stick with Netbeans. |
@rockdaboot I think I got it. I added
|
Or should I first just walk over to find the end of string and then use that |
I am not sure, but I think the quote has already been parsed by then, also leading white space should not bother the CSS parser (if it does, we have to fix it there). So, you are absolutely on the right way :-) |
Ok. Now I have a CSS string in buffer and now I need Now I can copy those two functions from Since |
On a side note, is there a css parsing function like |
For now, forget callback_encoding, just use the encoding of the HTML document. Also, feel free to copy callback_uri into libwget/html_url.c... remember, first get it going, optimizing later (except optimizing is straight forward). I am just in a hurry right now... so can't take a deeper look. |
Also why are we using callback functions in |
Maybe we can amend the CSS parsing so it works like the HTML parsing. Create an issue for discussion ! |
Shouldn't this be closed too? Though I'm yet to really look into the approach @MichaelHeerklotz took and compare it with what I was thinking, it works as expected. Or am I missing something? |
Needs a unit test + work in html_url.c.
Example code:
The text was updated successfully, but these errors were encountered: