You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I updated the expected values in the tests, and then made my change. After the change I reran the tests and they all continued to pass. However, there does not seem to be a pertinent test addressing my changes. Writing this test is a bit beyond my current skillset, so here's the situation from my email:
I've downloaded an HTML file from somewhere and cached it. The links are all relative within the file. When I ->scrape() the file, the links are converted into file:/// type URIs. But I am calling ->scrape( $file, 'http://example.org/source' );
The documentation says that the second argument is applied to relative links, but only if the first argument is text (as opposed to a URI). I could wrap every scraper call to 'fetch' the URL myself, but this seems like a reasonable thing for Web::Scraper to support natively.
I think it is reasonable to expect this would convert links to '/foo' into 'http://example.com:8888/test/foo/'. The current implementation simple discards the second argument silently.
I'll attach the patch or paste it into a followup message.
The text was updated successfully, but these errors were encountered:
This is an artifact that you're using GET (from HTTP::Request::Common I guess?) which returns an HTTP::Request object, in which case we retrieve the base URL from the request object. As you already figured out you can turn them into a string and it will work.
[Followup from email: the 'base URL' is ignored for URI type scrape arguments]
To start, I cloned the main Web::Scraper repository and attempted to run all tests, but the "live" tests are failing:
$ git diff
$ TEST_ALL=1 prove -l t
t/00_compile.t .......... ok
[...]
t/07-live.t ............. 1/1
Failed test at t/07-live.t line 21.
Structures begin differing at:
$got->{url} = 'http://d.hatena.ne.jp/keyword/%BA%B0%CC%EE%A4%A2%A4%B5%C8%FE'
$expected->{url} = 'http://d.hatena.ne.jp/keyword/%ba%b0%cc%ee%a4%a2%a4%b5%c8%fe'
[...]
t/18_http_response.t .... 2/2
Failed test 'Absolute URI'
at t/18_http_response.t line 27.
got: 'http://b.hatena.ne.jp/images/title_hotentry_curvebox-header.gif'
expected: 'http://b.hatena.ne.jp/images/logo1.gif'
Looks like you failed 1 test of 2.
[...]
t/19_decode_content.t ... 2/2
Failed test 'Absolute URI'
at t/19_decode_content.t line 28.
got: 'http://b.hatena.ne.jp/images/title_hotentry_curvebox-header.gif'
expected: 'http://b.hatena.ne.jp/images/logo1.gif'
Looks like you failed 1 test of 2.
I updated the expected values in the tests, and then made my change. After the change I reran the tests and they all continued to pass. However, there does not seem to be a pertinent test addressing my changes. Writing this test is a bit beyond my current skillset, so here's the situation from my email:
I've downloaded an HTML file from somewhere and cached it. The links are all relative within the file. When I ->scrape() the file, the links are converted into file:/// type URIs. But I am calling ->scrape( $file, 'http://example.org/source' );
The documentation says that the second argument is applied to relative links, but only if the first argument is text (as opposed to a URI). I could wrap every scraper call to 'fetch' the URL myself, but this seems like a reasonable thing for Web::Scraper to support natively.
For example, given
->scrape( GET('http://example.net/'), 'http://example.com:8888/test/' );
I think it is reasonable to expect this would convert links to '/foo' into 'http://example.com:8888/test/foo/'. The current implementation simple discards the second argument silently.
I'll attach the patch or paste it into a followup message.
The text was updated successfully, but these errors were encountered: