Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Spidering pages with no content-type header #32
We ran into a scenario where we tried to spider a customer's site for certain keywords -- keywords that were present when we viewed the site in a browser -- but could not locate any of them by using some flavor of
What are your thoughts on attempting to parse pages which have no
This comment has been minimized.
This comment has been minimized.Show comment Hide comment
This is interesting. Do you have a URI I can test, and did you check that
The URI that exposed the issue was http://offfurn.com.
I have no idea why they send
We do specify
% curl http://offfurn.com/ | wc > 329 1866 31900
That should give a better idea of what we're seeing.