Prevent google crawling, or make it faster. #24

jasononeil · 2015-03-18T13:03:00Z

We just had a google bot start crawling "preview.lib.haxe.org". (Still not sure where it scraped the URL from, but oh well).

It hit the File Browser, which currently displays a source file by opening the haxelib zip, unpacking the file, rendering it, and sending it to the client. Needless to say, with the tens (hundreds?) of thousands of files, this was causing significant strain on the server.

I've turned the preview site off for now until I fix this, either by having a faster (cached?) implementation, or by using robots.txt to block google from the file browser section.

markknol · 2015-03-18T13:10:45Z

Ah that's odd. Google reads our mail!
Since most content is static per lib version, you might just render out the stuff once (if it doesn't exist yet) to plain html, store that in cache or on disk and serve that?

jasononeil · 2015-03-18T23:38:20Z

Yes I think that's a suitable solution for text files, we could cache them
in the DB. Images and binaries we can perhaps block from web crawlers, as
they won't be valuable to search results and are not might be too large to
suitably cache, especially some of the ndll files etc.

On Wed, Mar 18, 2015 at 9:10 PM, Mark Knol notifications@github.com wrote:

Since most content is static per lib version, you might just render out
the stuff once (if it doesn't exist yet) to plain html, store that in cache
or on disk and serve that?

—
Reply to this email directly or view it on GitHub
#24 (comment).

markknol · 2015-04-09T08:01:09Z

What is the state of this?

See #24 This mostly solves it, though I should still do some DB caching.

jasononeil added a commit that referenced this issue May 8, 2015

Don't let google follow links to binary files.

af74b21

See #24 This mostly solves it, though I should still do some DB caching.

jasononeil closed this as completed in eb072a0 May 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent google crawling, or make it faster. #24

Prevent google crawling, or make it faster. #24

jasononeil commented Mar 18, 2015

markknol commented Mar 18, 2015

jasononeil commented Mar 18, 2015

markknol commented Apr 9, 2015

Prevent google crawling, or make it faster. #24

Prevent google crawling, or make it faster. #24

Comments

jasononeil commented Mar 18, 2015

markknol commented Mar 18, 2015

jasononeil commented Mar 18, 2015

markknol commented Apr 9, 2015