Skip to content

Commit

Permalink
allow_only_index configuration option, closes #5
Browse files Browse the repository at this point in the history
  • Loading branch information
simonw committed Aug 19, 2021
1 parent 103b245 commit f947bcc
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 6 deletions.
31 changes: 26 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Changelog](https://img.shields.io/github/v/release/simonw/datasette-block-robots?label=changelog)](https://github.com/simonw/datasette-block-robots/releases)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-block-robots/blob/master/LICENSE)

Datasette plugin that blocks all robots using robots.txt
Datasette plugin that blocks robots and crawlers using robots.txt

## Installation

Expand All @@ -19,29 +19,50 @@ Having installed the plugin, `/robots.txt` on your Datasette instance will retur
User-agent: *
Disallow: /

This will request all robots and crawlers not to visit any of the pages on your site.

Here's a demo of the plugin in action: https://sqlite-generate-demo.datasette.io/robots.txt

## Configuration

By default the plugin will block all access to the site, using `Disallow: /`.

You can instead block access to specific areas of the site by adding the following to your `metadata.json` configuration file:
If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following:

```json
{
"plugins": {
"datasette-block-robots": {
"allow_only_index": true
}
}
}
```
This will return a `/robots.txt` like so:

User-agent: *
Disallow: /db1
Disallow: /db2

With a `Disallow` line for every attached database.

To block access to specific areas of the site using custom paths, add this to your `metadata.json` configuration file:

```json
{
"plugins": {
"datasette-block-robots": {
"disallow": ["/mydatabase"]
"disallow": ["/mydatabase/mytable"]
}
}
}
```
This will result in a `/robots.txt` that looks like this:

User-agent: *
Disallow: /mydatabase
Disallow: /mydatabase/mytable

You can also set the full contents of the `robots.txt` file using the `literal` configuration option. Here's how to do that if you are using YAML rather than JSON and have a `metadata.yml` file:
Alternatively you can set the full contents of the `robots.txt` file using the `literal` configuration option. Here's how to do that if you are using YAML rather than JSON and have a `metadata.yml` file:

```yaml
plugins:
Expand Down
8 changes: 7 additions & 1 deletion datasette_block_robots/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,17 @@
def robots_txt(datasette):
config = datasette.plugin_config("datasette-block-robots") or {}
literal = config.get("literal")
disallow = []
if literal:
return Response.text(literal)
disallow = config.get("disallow")
disallow = config.get("disallow") or []
if isinstance(disallow, str):
disallow = [disallow]
allow_only_index = config.get("allow_only_index")
if allow_only_index:
for database_name in datasette.databases:
if database_name != "_internal":
disallow.append(datasette.urls.database(database_name))
if not disallow:
disallow = ["/"]
lines = ["User-agent: *"] + ["Disallow: {}".format(item) for item in disallow]
Expand Down
13 changes: 13 additions & 0 deletions tests/test_block_robots.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,16 @@ async def test_literal_prevent_literal_and_disallow_at_same_time():
)
with pytest.raises(AssertionError):
await ds.invoke_startup()


@pytest.mark.asyncio
async def test_allow_only_index():
ds = Datasette(
[],
memory=True,
metadata={"plugins": {"datasette-block-robots": {"allow_only_index": True}}},
)
response = await ds.client.get("/robots.txt")
assert response.status_code == 200
assert response.headers["content-type"] == "text/plain; charset=utf-8"
assert response.text == "User-agent: *\nDisallow: /_memory"

0 comments on commit f947bcc

Please sign in to comment.