A specification and set of tools for communicating and inferring how programs should interact with web apps, sites, and APIs
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
api
cmd
convert
docs
lib
path
remote
schema
site
.travis.sh
.travis.yml
LICENSE
Makefile
README.md

README.md

GoDoc Build Status Go Report Card

What is robots.json?

It's a specification and set of tools for communicating and inferring how programs should interact with web apps, sites, and APIs.

A few things it's got going for it:

  • It has a hierarchy that makes grouped data explicit and makes clear which directives are general: there is no confusing which user-agents are tied to which permission sets.

  • It's a spec that speaks for itself. Clearer rules mean they are more likely to be followed.

  • It's in the lingua franca for data on the web and ready for consumption.

  • Building something? You can work with robots.json now anywhere, even if only a robots.txt exists.

CLI

Install the command-line tool by running go install github.com/lukeheuer/robots, then run robotsjson -u https://www.robotsjson.org to convert a remote robots.txt file. A local robots.txt file can be converted to robots.json format by running cat robots.txt | robotsjson -t > robots.json

Parsing & Converting

docs

There are a few ways to parse and convert robots.txt files into robots.json: the public robotsjson.org API, a self-hosted API, command-line, and using the robots.json/convert Go package.

Public API

Remote conversion

GET

https://api.robotsjson.org/convert?url=robotsjson.org

POST

curl -d url=https://www.robotsjson.org https://api.robotsjson.org/convert

Plain-text conversion

curl -d text="User-agent: *
Disallow:" https://api.robotsjson.org/convert

Self-hosted API

Run the server in the sitemapjson/api repo and use the above commands with the hostname changed to yours.

robots.json/convert Go Package

See the godocs.

An example robots.json

Blocks BadBot from everything, gives GoodBot full access, except for a sensitive area. Links a sitemap index file.

{
	"$schema": "https://www.robotsjson.org/schemas/robots-v0.1.0-beta.0.schema.json",
	"rules": [
		{
			"user_agent": [
				"BadBot"
			],
			"permissions": [
				{
					"path": "/",
					"allow": false
				}
			]
		},
		{
			"user_agent": [
				"GoodBot"
			],
			"permissions": [
				{
					"path": "/",
					"allow": true
				},
				{
					"path": "/users/*/private",
					"allow": false
				}
			]
		}
	],
	"sitemaps": [
		{
			"url": "https://www.robotsjson.org/sitemaps.json",
			"index": true
		}
	]
}

examples/robots.json

Schema

docs

{
	"$schema": "https://www.robotsjson.org/schemas/robots-v0.1.0-beta.0.schema.json",
	"host": "string",
	"rules": [
		{
			"user_agent": [
				"string"
			],
			"permissions": [
				{
					"path": "string",
					"allow": true
				}
			],
			"crawl_delay": 1
		}
	],
	"sitemaps": [
		{
			"url": "string",
			"index": true
		}
	]
}

robots.schema.json