New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
view.matrix.org needs a robots.txt that excludes room contents #910
Comments
The intent of view.matrix.org is search engine indexability. That's why it uses 0 Javascript. |
By specifying
They don't even need to join. You can try it yourself using curl and jq:
Replace the Room ID (in single quotes) with any other room with History=Anyone. Disabling view.matrix.org doing it just paints over the underlying possibilities at the cost of real discoverability for the network.
I don't believe there is a legal distinction between a public chatroom and a public forum, public forums are almost always indexed by google even in the EU.
How can any protocol protect against a real user joining and then handing the reigns over to a script, or heck even that user themselves being paid to copy and paste history into an index elsewhere? |
Sure but just making it such that only nefarious users can access these indexes vs everyone for real discoverability would just give a fake sense of confidence/security?
Right, but matrix-static is not a data controller, it doesn't even store any data whatsoever, so it itself by translating the data available in JSON to HTML won't be breaking any such data protection laws. |
Because there will be other indexes going around, just not on Google, maybe dark net/whatever. If you're simply objecting to matrix.org being both a Data Controller and also hosting this helpful tool for discovery of the network's rooms (by their content) then you should know that there are many other matrix-static instances which are allowing the indexing different public room subsets to engines like Google. If view.matrix.org's robots.txt changes, I bet someone else will host one (given how few resources it takes) without that robots.txt change and completely nullify this whole conversation. |
Matrix.org is just 1 data provider in the Matrix federation. Given that rooms are not owned by any one server, one could ask any of the many servers (1158 for Matrix HQ: https://view.matrix.org/room/!OGEhHVWSdvArJzumhm:matrix.org/servers ) for that same data and thus host a matrix-static against it.
Sure, I don't see it as a necessity, but if view.matrix.org disables it then all that will happen is my personal test instance will rank higher in search results and fill the gap.
The issue is as a Homeserver Admin it is not your say at all, as you don't own the rooms, they are decentralized units shared between the many servers which have at least a single user in that room. As a room admin it is your say, and actually even under your control, you can change the History visibility to not include people who have not yet joined the room which will include things like matrix-static. |
retracted
The text was updated successfully, but these errors were encountered: