Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

view.matrix.org needs a robots.txt that excludes room contents #910

Closed
ell1e opened this issue Dec 23, 2020 · 5 comments
Closed

view.matrix.org needs a robots.txt that excludes room contents #910

ell1e opened this issue Dec 23, 2020 · 5 comments

Comments

@ell1e
Copy link

ell1e commented Dec 23, 2020

retracted

@t3chguy
Copy link
Member

t3chguy commented Dec 23, 2020

The intent of view.matrix.org is search engine indexability. That's why it uses 0 Javascript.

@t3chguy
Copy link
Member

t3chguy commented Dec 23, 2020

By specifying History Anyone you are making it so that any unauthenticated user can use the Matrix API to retrieve the data. view.matrix.org does nothing that an API user (without just an automated guest API key which can be gotten with no vaidation by using register kind=guest) can do.

and just to clarify further, I'm aware just about anyone could write a robot to join all chatrooms and extract al

They don't even need to join.

You can try it yourself using curl and jq:

curl "https://matrix.org/_matrix/client/r0/rooms/"'!OGEhHVWSdvArJzumhm:matrix.org'"/initialSync?access_token=$(curl "https://matrix.org/_matrix/client/r0/register?kind=guest" -s -XPOST -d "{}" | jq -r .access_token)" -s

Replace the Room ID (in single quotes) with any other room with History=Anyone.

Disabling view.matrix.org doing it just paints over the underlying possibilities at the cost of real discoverability for the network.

I don't think at least in Europe you can legally just publicly index everything people say in a chatroom without informing them with a giant warning and possibly opt-in.

I don't believe there is a legal distinction between a public chatroom and a public forum, public forums are almost always indexed by google even in the EU.

since there is no protocol protection against that

How can any protocol protect against a real user joining and then handing the reigns over to a script, or heck even that user themselves being paid to copy and paste history into an index elsewhere?

@t3chguy
Copy link
Member

t3chguy commented Dec 23, 2020

Again, technical possibilities are not the same thing as legally and morally unproblematic.

Sure but just making it such that only nefarious users can access these indexes vs everyone for real discoverability would just give a fake sense of confidence/security?

but I think what the user expects to be done with the data is often important for privacy and data protection laws.

Right, but matrix-static is not a data controller, it doesn't even store any data whatsoever, so it itself by translating the data available in JSON to HTML won't be breaking any such data protection laws.

@t3chguy
Copy link
Member

t3chguy commented Dec 23, 2020

Why is that sense fake? There is some true, actual, higher sense of confidentiality if it is not publicly indexed.

Because there will be other indexes going around, just not on Google, maybe dark net/whatever.

If you're simply objecting to matrix.org being both a Data Controller and also hosting this helpful tool for discovery of the network's rooms (by their content) then you should know that there are many other matrix-static instances which are allowing the indexing different public room subsets to engines like Google.

If view.matrix.org's robots.txt changes, I bet someone else will host one (given how few resources it takes) without that robots.txt change and completely nullify this whole conversation.

@t3chguy
Copy link
Member

t3chguy commented Dec 23, 2020

but also the technical data provider.

Matrix.org is just 1 data provider in the Matrix federation. Given that rooms are not owned by any one server, one could ask any of the many servers (1158 for Matrix HQ: https://view.matrix.org/room/!OGEhHVWSdvArJzumhm:matrix.org/servers ) for that same data and thus host a matrix-static against it.

I don't see that as a necessity

Sure, I don't see it as a necessity, but if view.matrix.org disables it then all that will happen is my personal test instance will rank higher in search results and fill the gap.

actively encourage or discourage it as a home server admin

The issue is as a Homeserver Admin it is not your say at all, as you don't own the rooms, they are decentralized units shared between the many servers which have at least a single user in that room. As a room admin it is your say, and actually even under your control, you can change the History visibility to not include people who have not yet joined the room which will include things like matrix-static.

@ell1e ell1e closed this as completed Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants