This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.
- domains
- repository links. For example https://github.com/rumca-js/Internet-Places-Database
- user spaces. Might be youtube channel link: Linus Tech Tips YouTube Channel. Might be X/Twitter user account.
- content farms
- malware sites
- porn, casino, etc.
- it infrastructure domains, CDN domains
- analytic domains that are used for user surveillance, not to provide data
I do not always follow these rules strictly. There are exceptions for some domains. Some sites are allowed to be in database, but are down-voted.
Obtained by the Django-link-archive web crawler.
Sources:
- https://nownownow.com/
- https://searchmysite.net/
- https://downloads.marginalia.nu/
- https://aboutideasnow.com/
- hacker front page entries
- some reddit channels r/selfhosted
The database is distributed as a set of JSON files. We do not want to store binary data, binary files. SQL files should be fine, but I am going with JSON files for now.
Each link contains a set of attributes, like:
- title
- description
- page rating
- date of creation
- date of last seen
- etc.
Content ranking is established by the Django link archive project.
Each entry can be tagged. Most notable examples of tags
- open source - if entry is "open source" related
- personal - if it seems to be a personal website
- self-host - software that can be self-hosted
- company - if entry exists just to provide information about company
- university, museum, etc - if entry provides details about a university, museum, etc.
- disinformation / misinformation - self explanatory
- news - if it is "news" content farm. Might be also "game news", "tech news", etc.
- amiga - anything amiga related
- wtf - for really interesting finds
- link service - bitly or other services that provide shortened versions of links
- Not all domains have to be stored here. I think it would be best to have valuable domains. Certainly we do not want content farms. We do not need sites that do not contribute anything useful to the society, to the reader
- The distinction is not that clear-cut, but more lenient rules apply toward personal sites
- I am not that interested in marking substack, or medium as "personal" sites, as I do not feel that it should be tagged as such
Might not be working. Used for development: https://renegat0x0.ddns.net/apps/places/