Skip to content

rumca-js/Internet-Places-Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.

Project Logo

Acceptable link types

Not acceptable link types

  • content farms
  • malware sites
  • porn, casino, etc.
  • it infrastructure domains, CDN domains
  • analytic domains that are used for user surveillance, not to provide data

I do not always follow these rules strictly. There are exceptions for some domains. Some sites are allowed to be in database, but are down-voted.

Sources of data

Obtained by the Django-link-archive web crawler.

Sources:

Files

The database is distributed as a set of JSON files. We do not want to store binary data, binary files. SQL files should be fine, but I am going with JSON files for now.

Each link contains a set of attributes, like:

  • title
  • description
  • page rating
  • date of creation
  • date of last seen
  • etc.

Content ranking is established by the Django link archive project.

Tags

Each entry can be tagged. Most notable examples of tags

  • open source - if entry is "open source" related
  • personal - if it seems to be a personal website
  • self-host - software that can be self-hosted
  • company - if entry exists just to provide information about company
  • university, museum, etc - if entry provides details about a university, museum, etc.
  • disinformation / misinformation - self explanatory
  • news - if it is "news" content farm. Might be also "game news", "tech news", etc.
  • amiga - anything amiga related
  • wtf - for really interesting finds
  • link service - bitly or other services that provide shortened versions of links

Notes

  • Not all domains have to be stored here. I think it would be best to have valuable domains. Certainly we do not want content farms. We do not need sites that do not contribute anything useful to the society, to the reader
  • The distinction is not that clear-cut, but more lenient rules apply toward personal sites
  • I am not that interested in marking substack, or medium as "personal" sites, as I do not feel that it should be tagged as such

Demo database

Might not be working. Used for development: https://renegat0x0.ddns.net/apps/places/

Meme

Releases

No releases published

Packages

No packages published