Skip to content

Latest commit

 

History

History
99 lines (71 loc) · 4.42 KB

README.md

File metadata and controls

99 lines (71 loc) · 4.42 KB

Overview

This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.

Project Logo

Acceptable link types

Not acceptable link types

  • content farms
  • malware sites
  • porn, casino, etc.
  • it infrastructure domains, CDN domains
  • analytic domains that are used for user surveillance, not to provide data

I do not always follow these rules strictly. There are exceptions for some domains. Some sites are allowed to be in database, but are down-voted.

Sources of data

Obtained by the Django-link-archive web crawler.

Sources:

Files

The database is distributed as a set of JSON files. We do not want to store binary data, binary files. SQL files should be fine, but I am going with JSON files for now.

Each link contains a set of attributes, like:

  • title
  • description
  • page rating
  • date of creation
  • date of last seen
  • etc.

Page rating

Content ranking is established by the Django link archive project.

To have a good page rating, it is desireable to follow good standards:

  • Schema Validator
  • W3C Validator
  • Provide HTML meta information. More info in Open Graph Protocol
  • Provide valid title, which is concise, but not too short
  • Provide valid description, which is concise, but not too short
  • Provide valid publication date
  • Provide valid thumbnail, media image
  • Provide a valid HTML status code. No fancy redirects, JavaScript redirects
  • Provide RSS feed. Provide HTML meta information for it https://www.petefreitag.com/blog/rss-autodiscovery/
  • Provide search engine keywords tags

Your page, domain exist alongside thousands of other pages. Imagine your meta data have an impact on your recognition, and page ranking.

Remember: a good page is always ranked higher.

You may wonder, why am I writing about search engine "keywords" meta field, if Google does not need them. Well I don't like Google. If we want alternative solutions to exist, it should be possible to easily find your page from simpler search engines. Provide keywords field if you support open web.

Tags

Each entry can be tagged. Most notable examples of tags

  • open source - if entry is "open source" related
  • personal - if it seems to be a personal website
  • self-host - software that can be self-hosted
  • company - if entry exists just to provide information about company
  • university, museum, etc - if entry provides details about a university, museum, etc.
  • disinformation / misinformation - self explanatory
  • news - if it is "news" content farm. Might be also "game news", "tech news", etc.
  • amiga - anything amiga related
  • wtf - for really interesting finds
  • link service - bitly or other services that provide shortened versions of links

Notes

  • Not all domains have to be stored here. I think it would be best to have valuable domains. Certainly we do not want content farms. We do not need sites that do not contribute anything useful to the society, to the reader
  • The distinction is not that clear-cut, but more lenient rules apply toward personal sites
  • I am not that interested in marking substack, or medium as "personal" sites, as I do not feel that it should be tagged as such

Demo database

Might not be working. Used for development: https://renegat0x0.ddns.net/apps/places/.

Meme