Request For Comments (RFCs)
Proposals
001: Use Org Mode
- nexp
rfc:salotz.001_use-org.org
Proposal: ./rfcs/salotz.001_use-org.org
Abstract:
Use org-mode LOL.
002: Semantic Changelog
- nexp
rfc:salotz.002_semantic-changelog.org
Proposal: ./rfcs/salotz.002_semantic-changelog.org
Abstract:
A mini-format for git commit messages that supports machine readability (without sacrificing human readability) semantic expression of impacts of code change.
003: RFC Specifications
- nexp
rfc:salotz.003_rfc-specs
Proposal: ./rfcs/salotz.003_rfc-specs.org
Abstract:
Specifications for necessary information for an RFC and how to format it.
004: Name Expressions (nexps)
- nexp
rfc:salotz.004_nexps
Proposal: ./salotz.004_nexps.org
Abstract:
A proposal that defines how to name digital document entities: e.g.
format:namespace.field-1_field-2
005: Syxel Buffers Format
- nexp
rfc:salotz.005_syxel-buffers
Proposal: ./rfcs/salotz.005_syxel-buffers.org
Abstract:
Format that utilizes a granular unit of ‘syxel’ (symbolic pixel) in a rectangular plane similar to a pixel based image format. Syxels are basically character points.
006: Codetags
- nexp
rfc:salotz.006_codetags
Proposal: ./rfcs/salotz.006_codetags.org
Abstract:
Tags that are added in comments to code that add semantic meaning to otherwise freeform comments, making them searchable by machine and available to tooling.
007: Python Library Repo Layout
- nexp
rfc:salotz.007_py-lib-layout
Proposal: ./rfcs/salotz.007_py-lib-layout.org
Executive Summary:
A layout and content specification that follows best practices that strives for effectiveness rather than complete compatibility.
008: Linux Shell Startup Indirection Layer
- nexp
rfc:salotz.008_linux-shell-startup
Proposal: ./rfcs/salotz.008_linux-shell-startup.org
Executive Summary:
A collection of script files which add a layer of indirection to standard linux shells (i.e. POSIX
sh
andbash
) that allows for a more modular, composable, and semantic organization.
009: Repo Layout Templates
- nexp
rfc:salotz.009_repo-layout
Proposal: ./rfcs/salotz.009_repo-layout.org
Executive Summary:
A template format for generating and specifying repo layouts in directories, i.e. cookiecutter.
010: Growth Mindset Versioning
- nexp
rfc:010_growth-versioning
Proposal: ./rfcs/010_growth-versioning.org
Executive Summary:
Version formatting that doesn’t lie.
011: Writetags
- nexp
rfc:011_write-tags
Proposal: ./rfcs/011_write-tags.org
Executive Summary:
Tags (see rfc:salotz.006_codetags for a similar proposal) that are added in comments to prose (documentation, whitepapers, technical documents, etc.) that add semantic meaning to otherwise freeform comments, making them searchable by machine and available to tooling.
012: Errors as Information
- nexp
rfc:012_information-errors
Proposal: ./rfcs/012_information-errors.org
Executive Summary:
Errors in information systems, like event logs, should take on a role of providing information rather than being categorized based on their operational importance as is the case in most languages (e.g. checked vs. unchecked exceptions). Inspired by Stuart Holloway and specifically this talk: https://youtu.be/oOON–g1PyU?t=965
013: Code Targets
- nexp
rfc:013_code-targets
Proposal: ./rfcs/013_code-targets.org
Executive Summary:
Code targets are a specification for a syntax which should be recognized by metaprogramming tools like template engines, editors, etc.
014: Project & Maintainence Intentions for OSS
- nexp
rfc:salotz.014_project-intent
Proposal: ./rfcs/salotz.014_project-intent.org
Executive Summary:
A proposal for a best practice that includes a clear statement of intent on open source projects that would allow for greater understanding by consumers about the current state and intentions of developers in fairly unambiguous terms. Extensions could provide optional semantic vocabularies for very clear intent and to provide a set of choices for developers so they don’t have to think as much from scratch.
015: Collaborative Line-Oriented Plaintext Document Editing Protocol
- nexp
rfc:salotz.015_collab-line-editing
Proposal: ./rfcs/salotz.015_collab-line-editing.org
Executive Summary:
A protocol and format to facilitate low-friction editing of line-oriented document formats like LaTeX, Markdown, Org-mode, etc.
016: Nearly Trivial Plaintext Formats
- nexp
rfc:salotz.016_trivial-plaintext-formats
Proposal: ./rfcs/salotz.016_trivial-plaintext-formats.org
Executive Summary:
A small collection of nearly trivial plaintext formats along with file extensions. Includes things like a line based list format.
017: Bunker: User De-Militarized Zone
- nexp
rfc:salotz.017_bunker
Proposal: ./rfcs/salotz.017_bunker.org
Introduces the concept of a
bunker
directory for user only data in their respective $HOME directories which is simply.${USER}
or.${USER}.d
. This is to allow a safe-space for customization and configuration that will be unmolested by other programs since it is unlikely that they will have this hardcoded in their behavior.
018: Secrets Database
- nexp
rfc:salotz.018_secret-database
Proposal: ./rfcs/salotz.018_secret-database.org
A specification for a secrets database using GPG and mostly compatible with tools like
pass
.
019: Meta-Project Protocol (MPP)
- nexp
rfc:salotz.019_MPP
Proposal: ./rfcs/salotz.019_MPP.org
Protocol for live-updating of file-based development projects.
020: In-Repo Issue Tracking Schema
- nexp
rfc:salotz.020_repo-issue-tracker
Proposal: ./rfcs/salotz.020_repo-issue-tracker.org
Executive Summary:
Schema for including issue tracking sources with a project, without having to rely on outside forges.
021: Org-mode Next-Gen
- nexp
rfc:salotz.021-org-mode-ng
Proposal: ./rfcs/salotz.021_org-mode-ng.org
Executive Summary:
Proposals for the next generation of org-mode involving simplifying, modularizing, and making it more extensible and amenable to a wider non-emacs ecosystem.
022: template inheritance
- nexp
rfc:salotz.022_template-inheritance
Proposal: ./rfcs/salotz.022_template-inheritance.org
Executive Summary:
Taking inspiration from gitignore.io and their template structure. Specifies a way to layout a repository of templates that allows for: stacks, inheritance, and ordering.
023: Editor RFIs
- nexp
rfc:salotz.023_editor-rfis
Proposal: ./rfcs/salotz.023_editor-rfis.org
Executive Summary:
Publish a series of RFI standards for editors that describe the behavior of specific features.
024: Next-Gen Vector Graphics Formats
- nexp
rfc:nextgen-vector-graphics-formats
Proposal: ./rfcs/nextgen-vector-graphics-formats.org
Executive Summary:
Specifications for a suite of vector graphics formats, namely providing both plain-text and efficient binary representations suited for different kinds of applications.
Request For Implementations (RFIs)
In addition to merely specifying best practices, protocols, and standards via RFCs it has come to my attention that there is a need for a listing of concrete proposals for actual working software that perform critical functionality.
The purpose here is not to “shout at clouds” but to attempt to actually evaluate and validate the need for the introduction of new software.
In addition to describing the purpose and semantics of the propsed software in-depth analyses of prior art should be done as well with feature matrices showing gaps in the current functionality.
It is unclear about how exactly to incorporate the idea of the ease of integration (or perhaps “civility”) of a piece of software but attempts should be made to take this into account. E.g. is it open-source and licensed in such a way that it could be used in perpetuity.
Name is inspired by the Scheme Request For Implementations (SRFIs).
Version Control for mixed collections of digital assets
Description
Categories of reflection capable for data objects:
- “black” objects
- opaque binary blobs
- “grey” objects
- e.g. PostScript files, XML, image formats
- specialized and customizable diffing functionality
- “white” objects
- those with well-known (semi-universal) diffing strategies and merge techniques
Ideas:
Beyond decentralized vs centralized
There should be a middle-ground between Decentralized and Centralized VCS which is more similar to federated systems.
The idea is that authority over the “master copy” (centralized vs. decentralized) is controlled (by some means) and can be delegated to a collection of hub servers.
A purely decentralized system (like git) is fundamentally unable to handle black objects because the source of these can only be controlled either through shear replacement (called arbitrary replacement in which no merge strategy is possible and decisions are completely arbitrary and determined by authority) or through locking (change control again completely arbitrary if we are to avoid “wars” between users).
While both arbitrary replacement (arbitration) and change control (delegation) are viable solutions in general for dealing with black objects, they can’t be implemented with git without an extension to the protocol. These protocol extensions are typically performed by the hosting service which controls the “master copy” (like github or gitlab, called forges).
This leads to lock-in and/or a lack of ability to cooperate with others and essentially a centralized, non-distributed paradigm that allows for offline work encoded as a temporary “fork” of authority, despite the “decentralized” moniker.
While any new VCS system should surely support offline-work it should be reified in the protocol, rather than being implemented as a temporary fork.
Furthermore, the usage of the two strategies for blackish objects (arbitration and delegation) should also be reified and be able to be composed.
Here is an example:
Lets say you are an organization that is working on a large product and you have more than one team.
You don’t want the progress of one team to hold up the other one and you would like them to work completely in parallel.
However, you don’t want them to make mutually incompatible changes to objects for which they really should be cooperating with other teams on (particularly to blackish objects which can’t be intelligently merged).
So you split up the entire repo image (the “monorepo” perhaps) into several regions:
- team A’s region
- region A
- team B’s region
- region B
- shared resources region
- region S
- innaccessible region (for other teams or administrators only etc.)
- region X
This allows for work done region’s A and B to happen quickly and according to the self-organization of the respective teams.
This forbids access to region X which should either be irrelevant parts of the monorepo or sensitive reflective configuration data that should only be controlled by the reigning authority (such as the access control itself).
Region S however both teams can use and merges therein should be performed by an authority.
However, within this shared region we still want to implement locks for blackish objects or implement replacements.
These should then be determined by the overarching authority (for consensus) and as such must be centralized. Remember consensus means slow.
Because we are able to isolate only the few contentious resources both teams must access regions A and B need not gain consensus between them.
So in effect we have a hybrid decentralization and distribution strategy that is allowed by a hierarchy in which authority is delegated to allow for parallel work to be done, while still maintaining consensus.
Specifying Networks
The main idea behind my refugue project is that you can manage your own personal network (PN) while using something like the internet or sneakernet to actually perform the transfers etc. Kind of like urbit.
The point being that at a semantic level the personal network is what matters to an individual or enterprise and not really the internet or what ICANN says about domain names.
This is something I always struggled with because it felt that control of my own network was out of my hands. In reality ICANN and the internet simply enable the underlying transport to take place conveniently and to build my network on top of.
This network really isn’t that complicated and simply relies on some mapping of unique pet names to a collection of addresses which you may find that peer on. Which can be:
- IP addresses
- domain names
- zeronet addresses
- file paths to mounted volumes
etc.
The same is useful for a distributed/decentralized version control system. And is useful for allowing polymorphism in the data that resides on each peer.
For example I want to check out just a portion of the enterprise monorepo because I either don’t need all of it, can’t hold all of it, or am not allowed to hold all of it.
Specifying a network from a single holder of authority is a way to achieve this.
How consensus on this single point of authority is decided can be customized but can be something like:
- zookeeper or raft
- blockchain
- held in a person i.e. CEO
Prior Art
- boar
- git-lfs
- git-annex
- subversion
In-Repo Issue Tracking
Description
Prior Art
git-issue
https://github.com/dspinellis/git-issue
Pros:
- import/export to common forges like github and gitlab
- implemented in shell scripts
Cons:
- fixed schema, oriented around small files of undescribed schemas, non-extendable
- only works with git
- not editable directly via editor easily:
- metadata and description spread across many files for each issue
- file/dir names based on hashes rather than anything meaningful
deft
https://github.com/npryce/deft
Pros:
- VCS agnostic
- Text-editor friendly
- issues live in a renamable dir
- only 2 files: description + metadata
- files named after file
Cons:
- Python 2
- no forge integration
- must manually name issues
git-dit
https://github.com/neithernut/git-dit
Requirements
After reviewing the existing options I like the UX of deft
the best
but it suffers from some implementation issues mainly being
implemented in Python 2.
I want my issues to be able to be edited by a text editor as not onyl a secondary means, but as a first class thing.
This doesn’t preclude command line tools for this purpose but using only a text editor should be ergonomic.
Ideally anyhow it would get integrated to emacs or VSCode etc. which is how these kinds of tools work better anyhow.
Single file issues
Each issue should be only a single file.
It is too annoying to switch between different files to tweak and twiddle metadata.
In the limit individual issues could be of different actual formats that allow for varying degrees of semantic data. (similar to git commits).
Issues can be in SOML (my minor TOML variant for writing lots of text) where you can add tags, flags, and metadata as you please alongside long descriptions in different markup languages.
They can also be in other formats as you wish and should be a decision each project should make.
That is the schema/protocol for the directory layout etc. should be orthogonal to the content of the actual issues.
This can only really be achieved with single file issues.
However, there should be affordances for attaching resources (like images etc.) into sidecar directories. But these really should contain any issue writer writings.
Other options for issue formats could be:
- skribilio
- org-mode
- markdown
- plaintext
- XML, HTML
- JSON
Here is an example with SOML:
First line of the issue
[meta]
status = 'open'
assigned = '@salotz'
tags = [
'feature',
'bug',
'critical',
]
[description]
Here is the longer description of the issue describing what to do
about it.
It should allow introspection with links and such.
Here I am linking the path to a piece of code I want to reference:
[[proj:/src/package/__init__.py?line=30;col=12][code here]]
[discussion]
[[comments]]
contributor = '@salotz'
message = [
I think this is a good issue, I can even do this myself.
]
[[comments]]
contributor = '@wumpus'
message = [
I disagree, I don't think its worth our time.
]
Automatic issue naming
Its not always easy to manually name issues. That is why things like github and gitlab use auto-naming things where they use numbers to refer to them.
Tools should support this so that people don’t have to name their issues.
However, to keep things friendly the file names should support adding extra “fields” to add short descriptions.
Examples:
An uncommented issue should just be a number. Issues should increase monotonically and start at 0 and should include 0 padding for sorting of crummy legacy software. All tools should sort numerically.
The zeroth issue:
000000.issue.schema
The next issue:
000001.issue.schema
The issue after with a name issue-key
also:
000002_issue-key.issue.schema
Alternate 3rd Party Standard Library for Python
I was looking for at one point an alternative to the standard library.
Here is the example:
the shutil
module is only good in 3.8+ because of some basic options
in the copytree
function, and so it is pretty unreliable and
confusing when you are trying to build standard tooling that should
use that.
Typically in tooling I would write:
cx.run("cp -r tree ~/scratch/tree")
But this is POSIX only, and using the python standard library should be cross platform, right?
So I change to using shutil
:
import shutil as sh
sh.copytree('tree', "/home/salotz/scratch/tree")
This will fail if “/home/salotz/scratch/tree” already exists. And is
annoying, so you add the exist_ok
option but this only is in python
3.8….
A lot of my tooling uses 3.6 or 3.7 because dependencies in other environments. This is super annoying.
Also there is no features in py 3.8 that make shutil able to gain this super power. Its just an if statement.
For something like this I wish there was just another library I could import to get this behavior on all relevant python versions.
import altstd.shutil as sh
An Non-Standard Standard Library for Python
The problem is that there are loads and loads of utilities that are very useful for doing common patterns or used during development (i.e. during debugging & prototyping etc.), and these are scattered across dozens and dozens of little tiny packages across the internet.
This leads to a lot of issues that are well-known. For example you run into the famous “left pad” problem. Which is where one of these tiny projects becomes unavailable and you lose that dependency and break all your code.
Other problems are that when you get into real projects there is a
certain friction associated not only with making sure that these
little tools are specified in the env files (i.e. requirements.in
etc.) and that every variation of these files gets updated as well.
On top of that you simply have more dependencies that you have to manage and worry about breaking your code when they change something.
My idea is to go around the internet and collect all of these little utilities into a single repository that is tracked as a single dependency.
I would call this the “Non-Standard Standard Library” or the ‘NSSL’.
(Name liable to change)
Some candidates that motivated this:
Diff Collection
A collection of custom diffs that can be plugged into a variety of things like VCS.
Add support for things that aren’t amenable to standard GNU diff like things.
gitignore generator CLI w/ local repo support
The gitignore.io collection is a great resource with a terrible website web API, local server mess used to implement it.
Something much simpler should be devised that simply works with either local repos or from a git repo.
You should also be able to combine different repos together.
For instance I like to use community maintained gitignores for a lot of stuff because then I don’t have to worry about stuff. But I typically end up home-growing my own sets of ignores that I use across many projects and I want to be able to compose them all.
For instance you can make a single gitignore using the query system they have in some command line tools as well:
giig python,emacs
Will give you both python and emacs.
One: I want to be able to specify which repo to use for this.
giig --url git+https://github.com/salotz/gitignore-salotz.git python,emacs
For lets say my personal one. But I also want to be able to compose them. (Do we need this to support that or can I just use piping?)
Perhaps above this would implicit compose that and the default community one, to only get mine I might use:
giig --no-default \
--url git+https://github.com/salotz/gitignore-salotz.git python,emacs
This is necessary for if there is conflicting names. I.e. both have
Python.gitignore
in them.
To allow for composition you can use in the secondary repos the name
Python.patch
or specify some stacks Salotz-Python.Python.stack
which is a symlink to Python.stack
and infers that we also want to
use the Salotz-Python.gitignore
template file.
And for instance combining the default repo with a local one:
giig --url file://~/my-ignores python,emacs
Source Code Tree Database & Query Language
I have come across a number of tools which read your source code and then generate some set of quality metrics or other representations of your code that allows you to do high-level planning and understanding about your code base.
(I am primarily interested in Python although this niche is necessary for really any code base.)
All of these tools have to reinvent the same wheel which is generating an extended “AST” which is over the files in your project.
The touchstone for this is my pymatuning which is a good prototype of this, albeit surely incomplete in many ways.
So the proposals for software would then be:
- reliable general purpose tree data structure in python that allows for standard querying and outputs to various common formats like JSON etc.
- a standard Python module/package representation using this tree data format.
GNU Coreutils numfmt clone
Should implement this in pure-python for maximum portability.
This tool converts from bytes to higher units correctly and with optional formatting help.
Local Prometheus-style monitoring & logging
After having used Prometheus for some projects for monitoring programs, I find the model usable and a niche I haven’t found in any existing profiling software.
Often profilers (at least in the python world) inject code into your program and get metrics on everything that is going on.
This is useful, but in many scenarios the results are overwhelming and quite complex.
The Prometheus approach is much like a traditional logger in that you have to actually write code to make data samples. Prometheus differs in that it supports structured data (albeit a usefully small number of things like Gauges, Counters, etc.) while logging is just some strings usually and is then queried using a full-text search engine.
The full text search on logs seems, unsatisfying. Not only have you gone the extra mile to both add logging in and collect performance data (object sizes timings etc.) but you now have a separate problem of parsing them.
Having first class support for the most common monitoring use cases is something that should be supported.
Logging software like eliot
provides a vastly improved experience on
standard loggers by outputting a JSON stream of log messages. By
leveraging this you can report all kinds of data including numeric
data.
As an aside eliot
has a focus on root cause analysis and not
monitoring, and as such it has a much higher overhead in injecting the
code into your “business” code. I.e. it wouldn’t really be amenable in
all cases to “aspect oriented” approaches like prometheus gauges would
be. That said you don’t need to use these features in eliot
.
The extra things on top of eliot that a Prometheus like system would
have, is a special purpose time series database and query engine
(PromQL
).
The problems with Prometheus
are that it is meant to be “cloud
native” and as such is best deployed in a container cluster along with
a bunch of other helper services that communicate via HTTP.
While this is an approach suited to a networked environment it increases the overhead of getting up and started by a lot.
The goal of this project would be to significantly reduce this complexity and not need to have http servers running for all programs being scraped.
This could simply be done over a RESTful virtual file system like 9P on FUSE.
Other than that prometheus places a great importance on shipping as a single binary (as most golang programs do) which is useful for moving it around and easing deployment.
In our situation we could be more oriented towards plugins and shipping more like a traditional python framework.
Plugins then would essentially replace the numerous side-car services that prometheus talks to (like the alertmanager) which are just “dockered” away so that the prometheus binary remains easy to deploy.