-
Notifications
You must be signed in to change notification settings - Fork 138
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
docs: dataflows: Improve docs #1279
Comments
By convention operations which have a single output we usually name that output |
|
|
|
|
|
|
|
Generic flow (data,work,program) executor |
Why: unikernels |
|
Manifest SchemaManifests allow us to focus less on code and more on data. Our manifests can be thought of as ways to provide a config class with it's References:
ValidatingInstall jsonschema, and pyyaml python modules pip install pyyaml jsonschema This is how you convert from yaml to json $ python -c "import sys, pathlib, json, yaml; pathlib.Path(sys.argv[-1]).write_text(json.dumps(yaml.safe_load(pathlib.Path(sys.argv[-2]).read_text()), indent=4) + '\n')" manifest.yaml manifest.json Example below validates, checking status code we see exit code 0 which means $ jsonschema --instance manifest.json manifest-format-name.0.0.2.schema.json
$ echo $?
0 WritingSuggested process (in flux)
ADR Templatemy-format-name
##############
Version: 0.0.1
Date: 2022-01-22
Status
******
Proposed|Evolving|Final
Description
***********
ADR for a declaration of assets (manifest) involved in the process
of greeting an entity.
Context
*******
- We need a way to describe the data involved in a greeting
Intent
******
- Ensure valid communication path to ``entity``
- Send ``entity`` message containing ``greeting`` |
State transition, issue filing, estimating time to close issue, all have to do with having the complete mapping of inputs to problem (data flow). If we have an accurate mapping then we have a valid flow, we can create an estimate that we understand how we created the estimate because we have a complete description of the problem. See also: estimation of GSoC project time, estimation of time to complete best practices badging program activities, time to complete any issue, helps with prioritization of who in an org should work on what, when, to unblock others in the org. Related to builtree discussion. |
We use dataflows because they are a declarative approach which allows you to define different implementations based on different execution environments, or even swap out pieces of a flow or do overlays to add new pieces. They help solve the fork and pull from upstream issue. When you fork code and change it, you need to pull in changes from the upstream (the place you forked it from). This is difficult to manage with the changes you have already made, using a dataflow makes this easy, as we focus on how the pieces of data should connect, rather than implementations of their connections. This declarative approach is important because the source of inputs change depending on your environment. For example, in CI you might grab from an environment variable populated from secrets. In your local setup, you might grab from the |
Notes from work in progress tutorial: We need to come up with serveral metrics to track and plot throughout. We could also make this like a choose your own adventure style tutorial, Will need to add in metrics API and use in various places in
This could be done as an IPython notebook.
|
|
Classes become systems of events (dataflows) where the interface they fit into is defined by contracts (manifests) |
To implement and interface one but satisfy system usage contraints. I.e. must be ready to accept certain events (manifest) and fulfill contract. Might also need to give certain events (inputas manifest) |
|
Run whatever you want, wherever you want, however you want, with whatever you want, for whoever you want. |
Hitting Critical Velocity. The fully connected dev model. |
City planning as dataflows plus CIImagine you're playing a city simulator. Each building has an architecture and purpose within the architecture of your overall city. Imagine that there are at certain guiding overall strategies which the entities within the city understand must be taken into account to perform any actions they're directed to do. For example, one strategic goal or piece of a strategic plan might be that the city should always collect garbage and there should never be a day where garbage is not collected from more than 75% of the residents. The garbage crews as agents need to know that their course of action in terms of actions they should take or next steps sent by the city should have been vetted by the strategic plan which involves the assurance of residents garbage being picked up at the expected percentage. Entities also make decisions based on data used to train their models in an active learning situation. Data used to train agent action / strategic plans should come only from flows validated by a set of strategic plans, or strategic plans with certain credentials (verified to ensure kill no humans is applicable for this subset of data). This will allow us to add controls on training models, to ensure that their data does not come from sources which would cause malicious behavior, or behavior unaligned with other any active strategic plans. We must also be able to add new strategic plans on the fly and modify the top level strategic decision maker. This example maps too the provenance information we will be collecting about the plans given to agents or inputs given to opimps. This provenance information must include in attestation or valid claim that that certain sets of strategic plans were taken into consideration by the top level strategic decision maker when the orders come down to that agent or opimp. Optimize for efficiency is post captializm society |
Alice's Adventures in Wonderland
Together we'll build Alice, an Artificial General Intelligence. We'll be successful when Alice successfully maintains a DFFML plugin as the only maintainer for a year. Debugging issues, writing fixes, reviewing code, accepting pull requests, refactoring the code base post PR merge, dealing with vulnerabilities, cutting releases, maintaining release branches, and completing development work in alignment with the plugin's universal blueprint. She will modify, submit pull requests to, and track upstreaming of patches to her dependencies to achieve the cleanest architecture possible. We'll interact with her as we would any other remote developer. We'll need to build the foundations of Alice's thought processes. Throughout this series, we'll rely heavily on a mental model based on how humans think and problem solve. By the end of this series we'll have ensured Alice has all the primitive operations she requires to carry out the scientific process. Terminology
ExpectationsAlice is going to be held to very high standards. We should expect this list to grow for a long time (years). This list of expectations may at times contain fragments which need to be worked out more and are only fragment so the ideas don't get forgotten.
Alice's Understanding of Software EngineeringWe'll teach Alice what she needs to know about software engineering though our InnerSource series. She'll follow the best practices outlined there. She'll understand a codebase's health in part using InnerSource metric collectors. |
What we end up with is a general purpose reinforcement learning architecture. This architecture can be feed any data and make sense of how the data relates to it's universal blueprint. The trained models and custom logic that form its understanding of how the data relates to it's universal blueprint are it's identity. As such, our entity named Alice will be trained on data making her an open source maintainer. We'll show in a later series of blog posts how to create custom entities with custom universal blueprints (strategic goals, assets at their disposal, etc.). Entities have jobs, Alice's first job is to be a maintainer. Her job is reflected in her universal blueprint, which will contains all the dataflow, orchestration configs, dataflows used to collect data for and train models used in her strategic plans, as well as any static input data or other static system context. We can save a "version" of Alice by leveraging caching. We specify dataflows used to train models which are then used in strategic plans. Perhaps there is something here with on dataflow instantiation query the inodes from shared config and sometimes a config will be defined by the running of a dataflow which will itself consume inputs or configs from other inodes within shared config. So on dataflow instantiation find leaf nodes in terms of purely static plugins to instantiate within shared configs region. This shared config linker needs to have access to the system context. For example if the flow is the top level flow triggered from the CLI then the system context should contain all the command line arguments somewhere within it's input network (after kick off or when looking at a cached copy from after kick off). Definition a plugin can be done by declaring it will be an instance where the config provided is from the output of a dataflow. That dataflow can be run as a subflow with a copy on write version of the parent system context (for accessing things like CLI flags given). There could be an operation which is runs an output operation dataflow on the CoW parent system context. That operations output can then be formed into it's appropriate place in the config of the plugin it will be used to instantiate. We will of course need to create a dependency graph between inodes. We should support requesting of re-instantiation of instances within shared configs via event based communication to strategic decision maker. Configuration and implementation of the strategic decision maker (SDM) determine what active strategic plans are taken into account. The SDM must provide attested claims for each decision it makes with any data sent over potentially tamperable communication channels (needs understanding of properties of use of all instances of plugins, for example in memory everything is different than and operation implementation network connected over the internet). |
Serializable graph data structure with linkage, can be used for "shared config", just add another property like an inode to the |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
These are notes and scratch work around the purpose and future of the project.
Mission: Provide a clear, meticulously validated, ubiquitously adopted reference architecture for an egalitarian Artificial General Intelligence (AGI) which respects the first law of robotics.
To do so we must enable the AGI with the ability to act in response to the current system context where it understands how to predict possible future system contexts and understands which future system contexts it wishes to pursue are acceptable according to guiding strategic plans (such as do no harm). We must also ensure that human and machine can interact via a shared language, the universal blueprint.
AI has the potential to do many great things. However, it also has the potential to to terrible things too. Recently there was an example of scientists who used a model that was good a generating life saving drugs, in reverse, to generate deadly poisons. GPU manufacturers recently implemented anti-crypto mining features. Since the ubiquitous unit of parallel compute is a GPU, this stops people from buying up GPUs for what we as a community at large have deemed undesirable behavior (hogging all the GPUs). There is nothing stopping those people from buying for building their own ASICs to mine crypto. However, the market for that is a subset of the larger GPU market. Cost per unit goes up, multi-use capabilities go down. GPU manufacturers are effectively able to ensure that the greater good is looked after because GPUs are the ubiquitous facilitator of parallel compute. If we prove out an architecture for an AGI that is robust, easy to adopt, and integrates with the existing open source ecosystem, we can bake in this looking after the greater good.
As we democratize AI, we must be careful not to democratize AI that will do harm. We must think secure by default in terms of architecture which has facilities for guard rails, baking safety into AI.
Failure to achieve ubiquitous adoption of an open architecture with meticulously audited safety controls will result in further consolidation of wealth and widening inequality.
The text was updated successfully, but these errors were encountered: