From 4de49ad3fd850e3eceb22f633c1e6400503f5aca Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 11 Aug 2021 15:21:18 +0300 Subject: [PATCH 01/54] initial doc structure change --- ...ng_the_federation.agg_based.baremetal.rst} | 0 ...the_federation.agg_based.certificates.rst} | 0 ...nning_the_federation.agg_based.docker.rst} | 0 ...ing_the_federation.agg_based.notebook.rst} | 0 docs/running_the_federation.agg_based.rst | 26 +++++++++++++++++++ ..._the_federation.agg_based.singularity.rst} | 0 ..._the_federation.agg_based.start_nodes.rst} | 0 ...ration.director_based.interactive_api.rst} | 0 .../running_the_federation.director_based.rst | 8 ++++++ docs/running_the_federation.rst | 8 ++---- 10 files changed, 36 insertions(+), 6 deletions(-) rename docs/{running_the_federation.baremetal.rst => running_the_federation.agg_based.baremetal.rst} (100%) rename docs/{running_the_federation.certificates.rst => running_the_federation.agg_based.certificates.rst} (100%) rename docs/{running_the_federation.docker.rst => running_the_federation.agg_based.docker.rst} (100%) rename docs/{running_the_federation.notebook.rst => running_the_federation.agg_based.notebook.rst} (100%) create mode 100644 docs/running_the_federation.agg_based.rst rename docs/{running_the_federation.singularity.rst => running_the_federation.agg_based.singularity.rst} (100%) rename docs/{running_the_federation.start_nodes.rst => running_the_federation.agg_based.start_nodes.rst} (100%) rename docs/{running_the_federation.interactive_api.rst => running_the_federation.director_based.interactive_api.rst} (100%) create mode 100644 docs/running_the_federation.director_based.rst diff --git a/docs/running_the_federation.baremetal.rst b/docs/running_the_federation.agg_based.baremetal.rst similarity index 100% rename from docs/running_the_federation.baremetal.rst rename to docs/running_the_federation.agg_based.baremetal.rst diff --git a/docs/running_the_federation.certificates.rst b/docs/running_the_federation.agg_based.certificates.rst similarity index 100% rename from docs/running_the_federation.certificates.rst rename to docs/running_the_federation.agg_based.certificates.rst diff --git a/docs/running_the_federation.docker.rst b/docs/running_the_federation.agg_based.docker.rst similarity index 100% rename from docs/running_the_federation.docker.rst rename to docs/running_the_federation.agg_based.docker.rst diff --git a/docs/running_the_federation.notebook.rst b/docs/running_the_federation.agg_based.notebook.rst similarity index 100% rename from docs/running_the_federation.notebook.rst rename to docs/running_the_federation.agg_based.notebook.rst diff --git a/docs/running_the_federation.agg_based.rst b/docs/running_the_federation.agg_based.rst new file mode 100644 index 0000000000..06f2bf26b4 --- /dev/null +++ b/docs/running_the_federation.agg_based.rst @@ -0,0 +1,26 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _running_the_federation_aggregato_based: + +********************** +Running the Federation +********************** + +First make sure you've installed the software :ref:`using these instructions ` + +.. figure:: images/openfl_flow.png + +.. centered:: 100K foot view of OpenFL workflow + +The high-level workflow is shown in the figure above. Note that once OpenFL is installed on all nodes of the federation and every member of the federation has a valid PKI certificate, all that is needed to run an instance of a federated workload is to distribute the workspace to all federation members and then run the command to start the node (e.g. :code:`fx aggregator start`/:code:`fx collaborator start`). In other words, most of the work is setting up an initial environment on all of the federation nodes that can be used across new instantiations of federations. + +.. toctree:: + :maxdepth: 4 + + running_the_federation.agg_based.notebook + running_the_federation.agg_based.baremetal + running_the_federation.agg_based.docker + running_the_federation.agg_based.certificates + running_the_federation.agg_based.start_nodes.rst + running_the_federation.director_based.interactive_api diff --git a/docs/running_the_federation.singularity.rst b/docs/running_the_federation.agg_based.singularity.rst similarity index 100% rename from docs/running_the_federation.singularity.rst rename to docs/running_the_federation.agg_based.singularity.rst diff --git a/docs/running_the_federation.start_nodes.rst b/docs/running_the_federation.agg_based.start_nodes.rst similarity index 100% rename from docs/running_the_federation.start_nodes.rst rename to docs/running_the_federation.agg_based.start_nodes.rst diff --git a/docs/running_the_federation.interactive_api.rst b/docs/running_the_federation.director_based.interactive_api.rst similarity index 100% rename from docs/running_the_federation.interactive_api.rst rename to docs/running_the_federation.director_based.interactive_api.rst diff --git a/docs/running_the_federation.director_based.rst b/docs/running_the_federation.director_based.rst new file mode 100644 index 0000000000..de16c4ddfd --- /dev/null +++ b/docs/running_the_federation.director_based.rst @@ -0,0 +1,8 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +.. _running_the_federation_director_based: + +********************** +Using Long-Living services +********************** diff --git a/docs/running_the_federation.rst b/docs/running_the_federation.rst index 8af954978c..880795aabf 100644 --- a/docs/running_the_federation.rst +++ b/docs/running_the_federation.rst @@ -18,9 +18,5 @@ The high-level workflow is shown in the figure above. Note that once OpenFL is i .. toctree:: :maxdepth: 4 - running_the_federation.notebook - running_the_federation.baremetal - running_the_federation.docker - running_the_federation.certificates - running_the_federation.start_nodes.rst - running_the_federation.interactive_api + running_the_federation.agg_based + running_the_federation.director_based From 015019b6a5ee48ea3bb74f3a30e6f13873e04246 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 11 Aug 2021 16:56:48 +0300 Subject: [PATCH 02/54] add utilities folder --- docs/manual.rst | 1 + docs/running_the_federation.agg_based.rst | 2 +- docs/source_utilities/pki.cert_request.rst | 0 docs/source_utilities/pki.rst | 12 ++++++++++++ docs/source_utilities/pki.step_ca.rst | 0 docs/source_utilities/utilities.rst | 12 ++++++++++++ 6 files changed, 26 insertions(+), 1 deletion(-) create mode 100644 docs/source_utilities/pki.cert_request.rst create mode 100644 docs/source_utilities/pki.rst create mode 100644 docs/source_utilities/pki.step_ca.rst create mode 100644 docs/source_utilities/utilities.rst diff --git a/docs/manual.rst b/docs/manual.rst index 50152d384c..6a94f5f6c2 100644 --- a/docs/manual.rst +++ b/docs/manual.rst @@ -12,4 +12,5 @@ Manual install running_the_federation plan_settings + source_utilities/utilities advanced_topics diff --git a/docs/running_the_federation.agg_based.rst b/docs/running_the_federation.agg_based.rst index 06f2bf26b4..cb309380a3 100644 --- a/docs/running_the_federation.agg_based.rst +++ b/docs/running_the_federation.agg_based.rst @@ -1,7 +1,7 @@ .. # Copyright (C) 2020-2021 Intel Corporation .. # SPDX-License-Identifier: Apache-2.0 -.. _running_the_federation_aggregato_based: +.. _running_the_federation_aggregator_based: ********************** Running the Federation diff --git a/docs/source_utilities/pki.cert_request.rst b/docs/source_utilities/pki.cert_request.rst new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/source_utilities/pki.rst b/docs/source_utilities/pki.rst new file mode 100644 index 0000000000..3742a5c5ad --- /dev/null +++ b/docs/source_utilities/pki.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL PKI solutions +****** + +.. toctree:: + :maxdepth: 4 + + pki.cert_request + pki.step_ca \ No newline at end of file diff --git a/docs/source_utilities/pki.step_ca.rst b/docs/source_utilities/pki.step_ca.rst new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/source_utilities/utilities.rst b/docs/source_utilities/utilities.rst new file mode 100644 index 0000000000..714078ddbb --- /dev/null +++ b/docs/source_utilities/utilities.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL Utilities +****** + +.. toctree:: + :maxdepth: 4 + + pki + data_splitters \ No newline at end of file From eff49946f1eb7a983856bc39ea5dafdac41cc24a Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 11 Aug 2021 17:08:03 +0300 Subject: [PATCH 03/54] test image import --- docs/source_utilities/pki.cert_request.rst | 8 ++++++++ docs/source_utilities/splitters_data.rst | 6 ++++++ 2 files changed, 14 insertions(+) create mode 100644 docs/source_utilities/splitters_data.rst diff --git a/docs/source_utilities/pki.cert_request.rst b/docs/source_utilities/pki.cert_request.rst index e69de29bb2..7bb1e69070 100644 --- a/docs/source_utilities/pki.cert_request.rst +++ b/docs/source_utilities/pki.cert_request.rst @@ -0,0 +1,8 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +The old PKI +****** + +.. figure:: images/openfl_flow.png \ No newline at end of file diff --git a/docs/source_utilities/splitters_data.rst b/docs/source_utilities/splitters_data.rst new file mode 100644 index 0000000000..1c0bed123b --- /dev/null +++ b/docs/source_utilities/splitters_data.rst @@ -0,0 +1,6 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +*********** +Data Splitters +*********** From 30daa63c9fad3e4dd4f2f7639bd79deea6865a35 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 11 Aug 2021 17:09:46 +0300 Subject: [PATCH 04/54] fixes --- docs/source_utilities/pki.cert_request.rst | 2 +- docs/source_utilities/utilities.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source_utilities/pki.cert_request.rst b/docs/source_utilities/pki.cert_request.rst index 7bb1e69070..6682934d54 100644 --- a/docs/source_utilities/pki.cert_request.rst +++ b/docs/source_utilities/pki.cert_request.rst @@ -5,4 +5,4 @@ The old PKI ****** -.. figure:: images/openfl_flow.png \ No newline at end of file +.. figure:: ../images/openfl_flow.png \ No newline at end of file diff --git a/docs/source_utilities/utilities.rst b/docs/source_utilities/utilities.rst index 714078ddbb..b6732c74a9 100644 --- a/docs/source_utilities/utilities.rst +++ b/docs/source_utilities/utilities.rst @@ -9,4 +9,4 @@ OpenFL Utilities :maxdepth: 4 pki - data_splitters \ No newline at end of file + splitters_data \ No newline at end of file From aa148cb007ea081152c034ef601f6869259329a9 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 11 Aug 2021 17:28:34 +0300 Subject: [PATCH 05/54] added architecture section --- docs/manual.rst | 1 + .../source_architecture/architecture.long_living.rst | 6 ++++++ docs/source_architecture/architecture.rst | 12 ++++++++++++ docs/source_architecture/architecture.spawning.rst | 6 ++++++ docs/source_utilities/pki.cert_request.rst | 2 -- docs/source_utilities/pki.step_ca.rst | 6 ++++++ 6 files changed, 31 insertions(+), 2 deletions(-) create mode 100644 docs/source_architecture/architecture.long_living.rst create mode 100644 docs/source_architecture/architecture.rst create mode 100644 docs/source_architecture/architecture.spawning.rst diff --git a/docs/manual.rst b/docs/manual.rst index 6a94f5f6c2..3dbf9c50ea 100644 --- a/docs/manual.rst +++ b/docs/manual.rst @@ -12,5 +12,6 @@ Manual install running_the_federation plan_settings + source_architecture/architecture source_utilities/utilities advanced_topics diff --git a/docs/source_architecture/architecture.long_living.rst b/docs/source_architecture/architecture.long_living.rst new file mode 100644 index 0000000000..87cfcbd01c --- /dev/null +++ b/docs/source_architecture/architecture.long_living.rst @@ -0,0 +1,6 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL long-living actors +****** diff --git a/docs/source_architecture/architecture.rst b/docs/source_architecture/architecture.rst new file mode 100644 index 0000000000..9e1167ec07 --- /dev/null +++ b/docs/source_architecture/architecture.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL core components +****** + +.. toctree:: + :maxdepth: 4 + + architecture.long_living + architecture.spawning \ No newline at end of file diff --git a/docs/source_architecture/architecture.spawning.rst b/docs/source_architecture/architecture.spawning.rst new file mode 100644 index 0000000000..24cfb5c410 --- /dev/null +++ b/docs/source_architecture/architecture.spawning.rst @@ -0,0 +1,6 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL short-living actors +****** diff --git a/docs/source_utilities/pki.cert_request.rst b/docs/source_utilities/pki.cert_request.rst index 6682934d54..29a39db221 100644 --- a/docs/source_utilities/pki.cert_request.rst +++ b/docs/source_utilities/pki.cert_request.rst @@ -4,5 +4,3 @@ ****** The old PKI ****** - -.. figure:: ../images/openfl_flow.png \ No newline at end of file diff --git a/docs/source_utilities/pki.step_ca.rst b/docs/source_utilities/pki.step_ca.rst index e69de29bb2..6168960f0c 100644 --- a/docs/source_utilities/pki.step_ca.rst +++ b/docs/source_utilities/pki.step_ca.rst @@ -0,0 +1,6 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +Semi-automatic PKI +****** \ No newline at end of file From a3eea9a16f9e907f2e6e0dde45b4f28a8d5a9656 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Thu, 12 Aug 2021 09:55:55 +0300 Subject: [PATCH 06/54] restructure docs --- docs/manual.rst | 6 +++--- .../components/components.long_living.rst} | 0 .../architecture.rst => source/components/components.rst} | 4 ++-- .../components/components.spawning.rst} | 0 .../utilities}/pki.cert_request.rst | 0 docs/{source_utilities => source/utilities}/pki.rst | 0 docs/{source_utilities => source/utilities}/pki.step_ca.rst | 0 .../utilities}/splitters_data.rst | 0 docs/{source_utilities => source/utilities}/utilities.rst | 0 9 files changed, 5 insertions(+), 5 deletions(-) rename docs/{source_architecture/architecture.long_living.rst => source/components/components.long_living.rst} (100%) rename docs/{source_architecture/architecture.rst => source/components/components.rst} (75%) rename docs/{source_architecture/architecture.spawning.rst => source/components/components.spawning.rst} (100%) rename docs/{source_utilities => source/utilities}/pki.cert_request.rst (100%) rename docs/{source_utilities => source/utilities}/pki.rst (100%) rename docs/{source_utilities => source/utilities}/pki.step_ca.rst (100%) rename docs/{source_utilities => source/utilities}/splitters_data.rst (100%) rename docs/{source_utilities => source/utilities}/utilities.rst (100%) diff --git a/docs/manual.rst b/docs/manual.rst index 3dbf9c50ea..8ccf286f4e 100644 --- a/docs/manual.rst +++ b/docs/manual.rst @@ -6,12 +6,12 @@ Manual ****** .. toctree:: - :maxdepth: 4 + :maxdepth: 2 overview install running_the_federation plan_settings - source_architecture/architecture - source_utilities/utilities + source/components/components + source/utilities/utilities advanced_topics diff --git a/docs/source_architecture/architecture.long_living.rst b/docs/source/components/components.long_living.rst similarity index 100% rename from docs/source_architecture/architecture.long_living.rst rename to docs/source/components/components.long_living.rst diff --git a/docs/source_architecture/architecture.rst b/docs/source/components/components.rst similarity index 75% rename from docs/source_architecture/architecture.rst rename to docs/source/components/components.rst index 9e1167ec07..4f48a352c9 100644 --- a/docs/source_architecture/architecture.rst +++ b/docs/source/components/components.rst @@ -8,5 +8,5 @@ OpenFL core components .. toctree:: :maxdepth: 4 - architecture.long_living - architecture.spawning \ No newline at end of file + components.long_living + components.spawning \ No newline at end of file diff --git a/docs/source_architecture/architecture.spawning.rst b/docs/source/components/components.spawning.rst similarity index 100% rename from docs/source_architecture/architecture.spawning.rst rename to docs/source/components/components.spawning.rst diff --git a/docs/source_utilities/pki.cert_request.rst b/docs/source/utilities/pki.cert_request.rst similarity index 100% rename from docs/source_utilities/pki.cert_request.rst rename to docs/source/utilities/pki.cert_request.rst diff --git a/docs/source_utilities/pki.rst b/docs/source/utilities/pki.rst similarity index 100% rename from docs/source_utilities/pki.rst rename to docs/source/utilities/pki.rst diff --git a/docs/source_utilities/pki.step_ca.rst b/docs/source/utilities/pki.step_ca.rst similarity index 100% rename from docs/source_utilities/pki.step_ca.rst rename to docs/source/utilities/pki.step_ca.rst diff --git a/docs/source_utilities/splitters_data.rst b/docs/source/utilities/splitters_data.rst similarity index 100% rename from docs/source_utilities/splitters_data.rst rename to docs/source/utilities/splitters_data.rst diff --git a/docs/source_utilities/utilities.rst b/docs/source/utilities/utilities.rst similarity index 100% rename from docs/source_utilities/utilities.rst rename to docs/source/utilities/utilities.rst From 2e9ffa6796a2f87f5df31b7adb861dfa07771d5f Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Thu, 12 Aug 2021 09:58:00 +0300 Subject: [PATCH 07/54] moving package structure to index --- docs/index.rst | 2 +- docs/manual.rst | 1 - docs/source/components/components.long_living.rst | 6 ------ docs/source/components/components.rst | 12 ------------ docs/source/components/components.spawning.rst | 6 ------ 5 files changed, 1 insertion(+), 26 deletions(-) delete mode 100644 docs/source/components/components.long_living.rst delete mode 100644 docs/source/components/components.rst delete mode 100644 docs/source/components/components.spawning.rst diff --git a/docs/index.rst b/docs/index.rst index e6b86b1fa7..ca91f8ddb5 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -32,7 +32,7 @@ can use any deep learning frameworks, such as `Tensorflow Date: Thu, 12 Aug 2021 10:49:48 +0300 Subject: [PATCH 08/54] ignore fix --- docs/index.rst | 2 +- docs/openfl.rst | 11 +++++++++++ docs/source/openfl/components.long_living.rst | 6 ++++++ docs/source/openfl/components.rst | 12 ++++++++++++ docs/source/openfl/components.spawning.rst | 6 ++++++ 5 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 docs/openfl.rst create mode 100644 docs/source/openfl/components.long_living.rst create mode 100644 docs/source/openfl/components.rst create mode 100644 docs/source/openfl/components.spawning.rst diff --git a/docs/index.rst b/docs/index.rst index ca91f8ddb5..e6b86b1fa7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -32,7 +32,7 @@ can use any deep learning frameworks, such as `Tensorflow Date: Fri, 13 Aug 2021 10:54:53 +0300 Subject: [PATCH 09/54] Restructure openfl section --- docs/source/openfl/components.long_living.rst | 6 ------ docs/source/openfl/components.rst | 11 +++++++++-- docs/source/openfl/components.spawning.rst | 6 ------ 3 files changed, 9 insertions(+), 14 deletions(-) delete mode 100644 docs/source/openfl/components.long_living.rst delete mode 100644 docs/source/openfl/components.spawning.rst diff --git a/docs/source/openfl/components.long_living.rst b/docs/source/openfl/components.long_living.rst deleted file mode 100644 index 87cfcbd01c..0000000000 --- a/docs/source/openfl/components.long_living.rst +++ /dev/null @@ -1,6 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -****** -OpenFL long-living actors -****** diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 4f48a352c9..4ea7d1a240 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -8,5 +8,12 @@ OpenFL core components .. toctree:: :maxdepth: 4 - components.long_living - components.spawning \ No newline at end of file + `Spawning`_ + `Long-living`_ + +Spawning +########## + +Long-living +############# + diff --git a/docs/source/openfl/components.spawning.rst b/docs/source/openfl/components.spawning.rst deleted file mode 100644 index 24cfb5c410..0000000000 --- a/docs/source/openfl/components.spawning.rst +++ /dev/null @@ -1,6 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -****** -OpenFL short-living actors -****** From fe1eb329d26c3a4ffc1c7c778ab1473ebb3463f4 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 11:17:44 +0300 Subject: [PATCH 10/54] openfl docs restructuring --- docs/source/openfl/communication.rst | 12 ++++++++++ docs/source/openfl/components.rst | 35 +++++++++++++++++++++++++++- docs/source/openfl/plugins.rst | 12 ++++++++++ 3 files changed, 58 insertions(+), 1 deletion(-) create mode 100644 docs/source/openfl/communication.rst create mode 100644 docs/source/openfl/plugins.rst diff --git a/docs/source/openfl/communication.rst b/docs/source/openfl/communication.rst new file mode 100644 index 0000000000..72c6f24daf --- /dev/null +++ b/docs/source/openfl/communication.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL inter-component communication +****** + +.. toctree:: + :maxdepth: 2 + + `...`_ + `...`_ \ No newline at end of file diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 4ea7d1a240..c0be8a9325 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -1,19 +1,52 @@ .. # Copyright (C) 2020-2021 Intel Corporation .. # SPDX-License-Identifier: Apache-2.0 +.. _openfl_components: + ****** OpenFL core components ****** .. toctree:: - :maxdepth: 4 + :maxdepth: 2 `Spawning`_ `Long-living`_ + +.. _openfl_spawning_components: + Spawning ########## +Aggregator +=========== + +The aggregator is a short-living entity, which means that its lifespan is limited by experiment execution time. It orchestrates collaborators according to the FL plan and performs model updates aggregation. +The aggregator is spawned by the Director (described below) when a new experiment is submitted. + + +Collaborator +============= + +Collaborator is also a short living entity, it manages training the model on local data: executes assigned tasks, converts DL framework-specific tensor objects to OpenFL inner representation, and exchanges model parameters with the aggregator. +Converting tensors is done by Framework adapter plugins. OpenFL ships with Pytorch and TensorFlow 2 framework adapters, this list will be extended in the future. User is free to implement their adapter for the required DL framework enabling OpenFL support for experiments using this framework. The adapter plugin interface is simple: there are two required methods to load and extract tensors from a model and optimizer. Model is loaded with relevant weights before every task and at the end of the training task, weights are extracted to be sent to the central node and aggregated. +Collaborator instance is created by Envoy (described below) when a new experiment is submitted. Every collaborator is a unique service as it is loaded with a local Shard Descriptor to perform tasks included in an FL experiment. + +.. _openfl_ll_components: + Long-living ############# +Director +========== + +Director is a long-living entity; it is a central node of the federation and may take in several experiments (with the same data interface). When an experiment is reported director starts an aggregator and sends the experiment data to involved envoys; during the experiment, Director oversees the aggregator and updates the user on the status of the experiment. +Director runs two services: one for frontend users and another one for envoys. It can distribute an experiment reported with the frontend API across the federation and communicate back a trained model snapshot and metrics. +Director support several concurrent frontend connections (yet experiments are run one by one) + +Envoy +========= + +Some text + diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst new file mode 100644 index 0000000000..657ef0b66e --- /dev/null +++ b/docs/source/openfl/plugins.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +OpenFL plugins +****** + +.. toctree:: + :maxdepth: 2 + + `...`_ + `...`_ \ No newline at end of file From bfaf9793c373fc0325a51393652afd84b8bc4fb4 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:18:48 +0300 Subject: [PATCH 11/54] introduced director-based workflow --- docs/openfl.rst | 2 +- docs/running_the_federation.rst | 2 +- docs/source/workflow/director_based_workflow.rst | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-) create mode 100644 docs/source/workflow/director_based_workflow.rst diff --git a/docs/openfl.rst b/docs/openfl.rst index e4081ea8cb..390b9e4629 100644 --- a/docs/openfl.rst +++ b/docs/openfl.rst @@ -2,7 +2,7 @@ .. # SPDX-License-Identifier: Apache-2.0 ****** -OpenFL package structure +OpenFL structure ****** .. toctree:: diff --git a/docs/running_the_federation.rst b/docs/running_the_federation.rst index 880795aabf..29506d1af4 100644 --- a/docs/running_the_federation.rst +++ b/docs/running_the_federation.rst @@ -19,4 +19,4 @@ The high-level workflow is shown in the figure above. Note that once OpenFL is i :maxdepth: 4 running_the_federation.agg_based - running_the_federation.director_based + source/workflow/director_based_workflow diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst new file mode 100644 index 0000000000..0185ef1c65 --- /dev/null +++ b/docs/source/workflow/director_based_workflow.rst @@ -0,0 +1,9 @@ +.. # Copyright (C) 2020 Intel Corporation +.. # Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you. + +.. _director_workflow: + +************ +Establishing a long-living Federation with Director +************ + From 62c31aff1f7a467a61d497efabf84ea316409c16 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:20:44 +0300 Subject: [PATCH 12/54] fix --- docs/source/workflow/director_based_workflow.rst | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 0185ef1c65..2eb4a25a84 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -7,3 +7,17 @@ Establishing a long-living Federation with Director ************ +Section 1 +############# + +some step +================== + +Section 2 +############# + +************ +Describing an FL experimnet using Interactive Python API +************ + +another story \ No newline at end of file From c320b53a9372ee5b4fba559a9e1ac70785bcf716 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:23:57 +0300 Subject: [PATCH 13/54] another fix --- docs/source/workflow/director_based_workflow.rst | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 2eb4a25a84..7dd2785aa0 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -4,20 +4,18 @@ .. _director_workflow: ************ -Establishing a long-living Federation with Director +Director-based workflow ************ -Section 1 -############# +Establishing a long-living Federation with Director +####################################### some step ================== -Section 2 -############# -************ + Describing an FL experimnet using Interactive Python API -************ +####################################### another story \ No newline at end of file From 1ae7e1825c49a53545945c4f5268ce3e25c9477c Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:29:30 +0300 Subject: [PATCH 14/54] Federation setup flow --- docs/source/workflow/director_based_workflow.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 7dd2785aa0..fd3558e75a 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -10,10 +10,19 @@ Director-based workflow Establishing a long-living Federation with Director ####################################### -some step +# Install OpenFL ================== +# Implement Shard Descriptors +================== + +Then the data owners need to implement `Shard Descriptors` Python classes. +# Start Director +================== + +# Start Envoys +================== Describing an FL experimnet using Interactive Python API ####################################### From d8452bcf282bc637509e9c7609e837ea2e3144ee Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:32:02 +0300 Subject: [PATCH 15/54] list fix --- docs/source/workflow/director_based_workflow.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index fd3558e75a..6c03b6fc33 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -10,18 +10,18 @@ Director-based workflow Establishing a long-living Federation with Director ####################################### -# Install OpenFL +#. Install OpenFL ================== -# Implement Shard Descriptors +#. Implement Shard Descriptors ================== Then the data owners need to implement `Shard Descriptors` Python classes. -# Start Director +#. Start Director ================== -# Start Envoys +#. Start Envoys ================== Describing an FL experimnet using Interactive Python API From a4541fd769ad3f9cbd03afa9c5ab222030b7b399 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:35:25 +0300 Subject: [PATCH 16/54] added text --- docs/source/workflow/director_based_workflow.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 6c03b6fc33..7828491397 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -13,9 +13,18 @@ Establishing a long-living Federation with Director #. Install OpenFL ================== +Please, refer to :ref:`_install_software_root` + #. Implement Shard Descriptors ================== +OpenFL framework provides a ‘Shard descriptor’ interface that should be described on every collaborator node +to provide a unified data interface for FL experiments. Abstract “Shard descriptor” should be subclassed and +all its methods should be implemented to describe the way data samples and labels will be loaded from disk +during training. Shard descriptor is a subscriptable object that implements `__getitem__()` and `len()` methods +as well as several additional methods to access ‘sample shape’, ‘target shape’, and ‘shard description’ text +that may be used to identify participants during experiment definition and execution. + Then the data owners need to implement `Shard Descriptors` Python classes. #. Start Director From eb47b03b1c95d0df28eb6f8519948d726f47d18b Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:42:05 +0300 Subject: [PATCH 17/54] added director start commands --- .../workflow/director_based_workflow.rst | 21 +++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 7828491397..3fcde14777 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -10,14 +10,16 @@ Director-based workflow Establishing a long-living Federation with Director ####################################### -#. Install OpenFL +1. Install OpenFL ================== Please, refer to :ref:`_install_software_root` -#. Implement Shard Descriptors +2. Implement Shard Descriptors ================== +Then the data owners need to implement `Shard Descriptors` Python classes. + OpenFL framework provides a ‘Shard descriptor’ interface that should be described on every collaborator node to provide a unified data interface for FL experiments. Abstract “Shard descriptor” should be subclassed and all its methods should be implemented to describe the way data samples and labels will be loaded from disk @@ -25,12 +27,19 @@ during training. Shard descriptor is a subscriptable object that implements `__g as well as several additional methods to access ‘sample shape’, ‘target shape’, and ‘shard description’ text that may be used to identify participants during experiment definition and execution. -Then the data owners need to implement `Shard Descriptors` Python classes. - -#. Start Director +3. Start Director ================== -#. Start Envoys + .. code-block:: console + + $ fx director start --disable-tls -c director_config.yaml + + .. code-block:: console + + FQDN=$1 + fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt + +1. Start Envoys ================== Describing an FL experimnet using Interactive Python API From bcb34006fef6575c5b61682d8cf6f8a291014bc2 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:44:14 +0300 Subject: [PATCH 18/54] added envoy commands --- docs/source/workflow/director_based_workflow.rst | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 3fcde14777..cb25260250 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -35,13 +35,25 @@ that may be used to identify participants during experiment definition and execu $ fx director start --disable-tls -c director_config.yaml .. code-block:: console - + FQDN=$1 fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt 1. Start Envoys ================== + .. code-block:: console + + $ fx envoy start -n env_one --disable-tls --shard-config-path shard_config.yaml -d director_fqdn:port + + .. code-block:: console + + ENVOY_NAME=$1 + DIRECTOR_FQDN=$2 + + fx envoy start -n "$ENVOY_NAME" --shard-config-path shard_config.yaml -d "$DIRECTOR_FQDN":50051 -rc cert/root_ca.crt -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt + + Describing an FL experimnet using Interactive Python API ####################################### From 091ac13b0c443725b3ee16b92bce01b2dca008a2 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 15:46:43 +0300 Subject: [PATCH 19/54] fox commands --- docs/source/workflow/director_based_workflow.rst | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index cb25260250..5db2af4ad5 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -37,14 +37,18 @@ that may be used to identify participants during experiment definition and execu .. code-block:: console FQDN=$1 - fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt + fx director start -c director_config.yaml \ + -rc cert/root_ca.crt \ + -pk cert/"${FQDN}".key \ + -oc cert/"${FQDN}".crt -1. Start Envoys +4. Start Envoys ================== .. code-block:: console - $ fx envoy start -n env_one --disable-tls --shard-config-path shard_config.yaml -d director_fqdn:port + $ fx envoy start -n env_one --disable-tls \ + --shard-config-path shard_config.yaml -d director_fqdn:port .. code-block:: console From 7ebd91e346b75ed8628a6ccaea38a50240e359f2 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 17:53:45 +0300 Subject: [PATCH 20/54] Small typo fixes --- docs/running_the_federation.rst | 2 +- docs/source/openfl/communication.rst | 2 +- docs/source/openfl/components.rst | 6 +- docs/source/openfl/plugins.rst | 2 +- docs/source/utilities/pki.rst | 2 +- docs/source/utilities/utilities.rst | 2 +- .../workflow/director_based_workflow.rst | 17 +- .../workflow/interactive_python_api.rst | 228 ++++++++++++++++++ 8 files changed, 246 insertions(+), 15 deletions(-) create mode 100644 docs/source/workflow/interactive_python_api.rst diff --git a/docs/running_the_federation.rst b/docs/running_the_federation.rst index 29506d1af4..ecd361b846 100644 --- a/docs/running_the_federation.rst +++ b/docs/running_the_federation.rst @@ -16,7 +16,7 @@ First make sure you've installed the software :ref:`using these instructions `_ +as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, +paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. + +OpenFL PKI workflow +=================== +Openfl PKI pipeline asumes creating local CA with https server which listen signing requests. +Certificates from each node can be signed by requesting to CA server with special token. +Token must be copied to each node by some secure way. Each step is considered in detail below. + +1. Create CA, i.e create root key pair, CA server config and other. + .. code-block:: console + + $ fx pki install -p --password <123> --ca-url + | :code:`-p` - path to folder, which will contain ca files. + | :code:`--password` - password that will encrypts some ca files. + | :code:`--ca-url` - host and port which ca server will listen + This command will also download `step-ca `_ and `step `_ binaries from github. + +2. Run CA https server. + .. code-block:: console + + $ fx pki run -p + | :code:`-p` - path to folder, which will contain ca files. + +3. Get token for some node. + + .. code-block:: console + + $ fx pki get-token -n + | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node + + Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA + root certificate concatenated together. This JWT have twenty-four hours time-to-live. + +4. Copy token to node side (director or envoy) by some secure channel and run certify command. + .. code-block:: console + + $ fx pki certify -n -t + | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node + | :code:`-t` - output token from previous command + This command call step client, to connect to CA server over https. + Https is provided by root certificate which was copy with JWT. + Server authenticates client by JWT and client authenticates server by root certificate. + +Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node. + +****************************************** +Defining a Federated Learning Experiment +****************************************** +Interactive API allows setting up an experiment from a single entrypoint - a Jupyter notebook or a Python script. +Defining an experiment includes setting up several interface entities and experiment parameters. + +Federation API +=================== +*Federation* entity is introduced to register and keep information about collaborators settings and their local data, as well as network settings to enable communication inside the federation. +Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should follow the same annotation format for all samples. Once you created a federation, it may be used in several subsequent experiments. + +To set up a federation, use Federation Interactive API. + +.. code-block:: python + + from openfl.interface.interactive_api.federation import Federation + +Federation API class should be initialized with the aggregator node FQDN and encryption settings. Someone may disable mTLS in trusted environments or provide paths to the certificate chain of CA, aggregator certificate and private key to enable mTLS. + +.. code-block:: python + + federation = Federation(central_node_fqdn: str, tls: bool, cert_chain: str, agg_certificate: str, agg_private_key: str) + +Federation's :code:`register_collaborators` method should be used to provide an information about collaborators participating in a federation. +It requires a dictionary object - :code:`{collaborator name : local data path}`. + +Experiment API +=================== + +*Experiment* entity allows registering training related objects, FL tasks and settings. +To set up an FL experiment someone should use the Experiment interactive API. + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import FLExperiment + +*Experiment* is being initialized by taking federation as a parameter. + +.. code-block:: python + + fl_experiment = FLExperiment(federation=federation) + +To start an experiment user must register *DataLoader*, *Federated Learning tasks* and *Model* with *Optimizer*. There are several supplementary interface classes for these purposes. + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import TaskInterface, DataInterface, ModelInterface + +Registering model and optimizer +-------------------------------- + +First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. +Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by existing plugins, someone can implement the plugin's interface and point :code:`framework_plugin` to the implementation inside the workspace. + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import ModelInterface + MI = ModelInterface(model=model_unet, optimizer=optimizer_adam, framework_plugin=framework_adapter) + +Registering FL tasks +--------------------- + +We have an agreement on what we consider to be a FL task. +Interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. +We also have requirements on task signature. Task should accept the following objects: + +1. model - will be rebuilt with relevant weights for every task by `TaskRunner` +2. :code:`data_loader` - data loader that will provide local data +3. device - a device to be used for execution on collaborator machines +4. optimizer (optional) - model optimizer, only for training tasks + +Moreover FL tasks should return a dictionary object with metrics :code:`{metric name: metric value for this task}`. + +:code:`Task Interface` class is designed to register task and accompanying information. +This class must be instantiated, then it's special methods may be used to register tasks. + +.. code-block:: python + + TI = TaskInterface() + + task_settings = { + 'batch_size': 32, + 'some_arg': 228, + } + @TI.add_kwargs(**task_settings) + @TI.register_fl_task(model='my_model', data_loader='train_loader', + device='device', optimizer='my_Adam_opt') + def foo(my_model, train_loader, my_Adam_opt, device, batch_size, some_arg=356) + ... + + +:code:`@TI.register_fl_task()` needs tasks argument names for (model, data_loader, device, optimizer (optional)) that constitute tasks 'contract'. +It adds the callable and the task contract to the task registry. + +:code:`@TI.add_kwargs()` method should be used to set up those arguments that are not included in the contract. + +Registering Federated DataLoader +--------------------------------- + +:code:`DataInterface` is provided to support a remote DataLoader initialization. + +It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. +User must subclass :code:`DataInterface` and implements several methods. + +* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. If dataset initalization procedure differs for some of the collaborators, the initialization logic must be described here. Dataset sharding procedure for test runs should also be described in this method. User is free to save objects in class fields for later use. +* :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. +* :code:`get_valid_loader(self, **kwargs)` - see the point above only with validation data +* :code:`get_train_data_size(self)` - return number of samples in local train dataset. +* :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. + +Preparing workspace distribution +--------------------------------- +Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to collaborator's node. In order to run a collaborator, we want to replicate the workspace and the Python environment. + +Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.prepare_workspace_distribution()` method along with other parameters. + +This method: + +* Compiles all provided setings to a Plan object. This is the central place where all actors in federation look up their parameters. +* Saves plan.yaml to the :code:`plan/` folder inside the workspace. +* Serializes interface objects on the disk. +* Prepares :code:`requirements.txt` for remote Python environment setup. +* Compressess the workspace to an archive so it can be coppied to collaborator nodes. + +Starting the aggregator +--------------------------- + +As all previous steps done, the experiment is ready to start +:code:`FLExperiment.start_experiment()` method requires :code:`model_interface` object with initialized weights. + +It starts a local aggregator that will wait for collaborators to connect. + +Starting collaborators +======================= + +The process of starting collaborators has not changed. +User must transfer the workspace archive to a remote node and type in console: + +.. code-block:: python + + fx workspace import --archive ws.zip + +Please, note that aggregator and all the collaborator nodes should have the same Python interpreter version as the machine used for defining the experiment. + +then cd to the workspace and run + +.. code-block:: python + + fx collaborator start -d data.yaml -n one + +For more details, please refer to the TaskRunner API section. \ No newline at end of file From 02ac0f6c569da0e4e142efe3cde20e6a81225e7d Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 18:01:43 +0300 Subject: [PATCH 21/54] Changed titels --- docs/running_the_federation.agg_based.rst | 2 +- docs/source/workflow/director_based_workflow.rst | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/running_the_federation.agg_based.rst b/docs/running_the_federation.agg_based.rst index cb309380a3..b0d976daab 100644 --- a/docs/running_the_federation.agg_based.rst +++ b/docs/running_the_federation.agg_based.rst @@ -4,7 +4,7 @@ .. _running_the_federation_aggregator_based: ********************** -Running the Federation +Aggregator-based workflow. ********************** First make sure you've installed the software :ref:`using these instructions ` diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 8ae329e21d..1b4612e0c0 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -13,9 +13,10 @@ Establishing a long-living Federation with Director 1. Install |productName| ================== -Please, refer to :ref:`_install_software_root` +Make sure that you installed |productName| in your virtual environment. +If not, use the instruction :ref:`install_initial_steps`. -2. Implement Shard Descriptors +1. Implement Shard Descriptors ================== Then the data owners need to implement `Shard Descriptors` Python classes. From 81c0e6cbeca88cb3ccd584ae1c334d6888c65c39 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 13 Aug 2021 18:24:24 +0300 Subject: [PATCH 22/54] Added interactive API sections --- docs/running_the_federation.agg_based.rst | 1 - docs/source/workflow/director_based_workflow.rst | 6 ++++-- docs/source/workflow/interactive_python_api.rst | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/running_the_federation.agg_based.rst b/docs/running_the_federation.agg_based.rst index b0d976daab..4326c1e7bc 100644 --- a/docs/running_the_federation.agg_based.rst +++ b/docs/running_the_federation.agg_based.rst @@ -23,4 +23,3 @@ The high-level workflow is shown in the figure above. Note that once OpenFL is i running_the_federation.agg_based.docker running_the_federation.agg_based.certificates running_the_federation.agg_based.start_nodes.rst - running_the_federation.director_based.interactive_api diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 1b4612e0c0..67bcb00416 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -62,7 +62,9 @@ that may be used to identify participants during experiment definition and execu -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt -Describing an FL experimnet using Interactive Python API +5. Describing an FL experimnet using Interactive Python API ####################################### -another story \ No newline at end of file +At this point, data scientists may register their experiments to be executed in the federation. +OpenFL provides a separate frontend Director’s client and :ref:`Interactive Python API ` +to register experiments. \ No newline at end of file diff --git a/docs/source/workflow/interactive_python_api.rst b/docs/source/workflow/interactive_python_api.rst index 6a99169146..f9ca963738 100644 --- a/docs/source/workflow/interactive_python_api.rst +++ b/docs/source/workflow/interactive_python_api.rst @@ -4,7 +4,7 @@ .. _interactive_api: ######################################################### -Experimental: |productName| Interactive Python API +Beta: |productName| Interactive Python API ######################################################### ********************************* From df9288eefa6c8295877d646eea0ca55f6b999070 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 12:14:26 +0300 Subject: [PATCH 23/54] attempt to structure the new workflow --- docs/openfl.rst | 4 +++- docs/source/workflow/director_based_workflow.rst | 9 +++++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/openfl.rst b/docs/openfl.rst index 390b9e4629..0270fc3ef2 100644 --- a/docs/openfl.rst +++ b/docs/openfl.rst @@ -8,4 +8,6 @@ OpenFL structure .. toctree:: :maxdepth: 4 - source/openfl/components \ No newline at end of file + source/openfl/components + source/openfl/communication + source/openfl/plugins \ No newline at end of file diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 67bcb00416..eb012a20ef 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -7,16 +7,17 @@ Director-based workflow ************ +************ Establishing a long-living Federation with Director -####################################### +************ 1. Install |productName| ================== -Make sure that you installed |productName| in your virtual environment. +Make sure that you installed |productName| in your virtual Python environment. If not, use the instruction :ref:`install_initial_steps`. -1. Implement Shard Descriptors +2. Implement Shard Descriptors ================== Then the data owners need to implement `Shard Descriptors` Python classes. @@ -63,7 +64,7 @@ that may be used to identify participants during experiment definition and execu 5. Describing an FL experimnet using Interactive Python API -####################################### +==================================== At this point, data scientists may register their experiments to be executed in the federation. OpenFL provides a separate frontend Director’s client and :ref:`Interactive Python API ` From 6e6799ae5d6dc234f767c47ed8adba8c50ac4404 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 12:20:36 +0300 Subject: [PATCH 24/54] attempts #2 --- docs/running_the_federation.director_based.rst | 8 -------- docs/source/workflow/director_based_workflow.rst | 11 +++++++++-- 2 files changed, 9 insertions(+), 10 deletions(-) delete mode 100644 docs/running_the_federation.director_based.rst diff --git a/docs/running_the_federation.director_based.rst b/docs/running_the_federation.director_based.rst deleted file mode 100644 index de16c4ddfd..0000000000 --- a/docs/running_the_federation.director_based.rst +++ /dev/null @@ -1,8 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _running_the_federation_director_based: - -********************** -Using Long-Living services -********************** diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index eb012a20ef..769c6a3f8b 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -7,9 +7,16 @@ Director-based workflow ************ -************ +.. toctree:: + :maxdepth: 1 + + establishing_federation_director_ + interactive_python_api + +.. _establishing_federation_director: + Establishing a long-living Federation with Director -************ +####################################### 1. Install |productName| ================== From 165a9021e699b86a2379ca6e4444b73de96cd59f Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 12:54:46 +0300 Subject: [PATCH 25/54] restructuring --- docs/source/workflow/director_based_workflow.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 769c6a3f8b..ce0dd8fc4c 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -8,9 +8,9 @@ Director-based workflow ************ .. toctree:: - :maxdepth: 1 + :maxdepth: 2 - establishing_federation_director_ + `Establishing a long-living Federation with Director`_ interactive_python_api .. _establishing_federation_director: From 10d583440d612c4bd0e9d9b0df8a3d551873a045 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 12:58:35 +0300 Subject: [PATCH 26/54] attempt #4 --- docs/source/workflow/director_based_workflow.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index ce0dd8fc4c..f0156e5030 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -7,12 +7,6 @@ Director-based workflow ************ -.. toctree:: - :maxdepth: 2 - - `Establishing a long-living Federation with Director`_ - interactive_python_api - .. _establishing_federation_director: Establishing a long-living Federation with Director @@ -75,4 +69,10 @@ that may be used to identify participants during experiment definition and execu At this point, data scientists may register their experiments to be executed in the federation. OpenFL provides a separate frontend Director’s client and :ref:`Interactive Python API ` -to register experiments. \ No newline at end of file +to register experiments. + +.. toctree:: + :maxdepth: 2 + + `Establishing a long-living Federation with Director`_ + interactive_python_api \ No newline at end of file From 04c98f24911684cdc2f106a7ea14f0ca74843763 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 15:17:10 +0300 Subject: [PATCH 27/54] director based workflow merged --- docs/source/utilities/pki.step_ca.rst | 55 +++++- .../workflow/director_based_workflow.rst | 179 +++++++++++++++++- 2 files changed, 226 insertions(+), 8 deletions(-) diff --git a/docs/source/utilities/pki.step_ca.rst b/docs/source/utilities/pki.step_ca.rst index 6168960f0c..87ed027d2c 100644 --- a/docs/source/utilities/pki.step_ca.rst +++ b/docs/source/utilities/pki.step_ca.rst @@ -1,6 +1,55 @@ .. # Copyright (C) 2020-2021 Intel Corporation .. # SPDX-License-Identifier: Apache-2.0 -****** -Semi-automatic PKI -****** \ No newline at end of file +****************************************** +Federation actors certification with Semi-automatic PKI +****************************************** + +If you have trusted workspace and connection should not be encrypted you can use :code:`disable_tls` option while starting experiment. +Otherwise it is necessary to certify each node participating in the federation. Certificates allow to use mutual tls connection between nodes. +You can certify nodes by your own pki system or use pki provided by OpenFL. It is based on `step-ca `_ +as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, +paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. + +OpenFL PKI workflow +=================== +Openfl PKI pipeline asumes creating local CA with https server which listen signing requests. +Certificates from each node can be signed by requesting to CA server with special token. +Token must be copied to each node by some secure way. Each step is considered in detail below. + +1. Create CA, i.e create root key pair, CA server config and other. + .. code-block:: console + + $ fx pki install -p --password <123> --ca-url + | :code:`-p` - path to folder, which will contain ca files. + | :code:`--password` - password that will encrypts some ca files. + | :code:`--ca-url` - host and port which ca server will listen + This command will also download `step-ca `_ and `step `_ binaries from github. + +2. Run CA https server. + .. code-block:: console + + $ fx pki run -p + | :code:`-p` - path to folder, which will contain ca files. + +3. Get token for some node. + + .. code-block:: console + + $ fx pki get-token -n + | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node + + Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA + root certificate concatenated together. This JWT have twenty-four hours time-to-live. + +4. Copy token to node side (director or envoy) by some secure channel and run certify command. + .. code-block:: console + + $ fx pki certify -n -t + | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node + | :code:`-t` - output token from previous command + This command call step client, to connect to CA server over https. + Https is provided by root certificate which was copy with JWT. + Server authenticates client by JWT and client authenticates server by root certificate. + +Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node. diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index f0156e5030..9171e55c1e 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -71,8 +71,177 @@ At this point, data scientists may register their experiments to be executed in OpenFL provides a separate frontend Director’s client and :ref:`Interactive Python API ` to register experiments. -.. toctree:: - :maxdepth: 2 - - `Establishing a long-living Federation with Director`_ - interactive_python_api \ No newline at end of file +.. _interactive_api: + +######################################################### +Beta: |productName| Interactive Python API +######################################################### + +********************************* +Python Interactive API Concepts +********************************* + +Workspace +========== +To initialize the workspace, create an empty folder and a Jupyter notebook (or a Python script) inside it. Root folder of the notebook will be considered as the workspace. +If some objects are imported in the notebook from local modules, source code should be kept inside the workspace. +If one decides to keep local test data inside the workspace, :code:`data` folder should be used as it will not be exported. +If one decides to keep certificates inside the workspace, :code:`cert` folder should be used as it will not be exported. +Only relevant source code or resources should be kept inside the workspace, since it will be zipped and transferred to collaborator machines. + +Python Environment +=================== +Create a virtual Python environment. Please, install only packages that are required for conducting the experiment, since Python environment will be replicated on collaborator nodes. + + +****************************************** +Defining a Federated Learning Experiment +****************************************** +Interactive API allows setting up an experiment from a single entrypoint - a Jupyter notebook or a Python script. +Defining an experiment includes setting up several interface entities and experiment parameters. + +Federation API +=================== +*Federation* entity is introduced to register and keep information about collaborators settings and their local data, as well as network settings to enable communication inside the federation. +Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should follow the same annotation format for all samples. Once you created a federation, it may be used in several subsequent experiments. + +To set up a federation, use Federation Interactive API. + +.. code-block:: python + + from openfl.interface.interactive_api.federation import Federation + +Federation API class should be initialized with the aggregator node FQDN and encryption settings. Someone may disable mTLS in trusted environments or provide paths to the certificate chain of CA, aggregator certificate and private key to enable mTLS. + +.. code-block:: python + + federation = Federation(central_node_fqdn: str, tls: bool, cert_chain: str, agg_certificate: str, agg_private_key: str) + +Federation's :code:`register_collaborators` method should be used to provide an information about collaborators participating in a federation. +It requires a dictionary object - :code:`{collaborator name : local data path}`. + +Experiment API +=================== + +*Experiment* entity allows registering training related objects, FL tasks and settings. +To set up an FL experiment someone should use the Experiment interactive API. + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import FLExperiment + +*Experiment* is being initialized by taking federation as a parameter. + +.. code-block:: python + + fl_experiment = FLExperiment(federation=federation) + +To start an experiment user must register *DataLoader*, *Federated Learning tasks* and *Model* with *Optimizer*. There are several supplementary interface classes for these purposes. + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import TaskInterface, DataInterface, ModelInterface + +Registering model and optimizer +-------------------------------- + +First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. +Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by existing plugins, someone can implement the plugin's interface and point :code:`framework_plugin` to the implementation inside the workspace. + +.. code-block:: python + + from openfl.interface.interactive_api.experiment import ModelInterface + MI = ModelInterface(model=model_unet, optimizer=optimizer_adam, framework_plugin=framework_adapter) + +Registering FL tasks +--------------------- + +We have an agreement on what we consider to be a FL task. +Interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. +We also have requirements on task signature. Task should accept the following objects: + +1. model - will be rebuilt with relevant weights for every task by `TaskRunner` +2. :code:`data_loader` - data loader that will provide local data +3. device - a device to be used for execution on collaborator machines +4. optimizer (optional) - model optimizer, only for training tasks + +Moreover FL tasks should return a dictionary object with metrics :code:`{metric name: metric value for this task}`. + +:code:`Task Interface` class is designed to register task and accompanying information. +This class must be instantiated, then it's special methods may be used to register tasks. + +.. code-block:: python + + TI = TaskInterface() + + task_settings = { + 'batch_size': 32, + 'some_arg': 228, + } + @TI.add_kwargs(**task_settings) + @TI.register_fl_task(model='my_model', data_loader='train_loader', + device='device', optimizer='my_Adam_opt') + def foo(my_model, train_loader, my_Adam_opt, device, batch_size, some_arg=356) + ... + + +:code:`@TI.register_fl_task()` needs tasks argument names for (model, data_loader, device, optimizer (optional)) that constitute tasks 'contract'. +It adds the callable and the task contract to the task registry. + +:code:`@TI.add_kwargs()` method should be used to set up those arguments that are not included in the contract. + +Registering Federated DataLoader +--------------------------------- + +:code:`DataInterface` is provided to support a remote DataLoader initialization. + +It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. +User must subclass :code:`DataInterface` and implements several methods. + +* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. If dataset initalization procedure differs for some of the collaborators, the initialization logic must be described here. Dataset sharding procedure for test runs should also be described in this method. User is free to save objects in class fields for later use. +* :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. +* :code:`get_valid_loader(self, **kwargs)` - see the point above only with validation data +* :code:`get_train_data_size(self)` - return number of samples in local train dataset. +* :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. + +Preparing workspace distribution +--------------------------------- +Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to collaborator's node. In order to run a collaborator, we want to replicate the workspace and the Python environment. + +Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.prepare_workspace_distribution()` method along with other parameters. + +This method: + +* Compiles all provided setings to a Plan object. This is the central place where all actors in federation look up their parameters. +* Saves plan.yaml to the :code:`plan/` folder inside the workspace. +* Serializes interface objects on the disk. +* Prepares :code:`requirements.txt` for remote Python environment setup. +* Compressess the workspace to an archive so it can be coppied to collaborator nodes. + +Starting the aggregator +--------------------------- + +As all previous steps done, the experiment is ready to start +:code:`FLExperiment.start_experiment()` method requires :code:`model_interface` object with initialized weights. + +It starts a local aggregator that will wait for collaborators to connect. + +Starting collaborators +======================= + +The process of starting collaborators has not changed. +User must transfer the workspace archive to a remote node and type in console: + +.. code-block:: python + + fx workspace import --archive ws.zip + +Please, note that aggregator and all the collaborator nodes should have the same Python interpreter version as the machine used for defining the experiment. + +then cd to the workspace and run + +.. code-block:: python + + fx collaborator start -d data.yaml -n one + +For more details, please refer to the TaskRunner API section. \ No newline at end of file From 3062683da3b0504a6abec4d2f2b3bee914cb5e93 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 15:24:53 +0300 Subject: [PATCH 28/54] merging #2 --- .../workflow/director_based_workflow.rst | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 9171e55c1e..d5acf0ebe8 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -3,9 +3,9 @@ .. _director_workflow: -************ +************************ Director-based workflow -************ +************************ .. _establishing_federation_director: @@ -71,18 +71,17 @@ At this point, data scientists may register their experiments to be executed in OpenFL provides a separate frontend Director’s client and :ref:`Interactive Python API ` to register experiments. + .. _interactive_api: -######################################################### Beta: |productName| Interactive Python API -######################################################### +####################################### -********************************* Python Interactive API Concepts -********************************* +=============================== Workspace -========== +---------- To initialize the workspace, create an empty folder and a Jupyter notebook (or a Python script) inside it. Root folder of the notebook will be considered as the workspace. If some objects are imported in the notebook from local modules, source code should be kept inside the workspace. If one decides to keep local test data inside the workspace, :code:`data` folder should be used as it will not be exported. @@ -90,18 +89,19 @@ If one decides to keep certificates inside the workspace, :code:`cert` folder sh Only relevant source code or resources should be kept inside the workspace, since it will be zipped and transferred to collaborator machines. Python Environment -=================== +--------------------- Create a virtual Python environment. Please, install only packages that are required for conducting the experiment, since Python environment will be replicated on collaborator nodes. -****************************************** + Defining a Federated Learning Experiment -****************************************** +======================================== + Interactive API allows setting up an experiment from a single entrypoint - a Jupyter notebook or a Python script. Defining an experiment includes setting up several interface entities and experiment parameters. Federation API -=================== +---------------- *Federation* entity is introduced to register and keep information about collaborators settings and their local data, as well as network settings to enable communication inside the federation. Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should follow the same annotation format for all samples. Once you created a federation, it may be used in several subsequent experiments. @@ -121,7 +121,7 @@ Federation's :code:`register_collaborators` method should be used to provide an It requires a dictionary object - :code:`{collaborator name : local data path}`. Experiment API -=================== +---------------- *Experiment* entity allows registering training related objects, FL tasks and settings. To set up an FL experiment someone should use the Experiment interactive API. From f2b38df8f87d0e6a1d206e634dedd6deb1b9ae99 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 16 Aug 2021 16:12:25 +0300 Subject: [PATCH 29/54] fill director workflow --- docs/source/utilities/pki.step_ca.rst | 2 + .../workflow/director_based_workflow.rst | 30 ++- .../workflow/interactive_python_api.rst | 228 ------------------ 3 files changed, 28 insertions(+), 232 deletions(-) delete mode 100644 docs/source/workflow/interactive_python_api.rst diff --git a/docs/source/utilities/pki.step_ca.rst b/docs/source/utilities/pki.step_ca.rst index 87ed027d2c..621df07cc6 100644 --- a/docs/source/utilities/pki.step_ca.rst +++ b/docs/source/utilities/pki.step_ca.rst @@ -1,6 +1,8 @@ .. # Copyright (C) 2020-2021 Intel Corporation .. # SPDX-License-Identifier: Apache-2.0 +.. _semi_automatic_certification: + ****************************************** Federation actors certification with Semi-automatic PKI ****************************************** diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index d5acf0ebe8..e06978e1b5 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -7,6 +7,13 @@ Director-based workflow ************************ +.. toctree:: + :maxdepth: 2 + + establishing_federation_director_ + interactive_api_ + + .. _establishing_federation_director: Establishing a long-living Federation with Director @@ -26,13 +33,28 @@ Then the data owners need to implement `Shard Descriptors` Python classes. |productName| framework provides a ‘Shard descriptor’ interface that should be described on every collaborator node to provide a unified data interface for FL experiments. Abstract “Shard descriptor” should be subclassed and all its methods should be implemented to describe the way data samples and labels will be loaded from disk -during training. Shard descriptor is a subscriptable object that implements `__getitem__()` and `len()` methods +during training. Shard descriptor is a subscriptable object that implements :code:`__getitem__()` and :code:`len()` methods as well as several additional methods to access ‘sample shape’, ‘target shape’, and ‘shard description’ text that may be used to identify participants during experiment definition and execution. -3. Start Director +3. (Optional) Obtain certificates using Step-CA +================== + +All communications inside a Federation may be protected with mTLS. User may use certificates provided by their organization +or utilize :ref:`PKI ` provided by |productName|. + +4. Start Director ================== +Create Director workspace +------------------- + +Tune Director config +------------------- + +Use CLI to start Director +------------------- + .. code-block:: console $ fx director start --disable-tls -c director_config.yaml @@ -45,7 +67,7 @@ that may be used to identify participants during experiment definition and execu -pk cert/"${FQDN}".key \ -oc cert/"${FQDN}".crt -4. Start Envoys +5. Start Envoys ================== .. code-block:: console @@ -64,7 +86,7 @@ that may be used to identify participants during experiment definition and execu -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt -5. Describing an FL experimnet using Interactive Python API +6. Describing an FL experimnet using Interactive Python API ==================================== At this point, data scientists may register their experiments to be executed in the federation. diff --git a/docs/source/workflow/interactive_python_api.rst b/docs/source/workflow/interactive_python_api.rst deleted file mode 100644 index f9ca963738..0000000000 --- a/docs/source/workflow/interactive_python_api.rst +++ /dev/null @@ -1,228 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _interactive_api: - -######################################################### -Beta: |productName| Interactive Python API -######################################################### - -********************************* -Python Interactive API Concepts -********************************* - -Workspace -========== -To initialize the workspace, create an empty folder and a Jupyter notebook (or a Python script) inside it. Root folder of the notebook will be considered as the workspace. -If some objects are imported in the notebook from local modules, source code should be kept inside the workspace. -If one decides to keep local test data inside the workspace, :code:`data` folder should be used as it will not be exported. -If one decides to keep certificates inside the workspace, :code:`cert` folder should be used as it will not be exported. -Only relevant source code or resources should be kept inside the workspace, since it will be zipped and transferred to collaborator machines. - -Python Environment -=================== -Create a virtual Python environment. Please, install only packages that are required for conducting the experiment, since Python environment will be replicated on collaborator nodes. - -****************************************** -Certification -****************************************** -If you have trusted workspace and connection should not be encrypted you can use :code:`disable_tls` option while starting experiment. -Otherwise it is necessary to certify each node participating in the federation. Certificates allow to use mutual tls connection between nodes. -You can certify nodes by your own pki system or use pki provided by OpenFL. It is based on `step-ca `_ -as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, -paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. - -OpenFL PKI workflow -=================== -Openfl PKI pipeline asumes creating local CA with https server which listen signing requests. -Certificates from each node can be signed by requesting to CA server with special token. -Token must be copied to each node by some secure way. Each step is considered in detail below. - -1. Create CA, i.e create root key pair, CA server config and other. - .. code-block:: console - - $ fx pki install -p --password <123> --ca-url - | :code:`-p` - path to folder, which will contain ca files. - | :code:`--password` - password that will encrypts some ca files. - | :code:`--ca-url` - host and port which ca server will listen - This command will also download `step-ca `_ and `step `_ binaries from github. - -2. Run CA https server. - .. code-block:: console - - $ fx pki run -p - | :code:`-p` - path to folder, which will contain ca files. - -3. Get token for some node. - - .. code-block:: console - - $ fx pki get-token -n - | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node - - Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA - root certificate concatenated together. This JWT have twenty-four hours time-to-live. - -4. Copy token to node side (director or envoy) by some secure channel and run certify command. - .. code-block:: console - - $ fx pki certify -n -t - | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node - | :code:`-t` - output token from previous command - This command call step client, to connect to CA server over https. - Https is provided by root certificate which was copy with JWT. - Server authenticates client by JWT and client authenticates server by root certificate. - -Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node. - -****************************************** -Defining a Federated Learning Experiment -****************************************** -Interactive API allows setting up an experiment from a single entrypoint - a Jupyter notebook or a Python script. -Defining an experiment includes setting up several interface entities and experiment parameters. - -Federation API -=================== -*Federation* entity is introduced to register and keep information about collaborators settings and their local data, as well as network settings to enable communication inside the federation. -Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should follow the same annotation format for all samples. Once you created a federation, it may be used in several subsequent experiments. - -To set up a federation, use Federation Interactive API. - -.. code-block:: python - - from openfl.interface.interactive_api.federation import Federation - -Federation API class should be initialized with the aggregator node FQDN and encryption settings. Someone may disable mTLS in trusted environments or provide paths to the certificate chain of CA, aggregator certificate and private key to enable mTLS. - -.. code-block:: python - - federation = Federation(central_node_fqdn: str, tls: bool, cert_chain: str, agg_certificate: str, agg_private_key: str) - -Federation's :code:`register_collaborators` method should be used to provide an information about collaborators participating in a federation. -It requires a dictionary object - :code:`{collaborator name : local data path}`. - -Experiment API -=================== - -*Experiment* entity allows registering training related objects, FL tasks and settings. -To set up an FL experiment someone should use the Experiment interactive API. - -.. code-block:: python - - from openfl.interface.interactive_api.experiment import FLExperiment - -*Experiment* is being initialized by taking federation as a parameter. - -.. code-block:: python - - fl_experiment = FLExperiment(federation=federation) - -To start an experiment user must register *DataLoader*, *Federated Learning tasks* and *Model* with *Optimizer*. There are several supplementary interface classes for these purposes. - -.. code-block:: python - - from openfl.interface.interactive_api.experiment import TaskInterface, DataInterface, ModelInterface - -Registering model and optimizer --------------------------------- - -First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. -Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by existing plugins, someone can implement the plugin's interface and point :code:`framework_plugin` to the implementation inside the workspace. - -.. code-block:: python - - from openfl.interface.interactive_api.experiment import ModelInterface - MI = ModelInterface(model=model_unet, optimizer=optimizer_adam, framework_plugin=framework_adapter) - -Registering FL tasks ---------------------- - -We have an agreement on what we consider to be a FL task. -Interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. -We also have requirements on task signature. Task should accept the following objects: - -1. model - will be rebuilt with relevant weights for every task by `TaskRunner` -2. :code:`data_loader` - data loader that will provide local data -3. device - a device to be used for execution on collaborator machines -4. optimizer (optional) - model optimizer, only for training tasks - -Moreover FL tasks should return a dictionary object with metrics :code:`{metric name: metric value for this task}`. - -:code:`Task Interface` class is designed to register task and accompanying information. -This class must be instantiated, then it's special methods may be used to register tasks. - -.. code-block:: python - - TI = TaskInterface() - - task_settings = { - 'batch_size': 32, - 'some_arg': 228, - } - @TI.add_kwargs(**task_settings) - @TI.register_fl_task(model='my_model', data_loader='train_loader', - device='device', optimizer='my_Adam_opt') - def foo(my_model, train_loader, my_Adam_opt, device, batch_size, some_arg=356) - ... - - -:code:`@TI.register_fl_task()` needs tasks argument names for (model, data_loader, device, optimizer (optional)) that constitute tasks 'contract'. -It adds the callable and the task contract to the task registry. - -:code:`@TI.add_kwargs()` method should be used to set up those arguments that are not included in the contract. - -Registering Federated DataLoader ---------------------------------- - -:code:`DataInterface` is provided to support a remote DataLoader initialization. - -It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. -User must subclass :code:`DataInterface` and implements several methods. - -* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. If dataset initalization procedure differs for some of the collaborators, the initialization logic must be described here. Dataset sharding procedure for test runs should also be described in this method. User is free to save objects in class fields for later use. -* :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. -* :code:`get_valid_loader(self, **kwargs)` - see the point above only with validation data -* :code:`get_train_data_size(self)` - return number of samples in local train dataset. -* :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. - -Preparing workspace distribution ---------------------------------- -Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to collaborator's node. In order to run a collaborator, we want to replicate the workspace and the Python environment. - -Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.prepare_workspace_distribution()` method along with other parameters. - -This method: - -* Compiles all provided setings to a Plan object. This is the central place where all actors in federation look up their parameters. -* Saves plan.yaml to the :code:`plan/` folder inside the workspace. -* Serializes interface objects on the disk. -* Prepares :code:`requirements.txt` for remote Python environment setup. -* Compressess the workspace to an archive so it can be coppied to collaborator nodes. - -Starting the aggregator ---------------------------- - -As all previous steps done, the experiment is ready to start -:code:`FLExperiment.start_experiment()` method requires :code:`model_interface` object with initialized weights. - -It starts a local aggregator that will wait for collaborators to connect. - -Starting collaborators -======================= - -The process of starting collaborators has not changed. -User must transfer the workspace archive to a remote node and type in console: - -.. code-block:: python - - fx workspace import --archive ws.zip - -Please, note that aggregator and all the collaborator nodes should have the same Python interpreter version as the machine used for defining the experiment. - -then cd to the workspace and run - -.. code-block:: python - - fx collaborator start -d data.yaml -n one - -For more details, please refer to the TaskRunner API section. \ No newline at end of file From 0aad5caac5b370bc469584c498f8244d010391e0 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 17 Aug 2021 13:14:20 +0300 Subject: [PATCH 30/54] director starting procedure --- .../workflow/director_based_workflow.rst | 30 +++++++++++++++---- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index e06978e1b5..adcf8ea310 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -37,37 +37,55 @@ during training. Shard descriptor is a subscriptable object that implements :cod as well as several additional methods to access ‘sample shape’, ‘target shape’, and ‘shard description’ text that may be used to identify participants during experiment definition and execution. -3. (Optional) Obtain certificates using Step-CA +3. (Optional) Create certificates using Step-CA ================== -All communications inside a Federation may be protected with mTLS. User may use certificates provided by their organization +All communications inside a Federation may be encrypted with mTLS. User may adapt certificates provided by their organization or utilize :ref:`PKI ` provided by |productName|. 4. Start Director ================== +Director is a central component in the Federation. It should be started on a node with at least one open port. +Learn more about the Director component here: :ref:`openfl_ll_components` + Create Director workspace ------------------- +Director requires a folder to operate in. Recieved experiments will be deployed in this folder. +Moreover, supplementary files like Director's config files and certificates may be stored in this folder. +One may use CLI command to create a structured workspace for Director with a default config file. + + .. code-block:: console + + $ fx director create-workspace -p director_ws + Tune Director config ------------------- +Director should be started from a config file. Basic config file should contain the Director's node FQDN, an open port, +and :code:`sample_shape` and :code:`target_shape` fields with string representation of the unified data interface in the Federation. +But it also may contain paths to certificates. + Use CLI to start Director ------------------- +When the Director's config has been set up, one may use CLI to start the Director. Without mTLS protection: + .. code-block:: console $ fx director start --disable-tls -c director_config.yaml +In the case of a certified Federation: + .. code-block:: console - $ FQDN=$1 $ fx director start -c director_config.yaml \ -rc cert/root_ca.crt \ - -pk cert/"${FQDN}".key \ - -oc cert/"${FQDN}".crt + -pk cert/priv.key \ + -oc cert/open.crt -5. Start Envoys +1. Start Envoys ================== .. code-block:: console From edb36a01bc8f7911ecdd8ee78724b66ea6cc2a79 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 17 Aug 2021 13:55:50 +0300 Subject: [PATCH 31/54] envoy starting procedure described --- docs/source/openfl/components.rst | 4 +-- .../workflow/director_based_workflow.rst | 35 ++++++++++++++++--- 2 files changed, 33 insertions(+), 6 deletions(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index c78ea9a858..abadea0b59 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -16,7 +16,7 @@ .. _openfl_spawning_components: -Spawning +Spawning components ########## Aggregator @@ -35,7 +35,7 @@ Collaborator instance is created by Envoy (described below) when a new experimen .. _openfl_ll_components: -Long-living +Long-living components ############# Director diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index adcf8ea310..d44d8d65d4 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -85,22 +85,49 @@ In the case of a certified Federation: -pk cert/priv.key \ -oc cert/open.crt -1. Start Envoys +5. Start Envoys ================== +Envoys are |productName|'s 'agents' on collaborator nodes that may recieve an experiment archive and provide +access to local data. +When started Envoy will try to connect to the Director. + +Create Envoy workspace +------------------- + +The Envoy component also requires a folder to operate in. Use the following CLI command to create a workspace +with convenient folder structure and default Envoy's config and Shard Descriptor Python script: + + .. code-block:: console + + $ fx envoy create-workspace -p envoy_ws + +Setup Envoy's config +------------------- + +Unlike Director’s config, the one for Envoy should contain settings for the local Shard Descriptor. +The template field must be filled with the address of the local Shard Descriptor class, and settings filed +should list arbitrary settings required to initialize the Shard Descriptor. + +Use CLI to start Envoy +------------------- + +To start the Envoy without mTLS use the following CLI command: + .. code-block:: console $ fx envoy start -n env_one --disable-tls \ --shard-config-path shard_config.yaml -d director_fqdn:port +Alternatively, use the following command to establish a secured connection: + .. code-block:: console - $ ENVOY_NAME=$1 - $ DIRECTOR_FQDN=$2 + $ ENVOY_NAME=envoy_example_name $ fx envoy start -n "$ENVOY_NAME" \ --shard-config-path shard_config.yaml \ - -d "$DIRECTOR_FQDN":50051 -rc cert/root_ca.crt \ + -d director_fqdn:port -rc cert/root_ca.crt \ -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt From c6377fe08f21bc30497eb3285602dbc469f3657a Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 17 Aug 2021 14:01:02 +0300 Subject: [PATCH 32/54] docs gitignore change --- docs/.gitignore | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/.gitignore b/docs/.gitignore index 05b595d74c..23592ba08e 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -1,4 +1,5 @@ -openfl* -models* +# openfl* +# models* +# data* /_build **/.ipynb_checkpoints \ No newline at end of file From 6e219aaf4bbce382ddc0b24ba94f86f7078d5f7b Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 17 Aug 2021 14:09:07 +0300 Subject: [PATCH 33/54] Fixing the python API section --- docs/source/workflow/director_based_workflow.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index d44d8d65d4..ce83cb1d09 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -144,6 +144,10 @@ to register experiments. Beta: |productName| Interactive Python API ####################################### +The |productName| Python Interactive API should help data scientists to adapt single node training code for +running in the FL manner. The process of defining an FL experimnent is fully decopupled from establishing +a Federation. Everything that a data scientist needs to prepare an experiment is a Python interpreter and access to the Director. + Python Interactive API Concepts =============================== From 71c7f184364f7ce4ebbf6f185be6f2a59dfd4e8c Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 17 Aug 2021 14:12:59 +0300 Subject: [PATCH 34/54] a few typo fixes --- docs/source/workflow/director_based_workflow.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index ce83cb1d09..374221f5ea 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -145,8 +145,8 @@ Beta: |productName| Interactive Python API ####################################### The |productName| Python Interactive API should help data scientists to adapt single node training code for -running in the FL manner. The process of defining an FL experimnent is fully decopupled from establishing -a Federation. Everything that a data scientist needs to prepare an experiment is a Python interpreter and access to the Director. +running in the FL manner. The process of defining an FL experimnent is fully decoupled from the establishing +a Federation routine. Everything that a data scientist needs to prepare an experiment is a Python interpreter and access to the Director. Python Interactive API Concepts =============================== From 1fb71e28ff515da291a7275bc27837c16918b991 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 17 Aug 2021 21:50:58 +0300 Subject: [PATCH 35/54] fixing python api docs --- .../workflow/director_based_workflow.rst | 41 ++++++++++++------- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 374221f5ea..b33c865750 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -173,8 +173,11 @@ Defining an experiment includes setting up several interface entities and experi Federation API ---------------- -*Federation* entity is introduced to register and keep information about collaborators settings and their local data, as well as network settings to enable communication inside the federation. -Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should follow the same annotation format for all samples. Once you created a federation, it may be used in several subsequent experiments. +*Federation* entity is introduced to register and keep information about collaborators settings and their local data, +as well as network settings to enable communication inside the federation. +Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should +follow the same annotation format for all samples. Once you created a federation, it may be used in several +subsequent experiments. To set up a federation, use Federation Interactive API. @@ -182,14 +185,18 @@ To set up a federation, use Federation Interactive API. from openfl.interface.interactive_api.federation import Federation -Federation API class should be initialized with the aggregator node FQDN and encryption settings. Someone may disable mTLS in trusted environments or provide paths to the certificate chain of CA, aggregator certificate and private key to enable mTLS. +Federation API class should be initialized with the aggregator node FQDN and encryption settings. User may disable mTLS in trusted environments or provide paths to the certificate chain of CA, aggregator certificate and private key to enable mTLS. .. code-block:: python - federation = Federation(central_node_fqdn: str, tls: bool, cert_chain: str, agg_certificate: str, agg_private_key: str) + federation = Federation( + client_id: str, director_node_fqdn: str, director_port: str + tls: bool, ca_cert_chain: str, cert: str, private_key: str) -Federation's :code:`register_collaborators` method should be used to provide an information about collaborators participating in a federation. -It requires a dictionary object - :code:`{collaborator name : local data path}`. +* Federation's :code:`get_dummy_shard_descriptor` method should be used to create a fummy Shard Descriptor +that fakes access to real data. It may be used for debugging the user's experiment pipeline. +* Federation's :code:`get_shard_registry` method returns information about the envoys connected to the Director +and their Shard Descriptors. Experiment API ---------------- @@ -201,13 +208,14 @@ To set up an FL experiment someone should use the Experiment interactive API. from openfl.interface.interactive_api.experiment import FLExperiment -*Experiment* is being initialized by taking federation as a parameter. +*Experiment* is being initialized by taking a Federation object and the experiment name as parameters. .. code-block:: python - fl_experiment = FLExperiment(federation=federation) + fl_experiment = FLExperiment(federation: Federation, experiment_name: str) -To start an experiment user must register *DataLoader*, *Federated Learning tasks* and *Model* with *Optimizer*. There are several supplementary interface classes for these purposes. +To start an experiment user must register *DataLoader*, *Federated Learning tasks* and *Model* with *Optimizer*. +There are several supplementary interface classes for these entities. .. code-block:: python @@ -216,19 +224,24 @@ To start an experiment user must register *DataLoader*, *Federated Learning task Registering model and optimizer -------------------------------- -First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. -Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by existing plugins, someone can implement the plugin's interface and point :code:`framework_plugin` to the implementation inside the workspace. +First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. +Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. +Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the +path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by +existing plugins, user can implement the plugin's interface and point :code:`framework_plugin` to the implementation +inside the workspace. .. code-block:: python from openfl.interface.interactive_api.experiment import ModelInterface - MI = ModelInterface(model=model_unet, optimizer=optimizer_adam, framework_plugin=framework_adapter) + MI = ModelInterface(model, optimizer, framework_plugin: str) Registering FL tasks --------------------- -We have an agreement on what we consider to be a FL task. -Interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. +|productName| has a specific concept of an FL task. +Interactive API currently allows registering only standalone functions defined in the main module or +imported from other modules inside the workspace. We also have requirements on task signature. Task should accept the following objects: 1. model - will be rebuilt with relevant weights for every task by `TaskRunner` From 1b44a3f0140139c91cca14b12b5b6115eacfc684 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 18 Aug 2021 13:08:32 +0300 Subject: [PATCH 36/54] interactive api section changes --- .../workflow/director_based_workflow.rst | 26 +++++++++++++------ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index b33c865750..f27df622ce 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -193,10 +193,10 @@ Federation API class should be initialized with the aggregator node FQDN and enc client_id: str, director_node_fqdn: str, director_port: str tls: bool, ca_cert_chain: str, cert: str, private_key: str) -* Federation's :code:`get_dummy_shard_descriptor` method should be used to create a fummy Shard Descriptor -that fakes access to real data. It may be used for debugging the user's experiment pipeline. +* Federation's :code:`get_dummy_shard_descriptor` method should be used to create a fummy Shard Descriptor that + fakes access to real data. It may be used for debugging the user's experiment pipeline. * Federation's :code:`get_shard_registry` method returns information about the envoys connected to the Director -and their Shard Descriptors. + and their Shard Descriptors. Experiment API ---------------- @@ -277,17 +277,27 @@ It adds the callable and the task contract to the task registry. Registering Federated DataLoader --------------------------------- -:code:`DataInterface` is provided to support a remote DataLoader initialization. +:code:`DataInterface` is provided to support seamless remote data adaption. -It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. -User must subclass :code:`DataInterface` and implements several methods. +As *Shard Descriptor's* responsibilities are reading and formating the local data, *DataLoader* is expected to +contain batching and augmenting data logic, common for all collaborators. + +User must subclass :code:`DataInterface` and implement the following methods: -* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. If dataset initalization procedure differs for some of the collaborators, the initialization logic must be described here. Dataset sharding procedure for test runs should also be described in this method. User is free to save objects in class fields for later use. +* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator + initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that + user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. + If dataset initalization procedure differs for some of the collaborators, the initialization logic must be + described here. Dataset sharding procedure for test runs should also be described in this method. User is free + to save objects in class fields for later use. * :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. -* :code:`get_valid_loader(self, **kwargs)` - see the point above only with validation data +* :code:`get_valid_loader(self, **kwargs)` - see the point above (just replace training with validation) * :code:`get_train_data_size(self)` - return number of samples in local train dataset. * :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. +It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. + + Preparing workspace distribution --------------------------------- Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to collaborator's node. In order to run a collaborator, we want to replicate the workspace and the Python environment. From 343b9fedb5213827663242c71341b146ffeea832 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 18 Aug 2021 20:07:58 +0300 Subject: [PATCH 37/54] finished editing director workflow --- .../workflow/director_based_workflow.rst | 95 +++++++++++-------- 1 file changed, 58 insertions(+), 37 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index f27df622ce..d557139af7 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -279,63 +279,84 @@ Registering Federated DataLoader :code:`DataInterface` is provided to support seamless remote data adaption. -As *Shard Descriptor's* responsibilities are reading and formating the local data, *DataLoader* is expected to +As the *Shard Descriptor's* responsibilities are reading and formating the local data, the *DataLoader* is expected to contain batching and augmenting data logic, common for all collaborators. User must subclass :code:`DataInterface` and implement the following methods: -* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator - initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that - user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. - If dataset initalization procedure differs for some of the collaborators, the initialization logic must be - described here. Dataset sharding procedure for test runs should also be described in this method. User is free - to save objects in class fields for later use. +.. code-block:: python + + class CustomDataLoader(DataInterface): + def __init__(self, **kwargs): + # Initialize superclass with kwargs: this array will be passed + # to get_data_loader methods + super().__init__(**kwargs) + # Set up augmentation, save required parameters, + # use it as you regular dataset class + validation_fraction = kwargs.get('validation_fraction', 0.5) + ... + + @property + def shard_descriptor(self): + return self._shard_descriptor + + @shard_descriptor.setter + def shard_descriptor(self, shard_descriptor): + self._shard_descriptor = shard_descriptor + # You can implement data splitting logic here + # Or update your data set according to local Shard Descriptor atributes if required + + def get_train_loader(self, **kwargs): + # these are the same kwargs you provided to __init__, + # But passed on a collaborator machine + bs = kwargs.get('train_batch_size', 32) + return foo_loader() + + # so on, see the full list of methods below + +* Shard Descriptor setter and getter methods: + :code:`shard_descriptor(self, shard_descriptor)` setter is the most important method. It will be called during collaborator + initialization procedure with the local Shard Descriptor. Any logic that is triggered with the Shard Descriptor replacement + must be also put here. * :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. * :code:`get_valid_loader(self, **kwargs)` - see the point above (just replace training with validation) -* :code:`get_train_data_size(self)` - return number of samples in local train dataset. +* :code:`get_train_data_size(self)` - return number of samples in local train dataset. Use the information provided by Shard Descriptor, take into account you train / validation split. * :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. -It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. +User Dataset class should be instantiated to pass futher to the *Experiment* object. Dummy *Shard Descriptor* +(or custom local one) may be set up to test the augmentation or batching pipeline. +Keyword arguments used during initialization on the frontend node may be used during dataloaders construction on collaborator machines. -Preparing workspace distribution ---------------------------------- -Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to collaborator's node. In order to run a collaborator, we want to replicate the workspace and the Python environment. +Starting an FL experiment +======================================== +Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to the *Director*. In order to run *Collaborators*, we want to replicate the workspace and the Python environment +on remote machines. -Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.prepare_workspace_distribution()` method along with other parameters. +Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.start()` method along with other parameters. This method: -* Compiles all provided setings to a Plan object. This is the central place where all actors in federation look up their parameters. +* Compiles all provided setings to a Plan object. Plan is the central place where all actors in federation look up their parameters. * Saves plan.yaml to the :code:`plan/` folder inside the workspace. * Serializes interface objects on the disk. * Prepares :code:`requirements.txt` for remote Python environment setup. -* Compressess the workspace to an archive so it can be coppied to collaborator nodes. +* Compressess the whole workspace to an archive. +* Sends the experiment archive to the Director so it may distribute the archive across the Federation and start the *Aggregator*. -Starting the aggregator ---------------------------- - -As all previous steps done, the experiment is ready to start -:code:`FLExperiment.start_experiment()` method requires :code:`model_interface` object with initialized weights. - -It starts a local aggregator that will wait for collaborators to connect. +Observing the Experiment execution +---------------------------------- -Starting collaborators -======================= +If the Experiment was accepted by the *Director* user can oversee its execution with +:code:`Flexperiment.stream_metrics()` method that will is able to print metrics from the FL tasks (and save tensorboard logs). -The process of starting collaborators has not changed. -User must transfer the workspace archive to a remote node and type in console: +When the Experiment is finished, user may retrieve trained models in the native format using :code:`Flexperiment.get_best_model()` +and :code:`Flexperiment.get_last_model()` metods. -.. code-block:: python - - fx workspace import --archive ws.zip - -Please, note that aggregator and all the collaborator nodes should have the same Python interpreter version as the machine used for defining the experiment. - -then cd to the workspace and run - -.. code-block:: python +:code:`Flexperiment.remove_experiment_data()` allows erasing the experiment's artifacts from the Director. - fx collaborator start -d data.yaml -n one +When the Experiment is finished +---------------------------------- -For more details, please refer to the TaskRunner API section. \ No newline at end of file +User may utilize the same Federation object to report another experiment or even schedule several experiments that +will be executed one by one. \ No newline at end of file From 7631268b5ef1a058f91adb6fd989d3ae551f3369 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Thu, 19 Aug 2021 11:02:36 +0300 Subject: [PATCH 38/54] Finished text description --- docs/bash_autocomplete_activation.rst | 10 ++++++++++ docs/source/openfl/components.rst | 8 +++++--- docs/source/workflow/director_based_workflow.rst | 16 ++++++++-------- 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/docs/bash_autocomplete_activation.rst b/docs/bash_autocomplete_activation.rst index a82c7f0e9a..8a64ffc582 100644 --- a/docs/bash_autocomplete_activation.rst +++ b/docs/bash_autocomplete_activation.rst @@ -14,18 +14,23 @@ If not use the instruction :ref:`install_initial_steps`. Create ~/.fx-autocomplete.sh script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + This step need to be done only one time when you don't have `~/.fx-autocomplete.sh` or `~/.fx-autocomplete.sh` have corrupted content. + .. code-block:: console $ _FX_COMPLETE=bash_source fx > ~/.fx-autocomplete.sh Check that command was executed correctly. + .. code-block:: console $ cat ~/.fx-autocomplete.sh Console output should look like example below (Click==8.0.1), but could be different depend on `Click https://click.palletsprojects.com/en/8.0.x/`_ version: + .. code-block:: console + _fx_completion() { local IFS=$'\n' local response @@ -57,15 +62,20 @@ Create ~/.fx-autocomplete.sh script Activate autocomplete feature ~~~~~~~~~~~~~~~~~~~~~ + This step should be done every time when you open a new terminal window. .. code-block:: console + $ source ~/.fx-autocomplete.sh Auto activation autocomplete ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + To save your time you can add autocomplete activation step to `~/.bashrc`. + .. code-block:: bash . ~/.fx-autocomplete.sh + Save `~/.bashrc`. Open new terminal to use updated `~/.bashrc`. diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index abadea0b59..a1e839fd26 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -22,7 +22,8 @@ Spawning components Aggregator =========== -The aggregator is a short-living entity, which means that its lifespan is limited by experiment execution time. It orchestrates collaborators according to the FL plan and performs model updates aggregation. +The aggregator is a short-living entity, which means that its lifespan is limited by experiment execution time. +It orchestrates collaborators according to the FL plan and performs model updates aggregation. The aggregator is spawned by the Director (described below) when a new experiment is submitted. @@ -48,5 +49,6 @@ Director support several concurrent frontend connections (yet experiments are ru Envoy ========= -Some text - +|productName| comes with another long-existing actor called Envoy. It runs on collaborator machines connected to a *Director*. +There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one *Shard Descriptor* to run. +When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator* diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index d557139af7..9079f8e042 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -315,16 +315,16 @@ User must subclass :code:`DataInterface` and implement the following methods: # so on, see the full list of methods below * Shard Descriptor setter and getter methods: - :code:`shard_descriptor(self, shard_descriptor)` setter is the most important method. It will be called during collaborator + :code:`shard_descriptor(self, shard_descriptor)` setter is the most important method. It will be called during the *Collaborator* initialization procedure with the local Shard Descriptor. Any logic that is triggered with the Shard Descriptor replacement must be also put here. -* :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. +* :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything the user expects to receive in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. * :code:`get_valid_loader(self, **kwargs)` - see the point above (just replace training with validation) -* :code:`get_train_data_size(self)` - return number of samples in local train dataset. Use the information provided by Shard Descriptor, take into account you train / validation split. +* :code:`get_train_data_size(self)` - return number of samples in local train dataset. Use the information provided by Shard Descriptor, take into account your train / validation split. * :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. -User Dataset class should be instantiated to pass futher to the *Experiment* object. Dummy *Shard Descriptor* -(or custom local one) may be set up to test the augmentation or batching pipeline. +User Dataset class should be instantiated to pass further to the *Experiment* object. Dummy *Shard Descriptor* +(or a custom local one) may be set up to test the augmentation or batching pipeline. Keyword arguments used during initialization on the frontend node may be used during dataloaders construction on collaborator machines. @@ -337,11 +337,11 @@ Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterf This method: -* Compiles all provided setings to a Plan object. Plan is the central place where all actors in federation look up their parameters. +* Compiles all provided settings to a Plan object. The Plan is the central place where all actors in federation look up their parameters. * Saves plan.yaml to the :code:`plan/` folder inside the workspace. * Serializes interface objects on the disk. * Prepares :code:`requirements.txt` for remote Python environment setup. -* Compressess the whole workspace to an archive. +* Compresses the whole workspace to an archive. * Sends the experiment archive to the Director so it may distribute the archive across the Federation and start the *Aggregator*. Observing the Experiment execution @@ -358,5 +358,5 @@ and :code:`Flexperiment.get_last_model()` metods. When the Experiment is finished ---------------------------------- -User may utilize the same Federation object to report another experiment or even schedule several experiments that +Users may utilize the same Federation object to report another experiment or even schedule several experiments that will be executed one by one. \ No newline at end of file From 6e5ca3c86f3d08f6a3b885acecef43feed0d1c00 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Thu, 19 Aug 2021 11:59:46 +0300 Subject: [PATCH 39/54] rebased on develop --- ...eration.director_based.interactive_api.rst | 228 ------------------ docs/source/openfl/components.rst | 2 +- docs/source/utilities/pki.step_ca.rst | 4 +- 3 files changed, 3 insertions(+), 231 deletions(-) delete mode 100644 docs/running_the_federation.director_based.interactive_api.rst diff --git a/docs/running_the_federation.director_based.interactive_api.rst b/docs/running_the_federation.director_based.interactive_api.rst deleted file mode 100644 index 00c1c7d226..0000000000 --- a/docs/running_the_federation.director_based.interactive_api.rst +++ /dev/null @@ -1,228 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _interactive_api: - -######################################################### -Experimental: |productName| Interactive Python API -######################################################### - -********************************* -Python Interactive API Concepts -********************************* - -Workspace -========== -To initialize the workspace, create an empty folder and a Jupyter notebook (or a Python script) inside it. Root folder of the notebook will be considered as the workspace. -If some objects are imported in the notebook from local modules, source code should be kept inside the workspace. -If one decides to keep local test data inside the workspace, :code:`data` folder should be used as it will not be exported. -If one decides to keep certificates inside the workspace, :code:`cert` folder should be used as it will not be exported. -Only relevant source code or resources should be kept inside the workspace, since it will be zipped and transferred to collaborator machines. - -Python Environment -=================== -Create a virtual Python environment. Please, install only packages that are required for conducting the experiment, since Python environment will be replicated on collaborator nodes. - -****************************************** -Certification -****************************************** -If you have trusted workspace and connection should not be encrypted you can use :code:`disable_tls` option while starting experiment. -Otherwise it is necessary to certify each node participating in the federation. Certificates allow to use mutual tls connection between nodes. -You can certify nodes by your own pki system or use pki provided by OpenFL. It is based on `step-ca `_ -as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, -paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. - -OpenFL PKI workflow -=================== -Openfl PKI pipeline asumes creating local CA with https server which listen signing requests. -Certificates from each node can be signed by requesting to CA server with special token. -Token must be copied to each node by some secure way. Each step is considered in detail below. - -1. Create CA, i.e create root key pair, CA server config and other. - .. code-block:: console - - $ fx pki install -p --ca-url - | :code:`-p` - path to folder, which will contain ca files. - | :code:`--ca-url` - host and port which ca server will listen - When executing this command, you will be prompted for a password and password confirmation. The password will encrypt some ca files. - This command will also download `step-ca `_ and `step `_ binaries from github. - -2. Run CA https server. - .. code-block:: console - - $ fx pki run -p - | :code:`-p` - path to folder, which will contain ca files. - -3. Get token for some node. - - .. code-block:: console - - $ fx pki get-token -n - | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node - - Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA - root certificate concatenated together. This JWT have twenty-four hours time-to-live. - -4. Copy token to node side (director or envoy) by some secure channel and run certify command. - .. code-block:: console - - $ fx pki certify -n -t - | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node - | :code:`-t` - output token from previous command - This command call step client, to connect to CA server over https. - Https is provided by root certificate which was copy with JWT. - Server authenticates client by JWT and client authenticates server by root certificate. - -Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node. - -****************************************** -Defining a Federated Learning Experiment -****************************************** -Interactive API allows setting up an experiment from a single entrypoint - a Jupyter notebook or a Python script. -Defining an experiment includes setting up several interface entities and experiment parameters. - -Federation API -=================== -*Federation* entity is introduced to register and keep information about collaborators settings and their local data, as well as network settings to enable communication inside the federation. -Each federation is bound to some Machine Learning problem in a sense that all collaborators dataset shards should follow the same annotation format for all samples. Once you created a federation, it may be used in several subsequent experiments. - -To set up a federation, use Federation Interactive API. - -.. code-block:: python - - from openfl.interface.interactive_api.federation import Federation - -Federation API class should be initialized with the aggregator node FQDN and encryption settings. Someone may disable mTLS in trusted environments or provide paths to the certificate chain of CA, aggregator certificate and private key to enable mTLS. - -.. code-block:: python - - federation = Federation(central_node_fqdn: str, tls: bool, cert_chain: str, agg_certificate: str, agg_private_key: str) - -Federation's :code:`register_collaborators` method should be used to provide an information about collaborators participating in a federation. -It requires a dictionary object - :code:`{collaborator name : local data path}`. - -Experiment API -=================== - -*Experiment* entity allows registering training related objects, FL tasks and settings. -To set up an FL experiment someone should use the Experiment interactive API. - -.. code-block:: python - - from openfl.interface.interactive_api.experiment import FLExperiment - -*Experiment* is being initialized by taking federation as a parameter. - -.. code-block:: python - - fl_experiment = FLExperiment(federation=federation) - -To start an experiment user must register *DataLoader*, *Federated Learning tasks* and *Model* with *Optimizer*. There are several supplementary interface classes for these purposes. - -.. code-block:: python - - from openfl.interface.interactive_api.experiment import TaskInterface, DataInterface, ModelInterface - -Registering model and optimizer --------------------------------- - -First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. -Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by existing plugins, someone can implement the plugin's interface and point :code:`framework_plugin` to the implementation inside the workspace. - -.. code-block:: python - - from openfl.interface.interactive_api.experiment import ModelInterface - MI = ModelInterface(model=model_unet, optimizer=optimizer_adam, framework_plugin=framework_adapter) - -Registering FL tasks ---------------------- - -We have an agreement on what we consider to be a FL task. -Interactive API currently allows registering only standalone functions defined in the main module or imported from other modules inside the workspace. -We also have requirements on task signature. Task should accept the following objects: - -1. model - will be rebuilt with relevant weights for every task by `TaskRunner` -2. :code:`data_loader` - data loader that will provide local data -3. device - a device to be used for execution on collaborator machines -4. optimizer (optional) - model optimizer, only for training tasks - -Moreover FL tasks should return a dictionary object with metrics :code:`{metric name: metric value for this task}`. - -:code:`Task Interface` class is designed to register task and accompanying information. -This class must be instantiated, then it's special methods may be used to register tasks. - -.. code-block:: python - - TI = TaskInterface() - - task_settings = { - 'batch_size': 32, - 'some_arg': 228, - } - @TI.add_kwargs(**task_settings) - @TI.register_fl_task(model='my_model', data_loader='train_loader', - device='device', optimizer='my_Adam_opt') - def foo(my_model, train_loader, my_Adam_opt, device, batch_size, some_arg=356) - ... - - -:code:`@TI.register_fl_task()` needs tasks argument names for (model, data_loader, device, optimizer (optional)) that constitute tasks 'contract'. -It adds the callable and the task contract to the task registry. - -:code:`@TI.add_kwargs()` method should be used to set up those arguments that are not included in the contract. - -Registering Federated DataLoader ---------------------------------- - -:code:`DataInterface` is provided to support a remote DataLoader initialization. - -It is initialized with User Dataset class object and all the keyword arguments can be used by dataloaders during training or validation. -User must subclass :code:`DataInterface` and implements several methods. - -* :code:`_delayed_init(self, data_path)` is the most important method. It will be called during collaborator initialization procedure with relevant :code:`data_path` (one that corresponds to the collaborator name that user registered in federation). User Dataset class should be instantiated with local :code:`data_path` here. If dataset initalization procedure differs for some of the collaborators, the initialization logic must be described here. Dataset sharding procedure for test runs should also be described in this method. User is free to save objects in class fields for later use. -* :code:`get_train_loader(self, **kwargs)` will be called before training tasks execution. This method must return anything user expects to recieve in the training task with :code:`data_loader` contract argument. :code:`kwargs` dict holds the same information that was provided during :code:`DataInterface` initialization. -* :code:`get_valid_loader(self, **kwargs)` - see the point above only with validation data -* :code:`get_train_data_size(self)` - return number of samples in local train dataset. -* :code:`get_valid_data_size(self)` - return number of samples in local validation dataset. - -Preparing workspace distribution ---------------------------------- -Now we may use :code:`Experiment` API to prepare a workspace archive for transferring to collaborator's node. In order to run a collaborator, we want to replicate the workspace and the Python environment. - -Instances of interface classes :code:`(TaskInterface, DataInterface, ModelInterface)` must be passed to :code:`FLExperiment.prepare_workspace_distribution()` method along with other parameters. - -This method: - -* Compiles all provided setings to a Plan object. This is the central place where all actors in federation look up their parameters. -* Saves plan.yaml to the :code:`plan/` folder inside the workspace. -* Serializes interface objects on the disk. -* Prepares :code:`requirements.txt` for remote Python environment setup. -* Compressess the workspace to an archive so it can be coppied to collaborator nodes. - -Starting the aggregator ---------------------------- - -As all previous steps done, the experiment is ready to start -:code:`FLExperiment.start_experiment()` method requires :code:`model_interface` object with initialized weights. - -It starts a local aggregator that will wait for collaborators to connect. - -Starting collaborators -======================= - -The process of starting collaborators has not changed. -User must transfer the workspace archive to a remote node and type in console: - -.. code-block:: python - - fx workspace import --archive ws.zip - -Please, note that aggregator and all the collaborator nodes should have the same Python interpreter version as the machine used for defining the experiment. - -then cd to the workspace and run - -.. code-block:: python - - fx collaborator start -d data.yaml -n one - -For more details, please refer to the TaskRunner API section. \ No newline at end of file diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index a1e839fd26..1ad6291625 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -51,4 +51,4 @@ Envoy |productName| comes with another long-existing actor called Envoy. It runs on collaborator machines connected to a *Director*. There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one *Shard Descriptor* to run. -When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator* +When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. diff --git a/docs/source/utilities/pki.step_ca.rst b/docs/source/utilities/pki.step_ca.rst index 621df07cc6..53f2509ec1 100644 --- a/docs/source/utilities/pki.step_ca.rst +++ b/docs/source/utilities/pki.step_ca.rst @@ -22,10 +22,10 @@ Token must be copied to each node by some secure way. Each step is considered in 1. Create CA, i.e create root key pair, CA server config and other. .. code-block:: console - $ fx pki install -p --password <123> --ca-url + $ fx pki install -p --ca-url | :code:`-p` - path to folder, which will contain ca files. - | :code:`--password` - password that will encrypts some ca files. | :code:`--ca-url` - host and port which ca server will listen + When executing this command, you will be prompted for a password and password confirmation. The password will encrypt some ca files. This command will also download `step-ca `_ and `step `_ binaries from github. 2. Run CA https server. From 5e5b8f3d0a8b2046cc547457870fe0caebf205a1 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 20 Aug 2021 18:26:47 +0300 Subject: [PATCH 40/54] resolving comments in pr --- docs/source/openfl/components.rst | 12 +++++++++--- docs/source/openfl/interface.rst | 12 ++++++++++++ docs/source/openfl/plugins.rst | 9 +++++++-- .../workflow/director_based_workflow.rst | 19 ++++++++++--------- 4 files changed, 38 insertions(+), 14 deletions(-) create mode 100644 docs/source/openfl/interface.rst diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 1ad6291625..1733241f23 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -30,9 +30,15 @@ The aggregator is spawned by the Director (described below) when a new experimen Collaborator ============= -Collaborator is also a short living entity, it manages training the model on local data: executes assigned tasks, converts DL framework-specific tensor objects to |productName| inner representation, and exchanges model parameters with the aggregator. -Converting tensors is done by Framework adapter plugins. |productName| ships with Pytorch and TensorFlow 2 framework adapters, this list will be extended in the future. User is free to implement their adapter for the required DL framework enabling |productName| support for experiments using this framework. The adapter plugin interface is simple: there are two required methods to load and extract tensors from a model and optimizer. Model is loaded with relevant weights before every task and at the end of the training task, weights are extracted to be sent to the central node and aggregated. -Collaborator instance is created by Envoy (described below) when a new experiment is submitted. Every collaborator is a unique service as it is loaded with a local Shard Descriptor to perform tasks included in an FL experiment. +Collaborator is also a short living entity, it manages training the model on local data: executes assigned tasks, +converts DL framework-specific tensor objects to |productName| inner representation, and exchanges model parameters with the aggregator. +Converting tensors is done by Framework adapter plugins. |productName| ships with Pytorch and Tensorflow 2.x framework adapters. +These framework adapters are intended to be extensible, +and we encourage users to contribute new adapters for DL frameworks they would like to see supported in |productName|. +The adapter plugin interface is simple: there are two required methods to load and extract tensors from a model and optimizer. +Model is loaded with relevant weights before every task and at the end of the training task, weights are extracted to be sent to the central node and aggregated. +Collaborator instance is created by Envoy (described below) when a new experiment is submitted. +Every collaborator is a unique service as it is loaded with a local Shard Descriptor to perform tasks included in an FL experiment. .. _openfl_ll_components: diff --git a/docs/source/openfl/interface.rst b/docs/source/openfl/interface.rst new file mode 100644 index 0000000000..959f76aeb3 --- /dev/null +++ b/docs/source/openfl/interface.rst @@ -0,0 +1,12 @@ +.. # Copyright (C) 2020-2021 Intel Corporation +.. # SPDX-License-Identifier: Apache-2.0 + +****** +|productName| plugins +****** + +.. toctree:: + :maxdepth: 2 + + `...`_ + `...`_ \ No newline at end of file diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst index 959f76aeb3..5504e80db5 100644 --- a/docs/source/openfl/plugins.rst +++ b/docs/source/openfl/plugins.rst @@ -2,11 +2,16 @@ .. # SPDX-License-Identifier: Apache-2.0 ****** -|productName| plugins +|productName| Plugin Components ****** .. toctree:: :maxdepth: 2 + framework_adapter_ `...`_ - `...`_ \ No newline at end of file + +.. _framework_adapter: + +Framework Adapter +###################### \ No newline at end of file diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 9079f8e042..4b75f0aa3e 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -40,13 +40,14 @@ that may be used to identify participants during experiment definition and execu 3. (Optional) Create certificates using Step-CA ================== -All communications inside a Federation may be encrypted with mTLS. User may adapt certificates provided by their organization -or utilize :ref:`PKI ` provided by |productName|. +The use of mTLS is strongly recommended for deployments in untrusted environments to establish participant identity and +to encrypt communication. Users may either import certificates provided by their organization or utilize +:ref:PKI provided by |productName|. 4. Start Director ================== -Director is a central component in the Federation. It should be started on a node with at least one open port. +Director is a central component in the Federation. It should be started on a node with at least two open ports. Learn more about the Director component here: :ref:`openfl_ll_components` Create Director workspace @@ -135,7 +136,7 @@ Alternatively, use the following command to establish a secured connection: ==================================== At this point, data scientists may register their experiments to be executed in the federation. -OpenFL provides a separate frontend Director’s client and :ref:`Interactive Python API ` +|productName| provides a separate frontend Director’s client and :ref:`Interactive Python API ` to register experiments. @@ -227,7 +228,7 @@ Registering model and optimizer First, user instantiate and initilize a model and optimizer in their favorite Deep Learning framework. Please, note that for now interactive API supports only *Keras* and *PyTorch* off-the-shelf. Initialized model and optimizer objects then should be passed to the :code:`ModelInterface` along with the -path to correct Framework Adapter plugin inside OpenFL package. If desired DL framework is not covered by +path to correct Framework Adapter plugin inside |productName| package. If desired DL framework is not covered by existing plugins, user can implement the plugin's interface and point :code:`framework_plugin` to the implementation inside the workspace. @@ -348,12 +349,12 @@ Observing the Experiment execution ---------------------------------- If the Experiment was accepted by the *Director* user can oversee its execution with -:code:`Flexperiment.stream_metrics()` method that will is able to print metrics from the FL tasks (and save tensorboard logs). +:code:`FLexperiment.stream_metrics()` method that will is able to print metrics from the FL tasks (and save tensorboard logs). -When the Experiment is finished, user may retrieve trained models in the native format using :code:`Flexperiment.get_best_model()` -and :code:`Flexperiment.get_last_model()` metods. +When the Experiment is finished, user may retrieve trained models in the native format using :code:`FLexperiment.get_best_model()` +and :code:`FLexperiment.get_last_model()` metods. -:code:`Flexperiment.remove_experiment_data()` allows erasing the experiment's artifacts from the Director. +:code:`FLexperiment.remove_experiment_data()` allows erasing the experiment's artifacts from the Director. When the Experiment is finished ---------------------------------- From bbc52b01316d7bc468e3074652886fe1af160559 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 20 Aug 2021 18:34:38 +0300 Subject: [PATCH 41/54] plugins test --- docs/source/openfl/plugins.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst index 5504e80db5..fe38bded0e 100644 --- a/docs/source/openfl/plugins.rst +++ b/docs/source/openfl/plugins.rst @@ -9,9 +9,17 @@ :maxdepth: 2 framework_adapter_ - `...`_ + serializer_plugin_ .. _framework_adapter: +Framework Adapter +###################### + +text + + +.. _serializer_plugin: + Framework Adapter ###################### \ No newline at end of file From 3a69dbb633703f6c21b969636849a3135d411218 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 20 Aug 2021 18:41:34 +0300 Subject: [PATCH 42/54] small rewrite --- docs/source/workflow/director_based_workflow.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 4b75f0aa3e..5b0d8c1249 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -169,8 +169,8 @@ Create a virtual Python environment. Please, install only packages that are requ Defining a Federated Learning Experiment ======================================== -Interactive API allows setting up an experiment from a single entrypoint - a Jupyter notebook or a Python script. -Defining an experiment includes setting up several interface entities and experiment parameters. +Interactive API allows to register and start an FL experiment from a single entry point - a Jupyter notebook or a Python script. +An FL experiment definition process includes setting up several interface entities and experiment parameters. Federation API ---------------- From bf4443a534cf7faf6708190716988b74f925edb5 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 23 Aug 2021 11:14:55 +0300 Subject: [PATCH 43/54] filled plugins section --- docs/source/openfl/components.rst | 4 +- docs/source/openfl/plugins.rst | 49 +++++++++++++++++-- .../workflow/director_based_workflow.rst | 2 +- .../framework_adapter_interface.py | 2 +- 4 files changed, 50 insertions(+), 7 deletions(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 1733241f23..65738229f2 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -32,10 +32,10 @@ Collaborator Collaborator is also a short living entity, it manages training the model on local data: executes assigned tasks, converts DL framework-specific tensor objects to |productName| inner representation, and exchanges model parameters with the aggregator. -Converting tensors is done by Framework adapter plugins. |productName| ships with Pytorch and Tensorflow 2.x framework adapters. +Converting tensors is done by :ref:`Framework adapter ` plugins. |productName| ships with Pytorch and Tensorflow 2.x framework adapters. These framework adapters are intended to be extensible, and we encourage users to contribute new adapters for DL frameworks they would like to see supported in |productName|. -The adapter plugin interface is simple: there are two required methods to load and extract tensors from a model and optimizer. + Model is loaded with relevant weights before every task and at the end of the training task, weights are extracted to be sent to the central node and aggregated. Collaborator instance is created by Envoy (described below) when a new experiment is submitted. Every collaborator is a unique service as it is loaded with a local Shard Descriptor to perform tasks included in an FL experiment. diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst index fe38bded0e..8275f2faa4 100644 --- a/docs/source/openfl/plugins.rst +++ b/docs/source/openfl/plugins.rst @@ -16,10 +16,53 @@ Framework Adapter ###################### -text +Framework Adapter plugins enable |productName| support for Deep Learning frameworks usage in FL experiments. +All the framework-specific operations on model weights are isolated in this plugin so |productName| can be framework-agnostic. +The Framework adapter plugin interface is simple: there are two required methods to load and extract tensors from +a model and an optimizer. + +:code:`get_tensor_dict` method accepts a model and optionally an optimizer. It should return a dictionary :code:`{tensor_name : ndarray}` +that maps tensor names to tensors in the numpy representation. + +.. code-block:: python + + @staticmethod + def get_tensor_dict(model, optimizer=None) -> dict: + +:code:`set_tensor_dict` method accepts a tensor dictionary, a model and optionally an optimizer. It loads weights from tensor dictionary +to the model inplace. Tensor names in the dictionary matches corresponding names set in :code:`get_tensor_dict` + +.. code-block:: python + + @staticmethod + def set_tensor_dict(model, tensor_dict, optimizer=None, device='cpu') -> None: + +Implement :code:`serialization_setup` optional method if some preparation are required before the model serialization. +This method would be called on the frontend Python API during an FL experiment extraction to the Director side. + +.. code-block:: python + + def serialization_setup(): .. _serializer_plugin: -Framework Adapter -###################### \ No newline at end of file +Experiment Serializer +###################### + +Serializer plugins are used on the Frontend API to serialize the Experiment components and then on Envoys to deserialize them back. +Currently, the default serializer is based on pickling. + +A Serializer plugin must implement :code:`serialize` method that creates a python object's representation on disk. + +.. code-block:: python + + @staticmethod + def serialize(object_, filename: str) -> None: + +As well as :code:`restore_object` that will load previously serialized object from disc. + +.. code-block:: python + + @staticmethod + def restore_object(filename: str): diff --git a/docs/source/workflow/director_based_workflow.rst b/docs/source/workflow/director_based_workflow.rst index 5b0d8c1249..3b7c2cd098 100644 --- a/docs/source/workflow/director_based_workflow.rst +++ b/docs/source/workflow/director_based_workflow.rst @@ -42,7 +42,7 @@ that may be used to identify participants during experiment definition and execu The use of mTLS is strongly recommended for deployments in untrusted environments to establish participant identity and to encrypt communication. Users may either import certificates provided by their organization or utilize -:ref:PKI provided by |productName|. +:ref:`PKI ` provided by |productName|. 4. Start Director ================== diff --git a/openfl/plugins/frameworks_adapters/framework_adapter_interface.py b/openfl/plugins/frameworks_adapters/framework_adapter_interface.py index d9a118ba3b..6107727176 100644 --- a/openfl/plugins/frameworks_adapters/framework_adapter_interface.py +++ b/openfl/plugins/frameworks_adapters/framework_adapter_interface.py @@ -16,7 +16,7 @@ def serialization_setup(): pass @staticmethod - def get_tensor_dict(model, optimizer=None): + def get_tensor_dict(model, optimizer=None) -> dict: """ Extract tensor dict from a model and an optimizer. From babd0670625a048c8c9c6370ffb3a33bb9d76b62 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Mon, 23 Aug 2021 11:51:45 +0300 Subject: [PATCH 44/54] typo fixes for plugins --- docs/source/openfl/plugins.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst index 8275f2faa4..1793451fe4 100644 --- a/docs/source/openfl/plugins.rst +++ b/docs/source/openfl/plugins.rst @@ -22,22 +22,22 @@ The Framework adapter plugin interface is simple: there are two required methods a model and an optimizer. :code:`get_tensor_dict` method accepts a model and optionally an optimizer. It should return a dictionary :code:`{tensor_name : ndarray}` -that maps tensor names to tensors in the numpy representation. +that maps tensor names to tensors in the NumPy representation. .. code-block:: python @staticmethod def get_tensor_dict(model, optimizer=None) -> dict: -:code:`set_tensor_dict` method accepts a tensor dictionary, a model and optionally an optimizer. It loads weights from tensor dictionary -to the model inplace. Tensor names in the dictionary matches corresponding names set in :code:`get_tensor_dict` +:code:`set_tensor_dict` method accepts a tensor dictionary, a model, and optionally an optimizer. It loads weights from the tensor dictionary +to the model in place. Tensor names in the dictionary match corresponding names set in :code:`get_tensor_dict` .. code-block:: python @staticmethod def set_tensor_dict(model, tensor_dict, optimizer=None, device='cpu') -> None: -Implement :code:`serialization_setup` optional method if some preparation are required before the model serialization. +Implement :code:`serialization_setup` optional method if some preparation is required before the model serialization. This method would be called on the frontend Python API during an FL experiment extraction to the Director side. .. code-block:: python @@ -60,7 +60,7 @@ A Serializer plugin must implement :code:`serialize` method that creates a pytho @staticmethod def serialize(object_, filename: str) -> None: -As well as :code:`restore_object` that will load previously serialized object from disc. +As well as :code:`restore_object` that will load previously serialized object from disk. .. code-block:: python From 50016c99de8b3f4e0cb15d3ee431ae82fd98ca2f Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 24 Aug 2021 12:24:21 +0300 Subject: [PATCH 45/54] added a link to a shard descriptor interface --- docs/source/openfl/components.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 65738229f2..feebcfea77 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -56,5 +56,6 @@ Envoy ========= |productName| comes with another long-existing actor called Envoy. It runs on collaborator machines connected to a *Director*. -There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one *Shard Descriptor* to run. +There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one +`*Shard Descriptor* `_ to run. When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. From 3d621570a3a480d83c88d8501ea5a00e64786c6b Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 24 Aug 2021 12:25:57 +0300 Subject: [PATCH 46/54] fix italic --- docs/source/openfl/components.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index feebcfea77..25b0230a82 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -57,5 +57,5 @@ Envoy |productName| comes with another long-existing actor called Envoy. It runs on collaborator machines connected to a *Director*. There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one -`*Shard Descriptor* `_ to run. +*`Shard Descriptor `_* to run. When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. From c5dbae92d0bc4edc25bd28800c2a1ebd9f56e768 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 24 Aug 2021 12:29:06 +0300 Subject: [PATCH 47/54] more fix italic --- docs/source/openfl/components.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 25b0230a82..2f6727a6de 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -22,23 +22,23 @@ Spawning components Aggregator =========== -The aggregator is a short-living entity, which means that its lifespan is limited by experiment execution time. -It orchestrates collaborators according to the FL plan and performs model updates aggregation. -The aggregator is spawned by the Director (described below) when a new experiment is submitted. +The *Aggregator* is a short-living entity, which means that its lifespan is limited by experiment execution time. +It orchestrates *Collaborators* according to the FL plan and performs model updates aggregation. +The *Aggregator* is spawned by the *Director* (described below) when a new experiment is submitted. Collaborator ============= -Collaborator is also a short living entity, it manages training the model on local data: executes assigned tasks, +*Collaborator* is also a short living entity, it manages training the model on local data: executes assigned tasks, converts DL framework-specific tensor objects to |productName| inner representation, and exchanges model parameters with the aggregator. Converting tensors is done by :ref:`Framework adapter ` plugins. |productName| ships with Pytorch and Tensorflow 2.x framework adapters. These framework adapters are intended to be extensible, and we encourage users to contribute new adapters for DL frameworks they would like to see supported in |productName|. Model is loaded with relevant weights before every task and at the end of the training task, weights are extracted to be sent to the central node and aggregated. -Collaborator instance is created by Envoy (described below) when a new experiment is submitted. -Every collaborator is a unique service as it is loaded with a local Shard Descriptor to perform tasks included in an FL experiment. +*Collaborator* instance is created by *Envoy* (described below) when a new experiment is submitted. +Every *collaborator* is a unique service as it is loaded with a local *Shard Descriptor* to perform tasks included in an FL experiment. .. _openfl_ll_components: @@ -48,14 +48,14 @@ Long-living components Director ========== -Director is a long-living entity; it is a central node of the federation and may take in several experiments (with the same data interface). When an experiment is reported director starts an aggregator and sends the experiment data to involved envoys; during the experiment, Director oversees the aggregator and updates the user on the status of the experiment. -Director runs two services: one for frontend users and another one for envoys. It can distribute an experiment reported with the frontend API across the federation and communicate back a trained model snapshot and metrics. -Director support several concurrent frontend connections (yet experiments are run one by one) +*Director* is a long-living entity; it is a central node of the federation and may take in several experiments (with the same data interface). When an experiment is reported director starts an aggregator and sends the experiment data to involved envoys; during the experiment, Director oversees the aggregator and updates the user on the status of the experiment. +*Director* runs two services: one for frontend users and another one for envoys. It can distribute an experiment reported with the frontend API across the federation and communicate back a trained model snapshot and metrics. +*Director* support several concurrent frontend connections (yet experiments are run one by one) Envoy ========= -|productName| comes with another long-existing actor called Envoy. It runs on collaborator machines connected to a *Director*. +|productName| comes with another long-existing actor called *Envoy*. It runs on collaborator machines connected to a *Director*. There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one -*`Shard Descriptor `_* to run. +`Shard Descriptor `_ to run. When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. From f02a9de47cd8230e9a73cb5d7a01dbe4952ca38e Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 25 Aug 2021 19:30:26 +0300 Subject: [PATCH 48/54] moved and fixed data splitters also restructured PKI and the old workflow --- docs/advanced_topics.rst | 1 - docs/data_splitting.rst | 57 -------------- docs/running_the_federation.agg_based.rst | 12 +-- ...t => running_the_federation.baremetal.rst} | 0 ...> running_the_federation.certificates.rst} | 4 +- ....rst => running_the_federation.docker.rst} | 0 ...st => running_the_federation.notebook.rst} | 0 docs/running_the_federation.rst | 2 +- ...=> running_the_federation.singularity.rst} | 0 ...=> running_the_federation.start_nodes.rst} | 0 docs/source/utilities/pki.cert_request.rst | 6 -- docs/source/utilities/pki.rst | 76 +++++++++++++++++-- docs/source/utilities/pki.step_ca.rst | 57 -------------- docs/source/utilities/splitters_data.rst | 61 ++++++++++++++- docs/source/utilities/utilities.rst | 2 +- 15 files changed, 139 insertions(+), 139 deletions(-) delete mode 100644 docs/data_splitting.rst rename docs/{running_the_federation.agg_based.baremetal.rst => running_the_federation.baremetal.rst} (100%) rename docs/{running_the_federation.agg_based.certificates.rst => running_the_federation.certificates.rst} (99%) rename docs/{running_the_federation.agg_based.docker.rst => running_the_federation.docker.rst} (100%) rename docs/{running_the_federation.agg_based.notebook.rst => running_the_federation.notebook.rst} (100%) rename docs/{running_the_federation.agg_based.singularity.rst => running_the_federation.singularity.rst} (100%) rename docs/{running_the_federation.agg_based.start_nodes.rst => running_the_federation.start_nodes.rst} (100%) delete mode 100644 docs/source/utilities/pki.cert_request.rst delete mode 100644 docs/source/utilities/pki.step_ca.rst diff --git a/docs/advanced_topics.rst b/docs/advanced_topics.rst index d5377c6d68..a8635563dc 100644 --- a/docs/advanced_topics.rst +++ b/docs/advanced_topics.rst @@ -15,4 +15,3 @@ Advanced Topics overriding_agg_fn bash_autocomplete_activation log_metric_callback - data_splitting diff --git a/docs/data_splitting.rst b/docs/data_splitting.rst deleted file mode 100644 index 33f0b326ef..0000000000 --- a/docs/data_splitting.rst +++ /dev/null @@ -1,57 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _data_splitting: -=============================== -Specifying custom data splits -=============================== - -------------------------------- -Usage -------------------------------- -|productName| allows developers to use custom data splits **for single-node simulation**. -In order to do this, you should: - -Python API -========== - -Choose from predefined |productName| aggregation functions: - -- ``openfl.plugins.data_splitters.EqualNumPyDataSplitter`` (default) -- ``openfl.plugins.data_splitters.RandomNumPyDataSplitter`` -- ``openfl.component.aggregation_functions.LogNormalNumPyDataSplitter`` - assumes ``data`` argument as ``np.ndarray`` of integers (labels) -- ``openfl.component.aggregation_functions.DirichletNumPyDataSplitter`` - assumes ``data`` argument as ``np.ndarray`` of integers (labels) -Or create an implementation of :class:`openfl.plugins.data_splitters.NumPyDataSplitter` -and pass it to FederatedDataset constructor as either ``train_splitter`` or ``valid_splitter`` keyword argument. - - -CLI -==== - -Choose from predefined |productName| aggregation functions: - -- ``openfl.plugins.data_splitters.EqualNumPyDataSplitter`` (default) -- ``openfl.plugins.data_splitters.RandomNumPyDataSplitter`` -- ``openfl.component.aggregation_functions.LogNormalNumPyDataSplitter`` - assumes ``data`` argument as np.ndarray of integers (labels) -- ``openfl.component.aggregation_functions.DirichletNumPyDataSplitter`` - assumes ``data`` argument as np.ndarray of integers (labels) -Or create your own implementation of :class:`openfl.component.aggregation_functions.AggregationFunctionInterface`. -After defining the splitting behavior, you need to use it on your data to perform a simulation. - -``NumPyDataSplitter`` requires a single ``split`` function. -This function receives ``data`` - NumPy array required to build the subsets of data indices (see definition of :meth:`openfl.plugins.data_splitters.NumPyDataSplitter.split`). It could be the whole dataset, or labels only, or anything else. -``split`` function returns a list of lists of indices which represent the collaborator-wise indices groups. - - .. code-block:: python - X_train, y_train = ... # train set - X_valid, y_valid = ... # valid set - train_splitter = RandomNumPyDataSplitter() - valid_splitter = RandomNumPyDataSplitter() - # collaborator_count value is passed to DataLoader constructor - # shard_num can be evaluated from data_path - train_idx = train_splitter.split(y_train, collaborator_count)[shard_num] - valid_idx = valid_splitter.split(y_valid, collaborator_count)[shard_num] - X_train_shard = X_train[train_idx] - X_valid_shard = X_valid[valid_idx] - -.. note:: - By default, we shuffle the data and perform equal split (see :class:`openfl.plugins.data_splitters.EqualNumPyDataSplitter`). diff --git a/docs/running_the_federation.agg_based.rst b/docs/running_the_federation.agg_based.rst index 4326c1e7bc..4797e71b2a 100644 --- a/docs/running_the_federation.agg_based.rst +++ b/docs/running_the_federation.agg_based.rst @@ -16,10 +16,10 @@ First make sure you've installed the software :ref:`using these instructions `_ +as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, +paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. + +OpenFL PKI workflow +=================== +Openfl PKI pipeline asumes creating local CA with https server which listen signing requests. +Certificates from each node can be signed by requesting to CA server with special token. +Token must be copied to each node by some secure way. Each step is considered in detail below. + +1. Create CA, i.e create root key pair, CA server config and other. + .. code-block:: console + + $ fx pki install -p --ca-url + | :code:`-p` - path to folder, which will contain ca files. + | :code:`--ca-url` - host and port which ca server will listen + When executing this command, you will be prompted for a password and password confirmation. The password will encrypt some ca files. + This command will also download `step-ca `_ and `step `_ binaries from github. + +2. Run CA https server. + .. code-block:: console + + $ fx pki run -p + | :code:`-p` - path to folder, which will contain ca files. + +3. Get token for some node. + + .. code-block:: console + + $ fx pki get-token -n + | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node + + Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA + root certificate concatenated together. This JWT have twenty-four hours time-to-live. + +4. Copy token to node side (director or envoy) by some secure channel and run certify command. + .. code-block:: console + + $ fx pki certify -n -t + | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node + | :code:`-t` - output token from previous command + This command call step client, to connect to CA server over https. + Https is provided by root certificate which was copy with JWT. + Server authenticates client by JWT and client authenticates server by root certificate. + +Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node. + + + +.. _manual_certification: + + +Manual PKI +************ + +This solution is embedded into the Aggregator-based |productName| workflow. +Please, refer to the :ref:`instruction_manual_certs` section. \ No newline at end of file diff --git a/docs/source/utilities/pki.step_ca.rst b/docs/source/utilities/pki.step_ca.rst deleted file mode 100644 index 53f2509ec1..0000000000 --- a/docs/source/utilities/pki.step_ca.rst +++ /dev/null @@ -1,57 +0,0 @@ -.. # Copyright (C) 2020-2021 Intel Corporation -.. # SPDX-License-Identifier: Apache-2.0 - -.. _semi_automatic_certification: - -****************************************** -Federation actors certification with Semi-automatic PKI -****************************************** - -If you have trusted workspace and connection should not be encrypted you can use :code:`disable_tls` option while starting experiment. -Otherwise it is necessary to certify each node participating in the federation. Certificates allow to use mutual tls connection between nodes. -You can certify nodes by your own pki system or use pki provided by OpenFL. It is based on `step-ca `_ -as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, -paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. - -OpenFL PKI workflow -=================== -Openfl PKI pipeline asumes creating local CA with https server which listen signing requests. -Certificates from each node can be signed by requesting to CA server with special token. -Token must be copied to each node by some secure way. Each step is considered in detail below. - -1. Create CA, i.e create root key pair, CA server config and other. - .. code-block:: console - - $ fx pki install -p --ca-url - | :code:`-p` - path to folder, which will contain ca files. - | :code:`--ca-url` - host and port which ca server will listen - When executing this command, you will be prompted for a password and password confirmation. The password will encrypt some ca files. - This command will also download `step-ca `_ and `step `_ binaries from github. - -2. Run CA https server. - .. code-block:: console - - $ fx pki run -p - | :code:`-p` - path to folder, which will contain ca files. - -3. Get token for some node. - - .. code-block:: console - - $ fx pki get-token -n - | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node - - Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA - root certificate concatenated together. This JWT have twenty-four hours time-to-live. - -4. Copy token to node side (director or envoy) by some secure channel and run certify command. - .. code-block:: console - - $ fx pki certify -n -t - | :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node - | :code:`-t` - output token from previous command - This command call step client, to connect to CA server over https. - Https is provided by root certificate which was copy with JWT. - Server authenticates client by JWT and client authenticates server by root certificate. - -Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node. diff --git a/docs/source/utilities/splitters_data.rst b/docs/source/utilities/splitters_data.rst index 1c0bed123b..2834050a4d 100644 --- a/docs/source/utilities/splitters_data.rst +++ b/docs/source/utilities/splitters_data.rst @@ -1,6 +1,61 @@ .. # Copyright (C) 2020-2021 Intel Corporation .. # SPDX-License-Identifier: Apache-2.0 -*********** -Data Splitters -*********** +.. _data_splitting: + +************************************ +Specifying Custom Data Splits +************************************ + +------------------------------- +Usage +------------------------------- + +|productName| allows developers to use custom data splits **for simulation runs on a single dataset**. +In order to do this, you should: + +Native Python API +========== + +Choose from predefined |productName| data splitters functions: + +- ``openfl.plugins.data_splitters.EqualNumPyDataSplitter`` (default) +- ``openfl.plugins.data_splitters.RandomNumPyDataSplitter`` +- ``openfl.component.aggregation_functions.LogNormalNumPyDataSplitter`` - assumes ``data`` argument as ``np.ndarray`` of integers (labels) +- ``openfl.component.aggregation_functions.DirichletNumPyDataSplitter`` - assumes ``data`` argument as ``np.ndarray`` of integers (labels) +Or create an implementation of :class:`openfl.plugins.data_splitters.NumPyDataSplitter` +and pass it to FederatedDataset constructor as either ``train_splitter`` or ``valid_splitter`` keyword argument. + + +Using in Shard Descriptor +================== + +Choose from predefined |productName| data splitters functions: + +- ``openfl.plugins.data_splitters.EqualNumPyDataSplitter`` (default) +- ``openfl.plugins.data_splitters.RandomNumPyDataSplitter`` +- ``openfl.component.aggregation_functions.LogNormalNumPyDataSplitter`` - assumes ``data`` argument as np.ndarray of integers (labels) +- ``openfl.component.aggregation_functions.DirichletNumPyDataSplitter`` - assumes ``data`` argument as np.ndarray of integers (labels) +Or create your own implementation of :class:`openfl.component.aggregation_functions.AggregationFunctionInterface`. +After defining the splitting behavior, you need to use it on your data to perform a simulation. + +``NumPyDataSplitter`` requires a single ``split`` function. +This function receives ``data`` - NumPy array required to build the subsets of data indices (see definition of :meth:`openfl.plugins.data_splitters.NumPyDataSplitter.split`). It could be the whole dataset, or labels only, or anything else. +``split`` function returns a list of lists of indices which represent the collaborator-wise indices groups. + +.. code-block:: python + + X_train, y_train = ... # train set + X_valid, y_valid = ... # valid set + train_splitter = RandomNumPyDataSplitter() + valid_splitter = RandomNumPyDataSplitter() + # collaborator_count value is passed to DataLoader constructor + # shard_num can be evaluated from data_path + train_idx = train_splitter.split(y_train, collaborator_count)[shard_num] + valid_idx = valid_splitter.split(y_valid, collaborator_count)[shard_num] + X_train_shard = X_train[train_idx] + X_valid_shard = X_valid[valid_idx] + +.. note:: + + By default, we shuffle the data and perform equal split (see :class:`openfl.plugins.data_splitters.EqualNumPyDataSplitter`). diff --git a/docs/source/utilities/utilities.rst b/docs/source/utilities/utilities.rst index 7f40842853..3d5047de9d 100644 --- a/docs/source/utilities/utilities.rst +++ b/docs/source/utilities/utilities.rst @@ -6,7 +6,7 @@ ****** .. toctree:: - :maxdepth: 4 + :maxdepth: 2 pki splitters_data \ No newline at end of file From 48b835b902f359e7b5bd20fe26a907bb1f0bad5b Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 25 Aug 2021 19:45:39 +0300 Subject: [PATCH 49/54] fixes --- docs/source/openfl/components.rst | 2 +- docs/source/utilities/splitters_data.rst | 12 +++++------- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 2f6727a6de..8371ba8ccf 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -38,7 +38,7 @@ and we encourage users to contribute new adapters for DL frameworks they would l Model is loaded with relevant weights before every task and at the end of the training task, weights are extracted to be sent to the central node and aggregated. *Collaborator* instance is created by *Envoy* (described below) when a new experiment is submitted. -Every *collaborator* is a unique service as it is loaded with a local *Shard Descriptor* to perform tasks included in an FL experiment. +Every *Collaborator* is a unique service as it is loaded with a local *Shard Descriptor* to perform tasks included in an FL experiment. .. _openfl_ll_components: diff --git a/docs/source/utilities/splitters_data.rst b/docs/source/utilities/splitters_data.rst index 2834050a4d..d436de3c7e 100644 --- a/docs/source/utilities/splitters_data.rst +++ b/docs/source/utilities/splitters_data.rst @@ -4,18 +4,16 @@ .. _data_splitting: ************************************ -Specifying Custom Data Splits +Dataset Splitters ************************************ -------------------------------- -Usage -------------------------------- -|productName| allows developers to use custom data splits **for simulation runs on a single dataset**. -In order to do this, you should: +|productName| allows developers to use specify custom data splits **for simulation runs on a single dataset**. + +You may apply data splitters differently depending on |productName| workflow that you follow. Native Python API -========== +================== Choose from predefined |productName| data splitters functions: From 6f3c66d8dda4f36b1052310d091c65606fc26ea5 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Wed, 25 Aug 2021 19:49:56 +0300 Subject: [PATCH 50/54] moved the old workflow to sources --- docs/running_the_federation.rst | 2 +- docs/{ => source/workflow}/running_the_federation.agg_based.rst | 0 docs/{ => source/workflow}/running_the_federation.baremetal.rst | 0 .../workflow}/running_the_federation.certificates.rst | 0 docs/{ => source/workflow}/running_the_federation.docker.rst | 0 docs/{ => source/workflow}/running_the_federation.notebook.rst | 0 .../workflow}/running_the_federation.singularity.rst | 0 .../workflow}/running_the_federation.start_nodes.rst | 0 8 files changed, 1 insertion(+), 1 deletion(-) rename docs/{ => source/workflow}/running_the_federation.agg_based.rst (100%) rename docs/{ => source/workflow}/running_the_federation.baremetal.rst (100%) rename docs/{ => source/workflow}/running_the_federation.certificates.rst (100%) rename docs/{ => source/workflow}/running_the_federation.docker.rst (100%) rename docs/{ => source/workflow}/running_the_federation.notebook.rst (100%) rename docs/{ => source/workflow}/running_the_federation.singularity.rst (100%) rename docs/{ => source/workflow}/running_the_federation.start_nodes.rst (100%) diff --git a/docs/running_the_federation.rst b/docs/running_the_federation.rst index 58086874a0..ef028a7fb2 100644 --- a/docs/running_the_federation.rst +++ b/docs/running_the_federation.rst @@ -18,5 +18,5 @@ The high-level workflow is shown in the figure above. Note that once OpenFL is i .. toctree:: :maxdepth: 2 - running_the_federation.agg_based + source/workflow/running_the_federation.agg_based source/workflow/director_based_workflow diff --git a/docs/running_the_federation.agg_based.rst b/docs/source/workflow/running_the_federation.agg_based.rst similarity index 100% rename from docs/running_the_federation.agg_based.rst rename to docs/source/workflow/running_the_federation.agg_based.rst diff --git a/docs/running_the_federation.baremetal.rst b/docs/source/workflow/running_the_federation.baremetal.rst similarity index 100% rename from docs/running_the_federation.baremetal.rst rename to docs/source/workflow/running_the_federation.baremetal.rst diff --git a/docs/running_the_federation.certificates.rst b/docs/source/workflow/running_the_federation.certificates.rst similarity index 100% rename from docs/running_the_federation.certificates.rst rename to docs/source/workflow/running_the_federation.certificates.rst diff --git a/docs/running_the_federation.docker.rst b/docs/source/workflow/running_the_federation.docker.rst similarity index 100% rename from docs/running_the_federation.docker.rst rename to docs/source/workflow/running_the_federation.docker.rst diff --git a/docs/running_the_federation.notebook.rst b/docs/source/workflow/running_the_federation.notebook.rst similarity index 100% rename from docs/running_the_federation.notebook.rst rename to docs/source/workflow/running_the_federation.notebook.rst diff --git a/docs/running_the_federation.singularity.rst b/docs/source/workflow/running_the_federation.singularity.rst similarity index 100% rename from docs/running_the_federation.singularity.rst rename to docs/source/workflow/running_the_federation.singularity.rst diff --git a/docs/running_the_federation.start_nodes.rst b/docs/source/workflow/running_the_federation.start_nodes.rst similarity index 100% rename from docs/running_the_federation.start_nodes.rst rename to docs/source/workflow/running_the_federation.start_nodes.rst From 29bd5f9f26cedd9a7765e5a6792530de3e9177ba Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Fri, 27 Aug 2021 17:39:53 +0300 Subject: [PATCH 51/54] static diagram + fixes --- docs/source/openfl/components.rst | 2 +- docs/source/openfl/plugins.rst | 4 +- docs/source/utilities/pki.rst | 2 +- .../structurizr-1-Containers.svg | 1 + docs/structurizer_dsl/workspace.dsl | 97 ++++ docs/structurizer_dsl/workspace.json | 509 ++++++++++++++++++ 6 files changed, 611 insertions(+), 4 deletions(-) create mode 100644 docs/structurizer_dsl/structurizr-1-Containers.svg create mode 100755 docs/structurizer_dsl/workspace.dsl create mode 100644 docs/structurizer_dsl/workspace.json diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 8371ba8ccf..fa9bbc0ee2 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -23,7 +23,7 @@ Aggregator =========== The *Aggregator* is a short-living entity, which means that its lifespan is limited by experiment execution time. -It orchestrates *Collaborators* according to the FL plan and performs model updates aggregation. +It orchestrates *Collaborators* according to the FL plan and performs model aggregation at the end of each round. The *Aggregator* is spawned by the *Director* (described below) when a new experiment is submitted. diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst index 1793451fe4..61dc84e43e 100644 --- a/docs/source/openfl/plugins.rst +++ b/docs/source/openfl/plugins.rst @@ -37,8 +37,8 @@ to the model in place. Tensor names in the dictionary match corresponding names @staticmethod def set_tensor_dict(model, tensor_dict, optimizer=None, device='cpu') -> None: -Implement :code:`serialization_setup` optional method if some preparation is required before the model serialization. -This method would be called on the frontend Python API during an FL experiment extraction to the Director side. +If your new framework model cannot be directly serialized with pickle-type libraries, you can optionally +implement the :code:`serialization_setup` method to prepare the model object for serialization. .. code-block:: python diff --git a/docs/source/utilities/pki.rst b/docs/source/utilities/pki.rst index b52d33753f..933296b5b2 100644 --- a/docs/source/utilities/pki.rst +++ b/docs/source/utilities/pki.rst @@ -19,7 +19,7 @@ Certification of Actors in Federation with Semi-automatic PKI If you have trusted workspace and connection should not be encrypted you can use :code:`disable_tls` option while starting experiment. Otherwise it is necessary to certify each node participating in the federation. Certificates allow to use mutual tls connection between nodes. -You can certify nodes by your own pki system or use pki provided by OpenFL. It is based on `step-ca `_ +You can certify nodes by your own PKI system or use PKI provided by OpenFL. It is based on `step-ca `_ as a server and `step `_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method, paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below. diff --git a/docs/structurizer_dsl/structurizr-1-Containers.svg b/docs/structurizer_dsl/structurizr-1-Containers.svg new file mode 100644 index 0000000000..8f816d4e88 --- /dev/null +++ b/docs/structurizer_dsl/structurizr-1-Containers.svg @@ -0,0 +1 @@ +Friday, 27 August 2021, 16:25 Moscow Standard TimeContainer diagram for OpenFLOpenFL[Software System]Central node-Collaborator node-Data scientist[Person]A person or group of peopleusing OpenFLEnvoy[Container]A long-living entity that can adapt alocal data set and spawncollaborators+Collaborator manager[Person]Data owner's representativecontrolling EnvoyDirector manager[Person]-Collaborator[Container]Actor executing tasks on local datainside one experiment+Python API component[Container]A set of tools to setup register FLExperiments+Director[Container]A long-living entity that can spawnaggregators-Aggregator[Container]Model server and collaboratororchestrator-Launches. Setsup globalFederationsettings--Remove link.Link options.Launches.Provides localdatasetShardDescriptors--Remove link.Link options.Sends locallytuned tensorsand trainingmetrics--Remove vertex.Remove link.Link options.Provides FL Plans,Tasks, Models,DataLoaders--Remove link.Link options.Sends tasks andinitial tensors--Remove vertex.Remove link.Link options.Approves, SendsFL experiments--Remove vertex.Remove link.Link options.Communicatesdataset info,Sends statusupdates--Remove vertex.Remove link.Link options.Creates aninstance tomaintain an FLexperiment--Remove link.Link options.Creates aninstance tomaintain an FLexperiment--Remove link.Link options.Sendsinformationabout theFederation.Returns trainingartifacts.--Remove vertex.Remove link.Link options.Registers FLexperiments--Remove vertex.Remove link.Link options. \ No newline at end of file diff --git a/docs/structurizer_dsl/workspace.dsl b/docs/structurizer_dsl/workspace.dsl new file mode 100755 index 0000000000..b6af7bd8cb --- /dev/null +++ b/docs/structurizer_dsl/workspace.dsl @@ -0,0 +1,97 @@ + +workspace "OpenFL" "An open framework for Federated Learning." { + model { + group "Control" { + user = person "Data scientist" "A person or group of people using OpenFL" + shardOwner = person "Collaborator manager" "Data owner's representative controlling Envoy" + centralManager = person "Director manager" + governor = softwareSystem "Governor" "CCF-based system for corporate clients" + } + openfl = softwareSystem "OpenFL" "An open framework for Federated Learning" { + apiLayer = container "Python API component" "A set of tools to setup register FL Experiments" { + federationInterface = component "Federaion Interface" + experimentInterface = component "Experiment Interface" + # TaskInterface = component "" + } + + group "Central node" { + director = container "Director" "A long-living entity that can spawn aggregators" + aggregator = container "Aggregator" "Model server and collaborator orchestrator"{ + assigner = component "Task Assigner" "Decides the policy for which collaborators should run FL tasks" + grpcServer = component "gRPC Server" + } + } + group "Collaborator node" { + envoy = container "Envoy" "A long-living entity that can adapt a local data set and spawn collaborators" { + shardDescriptor = component "Shard Descriptor" "Data manager's interface aimed to unify data access" { + tags "Interface" + } + } + collaborator = container "Collaborator" "Actor executing tasks on local data inside one experiment" { + pluginManager = component "Plugin Manager" + taskRunner = component "Task Runner" + tensorDB = component "Tensor Data Base" + tensorCodec = component "TensorCodec" + grpcClient = component "gRPC Client" + frameworkAdapter = component "Framework Adapter" + } + } + } + config = element "Config file" + + # relationships between people and software systems + user -> openfl "Controls Fedarations. Provides FL plans, tasks, models, data" + governor -> openfl "Controls Fedarations" + + # relationships to/from containers + user -> apiLayer "Provides FL Plans, Tasks, Models, DataLoaders" + shardOwner -> envoy "Launches. Provides local dataset ShardDescriptors" + centralManager -> director "Launches. Sets up global Federation settings" + apiLayer -> director "Registers FL experiments" + director -> apiLayer "Sends information about the Federation. Returns training artifacts." + director -> aggregator "Creates an instance to maintain an FL experiment" + envoy -> collaborator "Creates an instance to maintain an FL experiment" + envoy -> director "Communicates dataset info, Sends status updates" + director -> envoy "Approves, Sends FL experiments" + aggregator -> collaborator "Sends tasks and initial tensors" + collaborator -> aggregator "Sends locally tuned tensors and training metrics" + + + # relationships to/from components + envoy -> taskRunner "Provides tasks' defenitions" + grpcClient -> taskRunner "Invokes some tasks for the round" + aggregator -> grpcClient "Communicates" + } + + views + theme default + + systemcontext openfl "SystemContext" { + include * + autoLayout + + } + + container openfl "Containers" { + include * + # include config + # autoLayout + } + + component collaborator "Collaborator" { + include * + autoLayout + } + + component apiLayer "API" { + include * + autoLayout + } + + component envoy "Envoy" { + include * + autoLayout + } + +} + diff --git a/docs/structurizer_dsl/workspace.json b/docs/structurizer_dsl/workspace.json new file mode 100644 index 0000000000..71ac98b00a --- /dev/null +++ b/docs/structurizer_dsl/workspace.json @@ -0,0 +1,509 @@ +{ + "id" : 1, + "name" : "OpenFL", + "description" : "An open framework for Federated Learning.", + "revision" : 0, + "lastModifiedDate" : "2021-08-27T13:28:56Z", + "lastModifiedAgent" : "structurizr-web/2475", + "properties" : { + "structurizr.dsl" : "CndvcmtzcGFjZSAiT3BlbkZMIiAiQW4gb3BlbiBmcmFtZXdvcmsgZm9yIEZlZGVyYXRlZCBMZWFybmluZy4iIHsKICAgIG1vZGVsIHsKICAgICAgICBncm91cCAiQ29udHJvbCIgewogICAgICAgICAgICB1c2VyID0gcGVyc29uICJEYXRhIHNjaWVudGlzdCIgIkEgcGVyc29uIG9yIGdyb3VwIG9mIHBlb3BsZSB1c2luZyBPcGVuRkwiCiAgICAgICAgICAgIHNoYXJkT3duZXIgPSBwZXJzb24gIkNvbGxhYm9yYXRvciBtYW5hZ2VyIiAiRGF0YSBvd25lcidzIHJlcHJlc2VudGF0aXZlIGNvbnRyb2xsaW5nIEVudm95IgogICAgICAgICAgICBjZW50cmFsTWFuYWdlciA9IHBlcnNvbiAiRGlyZWN0b3IgbWFuYWdlciIgCiAgICAgICAgICAgIGdvdmVybm9yID0gc29mdHdhcmVTeXN0ZW0gIkdvdmVybm9yIiAiQ0NGLWJhc2VkIHN5c3RlbSBmb3IgY29ycG9yYXRlIGNsaWVudHMiCiAgICAgICAgfQogICAgICAgIG9wZW5mbCA9IHNvZnR3YXJlU3lzdGVtICJPcGVuRkwiICJBbiBvcGVuIGZyYW1ld29yayBmb3IgRmVkZXJhdGVkIExlYXJuaW5nIiB7CiAgICAgICAgICAgIGFwaUxheWVyID0gY29udGFpbmVyICJQeXRob24gQVBJIGNvbXBvbmVudCIgIkEgc2V0IG9mIHRvb2xzIHRvIHNldHVwIHJlZ2lzdGVyIEZMIEV4cGVyaW1lbnRzIiB7CiAgICAgICAgICAgICAgICBmZWRlcmF0aW9uSW50ZXJmYWNlID0gY29tcG9uZW50ICJGZWRlcmFpb24gSW50ZXJmYWNlIgogICAgICAgICAgICAgICAgZXhwZXJpbWVudEludGVyZmFjZSA9IGNvbXBvbmVudCAiRXhwZXJpbWVudCBJbnRlcmZhY2UiCiAgICAgICAgICAgICAgICAjIFRhc2tJbnRlcmZhY2UgPSBjb21wb25lbnQgIiIKICAgICAgICAgICAgfQoKICAgICAgICAgICAgZ3JvdXAgIkNlbnRyYWwgbm9kZSIgewogICAgICAgICAgICAgICAgZGlyZWN0b3IgPSBjb250YWluZXIgIkRpcmVjdG9yIiAiQSBsb25nLWxpdmluZyBlbnRpdHkgdGhhdCBjYW4gc3Bhd24gYWdncmVnYXRvcnMiCiAgICAgICAgICAgICAgICBhZ2dyZWdhdG9yID0gY29udGFpbmVyICJBZ2dyZWdhdG9yIiAiTW9kZWwgc2VydmVyIGFuZCBjb2xsYWJvcmF0b3Igb3JjaGVzdHJhdG9yInsKICAgICAgICAgICAgICAgICAgICBhc3NpZ25lciA9IGNvbXBvbmVudCAiVGFzayBBc3NpZ25lciIgIkRlY2lkZXMgdGhlIHBvbGljeSBmb3Igd2hpY2ggY29sbGFib3JhdG9ycyBzaG91bGQgcnVuIEZMIHRhc2tzIgogICAgICAgICAgICAgICAgICAgIGdycGNTZXJ2ZXIgPSBjb21wb25lbnQgImdSUEMgU2VydmVyIgogICAgICAgICAgICAgICAgfQogICAgICAgICAgICB9CiAgICAgICAgICAgIGdyb3VwICJDb2xsYWJvcmF0b3Igbm9kZSIgewogICAgICAgICAgICAgICAgZW52b3kgPSBjb250YWluZXIgIkVudm95IiAiQSBsb25nLWxpdmluZyBlbnRpdHkgdGhhdCBjYW4gYWRhcHQgYSBsb2NhbCBkYXRhIHNldCBhbmQgc3Bhd24gY29sbGFib3JhdG9ycyIgewogICAgICAgICAgICAgICAgICAgIHNoYXJkRGVzY3JpcHRvciA9IGNvbXBvbmVudCAiU2hhcmQgRGVzY3JpcHRvciIgIkRhdGEgbWFuYWdlcidzIGludGVyZmFjZSBhaW1lZCB0byB1bmlmeSBkYXRhIGFjY2VzcyIgewogICAgICAgICAgICAgICAgICAgICAgICB0YWdzICJJbnRlcmZhY2UiCiAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgY29sbGFib3JhdG9yID0gY29udGFpbmVyICJDb2xsYWJvcmF0b3IiICJBY3RvciBleGVjdXRpbmcgdGFza3Mgb24gbG9jYWwgZGF0YSBpbnNpZGUgb25lIGV4cGVyaW1lbnQiIHsKICAgICAgICAgICAgICAgICAgICBwbHVnaW5NYW5hZ2VyID0gY29tcG9uZW50ICJQbHVnaW4gTWFuYWdlciIKICAgICAgICAgICAgICAgICAgICB0YXNrUnVubmVyID0gY29tcG9uZW50ICJUYXNrIFJ1bm5lciIKICAgICAgICAgICAgICAgICAgICB0ZW5zb3JEQiA9IGNvbXBvbmVudCAiVGVuc29yIERhdGEgQmFzZSIKICAgICAgICAgICAgICAgICAgICB0ZW5zb3JDb2RlYyA9IGNvbXBvbmVudCAiVGVuc29yQ29kZWMiCiAgICAgICAgICAgICAgICAgICAgZ3JwY0NsaWVudCA9IGNvbXBvbmVudCAiZ1JQQyBDbGllbnQiCiAgICAgICAgICAgICAgICAgICAgZnJhbWV3b3JrQWRhcHRlciA9IGNvbXBvbmVudCAiRnJhbWV3b3JrIEFkYXB0ZXIiCiAgICAgICAgICAgICAgICB9CiAgICAgICAgICAgIH0KICAgICAgICB9CiAgICAgICAgY29uZmlnID0gZWxlbWVudCAiQ29uZmlnIGZpbGUiCgogICAgICAgICMgcmVsYXRpb25zaGlwcyBiZXR3ZWVuIHBlb3BsZSBhbmQgc29mdHdhcmUgc3lzdGVtcwogICAgICAgIHVzZXIgLT4gb3BlbmZsICJDb250cm9scyBGZWRhcmF0aW9ucy4gUHJvdmlkZXMgRkwgcGxhbnMsIHRhc2tzLCBtb2RlbHMsIGRhdGEiCiAgICAgICAgZ292ZXJub3IgLT4gb3BlbmZsICJDb250cm9scyBGZWRhcmF0aW9ucyIKCiAgICAgICAgIyByZWxhdGlvbnNoaXBzIHRvL2Zyb20gY29udGFpbmVycwogICAgICAgIHVzZXIgLT4gYXBpTGF5ZXIgIlByb3ZpZGVzIEZMIFBsYW5zLCBUYXNrcywgTW9kZWxzLCBEYXRhTG9hZGVycyIKICAgICAgICBzaGFyZE93bmVyIC0+IGVudm95ICJMYXVuY2hlcy4gUHJvdmlkZXMgbG9jYWwgZGF0YXNldCBTaGFyZERlc2NyaXB0b3JzIgogICAgICAgIGNlbnRyYWxNYW5hZ2VyIC0+IGRpcmVjdG9yICJMYXVuY2hlcy4gU2V0cyB1cCBnbG9iYWwgRmVkZXJhdGlvbiBzZXR0aW5ncyIKICAgICAgICBhcGlMYXllciAtPiBkaXJlY3RvciAiUmVnaXN0ZXJzIEZMIGV4cGVyaW1lbnRzIgogICAgICAgIGRpcmVjdG9yIC0+IGFwaUxheWVyICJTZW5kcyBpbmZvcm1hdGlvbiBhYm91dCB0aGUgRmVkZXJhdGlvbi4gUmV0dXJucyB0cmFpbmluZyBhcnRpZmFjdHMuIgogICAgICAgIGRpcmVjdG9yIC0+IGFnZ3JlZ2F0b3IgIkNyZWF0ZXMgYW4gaW5zdGFuY2UgdG8gbWFpbnRhaW4gYW4gRkwgZXhwZXJpbWVudCIKICAgICAgICBlbnZveSAtPiBjb2xsYWJvcmF0b3IgIkNyZWF0ZXMgYW4gaW5zdGFuY2UgdG8gbWFpbnRhaW4gYW4gRkwgZXhwZXJpbWVudCIKICAgICAgICBlbnZveSAtPiBkaXJlY3RvciAiQ29tbXVuaWNhdGVzIGRhdGFzZXQgaW5mbywgU2VuZHMgc3RhdHVzIHVwZGF0ZXMiCiAgICAgICAgZGlyZWN0b3IgLT4gZW52b3kgIkFwcHJvdmVzLCBTZW5kcyBGTCBleHBlcmltZW50cyIKICAgICAgICBhZ2dyZWdhdG9yIC0+IGNvbGxhYm9yYXRvciAiU2VuZHMgdGFza3MgYW5kIGluaXRpYWwgdGVuc29ycyIKICAgICAgICBjb2xsYWJvcmF0b3IgLT4gYWdncmVnYXRvciAiU2VuZHMgbG9jYWxseSB0dW5lZCB0ZW5zb3JzIGFuZCB0cmFpbmluZyBtZXRyaWNzIgoKCiAgICAgICAgIyByZWxhdGlvbnNoaXBzIHRvL2Zyb20gY29tcG9uZW50cwogICAgICAgIGVudm95IC0+IHRhc2tSdW5uZXIgIlByb3ZpZGVzIHRhc2tzJyBkZWZlbml0aW9ucyIKICAgICAgICBncnBjQ2xpZW50IC0+IHRhc2tSdW5uZXIgIkludm9rZXMgc29tZSB0YXNrcyBmb3IgdGhlIHJvdW5kIgogICAgICAgIGFnZ3JlZ2F0b3IgLT4gZ3JwY0NsaWVudCAiQ29tbXVuaWNhdGVzIgogICAgfQoKICAgIHZpZXdzCiAgICAgICAgdGhlbWUgZGVmYXVsdAoKICAgICAgICBzeXN0ZW1jb250ZXh0IG9wZW5mbCAiU3lzdGVtQ29udGV4dCIgewogICAgICAgICAgICBpbmNsdWRlICoKICAgICAgICAgICAgYXV0b0xheW91dAogICAgICAgICAgICAKICAgICAgICB9CgogICAgICAgIGNvbnRhaW5lciBvcGVuZmwgIkNvbnRhaW5lcnMiIHsKICAgICAgICAgICAgaW5jbHVkZSAqCiAgICAgICAgICAgICMgaW5jbHVkZSBjb25maWcKICAgICAgICAgICAgIyBhdXRvTGF5b3V0CiAgICAgICAgfQoKICAgICAgICBjb21wb25lbnQgY29sbGFib3JhdG9yICJDb2xsYWJvcmF0b3IiIHsKICAgICAgICAgICAgaW5jbHVkZSAqCiAgICAgICAgICAgIGF1dG9MYXlvdXQKICAgICAgICB9CgogICAgICAgIGNvbXBvbmVudCBhcGlMYXllciAiQVBJIiB7CiAgICAgICAgICAgIGluY2x1ZGUgKgogICAgICAgICAgICBhdXRvTGF5b3V0CiAgICAgICAgfQoKICAgICAgICBjb21wb25lbnQgZW52b3kgIkVudm95IiB7CiAgICAgICAgICAgIGluY2x1ZGUgKgogICAgICAgICAgICBhdXRvTGF5b3V0CiAgICAgICAgfQoKfQoK" + }, + "configuration" : { }, + "model" : { + "people" : [ { + "id" : "2", + "tags" : "Element,Person", + "name" : "Collaborator manager", + "description" : "Data owner's representative controlling Envoy", + "relationships" : [ { + "id" : "26", + "tags" : "Relationship", + "sourceId" : "2", + "destinationId" : "13", + "description" : "Launches. Provides local dataset ShardDescriptors" + }, { + "id" : "27", + "tags" : "Relationship", + "sourceId" : "2", + "destinationId" : "5", + "description" : "Launches. Provides local dataset ShardDescriptors" + } ], + "group" : "Control", + "location" : "Unspecified" + }, { + "id" : "1", + "tags" : "Element,Person", + "name" : "Data scientist", + "description" : "A person or group of people using OpenFL", + "relationships" : [ { + "id" : "25", + "tags" : "Relationship", + "sourceId" : "1", + "destinationId" : "6", + "description" : "Provides FL Plans, Tasks, Models, DataLoaders" + }, { + "id" : "23", + "tags" : "Relationship", + "sourceId" : "1", + "destinationId" : "5", + "description" : "Controls Fedarations. Provides FL plans, tasks, models, data" + } ], + "group" : "Control", + "location" : "Unspecified" + }, { + "id" : "3", + "tags" : "Element,Person", + "name" : "Director manager", + "relationships" : [ { + "id" : "29", + "tags" : "Relationship", + "sourceId" : "3", + "destinationId" : "5", + "description" : "Launches. Sets up global Federation settings" + }, { + "id" : "28", + "tags" : "Relationship", + "sourceId" : "3", + "destinationId" : "9", + "description" : "Launches. Sets up global Federation settings" + } ], + "group" : "Control", + "location" : "Unspecified" + } ], + "softwareSystems" : [ { + "id" : "4", + "tags" : "Element,Software System", + "name" : "Governor", + "description" : "CCF-based system for corporate clients", + "relationships" : [ { + "id" : "24", + "tags" : "Relationship", + "sourceId" : "4", + "destinationId" : "5", + "description" : "Controls Fedarations" + } ], + "group" : "Control", + "location" : "Unspecified" + }, { + "id" : "5", + "tags" : "Element,Software System", + "name" : "OpenFL", + "description" : "An open framework for Federated Learning", + "location" : "Unspecified", + "containers" : [ { + "id" : "10", + "tags" : "Element,Container", + "name" : "Aggregator", + "description" : "Model server and collaborator orchestrator", + "relationships" : [ { + "id" : "40", + "tags" : "Relationship", + "sourceId" : "10", + "destinationId" : "20", + "description" : "Communicates" + }, { + "id" : "36", + "tags" : "Relationship", + "sourceId" : "10", + "destinationId" : "15", + "description" : "Sends tasks and initial tensors" + } ], + "group" : "Central node", + "components" : [ { + "id" : "11", + "tags" : "Element,Component", + "name" : "Task Assigner", + "description" : "Decides the policy for which collaborators should run FL tasks", + "size" : 0 + }, { + "id" : "12", + "tags" : "Element,Component", + "name" : "gRPC Server", + "size" : 0 + } ] + }, { + "id" : "15", + "tags" : "Element,Container", + "name" : "Collaborator", + "description" : "Actor executing tasks on local data inside one experiment", + "relationships" : [ { + "id" : "37", + "tags" : "Relationship", + "sourceId" : "15", + "destinationId" : "10", + "description" : "Sends locally tuned tensors and training metrics" + } ], + "group" : "Collaborator node", + "components" : [ { + "id" : "16", + "tags" : "Element,Component", + "name" : "Plugin Manager", + "size" : 0 + }, { + "id" : "21", + "tags" : "Element,Component", + "name" : "Framework Adapter", + "size" : 0 + }, { + "id" : "18", + "tags" : "Element,Component", + "name" : "Tensor Data Base", + "size" : 0 + }, { + "id" : "20", + "tags" : "Element,Component", + "name" : "gRPC Client", + "relationships" : [ { + "id" : "39", + "tags" : "Relationship", + "sourceId" : "20", + "destinationId" : "17", + "description" : "Invokes some tasks for the round" + } ], + "size" : 0 + }, { + "id" : "19", + "tags" : "Element,Component", + "name" : "TensorCodec", + "size" : 0 + }, { + "id" : "17", + "tags" : "Element,Component", + "name" : "Task Runner", + "size" : 0 + } ] + }, { + "id" : "6", + "tags" : "Element,Container", + "name" : "Python API component", + "description" : "A set of tools to setup register FL Experiments", + "relationships" : [ { + "id" : "30", + "tags" : "Relationship", + "sourceId" : "6", + "destinationId" : "9", + "description" : "Registers FL experiments" + } ], + "components" : [ { + "id" : "8", + "tags" : "Element,Component", + "name" : "Experiment Interface", + "size" : 0 + }, { + "id" : "7", + "tags" : "Element,Component", + "name" : "Federaion Interface", + "size" : 0 + } ] + }, { + "id" : "13", + "tags" : "Element,Container", + "name" : "Envoy", + "description" : "A long-living entity that can adapt a local data set and spawn collaborators", + "relationships" : [ { + "id" : "34", + "tags" : "Relationship", + "sourceId" : "13", + "destinationId" : "9", + "description" : "Communicates dataset info, Sends status updates" + }, { + "id" : "38", + "tags" : "Relationship", + "sourceId" : "13", + "destinationId" : "17", + "description" : "Provides tasks' defenitions" + }, { + "id" : "33", + "tags" : "Relationship", + "sourceId" : "13", + "destinationId" : "15", + "description" : "Creates an instance to maintain an FL experiment" + } ], + "group" : "Collaborator node", + "components" : [ { + "id" : "14", + "tags" : "Element,Component,Interface", + "name" : "Shard Descriptor", + "description" : "Data manager's interface aimed to unify data access", + "size" : 0 + } ] + }, { + "id" : "9", + "tags" : "Element,Container", + "name" : "Director", + "description" : "A long-living entity that can spawn aggregators", + "relationships" : [ { + "id" : "31", + "tags" : "Relationship", + "sourceId" : "9", + "destinationId" : "6", + "description" : "Sends information about the Federation. Returns training artifacts." + }, { + "id" : "35", + "tags" : "Relationship", + "sourceId" : "9", + "destinationId" : "13", + "description" : "Approves, Sends FL experiments" + }, { + "id" : "32", + "tags" : "Relationship", + "sourceId" : "9", + "destinationId" : "10", + "description" : "Creates an instance to maintain an FL experiment" + } ], + "group" : "Central node" + } ] + } ], + "customElements" : [ { + "id" : "22", + "tags" : "Element", + "name" : "Config file" + } ] + }, + "documentation" : { }, + "views" : { + "systemContextViews" : [ { + "softwareSystemId" : "5", + "key" : "SystemContext", + "paperSize" : "A4_Landscape", + "dimensions" : { + "width" : 3358, + "height" : 1454 + }, + "automaticLayout" : { + "implementation" : "Graphviz", + "rankDirection" : "TopBottom", + "rankSeparation" : 300, + "nodeSeparation" : 300, + "edgeSeparation" : 0, + "vertices" : false + }, + "enterpriseBoundaryVisible" : true, + "elements" : [ { + "id" : "1", + "x" : 2604, + "y" : 277 + }, { + "id" : "2", + "x" : 1854, + "y" : 277 + }, { + "id" : "3", + "x" : 1104, + "y" : 277 + }, { + "id" : "4", + "x" : 354, + "y" : 277 + }, { + "id" : "5", + "x" : 1479, + "y" : 877 + } ], + "relationships" : [ { + "id" : "29" + }, { + "id" : "27" + }, { + "id" : "24", + "vertices" : [ { + "x" : 954, + "y" : 681 + } ] + }, { + "id" : "23", + "vertices" : [ { + "x" : 2454, + "y" : 681 + } ] + } ] + } ], + "containerViews" : [ { + "softwareSystemId" : "5", + "key" : "Containers", + "dimensions" : { + "width" : 3104, + "height" : 2546 + }, + "externalSoftwareSystemBoundariesVisible" : true, + "elements" : [ { + "id" : "1", + "x" : 890, + "y" : 200 + }, { + "id" : "13", + "x" : 1740, + "y" : 1320 + }, { + "id" : "2", + "x" : 2470, + "y" : 1265 + }, { + "id" : "3", + "x" : 230, + "y" : 1270 + }, { + "id" : "15", + "x" : 1740, + "y" : 1855 + }, { + "id" : "6", + "x" : 880, + "y" : 760 + }, { + "id" : "9", + "x" : 880, + "y" : 1320 + }, { + "id" : "10", + "x" : 880, + "y" : 1855 + } ], + "relationships" : [ { + "id" : "28" + }, { + "id" : "26" + }, { + "id" : "37", + "vertices" : [ { + "x" : 1535, + "y" : 1940 + } ] + }, { + "id" : "25" + }, { + "id" : "36", + "vertices" : [ { + "x" : 1565, + "y" : 2090 + } ] + }, { + "id" : "35", + "vertices" : [ { + "x" : 1550, + "y" : 1530 + } ] + }, { + "id" : "34", + "vertices" : [ { + "x" : 1530, + "y" : 1360 + } ] + }, { + "id" : "33" + }, { + "id" : "32" + }, { + "id" : "31", + "vertices" : [ { + "x" : 1215, + "y" : 1185 + } ] + }, { + "id" : "30", + "vertices" : [ { + "x" : 995, + "y" : 1175 + } ] + } ] + } ], + "componentViews" : [ { + "key" : "Collaborator", + "automaticLayout" : { + "implementation" : "Graphviz", + "rankDirection" : "TopBottom", + "rankSeparation" : 300, + "nodeSeparation" : 300, + "edgeSeparation" : 0, + "vertices" : false + }, + "containerId" : "15", + "externalContainerBoundariesVisible" : true, + "elements" : [ { + "id" : "13", + "x" : 0, + "y" : 0 + }, { + "id" : "16", + "x" : 0, + "y" : 0 + }, { + "id" : "17", + "x" : 0, + "y" : 0 + }, { + "id" : "18", + "x" : 0, + "y" : 0 + }, { + "id" : "19", + "x" : 0, + "y" : 0 + }, { + "id" : "20", + "x" : 0, + "y" : 0 + }, { + "id" : "21", + "x" : 0, + "y" : 0 + }, { + "id" : "10", + "x" : 0, + "y" : 0 + } ], + "relationships" : [ { + "id" : "40" + }, { + "id" : "38" + }, { + "id" : "39" + } ] + }, { + "key" : "API", + "automaticLayout" : { + "implementation" : "Graphviz", + "rankDirection" : "TopBottom", + "rankSeparation" : 300, + "nodeSeparation" : 300, + "edgeSeparation" : 0, + "vertices" : false + }, + "containerId" : "6", + "externalContainerBoundariesVisible" : true, + "elements" : [ { + "id" : "7", + "x" : 0, + "y" : 0 + }, { + "id" : "8", + "x" : 0, + "y" : 0 + } ] + }, { + "key" : "Envoy", + "automaticLayout" : { + "implementation" : "Graphviz", + "rankDirection" : "TopBottom", + "rankSeparation" : 300, + "nodeSeparation" : 300, + "edgeSeparation" : 0, + "vertices" : false + }, + "containerId" : "13", + "externalContainerBoundariesVisible" : true, + "elements" : [ { + "id" : "14", + "x" : 0, + "y" : 0 + } ] + } ], + "configuration" : { + "branding" : { }, + "styles" : { }, + "themes" : [ "https://static.structurizr.com/themes/default/theme.json" ], + "terminology" : { }, + "lastSavedView" : "Containers" + } + } +} \ No newline at end of file From b25c4f80058ecd2bec229b65f377d094f2d2343e Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 31 Aug 2021 12:40:10 +0300 Subject: [PATCH 52/54] alexey's changes --- docs/openfl.rst | 2 +- docs/source/openfl/components.rst | 14 +++++++++++--- docs/source/openfl/plugins.rst | 5 +++++ docs/structurizer_dsl/workspace.dsl | 4 ++-- 4 files changed, 19 insertions(+), 6 deletions(-) diff --git a/docs/openfl.rst b/docs/openfl.rst index 0270fc3ef2..40e1142ee0 100644 --- a/docs/openfl.rst +++ b/docs/openfl.rst @@ -9,5 +9,5 @@ OpenFL structure :maxdepth: 4 source/openfl/components - source/openfl/communication + .. source/openfl/communication source/openfl/plugins \ No newline at end of file diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index fa9bbc0ee2..93ff135d70 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -48,9 +48,15 @@ Long-living components Director ========== -*Director* is a long-living entity; it is a central node of the federation and may take in several experiments (with the same data interface). When an experiment is reported director starts an aggregator and sends the experiment data to involved envoys; during the experiment, Director oversees the aggregator and updates the user on the status of the experiment. -*Director* runs two services: one for frontend users and another one for envoys. It can distribute an experiment reported with the frontend API across the federation and communicate back a trained model snapshot and metrics. -*Director* support several concurrent frontend connections (yet experiments are run one by one) +*Director* is a long-living entity; it is a central node of the federation and may take in several experiments +(with the same data interface). When an experiment is reported director starts an aggregator and sends +the experiment data to involved envoys; during the experiment, Director oversees the aggregator and updates +the user on the status of the experiment. +*Director* runs two services: one for frontend users and another one for envoys. It can distribute an experiment +reported with the frontend API across the federation and communicate back a trained model snapshot and metrics. +*Director* supports several concurrent frontend connections (yet experiments are run one by one). +To learn more about using the |productName| frontend Python API, please refer to :ref:`interactive_api` + Envoy ========= @@ -59,3 +65,5 @@ Envoy There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* needs exactly one `Shard Descriptor `_ to run. When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. + +*Diagram* \ No newline at end of file diff --git a/docs/source/openfl/plugins.rst b/docs/source/openfl/plugins.rst index 61dc84e43e..2bc06bac45 100644 --- a/docs/source/openfl/plugins.rst +++ b/docs/source/openfl/plugins.rst @@ -11,6 +11,11 @@ framework_adapter_ serializer_plugin_ + +|productName| is designed to be a flexible and extensible framework. Plugins are interchangeable parts of +|productName| components. Different plugins support varying usage scenarios. |productName| users are free to provide +their implementations of |productName| plugins to support desired behavior. + .. _framework_adapter: Framework Adapter diff --git a/docs/structurizer_dsl/workspace.dsl b/docs/structurizer_dsl/workspace.dsl index b6af7bd8cb..bfc5de60bb 100755 --- a/docs/structurizer_dsl/workspace.dsl +++ b/docs/structurizer_dsl/workspace.dsl @@ -8,7 +8,7 @@ workspace "OpenFL" "An open framework for Federated Learning." { governor = softwareSystem "Governor" "CCF-based system for corporate clients" } openfl = softwareSystem "OpenFL" "An open framework for Federated Learning" { - apiLayer = container "Python API component" "A set of tools to setup register FL Experiments" { + apiLayer = container "Python API component" "A set of tools to setup and register FL Experiments" { federationInterface = component "Federaion Interface" experimentInterface = component "Experiment Interface" # TaskInterface = component "" @@ -22,7 +22,7 @@ workspace "OpenFL" "An open framework for Federated Learning." { } } group "Collaborator node" { - envoy = container "Envoy" "A long-living entity that can adapt a local data set and spawn collaborators" { + envoy = container "Envoy" "A long-living entity that can adapt a local dataset and spawn collaborators" { shardDescriptor = component "Shard Descriptor" "Data manager's interface aimed to unify data access" { tags "Interface" } From 1b2dbd8a0306aa8143f914449edb4d3d0f61c41e Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 31 Aug 2021 12:43:59 +0300 Subject: [PATCH 53/54] added a static diagram --- docs/source/openfl/components.rst | 2 +- docs/source/openfl/static_diagram.svg | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) create mode 100644 docs/source/openfl/static_diagram.svg diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 93ff135d70..1ee87fd755 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -66,4 +66,4 @@ There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* n `Shard Descriptor `_ to run. When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. -*Diagram* \ No newline at end of file +.. figure:: static_diagram.svg \ No newline at end of file diff --git a/docs/source/openfl/static_diagram.svg b/docs/source/openfl/static_diagram.svg new file mode 100644 index 0000000000..8f816d4e88 --- /dev/null +++ b/docs/source/openfl/static_diagram.svg @@ -0,0 +1 @@ +Friday, 27 August 2021, 16:25 Moscow Standard TimeContainer diagram for OpenFLOpenFL[Software System]Central node-Collaborator node-Data scientist[Person]A person or group of peopleusing OpenFLEnvoy[Container]A long-living entity that can adapt alocal data set and spawncollaborators+Collaborator manager[Person]Data owner's representativecontrolling EnvoyDirector manager[Person]-Collaborator[Container]Actor executing tasks on local datainside one experiment+Python API component[Container]A set of tools to setup register FLExperiments+Director[Container]A long-living entity that can spawnaggregators-Aggregator[Container]Model server and collaboratororchestrator-Launches. Setsup globalFederationsettings--Remove link.Link options.Launches.Provides localdatasetShardDescriptors--Remove link.Link options.Sends locallytuned tensorsand trainingmetrics--Remove vertex.Remove link.Link options.Provides FL Plans,Tasks, Models,DataLoaders--Remove link.Link options.Sends tasks andinitial tensors--Remove vertex.Remove link.Link options.Approves, SendsFL experiments--Remove vertex.Remove link.Link options.Communicatesdataset info,Sends statusupdates--Remove vertex.Remove link.Link options.Creates aninstance tomaintain an FLexperiment--Remove link.Link options.Creates aninstance tomaintain an FLexperiment--Remove link.Link options.Sendsinformationabout theFederation.Returns trainingartifacts.--Remove vertex.Remove link.Link options.Registers FLexperiments--Remove vertex.Remove link.Link options. \ No newline at end of file From 4d4361006e22593a79d37c4838739d648c5b8171 Mon Sep 17 00:00:00 2001 From: Igor Davidyuk Date: Tue, 31 Aug 2021 12:50:49 +0300 Subject: [PATCH 54/54] static diagram section --- docs/source/openfl/components.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/source/openfl/components.rst b/docs/source/openfl/components.rst index 1ee87fd755..84e8486ea7 100644 --- a/docs/source/openfl/components.rst +++ b/docs/source/openfl/components.rst @@ -10,8 +10,9 @@ .. toctree:: :maxdepth: 2 - `Spawning`_ - `Long-living`_ + `Spawning components`_ + `Long-living components`_ + `Static Diagram`_ .. _openfl_spawning_components: @@ -66,4 +67,8 @@ There is one to one mapping between *Envoys* and Dataset shards: every *Envoy* n `Shard Descriptor `_ to run. When the *Director* starts an experiment, *Envoy* will accept the experiment workspace, prepare the environment and start a *Collaborator*. + +Static Diagram +############# + .. figure:: static_diagram.svg \ No newline at end of file