Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Director envoy #104

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
187 commits
Select commit Hold shift + click to select a range
a60ed6f
Merge pull request #2 from intel/develop
igor-davidyuk Apr 23, 2021
c331ae0
Add new log level - METRIC. Write metrics on aggregator to tensorboard
maradionov May 20, 2021
4814acd
Refactor code. Add tensorboardX to setup.py
maradionov May 21, 2021
f46027e
fix flake8
maradionov May 21, 2021
2ba5fca
Return back global logger in native.py
maradionov May 21, 2021
3ed9f51
Delete raising exceptions in addLoggingLevel function
maradionov May 21, 2021
6913782
Modify log level checking
maradionov May 21, 2021
c286061
Apply suggestions from code review
maradionov May 21, 2021
6303568
Remove log level checking
maradionov May 24, 2021
574a160
Pass write_metric function through plan
maradionov May 24, 2021
769a3cb
Merge branch 'develop' into logging_updates
maradionov May 27, 2021
747b5c5
Pass write metric function also in workspace
maradionov May 27, 2021
eff999b
Add docs for log metric callback
maradionov May 27, 2021
dabf55a
Clear notebook output
maradionov May 27, 2021
36fb16d
Merge branch 'develop' into logging_updates
maradionov May 27, 2021
293d9a4
Add cfssl_scheme.md
maradionov Apr 14, 2021
8f605d5
Add diagram as png
maradionov Apr 14, 2021
b0139ab
Add description of CA to clients tls connestion
maradionov Apr 21, 2021
08d53d8
Fix cfssl scheme
maradionov Apr 22, 2021
c4c10bb
Merge pull request #6 from intel/develop
igor-davidyuk May 31, 2021
b5c11e4
Set 'metric' tag to train task result in interactive api
maradionov May 31, 2021
bd8afde
Log metric task result on aggregator additionally.
maradionov May 31, 2021
7c6c770
fix flake8
maradionov May 31, 2021
3916a1d
Remove default tensorboard option in defaults plan.yaml
maradionov May 31, 2021
5197e84
Remove tensorboardX dependency
maradionov May 31, 2021
79e0531
Fix some pep8 issues. Remove debugging grpc code.
maradionov Jun 1, 2021
179fd0e
VOC shard descriptor
igor-davidyuk Jun 1, 2021
95fcf39
Remove debugging grpc code.
maradionov Jun 1, 2021
0f3cc67
First draft working pki with step-ca
maradionov Jun 4, 2021
33b1cb5
Initial test experiments for new python API
aleksandr-mokrov May 17, 2021
d97ad9d
Director - envoy initial commit
aleksandr-mokrov May 28, 2021
738f6cd
Base shard descriptor
aleksandr-mokrov Jun 4, 2021
16022ae
Pass shard descriptor to get_collaborator and addition API
aleksandr-mokrov Jun 4, 2021
d3c3bab
Reorganize project structure for director and collaborator manager
aleksandr-mokrov Jun 4, 2021
8e3f87b
Move ca commands to separate file
maradionov Jun 4, 2021
1618c81
first LL adaption approach
igor-davidyuk Jun 4, 2021
9c16049
Merge branch 'll-merge' into director-envoy
igor-davidyuk Jun 4, 2021
ab684ba
Merge pull request #7 from aleksandr-mokrov/director-envoy
igor-davidyuk Jun 4, 2021
fa476f3
Merge branch 'develop' into logging_updates
maradionov Jun 7, 2021
032c12b
fix flake8
maradionov Jun 8, 2021
91d5f9c
First working example
igor-davidyuk Jun 9, 2021
3016bcc
Prepare an aggregator workspace and pass initial tensors
aleksandr-mokrov Jun 9, 2021
57436b3
Merge branch 'develop' into logging_updates
maradionov Jun 10, 2021
6fec331
fix flake8
maradionov Jun 10, 2021
fe776bf
Merge branch 'model-proto-ll' into director-envoy
igor-davidyuk Jun 10, 2021
75b9d8d
Merge pull request #8 from aleksandr-mokrov/director-envoy
igor-davidyuk Jun 10, 2021
243c54a
Adapted launching aggregator on director side
igor-davidyuk Jun 11, 2021
4399c5c
requirements fix
igor-davidyuk Jun 11, 2021
ee78fc8
Updated the Kvasir notebook
igor-davidyuk Jun 11, 2021
f9c218a
Add tensorboard to torch_cnn_mnist requirements
maradionov Jun 15, 2021
d63d4d7
Fix typo
maradionov Jun 15, 2021
4a5d8fb
Initial test experiments for new python API
aleksandr-mokrov May 17, 2021
fb4dc22
Director - envoy initial commit
aleksandr-mokrov May 28, 2021
37cb2e9
Base shard descriptor
aleksandr-mokrov Jun 4, 2021
3a1e2d5
Pass shard descriptor to get_collaborator and addition API
aleksandr-mokrov Jun 4, 2021
5538443
Reorganize project structure for director and collaborator manager
aleksandr-mokrov Jun 4, 2021
8ce29c5
VOC shard descriptor
igor-davidyuk Jun 1, 2021
172c812
first LL adaption approach
igor-davidyuk Jun 4, 2021
7137771
Prepare an aggregator workspace and pass initial tensors
aleksandr-mokrov Jun 9, 2021
40ac7fd
First working example
igor-davidyuk Jun 9, 2021
d8cd92a
CLI commands
aleksandr-mokrov Jun 17, 2021
09156b1
Merge branch 'director-envoy' into logging_updates
aleksandr-mokrov Jun 17, 2021
b5282d6
Merge pull request #3 from maradionov/logging_updates
aleksandr-mokrov Jun 17, 2021
c5a9f7b
Add tensorboard
aleksandr-mokrov Jun 18, 2021
63ae7d1
Flake8 and logging
aleksandr-mokrov Jun 18, 2021
76b4d93
Flake8
aleksandr-mokrov Jun 18, 2021
57134ea
Setup logging for experiment
aleksandr-mokrov Jun 18, 2021
611ac7c
Setup logging for experiment
aleksandr-mokrov Jun 18, 2021
5215a27
Change requirements for grpcio
aleksandr-mokrov Jun 18, 2021
6509327
Change requirements for grpcio
aleksandr-mokrov Jun 18, 2021
aa2dbb1
Change requirements for grpcio
aleksandr-mokrov Jun 18, 2021
7309737
Remove unused kwargs
aleksandr-mokrov Jun 18, 2021
f173298
Resolve aggregator serving
aleksandr-mokrov Jun 18, 2021
66bae5b
Resolve aggregator serving
aleksandr-mokrov Jun 18, 2021
95dca0c
flake8
aleksandr-mokrov Jun 19, 2021
83ccc2b
Merge branch 'director-envoy' of https://github.com/aleksandr-mokrov/…
igor-davidyuk Jun 21, 2021
4be4097
Working notebook
igor-davidyuk Jun 22, 2021
de67de4
Create Updated_Kvasir_with_Director.ipynb
igor-davidyuk Jun 22, 2021
a470212
lost fixes after merge
igor-davidyuk Jun 23, 2021
f2ee9ab
fixed notebook experimnet start
igor-davidyuk Jun 23, 2021
fdc3c16
Infrastructure for model retrieving
igor-davidyuk Jun 24, 2021
fe5d708
fixes to the director example
igor-davidyuk Jun 24, 2021
215c7d5
added requirements file for envoy
igor-davidyuk Jun 24, 2021
cf08c0c
Merge branch 'model-proto-ll' of https://github.com/igor-davidyuk/ope…
igor-davidyuk Jun 24, 2021
cff9e72
fixed metric logging
igor-davidyuk Jun 25, 2021
3b3adc4
Merge pull request #9 from igor-davidyuk/ll-get-trained-model
igor-davidyuk Jun 25, 2021
d9b915c
pkg-resources bug fix
igor-davidyuk Jun 25, 2021
529a2f2
pkg-resources fix
igor-davidyuk Jun 25, 2021
b1dfa1b
pip fix
igor-davidyuk Jun 25, 2021
a9f510e
Merge branch 'model-proto-ll' of https://github.com/igor-davidyuk/ope…
igor-davidyuk Jun 25, 2021
9f4846c
first try to run and stop aggregator async
igor-davidyuk Jun 28, 2021
4a2c027
added workspace cleaning in director and envoy
igor-davidyuk Jun 29, 2021
f5fd39a
removing workspace archive on Frontend API
igor-davidyuk Jun 29, 2021
107aee9
Update pki scheme to step-ca flow
maradionov Jun 11, 2021
8380aed
Draft ca integration
maradionov Jun 30, 2021
05892bd
Add ca to utils
maradionov Jul 1, 2021
e131206
Move ca funtions to separate component
maradionov Jul 1, 2021
7a28f85
Add certificates for api layer director and collaborator-manager
maradionov Jul 1, 2021
46e5047
Pass disable_tls option to all grpc clients and servers
maradionov Jul 2, 2021
a00bf21
Merge branch 'ca_integration' of https://github.com/maradionov/openfl…
igor-davidyuk Jul 2, 2021
1d692d6
New CLI and fixes for the step-ca component
igor-davidyuk Jul 5, 2021
cdb7bc3
Merge pull request #2 from igor-davidyuk/model-proto-ll
aleksandr-mokrov Jul 6, 2021
73731dc
flake8 fixes
igor-davidyuk Jul 6, 2021
fd11f49
Pass certificates name to collaborator and aggregator
maradionov Jul 6, 2021
3820673
Merge pull request #4 from igor-davidyuk/model-proto-ll
aleksandr-mokrov Jul 6, 2021
0f43fea
Merge branch 'director-envoy' into ca_integration
maradionov Jul 6, 2021
0a588fd
Fix rank_worldsize value
maradionov Jul 6, 2021
dcd33cf
Fix flake8
maradionov Jul 6, 2021
da095ae
flake8
aleksandr-mokrov Jul 7, 2021
1ad77bd
Separate install and run command for pki
maradionov Jul 9, 2021
7ad149d
Merge branch 'director-envoy' into ca_integration
maradionov Jul 9, 2021
28856da
Stream metrics to frontend RPC
igor-davidyuk Jul 9, 2021
6829446
UnpackWorkspace context manager
igor-davidyuk Jul 9, 2021
b9bc7b9
local changes
igor-davidyuk Jul 9, 2021
1cded21
Merge branch 'step_pki' into ca_integration
maradionov Jul 9, 2021
eaf04d4
Add step-ca pki mermaid scheme to documentation.
maradionov Jul 9, 2021
2d97e4a
Merge branch 'develop' into director-envoy
aleksandr-mokrov Jul 12, 2021
92918a7
Merge branch 'director-envoy' into pr/6
aleksandr-mokrov Jul 12, 2021
67ff80e
Merge pull request #7 from aleksandr-mokrov/pr/6
aleksandr-mokrov Jul 12, 2021
4723241
flake8
aleksandr-mokrov Jul 12, 2021
68aa44d
Merge pull request #8 from aleksandr-mokrov/pr/6
aleksandr-mokrov Jul 12, 2021
23fe1aa
Merge branch 'develop' into director-envoy
aleksandr-mokrov Jul 12, 2021
4e85590
Resolve merge conflicts
aleksandr-mokrov Jul 12, 2021
92cdd08
Merge branch 'director-envoy' into ca_integration
aleksandr-mokrov Jul 12, 2021
4ea604c
Merge pull request #5 from maradionov/ca_integration
aleksandr-mokrov Jul 12, 2021
7ae9456
Backward compatibility
aleksandr-mokrov Jul 12, 2021
ea7cfb0
Fix cert api
aleksandr-mokrov Jul 13, 2021
34fb1a7
make cert paths absolute
igor-davidyuk Jul 13, 2021
a02fb49
Merge pull request #9 from igor-davidyuk/director-envoy
aleksandr-mokrov Jul 13, 2021
f3abe86
Fix simultaneous requirements installation
aleksandr-mokrov Jul 13, 2021
8c4985c
Change default and max token and certificate durations
aleksandr-mokrov Jul 15, 2021
7aec2d7
Import order
aleksandr-mokrov Jul 15, 2021
3f79276
First implementation of API registry
igor-davidyuk Jul 16, 2021
011b617
Added metric logging on frontend
igor-davidyuk Jul 21, 2021
94506fd
Tensorboard logs: removed hardcoded name
igor-davidyuk Jul 21, 2021
f9edce1
Add description of pki workflow to documentation
maradionov Jul 21, 2021
45d0463
Merge pull request #11 from igor-davidyuk/api-registry-2
aleksandr-mokrov Jul 22, 2021
b63e413
Merge branch 'develop' into director-envoy
aleksandr-mokrov Jul 22, 2021
853edc3
Health check and flake8
aleksandr-mokrov Jul 22, 2021
df92aff
Fix pb2 import
aleksandr-mokrov Jul 22, 2021
a17f4c6
Add info about token live time and other.
maradionov Jul 22, 2021
d1bdee7
Merge pull request #12 from maradionov/ca_integration
aleksandr-mokrov Jul 23, 2021
44643ff
Add check for bin downloading responce
maradionov Jul 23, 2021
5021f54
Change link
aleksandr-mokrov Jul 23, 2021
822bcc7
Add ca_path as optional to certify command
aleksandr-mokrov Jul 23, 2021
8fb5e9c
Add ca_path as optional to certify command
aleksandr-mokrov Jul 23, 2021
ad6fc19
Replaced old shard registry by new
aleksandr-mokrov Jul 23, 2021
69b5b07
Fixes, renaming and error handling
igor-davidyuk Jul 26, 2021
8597ee8
Rename collaborator_manager -> envoy
igor-davidyuk Jul 27, 2021
ca58c4f
Merge pull request #13 from maradionov/ca_integration
aleksandr-mokrov Jul 27, 2021
e3d22ea
Merge pull request #14 from igor-davidyuk/director-unittests
aleksandr-mokrov Jul 27, 2021
4a64061
Configure a director address,fix certificates path checking, add dire…
aleksandr-mokrov Jul 27, 2021
d29be4b
Collaborator-manager renamed to envoy.
aleksandr-mokrov Jul 27, 2021
9c185a5
_read_image_ids is astatic method. No reason to call it from class.
aleksandr-mokrov Jul 27, 2021
105566b
Break after successful installation
aleksandr-mokrov Jul 27, 2021
5e315ba
Remove extra method.
aleksandr-mokrov Jul 27, 2021
93132d3
Additional certificate path checks.
aleksandr-mokrov Jul 27, 2021
fb60c8f
Separate a transport layer from the director. Running server details …
aleksandr-mokrov Jul 28, 2021
a936d0a
Director redesign
aleksandr-mokrov Jul 28, 2021
842e3c3
Fix pr comments
aleksandr-mokrov Jul 28, 2021
8dd2373
flake8
aleksandr-mokrov Jul 28, 2021
4672b91
Cleanup gitignore
aleksandr-mokrov Jul 28, 2021
9ead330
Cleanup
aleksandr-mokrov Jul 28, 2021
4c158b0
Small fixes by comments
aleksandr-mokrov Jul 28, 2021
118e3e2
Generate default experiment name with timestamp
aleksandr-mokrov Jul 28, 2021
f521f6a
Assign only online collaborator to experiment by default. GetRegister…
aleksandr-mokrov Jul 28, 2021
b480920
tests for components are added
igor-davidyuk Jul 28, 2021
52d44f6
created default shard_descriptor
igor-davidyuk Jul 29, 2021
56e6c5c
Refactor tensorboard metrics + default envoy files
igor-davidyuk Jul 29, 2021
b7de839
Add Dmitriy fixes for ca part
maradionov Jul 29, 2021
7ea94b8
Merge pull request #15 from igor-davidyuk/director-unittests
aleksandr-mokrov Jul 29, 2021
9ab7184
Merge pull request #16 from maradionov/ca_integration
aleksandr-mokrov Jul 29, 2021
7c89792
Fixes by pr comments
aleksandr-mokrov Jul 29, 2021
bdae2ee
Fixes by pr comments
aleksandr-mokrov Jul 29, 2021
48df455
Collaborator manager -> Envoy
aleksandr-mokrov Jul 30, 2021
1c8b487
Up -> up
aleksandr-mokrov Jul 30, 2021
c03864f
Collaborator selection for experiment by default changed
aleksandr-mokrov Jul 30, 2021
fc44f73
Gramma
aleksandr-mokrov Jul 30, 2021
2a0aa5f
Comments and doc strings
igor-davidyuk Jul 30, 2021
fc0a438
use delimiter for token
dmitryagapov Jul 30, 2021
c5584dc
fix typo
igor-davidyuk Jul 30, 2021
6fac3a2
typo fix
igor-davidyuk Jul 30, 2021
c69c156
Merge pull request #17 from igor-davidyuk/director-unittests
aleksandr-mokrov Jul 30, 2021
33db9f9
Merge pull request #19 from dmitryagapov/fix-pki-cert-token-creation
aleksandr-mokrov Jul 30, 2021
23f5035
Fix token encoding
aleksandr-mokrov Jul 30, 2021
1f10590
flake8
aleksandr-mokrov Jul 30, 2021
58c4479
Start federation fixes
aleksandr-mokrov Jul 30, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
*.egg-info
*.pkl
__pycache__
/build
/dist
.vscode
.ipynb_checkpoints
venv/*
.idea

*.jpg
*.crt
*.key
1 change: 1 addition & 0 deletions docs/advanced_topics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ Advanced Topics
compression_settings
overriding_agg_fn
bash_autocomplete_activation
log_metric_callback


69 changes: 69 additions & 0 deletions docs/log_metric_callback.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
.. # Copyright (C) 2020-2021 Intel Corporation
.. # SPDX-License-Identifier: Apache-2.0

.. _log_metric_callback:
===============================
Metric logging callback
===============================

-------------------------------
Usage
-------------------------------
|productName| allows developers to use custom metric logging functions. This function will call on aggregator node.
In order to define such function, you should:

Python API
==========
Define function with follow signature:

.. code-block:: python

def callback_name(node_name, task_name, metric_name, metric, round_number):
"""
Write metric callback

Args:
node_name (str): Name of node, which generate metric
task_name (str): Name of task
metric_name (str): Name of metric
metric (np.ndarray): Metric value
round_number (int): Round number
"""
your code
CLI
====

Define callback function similar way like in python api in ``src`` folder of your workspace. And provide a way to your function in ``aggregator`` part of ``plan/plan.yaml`` file in your workspace, use ``log_metric_callback`` key:

.. code-block:: yaml

aggregator :
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
init_state_path : save/torch_cnn_mnist_init.pbuf
best_state_path : save/torch_cnn_mnist_best.pbuf
last_state_path : save/torch_cnn_mnist_last.pbuf
rounds_to_train : 10
log_metric_callback :
template : src.mnist_utils.callback_name



Example
=======================

Below is an example of a log metric callback, which writes metric values to tensorboard

.. code-block:: python

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5)


def write_metric(node_name, task_name, metric_name, metric, round_number):
writer.add_scalar("{}/{}/{}".format(node_name, task_name, metric_name),
metric, round_number)

Full implementation can be found in ``openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb`` and in ``torch_cnn_mnist`` workspace
20 changes: 20 additions & 0 deletions docs/mermaid/pki_scheme.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
sequenceDiagram
Title: Collaborator Certificate Signing Flow
participant A as Aggregator
participant CA as CA
participant C as Collaborator
CA->>CA: 1. Create CA:<br>`step ca init --password-file pass_file`
CA->>CA: 2. Up HTTPS CA server:<br>`step_ca ca_config.json`
CA->>CA: 3. Generate JWK pair:<br>`step crypto jwk create pub.json priv.json --password-file pass_file`
CA->>CA: 4. Get JWT for aggregator:<br>`step ca token localhost --key priv.json --password-file pass_file --ca-url ca_url`
CA->>A: 5. Copy JWT to aggregator.
A->>CA: 6. Certify node:<br>`step ca certificate localhost agg.crt agg.key --token AbC1d2E..`
Note over A,CA: Get agg.crt
CA->>CA: 7. Get JWT for collaborator:<br>`step ca token col_name --key priv.json --password-file pass_file --ca-url ca_url`
CA->>C: 8. Copy JWT to collaborator.
C->>CA: 9. Certify node:<br>`step ca certificate col_name col_name.crt col_name.key --token AbC1d2E..`
Note over C,CA: Get col_name.crt
CA->>A: 10. Copy root_ca.crt to aggregator
Note over A,CA: This could be done at step 5 with token
CA->>C: 11. Copy root_ca.crt to collaborator
Note over C,CA: This could be done at step 8 with token
11 changes: 9 additions & 2 deletions docs/running_the_federation.certificates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ Therefore, security keys and certificates will need to be created for the
aggregator and collaborators
to negotiate the connection securely. For the :ref:`Hello Federation <running_the_federation>` demo
we will run the aggregator and collaborators on the same localhost server
so these configuration steps just need to be done once on that machine.
so these configuration steps just need to be done once on that machine. We have two pki
workflows: manual and semi-automatic (with step-ca).

.. note::

Expand All @@ -20,10 +21,16 @@ so these configuration steps just need to be done once on that machine.
.. _install_certs:

.. kroki:: mermaid/CSR_signing.mmd
:caption: Certificate generation and signing
:caption: Manual certificate generation and signing
:align: center
:type: mermaid

.. _install_certs:

.. kroki:: mermaid/pki_scheme.mmd
:caption: Step-ca certificate generation and signing
:align: center
:type: mermaid


.. _install_certs_agg:
Expand Down
52 changes: 52 additions & 0 deletions docs/running_the_federation.interactive_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,58 @@ Python Environment
===================
Create a virtual Python environment. Please, install only packages that are required for conducting the experiment, since Python environment will be replicated on collaborator nodes.

******************************************
Certification
******************************************
If you have trusted workspace and connection should not be encrypted you can use :code:`disable_tls` option while starting expirement.
Otherwise it is necessary to certify each node participating in the federation. Certificates allow to use mutual tls connection between nodes.
You can certify nodes by your own pki system or use pki provided by OpenFL. It is based on `step-ca <https://github.com/smallstep/certificates>`_
as a server and `step <https://github.com/smallstep/cli>`_ as a client utilities. They are downloaded from github during workspace setup. Regardless of the certification method,
paths to certificates on each node are provided at start of experiment. Pki workflow from OpenFL will be discussed below.

OpenFL PKI workflow
===================
Openfl PKI pipeline asumes creating local CA with https server which listen signing requests.
Certificates from each node can be signed by requesting to CA server with special token.
Token must be copied to each node by some secure way. Each step is considered in detail below.

1. Create CA, i.e create root key pair, CA server config and other.
.. code-block:: console

$ fx pki install -p </path/to/ca/dir> --password <123> --ca-url <host:port>
| :code:`-p` - path to folder, which will contain ca files.
| :code:`--password` - password that will encrypts some ca files.
| :code:`--ca-url` - host and port which ca server will listen
This command will also download `step-ca <https://github.com/smallstep/certificates>`_ and `step <https://github.com/smallstep/cli>`_ binaries from github.

2. Run CA https server.
.. code-block:: console

$ fx pki run -p </path/to/ca/dir>
| :code:`-p` - path to folder, which will contain ca files.

3. Get token for some node.

.. code-block:: console

$ fx pki get-token -n <subject>
| :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node

Run this command on ca side, from ca folder. Output is a token which contains JWT (json web token) from CA server and CA
root certificate concatenated together. This JWT have twenty-four hours time-to-live.

4. Copy token to node side (director or envoy) by some secure channel and run certify command.
.. code-block:: console

$ fx pki certify -n <subject> -t <token>
| :code:`-n` - subject name, fqdn for director, collaborator name for envoy or api name for api-layer node
| :code:`-t` - output token from previous command
This command call step client, to connect to CA server over https.
Https is provided by root certificate which was copy with JWT.
Server authenticates client by JWT and client authenticates server by root certificate.

Now signed certificate and private key are stored on current node. Signed certificate has one year time-to-live. You should certify all node that will participate in federation: director, all envoys and api-layer node.

******************************************
Defining a Federated Learning Experiment
******************************************
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"After importing the required packages, the next step is setting up our openfl workspace. To do this, simply run the `fx.init()` command as follows:"
"After importing the required packages, the next step is setting Up our openfl workspace. To do this, simply run the `fx.init()` command as follows:"
]
},
{
Expand Down
58 changes: 32 additions & 26 deletions openfl-tutorials/Federated_PyTorch_UNET_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,11 @@
" def __init__(self, data_path, collaborator_count, collaborator_num, is_validation):\n",
" self.images_path = './data/segmented-images/images/'\n",
" self.masks_path = './data/segmented-images/masks/'\n",
" self.images_names = [img_name for img_name in sorted(listdir(\n",
" self.images_path)) if len(img_name) > 3 and img_name[-3:] == 'jpg']\n",
" self.images_names = [\n",
" img_name\n",
" for img_name in sorted(listdir(self.images_path))\n",
" if len(img_name) > 3 and img_name[-3:] == 'jpg'\n",
" ]\n",
"\n",
" self.images_names = self.images_names[collaborator_num:: collaborator_count]\n",
" self.is_validation = is_validation\n",
Expand Down Expand Up @@ -267,9 +270,9 @@
" return score.sum()\n",
"\n",
"\n",
"class double_conv(nn.Module):\n",
"class DoubleConv(nn.Module):\n",
" def __init__(self, in_ch, out_ch):\n",
" super(double_conv, self).__init__()\n",
" super(DoubleConv, self).__init__()\n",
" self.in_ch = in_ch\n",
" self.out_ch = out_ch\n",
" self.conv = nn.Sequential(\n",
Expand All @@ -286,33 +289,36 @@
" return x\n",
"\n",
"\n",
"class down(nn.Module):\n",
"class Down(nn.Module):\n",
" def __init__(self, in_ch, out_ch):\n",
" super(down, self).__init__()\n",
" super(Down, self).__init__()\n",
" self.mpconv = nn.Sequential(\n",
" nn.MaxPool2d(2), double_conv(in_ch, out_ch))\n",
" nn.MaxPool2d(2),\n",
" DoubleConv(in_ch, out_ch)\n",
" )\n",
"\n",
" def forward(self, x):\n",
" x = self.mpconv(x)\n",
" return x\n",
"\n",
"\n",
"class up(nn.Module):\n",
"class Up(nn.Module):\n",
" def __init__(self, in_ch, out_ch, bilinear=False):\n",
" super(up, self).__init__()\n",
" super(Up, self).__init__()\n",
" self.in_ch = in_ch\n",
" self.out_ch = out_ch\n",
" if bilinear:\n",
" self.up = nn.Upsample(\n",
" scale_factor=2, mode=\"bilinear\", align_corners=True)\n",
" else:\n",
" self.up = nn.ConvTranspose2d(\n",
" in_ch, in_ch // 2, 2, stride=2\n",
" self.Up = nn.Upsample(\n",
" scale_factor=2,\n",
" mode=\"bilinear\",\n",
" align_corners=True\n",
" )\n",
" self.conv = double_conv(in_ch, out_ch)\n",
" else:\n",
" self.Up = nn.ConvTranspose2d(in_ch, in_ch // 2, 2, stride=2)\n",
" self.conv = DoubleConv(in_ch, out_ch)\n",
"\n",
" def forward(self, x1, x2):\n",
" x1 = self.up(x1)\n",
" x1 = self.Up(x1)\n",
" diffY = x2.size()[2] - x1.size()[2]\n",
" diffX = x2.size()[3] - x1.size()[3]\n",
"\n",
Expand All @@ -327,15 +333,15 @@
"class UNet(nn.Module):\n",
" def __init__(self, n_channels=3, n_classes=1):\n",
" super().__init__()\n",
" self.inc = double_conv(n_channels, 64)\n",
" self.down1 = down(64, 128)\n",
" self.down2 = down(128, 256)\n",
" self.down3 = down(256, 512)\n",
" self.down4 = down(512, 1024)\n",
" self.up1 = up(1024, 512)\n",
" self.up2 = up(512, 256)\n",
" self.up3 = up(256, 128)\n",
" self.up4 = up(128, 64)\n",
" self.inc = DoubleConv(n_channels, 64)\n",
" self.down1 = Down(64, 128)\n",
" self.down2 = Down(128, 256)\n",
" self.down3 = Down(256, 512)\n",
" self.down4 = Down(512, 1024)\n",
" self.up1 = Up(1024, 512)\n",
" self.up2 = Up(512, 256)\n",
" self.up3 = Up(256, 128)\n",
" self.up4 = Up(128, 64)\n",
" self.outc = nn.Conv2d(64, n_classes, 1)\n",
"\n",
" def forward(self, x):\n",
Expand Down Expand Up @@ -558,4 +564,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
33 changes: 30 additions & 3 deletions openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"outputs": [],
"source": [
"#Setup default workspace, logging, etc.\n",
"fx.init('torch_cnn_mnist')"
"fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log')"
]
},
{
Expand Down Expand Up @@ -124,6 +124,29 @@
" return F.cross_entropy(input=output,target=target)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we can define metric logging function. It should has the following signature described below. You can use it to write metrics to tensorboard or some another specific logging."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torch.utils.tensorboard import SummaryWriter\n",
"\n",
"writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5)\n",
"\n",
"\n",
"def write_metric(node_name, task_name, metric_name, metric, round_number):\n",
" writer.add_scalar(\"{}/{}/{}\".format(node_name, task_name, metric_name),\n",
" metric, round_number)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -205,8 +228,12 @@
"metadata": {},
"outputs": [],
"source": [
"#Run experiment, return trained FederatedModel\n",
"final_fl_model = fx.run_experiment(collaborators,{'aggregator.settings.rounds_to_train':5})"
"# Run experiment, return trained FederatedModel\n",
"\n",
"final_fl_model = fx.run_experiment(collaborators, override_config={\n",
" 'aggregator.settings.rounds_to_train': 5,\n",
" 'aggregator.settings.log_metric_callback': write_metric,\n",
"})"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
settings:
listen_ip: localhost
sample_shape: ['300', '400', '3']
target_shape: ['300', '400']
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e
FQDN=$1
fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt
Loading