# Data Owner 03

Outline of what DO1 will do

0. Setup local syftbox network for local experimentations (only needed for the local testing case)
1. DO logs into the datasite as an admin
2. DO creates a Syft dataset 
3. DO reviews and run jobs submitted by data scientists on DO's private data

## 0. Setup local syftbox network for local experimentations

This will set up a local syftbox directory structures to test the whole flow locally under `./local_syftbox_network`, where eventually when all 3 clients have setup their datasites, it will look like below

In [None]:
import os
from pathlib import Path

from syft_rds.orchestra import remove_rds_stack_dir, setup_rds_server

remove_rds_stack_dir(root_dir=Path("."), key="local_syftbox_network")

DO_EMAIL = "do1@openmined.org"
do_stack = setup_rds_server(
    email=DO_EMAIL, root_dir=Path("."), key="local_syftbox_network"
)

os.environ["SYFTBOX_CLIENT_CONFIG_PATH"] = str(do_stack.client.config_path)

## 1. DO logs into the datasite as admin

In [None]:
do1 = do_stack.init_session(host=DO_EMAIL)

In [None]:
do1.is_admin

## 2. DO1 creates a dataset

First, DO1 prepares a diabetes dataset with mock (fake / synthetic) part and real, private part  

In [None]:
from pathlib import Path

CORPUS_NAME = "statpearls"
DATASET_DIR = (
    Path(f"../data_processing/processed_data/{CORPUS_NAME}").expanduser().absolute()
)
PRIVATE_PATH = DATASET_DIR / "private"
MOCK_PATH = DATASET_DIR / "mock"
README_PATH = DATASET_DIR / "README.md"

assert DATASET_DIR.exists()
assert PRIVATE_PATH.exists()
assert MOCK_PATH.exists()

DO1 creates a syft dataset, where the mock part is uploaded to the datasite and is public to the SyftBox network, and the private part stays local (never get shared)

In [None]:
dataset = do1.dataset.create(
    name=CORPUS_NAME,
    path=PRIVATE_PATH,
    mock_path=PRIVATE_PATH,
    description_path=README_PATH,
)
dataset.describe()

## 3. Review and Run Jobs

After the DS submits a job, the DO sees that it has appeared on their datasite, and can review it

In [None]:
jobs = do1.job.get_all(status="pending_code_review")
jobs

In [None]:
job = jobs[0]
job

In [None]:
# same as job.code.describe()
job.show_user_code()

By running `run_private(job)`, the DO1 runs the `syft_flwr` client code that trains the model received from the aggregator on their private data and then sends the updated model back to the aggregator. This happens for multiple rounds

In [None]:
res_job = do1.run_private(job)