# Part 5 - Welcome to the Sandbox

In the last tutorials, we've been initializing our hook and all of our workers by hand every time. This can be a bit annoying when you're just playing around / learning about the interfaces. So, from here on out we'll be creating all these same variables using a special convenience function.

In [1]:
import torch
import syft as sy
sy.create_sandbox(globals())

Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/opt/conda/lib/python3.7/site-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.15.4.so'



Setting up Sandbox...
	- Hooking PyTorch
	- Creating Virtual Workers:
		- bob
		- theo
		- jason
		- alice
		- andy
		- jon
	Storing hook and workers as global variables...
	Loading datasets from SciKit Learn...
		- Boston Housing Dataset
		- Diabetes Dataset
		- Breast Cancer Dataset
	- Digits Dataset
		- Iris Dataset
		- Wine Dataset
		- Linnerud Dataset
	Distributing Datasets Amongst Workers...
	Collecting workers into a VirtualGrid...
Done!


### What does the sandbox give us?

As you can see above, we created several virtual workers and loaded in lots of test dataset, distributing them around the various workers so that we can practice using privacy preserving techniques such as Federated Learning.

We created six workers....

In [2]:
workers

[<VirtualWorker id:bob #objects:14>,
 <VirtualWorker id:theo #objects:14>,
 <VirtualWorker id:jason #objects:14>,
 <VirtualWorker id:alice #objects:14>,
 <VirtualWorker id:andy #objects:14>,
 <VirtualWorker id:jon #objects:14>]

We also populated lots of global variables which we can use right away!

In [3]:
hook

<syft.frameworks.torch.hook.hook.TorchHook at 0x7f1dd28361d0>

In [4]:
bob

<VirtualWorker id:bob #objects:14>

You can view the pre-populated datasets on a given worker by doing the following:

In [5]:
bob._objects

{1892113926: tensor([[6.3200e-03, 1.8000e+01, 2.3100e+00,  ..., 1.5300e+01, 3.9690e+02,
          4.9800e+00],
         [2.7310e-02, 0.0000e+00, 7.0700e+00,  ..., 1.7800e+01, 3.9690e+02,
          9.1400e+00],
         [2.7290e-02, 0.0000e+00, 7.0700e+00,  ..., 1.7800e+01, 3.9283e+02,
          4.0300e+00],
         ...,
         [4.4620e-02, 2.5000e+01, 4.8600e+00,  ..., 1.9000e+01, 3.9563e+02,
          7.2200e+00],
         [3.6590e-02, 2.5000e+01, 4.8600e+00,  ..., 1.9000e+01, 3.9690e+02,
          6.7200e+00],
         [3.5510e-02, 2.5000e+01, 4.8600e+00,  ..., 1.9000e+01, 3.9064e+02,
          7.5100e+00]])
 	Tags: #housing #data .. _boston_dataset: #boston #boston_housing 
 	Description: .. _boston_dataset:...
 	Shape: torch.Size([84, 13]),
 30647330823: tensor([24.0000, 21.6000, 34.7000, 33.4000, 36.2000, 28.7000, 22.9000, 27.1000,
         16.5000, 18.9000, 15.0000, 18.9000, 21.7000, 20.4000, 18.2000, 19.9000,
         23.1000, 17.5000, 20.2000, 18.2000, 13.6000, 19.6000, 15.2

# Part 2: Worker Search Functionality

One important aspect of doing remote data science is that we want the ability to search for datasets on a remote machine. Think of a research lab wanting to query hospitals for maybe "radio" datasets.

In [6]:
x = torch.tensor([1,2,3,4,5]).tag("#radio", "#hospital1").describe("The input datapoints to the hospital1 dataset.")
y = torch.tensor([5,4,3,2,1]).tag("#radio", "#hospital2").describe("The input datapoints to the hospital2 dataset.")
z = torch.tensor([1,2,3,4,5]).tag("#fun", "#mnist",).describe("The images in the MNIST training dataset.")

In [7]:
x

tensor([1, 2, 3, 4, 5])
	Tags: #radio #hospital1 
	Description: The input datapoints to the hospital1 dataset....
	Shape: torch.Size([5])

In [8]:
x = x.send(bob)
y = y.send(bob)
z = z.send(bob)

# this searches for exact match within a tag or within the description
results = bob.search(["#radio"])

In [9]:
results

[tensor([1, 2, 3, 4, 5])
 	Tags: #radio #hospital1 
 	Description: The input datapoints to the hospital1 dataset....
 	Shape: torch.Size([5]),
 tensor([5, 4, 3, 2, 1])
 	Tags: #radio #hospital2 
 	Description: The input datapoints to the hospital2 dataset....
 	Shape: torch.Size([5])]

In [10]:
print(results[0].description)

The input datapoints to the hospital1 dataset.


Similarly, you can also search for datasets that are pre-populated on the sandbox workers.

In [11]:
boston_housing_results = bob.search(["#boston", "#housing"])

In [12]:
boston_housing_results

[tensor([[6.3200e-03, 1.8000e+01, 2.3100e+00,  ..., 1.5300e+01, 3.9690e+02,
          4.9800e+00],
         [2.7310e-02, 0.0000e+00, 7.0700e+00,  ..., 1.7800e+01, 3.9690e+02,
          9.1400e+00],
         [2.7290e-02, 0.0000e+00, 7.0700e+00,  ..., 1.7800e+01, 3.9283e+02,
          4.0300e+00],
         ...,
         [4.4620e-02, 2.5000e+01, 4.8600e+00,  ..., 1.9000e+01, 3.9563e+02,
          7.2200e+00],
         [3.6590e-02, 2.5000e+01, 4.8600e+00,  ..., 1.9000e+01, 3.9690e+02,
          6.7200e+00],
         [3.5510e-02, 2.5000e+01, 4.8600e+00,  ..., 1.9000e+01, 3.9064e+02,
          7.5100e+00]])
 	Tags: #housing #data .. _boston_dataset: #boston #boston_housing 
 	Description: .. _boston_dataset:...
 	Shape: torch.Size([84, 13]),
 tensor([24.0000, 21.6000, 34.7000, 33.4000, 36.2000, 28.7000, 22.9000, 27.1000,
         16.5000, 18.9000, 15.0000, 18.9000, 21.7000, 20.4000, 18.2000, 19.9000,
         23.1000, 17.5000, 20.2000, 18.2000, 13.6000, 19.6000, 15.2000, 14.5000,
         15

# Part 3: Virtual Grid

A Grid is simply a collection of workers which gives you some convenience functions for when you want to put together a dataset.

In [13]:
grid = sy.PrivateGridNetwork(*workers)

In [14]:
results = grid.search("#boston")

In [15]:
boston_data = grid.search("#boston","#data")

In [16]:
boston_target = grid.search("#boston","#target")