<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#A-Custom-Block" data-toc-modified-id="A-Custom-Block-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>A Custom Block</a></span></li><li><span><a href="#The-Sequential-Block" data-toc-modified-id="The-Sequential-Block-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>The Sequential Block</a></span></li><li><span><a href="#Executing-Code-in-the-forward-Method" data-toc-modified-id="Executing-Code-in-the-forward-Method-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Executing Code in the forward Method</a></span></li><li><span><a href="#PARAMETER" data-toc-modified-id="PARAMETER-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>PARAMETER</a></span></li><li><span><a href="#Parameter-Access" data-toc-modified-id="Parameter-Access-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Parameter Access</a></span></li><li><span><a href="#Targeted-Parameters" data-toc-modified-id="Targeted-Parameters-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Targeted Parameters</a></span></li><li><span><a href="#All-Parameters-at-Once" data-toc-modified-id="All-Parameters-at-Once-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>All Parameters at Once</a></span></li><li><span><a href="#Collecting-Parameters-from-Nested-Blocks" data-toc-modified-id="Collecting-Parameters-from-Nested-Blocks-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Collecting Parameters from Nested Blocks</a></span></li><li><span><a href="#Parameter-Initialization" data-toc-modified-id="Parameter-Initialization-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Parameter Initialization</a></span></li><li><span><a href="#Custom-Initialization" data-toc-modified-id="Custom-Initialization-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Custom Initialization</a></span></li><li><span><a href="#Tied-Parameters" data-toc-modified-id="Tied-Parameters-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>Tied Parameters</a></span></li><li><span><a href="#Loading-and-Saving-Tensors" data-toc-modified-id="Loading-and-Saving-Tensors-12"><span class="toc-item-num">12&nbsp;&nbsp;</span>Loading and Saving Tensors</a></span></li><li><span><a href="#Loading-and-Saving-Model-Parameters" data-toc-modified-id="Loading-and-Saving-Model-Parameters-13"><span class="toc-item-num">13&nbsp;&nbsp;</span>Loading and Saving Model Parameters</a></span></li></ul></div>

In [1]:
import mxnet
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import  np,npx
npx.set_np()

## A Custom Block

In [2]:
class MLP(nn.Block):
    def __init__(self,*kwargs):
        super().__init__(*kwargs)
        self.lr1=nn.Dense(50,activation='relu')
        self.lr2=nn.Dense(20,activation='relu')
        self.l3=nn.Dense(1)
        
    def forward(self,x):
        h1=self.lr1(x)
        h2=self.lr2(h1)
        output=self.l3(h2)
        return output

In [3]:
x = np.random.uniform(size=(5,3))
x

array([[0.5488135 , 0.5928446 , 0.71518934],
       [0.84426576, 0.60276335, 0.8579456 ],
       [0.5448832 , 0.8472517 , 0.4236548 ],
       [0.6235637 , 0.6458941 , 0.3843817 ],
       [0.4375872 , 0.2975346 , 0.891773  ]])

In [4]:
net=MLP()
net.initialize()

In [5]:
net(x)

array([[-0.00207408],
       [-0.00273135],
       [-0.00165528],
       [-0.00164686],
       [-0.00209926]])

In [6]:
net.summary(x)

--------------------------------------------------------------------------------
        Layer (type)                                Output Shape         Param #
               Input                                      (5, 3)               0
        Activation-1                                     (5, 50)               0
             Dense-2                                     (5, 50)             200
        Activation-3                                     (5, 20)               0
             Dense-4                                     (5, 20)            1020
             Dense-5                                      (5, 1)              21
               MLP-6                                      (5, 1)               0
Parameters in forward computation graph, duplicate included
   Total params: 1241
   Trainable params: 1241
   Non-trainable params: 0
Shared params in forward computation graph: 0
Unique parameters in model: 1241
---------------------------------------------------------

In [7]:
4*500

2000

In [8]:
51*20

1020

In [9]:
class Linear(nn.Block):
    def __init__(self,in_units,units):
        super().__init__()
        with self.name_scope():
            self.units=units
            self.in_units=in_units
            self.weight=self.params.get('weight',init=mxnet.init.Normal(sigma=0.5),
                                    shape=(self.in_units,self.units))
            self.bias=self.params.get('bias',init=mxnet.init.Normal(sigma=0.5),shape=(self.units))
    def forward(self,x):
        liner=np.dot(x,self.weight.data())+self.bias.data()
        output=npx.relu(liner)
        return output

In [10]:
class MLP(nn.Block):
    def __init__(self):
        super().__init__()
        self.l1=Linear(units=7,in_units=3)
        self.l2=Linear(units=2,in_units=7)
        self.l3=Linear(units=4,in_units=2)
    def forward(self,x):
        h1=npx.relu(self.l1(x))
        h1=npx.dropout(data=h1,p=0.5)
        h2=npx.relu(self.l2(h1))
        output=self.l3(h2)
        return output    

In [11]:
mlp=MLP()
mlp.collect_params().initialize()

In [12]:
x = np.random.rand(5,3)

In [13]:
mlp.collect_params()

mlp1_ (
  Parameter linear0_weight (shape=(3, 7), dtype=<class 'numpy.float32'>)
  Parameter linear0_bias (shape=(7,), dtype=<class 'numpy.float32'>)
  Parameter linear1_weight (shape=(7, 2), dtype=<class 'numpy.float32'>)
  Parameter linear1_bias (shape=(2,), dtype=<class 'numpy.float32'>)
  Parameter linear2_weight (shape=(2, 4), dtype=<class 'numpy.float32'>)
  Parameter linear2_bias (shape=(4,), dtype=<class 'numpy.float32'>)
)

In [14]:
mlp.collect_params()['linear0_weight'].data()

array([[ 0.90511256,  0.1217143 ,  0.11279303, -0.9587378 ,  1.2084526 ,
        -0.36452514,  0.7567334 ],
       [-0.6180372 ,  0.5950067 , -0.62458515, -0.5837988 ,  0.50898963,
         0.3497631 ,  0.08811765],
       [-0.97152925, -0.11385177, -0.3536062 , -0.7891857 ,  0.33093628,
        -0.650588  ,  0.49574193]])

In [15]:
mlp(x)

array([[0.        , 0.7017477 , 0.        , 0.06274574],
       [0.        , 0.71423846, 0.        , 0.15500778],
       [0.        , 0.45687222, 0.        , 0.1652804 ],
       [0.        , 0.68415844, 0.        , 0.01396903],
       [0.        , 0.72619736, 0.        , 0.07520926]])

In [16]:
class MyDense(nn.Block):
    def __init__(self, units, in_units, **kwargs):
        super().__init__(**kwargs)
        self.weight = self.params.get('weight', shape=(in_units, units))
        self.bias = self.params.get('bias', shape=(units,))
    def forward(self, x):
        linear = np.dot(x, self.weight.data(ctx=x.ctx)) + self.bias.data(ctx=x.ctx)
        return npx.relu(linear)

In [17]:
net = nn.Sequential()
net.add(MyDense(8, in_units=6),
MyDense(4, in_units=8))
net.initialize()

In [18]:
net(np.random.uniform(size=(2, 6)))

array([[0.01086344, 0.        , 0.03279249, 0.02536564],
       [0.00875435, 0.        , 0.0257882 , 0.02083994]])

## The Sequential Block

The following MySequential class delivers the same functionality as Gluonʼs default Sequential
class:

In [19]:
class MySequential(nn.Block):
    def add(self, block):
        # Here, block is an instance of a Block subclass, and we assume it has
        # a unique name. We save it in the member variable _children of the
        # Block class, and its type is OrderedDict. When the MySequential
        # instance calls the initialize function, the system automatically
        # initializes all members of _children
        self._children[block.name] = block
    def forward(self, x):
        # OrderedDict guarantees that members will be traversed in the order
        # they were added
        for block in self._children.values():
            x = block(x)
        return x

In [20]:
net=MySequential()
net.add(nn.Dense(50,activation='relu'))
net.add(nn.Dense(20,activation='relu'))
net.add(nn.Dense(1))
net.initialize()

In [21]:
net(x)

array([[0.00104705],
       [0.00077609],
       [0.00030453],
       [0.00107107],
       [0.00115779]])

## Executing Code in the forward Method
<h1><kbd>
You might have noticed that until now, all of the operations in our networks have acted upon our networkʼs activations and its parameters. Sometimes, however, we might want to incorporate constant terms which are neither the result of previous layers nor updatable parameters. In Gluon, we call these constant parameters. Say for example that we want a layer that calculates the function  f(x; w) = c · w⊤x, where x is the input, w is our parameter, and c is some specified constant that is not updated during optimization.
Declaring constants explicitly (via get_constant) makes this clear helps Gluon to speed up execution. In the following code, we will implement a model that could not easily be assembled using
only predefined layers and Sequential
    
    (source: Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola page 202)
</kbd></h1>

 

In [23]:
class FixedHiddenMLP(nn.Block):
    def __init__(self, **kwargs):
        super(FixedHiddenMLP, self).__init__(**kwargs)
        # Random weight parameters created with the get_constant are not
        # iterated during training (i.e., constant parameters)
        self.rand_weight = self.params.get_constant('rand_weight',np.random.uniform(size=(5, 2)))
        self.dense = nn.Dense(5, activation='relu')
    def forward(self,x):
        x=self.dense(x)
        x=npx.relu(np.dot(x,self.rand_weight.data()))
        while np.abs(x).mean()>1:
            x/=3
        return x.mean(axis=1)

In [24]:
net =FixedHiddenMLP()
net.initialize()

In [25]:
net(x)

array([0.        , 0.        , 0.        , 0.0040905 , 0.00340335])

## PARAMETER

In [26]:
network = nn.Sequential()
network.add(nn.Dense(5, activation='relu'))
network.add(nn.Dense(3, activation='relu'))
network.add(nn.Dense(10))
network.initialize() 

In [27]:
x = np.random.uniform(size=(2, 4))
network(x) 

array([[ 2.5490829e-06,  1.1651800e-05, -1.2539108e-05,  1.6888396e-05,
         1.8220503e-06,  4.1542698e-06, -1.5352751e-05, -5.7152911e-06,
        -1.0659906e-05,  1.6679809e-05],
       [-6.8636573e-06,  6.6575398e-05, -8.8636894e-05,  3.2865614e-04,
         1.3288157e-04,  6.8365924e-05, -3.8826469e-04,  1.1464535e-04,
        -1.4936542e-05,  1.9356799e-04]])

## Parameter Access

In [28]:
network.load_params

<bound method Block.load_params of Sequential(
  (0): Dense(4 -> 5, Activation(relu))
  (1): Dense(5 -> 3, Activation(relu))
  (2): Dense(3 -> 10, linear)
)>

In [29]:
print(network[0].params)
print(network[1].params)
print(network[2].params)

dense7_ (
  Parameter dense7_weight (shape=(5, 4), dtype=float32)
  Parameter dense7_bias (shape=(5,), dtype=float32)
)
dense8_ (
  Parameter dense8_weight (shape=(3, 5), dtype=float32)
  Parameter dense8_bias (shape=(3,), dtype=float32)
)
dense9_ (
  Parameter dense9_weight (shape=(10, 3), dtype=float32)
  Parameter dense9_bias (shape=(10,), dtype=float32)
)


## Targeted Parameters

accessing the values of the bias parameter of the first layer

In [30]:
print(type(network[0].bias))
print(network[0].bias)
print(network[0].bias.data())

<class 'mxnet.gluon.parameter.Parameter'>
Parameter dense7_bias (shape=(5,), dtype=float32)
[0. 0. 0. 0. 0.]


accessing the weight of the second layer

In [31]:
print(network[1].weight)
print(network[1].weight.data())

Parameter dense8_weight (shape=(3, 5), dtype=float32)
[[-0.03604563  0.06130102  0.00140283  0.01695964 -0.03465375]
 [-0.05630658  0.02086442  0.05381045  0.0262625   0.03768177]
 [-0.06403377  0.02966186  0.00689322 -0.06247731  0.02004783]]


In addition to the value, each parameter also allows us to access the gradient. Because we have
not invoked backpropagation for this network yet, it is in its initial state.

In [32]:
print(network[1].weight.grad())

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


## All Parameters at Once

In [33]:
print(network[0].collect_params())
print(network.collect_params())

dense7_ (
  Parameter dense7_weight (shape=(5, 4), dtype=float32)
  Parameter dense7_bias (shape=(5,), dtype=float32)
)
sequential1_ (
  Parameter dense7_weight (shape=(5, 4), dtype=float32)
  Parameter dense7_bias (shape=(5,), dtype=float32)
  Parameter dense8_weight (shape=(3, 5), dtype=float32)
  Parameter dense8_bias (shape=(3,), dtype=float32)
  Parameter dense9_weight (shape=(10, 3), dtype=float32)
  Parameter dense9_bias (shape=(10,), dtype=float32)
)


In [34]:
network.collect_params()['dense7_weight'].data()

array([[ 0.02726193,  0.04634436, -0.06751239,  0.02247506],
       [ 0.0641404 , -0.04866897,  0.04530007,  0.06944998],
       [ 0.05894835, -0.05596732, -0.03630256,  0.05139603],
       [ 0.02638481, -0.02880274,  0.06674667, -0.00905051],
       [-0.05220781,  0.04136392,  0.03453634,  0.02485117]])

In [35]:
network[0].weight.data()

array([[ 0.02726193,  0.04634436, -0.06751239,  0.02247506],
       [ 0.0641404 , -0.04866897,  0.04530007,  0.06944998],
       [ 0.05894835, -0.05596732, -0.03630256,  0.05139603],
       [ 0.02638481, -0.02880274,  0.06674667, -0.00905051],
       [-0.05220781,  0.04136392,  0.03453634,  0.02485117]])

## Collecting Parameters from Nested Blocks

In [36]:
def block1():
    net=nn.Sequential()
    net.add(nn.Dense(10,activation='relu'))
    net.add(nn.Dense(5,activation='relu'))
    return net
def block2():
    net=nn.Sequential()
    for _ in range(3):
        net.add(block1())
    return net
rgnet = nn.Sequential()
rgnet.add(block2())
rgnet.add(nn.Dense(1))
rgnet.initialize()
        

In [37]:
x = np.random.uniform(size=(5,3))
x

array([[0.06512146, 0.55264586, 0.04457111],
       [0.2512001 , 0.9132836 , 0.32405233],
       [0.3050467 , 0.7881646 , 0.5579874 ],
       [0.16245073, 0.9824449 , 0.08158111],
       [0.40044853, 0.5131847 , 0.6658714 ]])

In [38]:
rgnet(x)

array([[-3.2800735e-09],
       [-5.1647358e-09],
       [-3.2364771e-09],
       [-6.0090639e-09],
       [-9.3335528e-10]])

# let see how the network is organized

In [39]:
print(rgnet.collect_params)

<bound method Block.collect_params of Sequential(
  (0): Sequential(
    (0): Sequential(
      (0): Dense(3 -> 10, Activation(relu))
      (1): Dense(10 -> 5, Activation(relu))
    )
    (1): Sequential(
      (0): Dense(5 -> 10, Activation(relu))
      (1): Dense(10 -> 5, Activation(relu))
    )
    (2): Sequential(
      (0): Dense(5 -> 10, Activation(relu))
      (1): Dense(10 -> 5, Activation(relu))
    )
  )
  (1): Dense(5 -> 1, linear)
)>


In [40]:
print(rgnet.collect_params())

sequential2_ (
  Parameter dense10_weight (shape=(10, 3), dtype=float32)
  Parameter dense10_bias (shape=(10,), dtype=float32)
  Parameter dense11_weight (shape=(5, 10), dtype=float32)
  Parameter dense11_bias (shape=(5,), dtype=float32)
  Parameter dense12_weight (shape=(10, 5), dtype=float32)
  Parameter dense12_bias (shape=(10,), dtype=float32)
  Parameter dense13_weight (shape=(5, 10), dtype=float32)
  Parameter dense13_bias (shape=(5,), dtype=float32)
  Parameter dense14_weight (shape=(10, 5), dtype=float32)
  Parameter dense14_bias (shape=(10,), dtype=float32)
  Parameter dense15_weight (shape=(5, 10), dtype=float32)
  Parameter dense15_bias (shape=(5,), dtype=float32)
  Parameter dense16_weight (shape=(1, 5), dtype=float32)
  Parameter dense16_bias (shape=(1,), dtype=float32)
)


In [41]:
print(rgnet[0][0][1].bias.data())

[0. 0. 0. 0. 0.]


In [42]:
print(rgnet[0][0][0].bias.data())

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [43]:
print(rgnet[0][0][1].weight.data())

[[-0.06793067 -0.02857513  0.04238271 -0.02753913  0.0159183  -0.02017552
   0.03580332  0.04344229 -0.01702353  0.01086262]
 [ 0.01413071 -0.05946118 -0.00431622 -0.05904555  0.03154427 -0.01801983
  -0.05632765  0.03732275 -0.05355117  0.02641568]
 [ 0.04385129  0.02911753 -0.05524785  0.03740941  0.06503151 -0.02979862
   0.02976952  0.00675588  0.00335923  0.00606937]
 [ 0.05405652  0.03354855  0.05086622  0.06396187  0.03855976 -0.03108141
   0.04162101  0.04105943  0.03940867  0.02239587]
 [ 0.00975884  0.0112333   0.03175914  0.03848317 -0.00599905  0.06216455
   0.04231021 -0.0648632   0.03409437 -0.04936399]]


## Parameter Initialization

# Built-in Initialization

In [44]:
# # Here `force_reinit` ensures that parameters are freshly initialized even if
# they were already initialized previously
network.initialize(init=mxnet.init.Normal(sigma=0.01),force_reinit=True)

In [45]:
network[0].weight.data()

array([[-0.01143898, -0.00064168,  0.00557283, -0.00845679],
       [ 0.00498488, -0.004527  , -0.00650531,  0.00110566],
       [-0.01809649, -0.00372469,  0.00590213, -0.00047768],
       [ 0.01075142, -0.01239304,  0.00349711,  0.00102358],
       [ 0.02841117, -0.01567159, -0.00675081, -0.01355863]])

In [46]:
network[0].weight.data()[4]

array([ 0.02841117, -0.01567159, -0.00675081, -0.01355863])

<b> initializing all parameters to a given constant value 0.5</b>

In [47]:
network.initialize(init=mxnet.init.Constant(0.5),force_reinit=True)

In [48]:
network[0].weight.data()

array([[0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5]])

In [49]:
network[0].weight.data()[1]

array([0.5, 0.5, 0.5, 0.5])

# We can also apply different initializers for certain blocks

In [50]:
network[0].initialize(mxnet.init.Xavier(),force_reinit=True)
network[1].initialize(mxnet.init.Normal(sigma=0.1),force_reinit=True)
network[2].initialize(mxnet.init.Constant(1),force_reinit=True)

In [51]:
network[0].weight.data()[:2]

array([[-0.29841286,  0.58742774,  0.782645  ,  0.35210955],
       [ 0.24438798,  0.44253576,  0.6221672 , -0.66769755]])

In [52]:
network[1].weight.data()[0:3]

array([[-0.043648  ,  0.15423739,  0.07354638, -0.01657329,  0.07273798],
       [ 0.03766086, -0.00454513,  0.14672141,  0.05573305,  0.08283143],
       [ 0.14260544, -0.07045387,  0.14156336, -0.01435713, -0.06850393]])

In [53]:
network[2].weight.data()[0:2]

array([[1., 1., 1.],
       [1., 1., 1.]])

## Custom Initialization

In [54]:
class MyInit(mxnet.init.Initializer):
    def _init_weight(self,name,data):
        print(name,data.shape)
        data[:]=np.random.uniform(-10,10,data.shape)
        data*=np.abs(data)>=5

In [55]:
network.initialize(init=MyInit(),force_reinit=True)

dense7_weight (5, 4)
dense8_weight (3, 5)
dense9_weight (10, 3)


In [56]:
network[2].weight.data()[0:2]

array([[ 0.      ,  8.39476 , -7.311439],
       [ 9.21533 , -5.912525,  0.      ]])

## Tied Parameters

In [57]:
net = nn.Sequential()
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Dense(4, activation='relu')
net.add(nn.Dense(4, activation='relu'),shared,
nn.Dense(4, activation='relu', params=shared.params),
nn.Dense(10))
net.initialize()
X = np.random.uniform(size=(2, 20))
net(X)
# Check whether the parameters are the same
print(net[1].weight.data()[0] == net[2].weight.data()[0])
net[1].weight.data()[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[1].weight.data()[0] == net[2].weight.data()[0])

[ True  True  True  True]
[ True  True  True  True]


<h1><kbd> This example shows that the parameters of the second and third layer are tied. They are not just equal, they are represented by the same exact tensor. Thus, if we change one of the parameters, the other one changes, too. You might wonder, when parameters are tied what happens to the gradients? Since the model parameters contain gradients, the gradients of the second hidden layer and the third hidden layer are added together during backpropagation
</kbd></h1>

 (source: From the book am using: Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola page 212)

## Loading and Saving Tensors

For individual tensors, we can directly invoke the load and save functions to read and write them
respectively. Both functions require that we supply a name, and save requires as input the variable
to be saved

In [58]:
x=np.random.rand(5,2)
npx.save('data/x-file',x)

 reading the data from the stored file back into memory

In [59]:
x=npx.load('data/x-file')
x

[array([[0.37129477, 0.4562225 ],
        [0.6757552 , 0.59618443],
        [0.91564703, 0.42880976],
        [0.09512343, 0.5551939 ],
        [0.48494202, 0.41693395]])]

In [60]:
y=np.random.uniform(size=(4,2))
y

array([[0.78078854, 0.40046972],
       [0.64849687, 0.6953465 ],
       [0.1265121 , 0.09285121],
       [0.8600266 , 0.16654207]])

##  Loading and Saving Model Parameters

In [61]:
class MLP(nn.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        self.hidden = nn.Dense(256, activation='relu')
        self.output = nn.Dense(10)
    def forward(self, x):
        return self.output(self.hidden(x))
net = MLP()
net.initialize()

In [62]:
X = np.random.uniform(size=(2, 20))
Y = net(X)

Next, we store the parameters of the model as a file with the name “mlp.params”

In [63]:
net.save_parameters('saved_models/mpl.params')

To recover the model, we instantiate a clone of the original MLP model. Instead of randomly
initializing the model parameters, we read the parameters stored in the file directly

In [64]:
clone=MLP()
clone.load_parameters('saved_models/mpl.params')

Since both instances have the same model parameters, the computational result of the same input
X should be the same. Let us verify this

In [65]:
Y_clone=clone(X)
Y_clone==Y

array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True]])

In [66]:
def try_gpu(i=0): #@save\
    """Return gpu(i) if exists, otherwise return cpu()."""
    return npx.gpu(i) if npx.num_gpus() >= i + 1 else npx.cpu()
def try_all_gpus(): #@save
    """Return all available GPUs, or [cpu()] if no GPU exists."""
    devices = [npx.gpu(i) for i in range(npx.num_gpus())]
    return devices if devices else [npx.cpu()]

In [67]:
try_gpu(), try_gpu(10), try_all_gpus()

(cpu(0), cpu(0), [cpu(0)])

By default, tensors are created on the CPU. We can query the device where the tensor is located.

In [68]:
x=np.array([3,2])
x.ctx

cpu(0)