that a single neuron 

(i) takes some set of inputs;  

(ii) generates a corresponding scalar output; and  

(iii) has a set of associated parameters that can be updated to optimize some objective function of interest.

Just like individual neurons, layers 

(i) take a set of inputs, 

(ii) generate corresponding outputs, and 

(iii) are described by a set of tunable parameters. 

The entire model takes in raw inputs (the features), 

generates outputs (the predictions), 

and possesses parameters (the combined parameters from all constituent layers). 

While you might think that neurons, layers, and models give us enough abstractions to go about our business, 

it turns out that we often find it convenient to speak about components that are larger than an individual layer but smaller than the entire model.

![](https://d2l.ai/_images/blocks.svg)

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

In [2]:
net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

X = torch.rand(2, 20)
net(X).shape



torch.Size([2, 10])

we briefly summarize the basic functionality that each module must provide:

1. Ingest input data as arguments to its forward propagation method.

2. Generate an output by having the forward propagation method return a value. Note that the output may have a different shape from the input. For example, the first fully connected layer in our model above ingests an input of arbitrary dimension but returns an output of dimension 256.

3. Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation method. Typically this happens automatically.

4. Store and provide access to those parameters necessary for executing the forward propagation computation.

5. Initialize model parameters as needed.

In [3]:
class MLP(nn.Module):
    def __init__(self):
        # Call the constructor of the parent class nn.Module to perform
        # the necessary initialization
        super().__init__()
        self.hidden = nn.LazyLinear(256)
        self.out = nn.LazyLinear(10)

    # Define the forward propagation of the model, that is, how to return the
    # required model output based on the input X
    def forward(self, X):
        return self.out(F.relu(self.hidden(X)))

In [4]:
net = MLP()
net(X).shape

torch.Size([2, 10])

To build our own simplified MySequential, we just need to define two key methods:

1. A method for appending modules one by one to a list.

2. A forward propagation method for passing an input through the chain of modules, in the same order as they were appended.

In [5]:
class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            self.add_module(str(idx), module)

    def forward(self, X):
        for module in self.children():
            X = module(X)
        return X

In [6]:
net = MySequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))
net(X).shape

torch.Size([2, 10])

In [7]:
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # Random weight parameters that will not compute gradients and
        # therefore keep constant during training
        self.rand_weight = torch.rand((20, 20))
        self.linear = nn.LazyLinear(20)

    def forward(self, X):
        X = self.linear(X)
        X = F.relu(X @ self.rand_weight + 1)
        # Reuse the fully connected layer. This is equivalent to sharing
        # parameters with two fully connected layers
        X = self.linear(X)
        # Control flow
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

In [8]:
net = FixedHiddenMLP()
net(X)

tensor(-0.1621, grad_fn=<SumBackward0>)

In [9]:
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.LazyLinear(64), nn.ReLU(),
                                 nn.LazyLinear(32), nn.ReLU())
        self.linear = nn.LazyLinear(16)

    def forward(self, X):
        return self.linear(self.net(X))


chimera = nn.Sequential(NestMLP(), nn.LazyLinear(20), FixedHiddenMLP())
chimera(X)

tensor(-0.3452, grad_fn=<SumBackward0>)

1. If you change MySequential to store modules in a Python list:

If you change MySequential to store modules in a Python list, the main issue you might encounter is that the parameters of the modules in the list might not be registered correctly in the parent module (MySequential). This is because PyTorch's nn.Module tracks the parameters of its submodules only when they are added as attributes. When using a Python list, the submodules are not added as attributes, so their parameters might not be recognized by the parent module.

To fix this issue, you can use `nn.ModuleList` instead of a Python list. `nn.ModuleList` is specifically designed to store a list of PyTorch modules and handles the registration of their parameters correctly.

2. Implement a parallel module:

Here's an implementation of a parallel module that takes two modules as arguments and returns the concatenated output of both networks in the forward propagation:

```python
import torch
import torch.nn as nn

class ParallelModule(nn.Module):
    def __init__(self, net1, net2):
        super().__init__()
        self.net1 = net1
        self.net2 = net2

    def forward(self, x):
        out1 = self.net1(x)
        out2 = self.net2(x)
        return torch.cat((out1, out2), dim=1)
```

3. Factory function to generate multiple instances of the same module and build a larger network:

Here's a factory function that generates multiple instances of the same module and builds a larger network from it:

```python
def create_networks(module_class, num_instances, *args, **kwargs):
    networks = [module_class(*args, **kwargs) for _ in range(num_instances)]
    return networks

def build_large_network(networks):
    return nn.Sequential(*networks)

# Example usage:
class MyNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 20),
            nn.ReLU(),
            nn.Linear(20, 5)
        )

    def forward(self, x):
        return self.layers(x)

num_instances = 3
networks = create_networks(MyNetwork, num_instances)
large_network = build_large_network(networks)
```

In the example above, the `create_networks` function generates multiple instances of the same module, and the `build_large_network` function combines them into a larger network using `nn.Sequential`.

抱歉，我误解了您的问题。要实现并行执行，您可以使用`torch.nn.parallel`模块中的`DataParallel`类。这允许您在多个GPU上并行运行网络，从而加速计算。以下是一个使用`DataParallel`实现并行执行的示例：

```python
import torch
import torch.nn as nn

class ParallelModule(nn.Module):
    def __init__(self, net1, net2):
        super().__init__()
        self.net1 = net1
        self.net2 = net2

    def forward(self, x):
        out1 = self.net1(x)
        out2 = self.net2(x)
        return torch.cat((out1, out2), dim=1)

# Example networks
net1 = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 5)
)

net2 = nn.Sequential(
    nn.Linear(10, 30),
    nn.ReLU(),
    nn.Linear(30, 5)
)

# Create the parallel module
parallel_module = ParallelModule(net1, net2)

# Check if multiple GPUs are available
if torch.cuda.device_count() > 1:
    print("Using", torch.cuda.device_count(), "GPUs")
    parallel_module = nn.DataParallel(parallel_module)

# Move the parallel module to the available device (GPU or CPU)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
parallel_module.to(device)

# Create input data and move it to the available device
input_data = torch.randn(32, 10).to(device)

# Forward pass through the parallel module
output = parallel_module(input_data)
```

在这个示例中，我们创建了一个`ParallelModule`，它接受两个子网络`net1`和`net2`。然后，我们使用`nn.DataParallel`类将`ParallelModule`并行化。如果有多个GPU可用，`nn.DataParallel`会自动在这些GPU上分布计算。

需要注意的是，`DataParallel`实际上是在训练过程中将输入数据拆分成多个较小的批次，并将这些批次分发到多个GPU上。这样，每个GPU都可以同时处理一部分数据，从而实现并行计算。然后，`DataParallel`会将各个GPU上的输出结果聚合起来，形成最终输出。

In [1]:
import torch
import torch.nn as nn


class ParallelModule(nn.Module):
    def __init__(self, net1, net2):
        super().__init__()
        self.net1 = net1
        self.net2 = net2

    def forward(self, x):
        out1 = self.net1(x)
        out2 = self.net2(x)
        return torch.cat((out1, out2), dim=1)


# Example networks
net1 = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 5)
)

net2 = nn.Sequential(
    nn.Linear(10, 30),
    nn.ReLU(),
    nn.Linear(30, 5)
)

# Create the parallel module
parallel_module = ParallelModule(net1, net2)

# Check if multiple GPUs are available
if torch.cuda.device_count() > 1:
    print("Using", torch.cuda.device_count(), "GPUs")
    parallel_module = nn.DataParallel(parallel_module)

# Move the parallel module to the available device (GPU or CPU)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
parallel_module.to(device)

# Create input data and move it to the available device
input_data = torch.randn(32, 10).to(device)

# Forward pass through the parallel module
output = parallel_module(input_data)