# Paddle

## general docs

* https://www.paddlepaddle.org.cn/documentation/docs/en/guides
* 高阶api: https://www.paddlepaddle.org.cn/documentation/docs/zh/tutorial/quick_start/high_level_api/high_level_api.html
* https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/index_cn.html

## tricks of training for Paddldclas

* https://paddleclas.readthedocs.io/en/latest/models/Tricks_en.html

## install problems

* core_avx.so: undefined symbol: _dl_sym, version GLIBC_PRIVATE
* install lighting version.

## Paddle dataset




### Dataset

#### map-style, MapDataset

* 映射式(map-style) 数据集需要继承这个基类,映射式数据集 可以通过一个键值 索引并获取指定样本的数据集,所有映射式数据集必须实现以下方法: 
* 1.___getitem__: 根据给定索引 获取数据集中指定样本,在 paddle.io.DataLoader 中需要使用此函数通过下标获取样本。  
* 2.__len__: 返回数据集样本个数, <span style="color:red">paddle.io.BatchSampler 中需要样本个数生成 下标序列。</span> 
* 可以直接继承 Dataset, 实现以上两个方法即可
* MapDataset 进行了包装，比如可以直接传入list 数据集合


#### iterable-style


### transformation

* 在自定义Dataset 的 __getitem__中就处理
  * 例子参考： https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/beginner/data_preprocessing_cn.html
* 直接调用.map(fn), 其中fn为处理函数
* 在Dataloader 中传入 collate_fn
  * paddlenlp\data\data_collator.py， 如DataCollatorWithPadding，DataCollatorForSeq2Seq
  ```python
  DataLoader(
            train_dataset,
            batch_sampler=train_sampler,
            collate_fn=self.data_collator,
            num_workers=self.args.dataloader_num_workers,
        )
  ```

### DataLoader

* doc: https://www.paddlepaddle.org.cn/documentation/api/paddle/io/DataLoader_cn.html
* 多进程读取： DataLoader支持单进程和多进程的数据加载方式，当 num_workers 大于0时，将使用多进程方式异步加载数据。
* DataLoader当前支持 map-style 和 iterable-style 的数据集

#### 禁用自动组batch
* DataLoader 支持在 batch_size 和 batch_sampler 均为None的时候禁用自动组batch功能，此时需求从 dataset 中获取的数据为已经组好batch的数据，该数据将不做任何处理直接传到 collate_fn

#### collate_fn 作用
* 组batch的方法
* collate_fn (callable，可选) - 通过此参数指定如果将样本列表组合为 mini-batch 数据，当 collate_fn 为 None 时，默认为将样本个字段在第 0 维上堆叠(同 np.stack(..., axis=0) )为 mini-batch 的数据。默认值为 None。

### Paddle BatchSampler

* https://www.paddlepaddle.org.cn/documentation/docs/en/api/paddle/io/BatchSampler_en.html
* Batch sampler used by paddle.io.DataLoader should be a subclass of paddle.io.BatchSampler, BatchSampler subclasses should implement following methods:
    * __iter__: return <span style="color:red">***mini-batch indices***</span> iterably. 
    * __len__: get mini-batch number in an epoch.

In [1]:
from paddle.io import RandomSampler, BatchSampler, Dataset
import numpy as np

# init with dataset
class RandomDataset(Dataset):
    def __init__(self, num_samples):
        self.num_samples = num_samples

    def __getitem__(self, idx):
        image = np.random.random([784]).astype('float32')
        label = np.random.randint(0, 9, (1, )).astype('int64')
        return image, label

    def __len__(self):
        return self.num_samples

bs = BatchSampler(dataset=RandomDataset(100),
                  shuffle=False,
                  batch_size=16,
                  drop_last=False)

for batch_indices in bs:
    print("BatchSampler:", batch_indices)

# init with sampler
sampler = RandomSampler(RandomDataset(100))
bs = BatchSampler(sampler=sampler,
                  batch_size=8,
                  drop_last=True)

for batch_indices in bs:
    print("RandomSampler:", batch_indices)

BatchSampler: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
BatchSampler: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
BatchSampler: [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
BatchSampler: [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
BatchSampler: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
BatchSampler: [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
BatchSampler: [96, 97, 98, 99]
RandomSampler: [6, 47, 35, 37, 82, 58, 3, 11]
RandomSampler: [33, 56, 95, 45, 87, 10, 4, 52]
RandomSampler: [57, 41, 71, 55, 83, 96, 13, 97]
RandomSampler: [36, 99, 8, 69, 25, 92, 44, 73]
RandomSampler: [62, 89, 51, 72, 14, 66, 84, 98]
RandomSampler: [20, 75, 48, 68, 15, 43, 65, 39]
RandomSampler: [23, 40, 64, 16, 63, 76, 86, 50]
RandomSampler: [59, 22, 54, 46, 77, 53, 67, 0]
RandomSampler: [90, 34, 74, 18, 42, 80, 27, 32]
RandomSampler: [79, 70, 93, 60, 29, 17, 61, 5]
RandomSampler: [91, 9, 81, 31, 

In [1]:
import paddle

data = paddle.vision.datasets.MNIST(mode='train')

Cache file /home/jeffye/.cache/paddle/dataset/mnist/train-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/train-images-idx3-ubyte.gz 
Begin to download





Download finished
Cache file /home/jeffye/.cache/paddle/dataset/mnist/train-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/train-labels-idx1-ubyte.gz 
Begin to download

Download finished


In [2]:
type(data)

  and should_run_async(code)


paddle.vision.datasets.mnist.MNIST

In [4]:
from cProfile import label

labels = paddle.reshape(labels, shape=[-1, 1])
labels

  and should_run_async(code)


Tensor(shape=[10, 1], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       [[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]])

## paddle layers 

### Cosine Similarity Operator - paddle.fluid.layers.nn.cos_sim(X, Y)

In [1]:
import paddle

x = paddle.rand(shape=[3, 7], dtype='float32')
y = paddle.rand(shape=[1, 7], dtype='float32')
out = paddle.fluid.layers.cos_sim(x, y)
print(out)

Tensor(shape=[3, 1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.83597958],
        [0.93105304],
        [0.85824275]])




In [25]:
x1 = [[0.8024077,0.9927354,0.27238318,0.8344984 ],
        [0.48949873,0.5797396,0.65444374,0.66510963],
        [0.1031398,0.9614342,0.08365563,0.6796464 ],
        [0.1031398,0.9614342,0.08365563,0.6796464 ],
        [0.10760343,0.7461209,0.7726148,0.5801006 ]]
x2 = [[0.62913156,0.1536727,0.9847992,0.04591406],
        [0.9098952,0.15715368,0.8671125,0.3156102 ],
        [0.4427798,0.54136837,0.5276275,0.32394758],
        [0.1031398,0.9614342,0.08365563,0.6796464 ],
        [0.3769419,0.8535014,0.48041078,0.9256797 ]]
x1 = paddle.to_tensor(x1)
x2 = paddle.to_tensor(x2)

In [27]:
cosine_sim = paddle.matmul(x1, x2, transpose_y=True)
margin_diag = paddle.full(shape=[cosine_sim.shape[0]],
                                  fill_value=0.1,
                                  dtype=paddle.get_default_dtype())
print(margin_diag)
print(paddle.diag(margin_diag))     
cosine_sim = cosine_sim - paddle.diag(margin_diag)

Tensor(shape=[5], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.10000000, 0.10000000, 0.10000000, 0.10000000, 0.10000000])
Tensor(shape=[5, 5], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.10000000, 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.10000000, 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.10000000, 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.10000000, 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.10000000]])


  and should_run_async(code)


In [20]:
x = x1.unsqueeze(1)
y = x2.unsqueeze(0)
print(x)
print(y)

cos_sim_func = paddle.nn.CosineSimilarity(axis=-1)
result = cos_sim_func(x, y)
print(result)

Tensor(shape=[5, 1, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[[0.80240768, 0.99273539, 0.27238318, 0.83449841]],

        [[0.48949873, 0.57973957, 0.65444374, 0.66510963]],

        [[0.10313980, 0.96143419, 0.08365563, 0.67964637]],

        [[0.10313980, 0.96143419, 0.08365563, 0.67964637]],

        [[0.10760343, 0.74612093, 0.77261478, 0.58010060]]])
Tensor(shape=[1, 5, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[[0.62913156, 0.15367270, 0.98479921, 0.04591406],
         [0.90989518, 0.15715368, 0.86711252, 0.31561020],
         [0.44277981, 0.54136837, 0.52762753, 0.32394758],
         [0.10313980, 0.96143419, 0.08365563, 0.67964637],
         [0.37694189, 0.85350138, 0.48041078, 0.92567968]]])
Tensor(shape=[5, 5], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.52750373, 0.68519437, 0.90307665, 0.88645834, 0.94705576],
        [0.75573283, 0.83689672, 0.97151995, 0.78222287, 0.95629632],
        [0.23341376, 0.3

In [24]:
paddle.diag(result)

  and should_run_async(code)


Tensor(shape=[5], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.52750373, 0.83689672, 0.75037485, 1.        , 0.92458993])

In [7]:
cos_sim_func = paddle.nn.CosineSimilarity(axis=0)
result = cos_sim_func(x1, x2)
print(result)

Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.87143707, 0.80905795, 0.79628360, 0.70043772])


  and should_run_async(code)


In [16]:
cos_sim_func = paddle.nn.CosineSimilarity(axis=1)
result = cos_sim_func(x1, x2)
print(result)

Tensor(shape=[5], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.52750373, 0.83689672, 0.75037485, 1.        , 0.92458993])


In [13]:
cos_sim_func = paddle.nn.CosineSimilarity(axis=0)
result = cos_sim_func(x1[:, 0], x2[:, 0])
print(result)

Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.87143707])


In [14]:
cos_sim_func = paddle.nn.CosineSimilarity(axis=0)
result = cos_sim_func(x1[0, :], x2[0, :])
print(result)

Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [0.52750373])


## Paddle Search operator

* useful for implementing special process
* paddle/tensor/search.py
    * masked_select : https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/masked_select_cn.html#masked-select
    * topk
    * searchsorted
    * index_sample
    * sort
    * kthvalue
    * bucketize
    * index_select

## Paddle Train

### fp16 training

* 

### Gradient Accumulation in dygraph graph mode
* <https://www.paddlepaddle.org.cn/documentation/docs/en/guides/performance_improving/amp_en.html>


In [None]:
mse = paddle.nn.MSELoss() # Define loss calculation function
model = SimpleNet(input_size, output_size)  # Define SimpleNet model
optimizer = paddle.optimizer.SGD(learning_rate=0.0001, parameters=model.parameters())  # Define SGD optimizer

accumulate_batchs_num = 10 # the batch numbers of gradients accumulation

# define GradScaler
scaler = paddle.amp.GradScaler(init_loss_scaling=1024)

train_time = 0
for epoch in range(epochs):
    for i, (data, label) in enumerate(loader):
        start_time = time.time() # get start time
        label._to(place) # Copy label to GPU
         # create AMP context environment
        with paddle.amp.auto_cast(level='O1'):
            output = model(data)
            loss = mse(output, label)
        # use GradScaler complete the loss scaling
        scaled = scaler.scale(loss)
        scaled.backward()

        #  when the accumulated batch is accumulate_batchs_num, update the model parameters
        if (i + 1) % accumulate_batchs_num == 0:
            # update parameters
            scaler.step(optimizer)
            scaler.update()
            optimizer.clear_grad(set_to_zero=False)
        # record training loss and training time
        train_loss = loss.numpy()
        train_time += time.time() - start_time

print("loss:", train_loss)
print("Time consuming using AMP-O1 mode:{:.3f} sec".format(train_time/(epochs*nums_batch)))
# loss: [0.6602017]
# Time consuming using AMP-O1 mode:0.113 sec

### use high level API with AMP usage
* https://www.paddlepaddle.org.cn/documentation/docs/zh/tutorial/quick_start/high_level_api/high_level_api.html


In [None]:
import paddle
import paddle.nn as nn
import paddle.vision.transforms as T


def run_example_code():
    device = paddle.set_device('gpu')
    # Using high level API to define neural network
    net = nn.Sequential(nn.Flatten(1), nn.Linear(
        784, 200), nn.Tanh(), nn.Linear(200, 10))
    model = paddle.Model(net)
    # Define optimizer
    optim = paddle.optimizer.SGD(
        learning_rate=1e-3, parameters=model.parameters())
    # Initialize neural network
    amp_configs = {
        "level": "O1",                    # Level corresponds to amp mode: O1, O2
        # Customize the white list and support custom_black_list
        "custom_white_list": {'conv2d'},
        "use_dynamic_loss_scaling": True  # Dynamic loss_scaling
    }
    model.prepare(optim,
                  paddle.nn.CrossEntropyLoss(),
                  paddle.metric.Accuracy(),
                  amp_configs=amp_configs)
    # prepare data
    transform = T.Compose([T.Transpose(), T.Normalize([127.5], [127.5])])
    data = paddle.vision.datasets.MNIST(mode='train', transform=transform)
    # use AMP training
    model.fit(data, epochs=2, batch_size=32, verbose=1)


if paddle.is_compiled_with_cuda():
    run_example_code()


### count parameters of a model/layer

```python
import numpy as np

def calculate_params(model):
    n_train = 0
    n_non_train = 0
    for p in model.parameters():
        if p.trainable:
            n_train += np.prod(p.shape)
        else:
            n_non_train += np.prod(p.shape)
    return n_train + n_non_train, n_train, n_non_train

```

### dispaly network structure.

In [1]:
import paddle
import paddle.nn as nn

class LeNet(nn.Layer):
    def __init__(self, num_classes=10):
        super(LeNet, self).__init__()
        self.num_classes = num_classes
        self.features = nn.Sequential(
            nn.Conv2D(
                1, 6, 3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2D(2, 2),
            nn.Conv2D(
                6, 16, 5, stride=1, padding=0),
            nn.ReLU(),
            nn.MaxPool2D(2, 2))

        if num_classes > 0:
            self.fc = nn.Sequential(
                nn.Linear(400, 120),
                nn.Linear(120, 84),
                nn.Linear(
                    84, 10))

    def forward(self, inputs):
        x = self.features(inputs)

        if self.num_classes > 0:
            x = paddle.flatten(x, 1)
            x = self.fc(x)
        return x

lenet = LeNet()

params_info = paddle.summary(lenet, (1, 1, 28, 28))
print(params_info)

# list input demo


class LeNetListInput(LeNet):

    def forward(self, inputs):
        x = self.features(inputs[0])

        if self.num_classes > 0:
            x = paddle.flatten(x, 1)
            x = self.fc(x + inputs[1])
        return x


lenet_list_input = LeNetListInput()
input_data = [paddle.rand([1, 1, 28, 28]), paddle.rand([1, 400])]
params_info = paddle.summary(lenet_list_input, input=input_data)
print(params_info)

# dict input demo


class LeNetDictInput(LeNet):

    def forward(self, inputs):
        x = self.features(inputs['x1'])

        if self.num_classes > 0:
            x = paddle.flatten(x, 1)
            x = self.fc(x + inputs['x2'])
        return x


lenet_dict_input = LeNetDictInput()
input_data = {'x1': paddle.rand([1, 1, 28, 28]),
              'x2': paddle.rand([1, 400])}
params_info = paddle.summary(lenet_dict_input, input=input_data)
print(params_info)


---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
   Conv2D-1       [[1, 1, 28, 28]]      [1, 6, 28, 28]          60       
    ReLU-1        [[1, 6, 28, 28]]      [1, 6, 28, 28]           0       
  MaxPool2D-1     [[1, 6, 28, 28]]      [1, 6, 14, 14]           0       
   Conv2D-2       [[1, 6, 14, 14]]     [1, 16, 10, 10]         2,416     
    ReLU-2       [[1, 16, 10, 10]]     [1, 16, 10, 10]           0       
  MaxPool2D-2    [[1, 16, 10, 10]]      [1, 16, 5, 5]            0       
   Linear-1          [[1, 400]]            [1, 120]           48,120     
   Linear-2          [[1, 120]]            [1, 84]            10,164     
   Linear-3          [[1, 84]]             [1, 10]              850      
Total params: 61,610
Trainable params: 61,610
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward

In [5]:
# 输出每一层的名字，然后可以通过名字冻结我们想冻结的层次。 
# 如果想知道layer 间层次结构，可以看看paddle.summary() 代码

for layer in lenet.sublayers():
    print(type(layer).__name__, layer._full_name)

for prefix, layer in lenet.named_sublayers():
    print(prefix, layer)
    
import paddle.nn as nn
def unfreeze(model):
    for p in model.parameters():
        p.trainable = True
        
for layer in model.sublayers():
    # print(type(layer).__name__, layer._full_name)
    if isinstance(layer, nn.TransformerEncoderLayer):
        if layer._full_name.endswith("_1"):
            unfreeze(layer)
            print(f'freeze layer 1 {layer._full_name}')


Sequential sequential_0
Conv2D conv2d_0
ReLU re_lu_0
MaxPool2D max_pool2d_0
Conv2D conv2d_1
ReLU re_lu_1
MaxPool2D max_pool2d_1
Sequential sequential_1
Linear linear_0
Linear linear_1
Linear linear_2
features Sequential(
  (0): Conv2D(1, 6, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (1): ReLU()
  (2): MaxPool2D(kernel_size=2, stride=2, padding=0)
  (3): Conv2D(6, 16, kernel_size=[5, 5], data_format=NCHW)
  (4): ReLU()
  (5): MaxPool2D(kernel_size=2, stride=2, padding=0)
)
features.0 Conv2D(1, 6, kernel_size=[3, 3], padding=1, data_format=NCHW)
features.1 ReLU()
features.2 MaxPool2D(kernel_size=2, stride=2, padding=0)
features.3 Conv2D(6, 16, kernel_size=[5, 5], data_format=NCHW)
features.4 ReLU()
features.5 MaxPool2D(kernel_size=2, stride=2, padding=0)
fc Sequential(
  (0): Linear(in_features=400, out_features=120, dtype=float32)
  (1): Linear(in_features=120, out_features=84, dtype=float32)
  (2): Linear(in_features=84, out_features=10, dtype=float32)
)
fc.0 Linear(in_features

In [11]:
import paddle
import paddle.nn as nn
import numpy as np

# max pool2d
input = paddle.to_tensor(np.random.uniform(-1, 1, [1, 3, 32, 32]).astype(np.float32))
MaxPool2D = nn.MaxPool2D(kernel_size=2, stride=1)   
output = MaxPool2D(input)
# output.shape [1, 3, 16, 16]
output.shape

# for return_mask=True
# MaxPool2D = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, return_mask=True)
# output, max_indices = MaxPool2D(input)

[1, 3, 16, 16]

In [32]:
import paddle
import paddle.nn as nn

paddle.disable_static()

x_var = paddle.uniform((1, 13, 64, 768), dtype='float32', min=-1., max=1.)

conv = nn.Conv2D(13, 13, (3, 768), padding=0)
pool = nn.MaxPool2D(kernel_size=3, stride=1)
flat = nn.Flatten()

# y_var = pool(conv(x_var))
# y_np = y_var.numpy()
# print(y_np.shape)

conv = nn.Conv2D(13, 13, (3, 768), padding=1)
y_var = flat(pool(conv(x_var)))
y_np = y_var.numpy()
print(y_np.shape)


(1, 617396)


  and should_run_async(code)


## PaddleHub

* https://www.paddlepaddle.org.cn/hublist

In [1]:
!export LD_LIBRARY_PATH=/home/$USER/anaconda3/lib:$LD_LIBRARY_PATH

In [2]:
import paddle
import numpy as np

x_data = np.array([[[0, 1, 0],
                    [ 1,  0, 1]]]).astype("float32")
print(x_data.shape)
paddle.disable_static()
x = paddle.to_tensor(x_data, stop_gradient=False)
output = paddle.nn.functional.label_smooth(x)
print(output)

Error: Can not import paddle core while this file exists: /home/jeffye/anaconda3/lib/python3.8/site-packages/paddle/fluid/libpaddle.so


ImportError: libpython3.8.so.1.0: cannot open shared object file: No such file or directory