Double downsampling in _make_fuse_layers in pose_hrent.py only enlarges the output channel on the second downsample

for w32, it seems that the downsampling is done like this:
input: 64X48 [32 channels] ->  32X24 [32 channels] -> 16X12 [128 channels]
The downsample from 64X48 to 32X24 should enlarge to 64 output channels, to keep the higher resolution info saved in the inner channels, no?