-
Notifications
You must be signed in to change notification settings - Fork 867
Description
Hi, I have several interesting observations while using DLRM.
- The following code (line 223 - 225) to randomize data doesn't seem to work:
Lines 214 to 225 in fbc37eb
| # create reordering | |
| indices = np.arange(len(y)) | |
| if split == "none": | |
| # randomize all data | |
| if randomize == "total": | |
| indices = np.random.permutation(indices) | |
| print("Randomized indices...") | |
| X_int[indices] = X_int | |
| X_cat[indices] = X_cat | |
| y[indices] = y |
Maybe here the code should be like:
self.X_int = X_int[indices]
self.X_cat = X_cat[indices]
self.y = y[indices]But fortunately, this code seems never triggered in the current version.
- I got warnings when running the following code (torch=1.10.1, numpy=1.21.5):
UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
Lines 328 to 333 in 9c2fda7
| def collate_wrapper_criteo_offset(list_of_tuples): | |
| # where each tuple is (X_int, X_cat, y) | |
| transposed_data = list(zip(*list_of_tuples)) | |
| X_int = torch.log(torch.tensor(transposed_data[0], dtype=torch.float) + 1) | |
| X_cat = torch.tensor(transposed_data[1], dtype=torch.long) | |
| T = torch.tensor(transposed_data[2], dtype=torch.float32).view(-1, 1) |
Lines 399 to 404 in 9c2fda7
| def collate_wrapper_criteo_length(list_of_tuples): | |
| # where each tuple is (X_int, X_cat, y) | |
| transposed_data = list(zip(*list_of_tuples)) | |
| X_int = torch.log(torch.tensor(transposed_data[0], dtype=torch.float) + 1) | |
| X_cat = torch.tensor(transposed_data[1], dtype=torch.long) | |
| T = torch.tensor(transposed_data[2], dtype=torch.float32).view(-1, 1) |
This might be a bug of PyTorch, see pytorch/pytorch#13918, and I followed the instruction to modify the code as:
X_int = torch.log(torch.tensor(np.array(transposed_data[0]), dtype=torch.float) + 1)
X_cat = torch.tensor(np.array(transposed_data[1]), dtype=torch.long)
T = torch.tensor(np.array(transposed_data[2]), dtype=torch.float32).view(-1, 1)This modification speeds up the training process about 100%, from ~30 ms/it to ~15 ms/it on my machine (12x CPUs, 1x GTX 1060). I guess this is because the collect function is frequently called during training. Anyway, I hope this can be useful for others to train DLRM.
- This is just a small optimization. It seems
X_int, X_cat, T, indicesare all numpy.ndarray here:
Lines 247 to 259 in 9c2fda7
| # create training, validation, and test sets | |
| if split == 'train': | |
| self.X_int = [X_int[i] for i in train_indices] | |
| self.X_cat = [X_cat[i] for i in train_indices] | |
| self.y = [y[i] for i in train_indices] | |
| elif split == 'val': | |
| self.X_int = [X_int[i] for i in val_indices] | |
| self.X_cat = [X_cat[i] for i in val_indices] | |
| self.y = [y[i] for i in val_indices] | |
| elif split == 'test': | |
| self.X_int = [X_int[i] for i in test_indices] | |
| self.X_cat = [X_cat[i] for i in test_indices] | |
| self.y = [y[i] for i in test_indices] |
So, if I rewrite the above code to:
# create training, validation, and test sets
if split == 'train':
self.X_int = X_int[train_indices]
self.X_cat = X_cat[train_indices]
self.y = y[train_indices]
elif split == 'val':
self.X_int = X_int[val_indices]
self.X_cat = X_cat[val_indices]
self.y = y[val_indices]
elif split == 'test':
self.X_int = X_int[test_indices]
self.X_cat = X_cat[test_indices]
self.y = y[test_indices]This can reduce ~15 s when creating the Kaggle dataset on my machine.