This is a collection of theoretical and practical questions, so I'll address each one separately:

1. **Initializing Weights**:
   No, you shouldn't initialize all the weights to the same value, even if that value comes from a random initialization like He. If weights are identical, during backpropagation all neurons in a given layer will receive the same gradient and make the same update, essentially making them identical throughout training. This negates the very essence of deep networks, as all neurons would act the same.

2. **Bias Initialization**:
   Yes, it's common practice to initialize biases to 0. They learn their required non-zero values during training.

3. **SELU vs. ReLU**:
   - **Self-normalization**: SELU activations tend to maintain a mean close to 0 and standard deviation close to 1, which helps combat the vanishing/exploding gradient problems.
   - **Leaky**: Like leaky ReLU, it can avoid dead neurons because it's non-zero for negative inputs.
   - **Defined for all values**: It's smooth and differentiable everywhere.

4. **Activation Functions Use Cases**:
   - **SELU**: Good default for deep networks, especially when network architecture supports self-normalization.
   - **Leaky ReLU/variants**: Useful in deep networks to avoid dead neurons and combat vanishing gradient issues.
   - **ReLU**: A traditional default for many scenarios due to its simplicity.
   - **Tanh**: Useful when you need outputs between -1 and 1.
   - **Logistic (Sigmoid)**: Mostly used in binary classification problems in the output layer.
   - **Softmax**: For multi-class classification tasks in the output layer.

5. **High Momentum in SGD**:
   If momentum is set too close to 1, the updates can become very large, and the optimizer might overshoot the minimum and diverge, leading to training instability.

6. **Sparse Model Production**:
   - **Pruning**: Train the model, then remove neurons/weights with the smallest absolute values.
   - **Regularization**: Using L1 regularization which encourages weights to become exactly zero.
   - **Dropout**: While it doesn't produce a traditionally sparse model, dropout can act like a model with fewer neurons.

7. **Dropout and Training/Inference**:
   Dropout can slow down convergence during training because it "turns off" neurons randomly, making the effective network smaller. However, during inference, dropout is turned off, so it doesn't slow down predictions. **MC Dropout**, on the other hand, involves running inference with dropout multiple times and averaging the results, so it will be slower than standard inference.

8. **Deep Neural Network on CIFAR10**:
   While I can't directly train a model here, I can provide a blueprint to follow:

   a. **DNN with 20 hidden layers**:
   ```python
   model = keras.models.Sequential()
   model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))
   for _ in range(20):
       model.add(keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"))
   model.add(keras.layers.Dense(10, activation="softmax"))
   ```

   b. **Nadam & Early Stopping**:
   ```python
   optimizer = keras.optimizers.Nadam(lr=5e-5)
   early_stopping = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
   model.compile(optimizer=optimizer, loss="sparse_categorical_crossentropy", metrics=["accuracy"])
   history = model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), callbacks=[early_stopping])
   ```

   c. **Batch Normalization**:
   Introduce `BatchNormalization` layers between the dense layers. It can help in faster convergence and potentially better models, but might slightly slow down training due to additional operations.

   d. **Replacing with SELU**:
   Standardize input features, change the initializer to `lecun_normal`, and replace `elu` with `selu`.

   e. **Alpha Dropout & MC Dropout**:
   Introduce `AlphaDropout` layers in the network. For MC Dropout, during inference, run the model multiple times with dropout still enabled and average the predictions.

Remember, the provided code is a blueprint. Depending on the platform, libraries' versions, and specific requirements, you might need to make adjustments. Always monitor the training process and adjust hyperparameters as needed.