1. **Advantages of a CNN over a Fully Connected DNN for Image Classification**:
   - **Parameter Efficiency**: Due to weight sharing in convolutional layers, CNNs use fewer parameters than DNNs. This can reduce the memory usage and speed up training.
   - **Feature Hierarchies**: CNNs are structured in a way to automatically and adaptively learn spatial hierarchies of features from images.
   - **Translation Invariance**: Once a feature is learned, a CNN can recognize it anywhere in an image.
   - **Reduced Overfitting**: Due to the parameter sharing and pooling layers, CNNs are less likely to overfit to the training data.

2. **Parameters and Memory Requirements**:
   - **Parameters**:
     - First Convolutional Layer: 3 x 3 x 3 x 100 = 2,700 (The input has 3 channels)
     - Second Convolutional Layer: 3 x 3 x 100 x 200 = 180,000
     - Third Convolutional Layer: 3 x 3 x 200 x 400 = 720,000
     - Total Parameters = 2,700 + 180,000 + 720,000 = 902,700
   - **Prediction RAM**:
     - 32-bit float = 4 bytes. For one image: 902,700 x 4 bytes = 3,610,800 bytes or ~3.6 MB
   - **Training RAM for Mini-batch of 50 images**:
     - 3.6 MB x 50 = 180 MB (This is a simplistic calculation; actual memory use will be higher due to intermediate values, gradients, etc.)

3. **Solutions for GPU Out-of-Memory Issue**:
   - **Reduce Mini-batch Size**: Smaller batches consume less memory.
   - **Reduce Dimensions**: Reduce the image size or reduce the number of channels or feature maps.
   - **Gradient Checkpointing**: This technique trades compute time for memory.
   - **Use a Simpler Model**: Use a model with fewer layers or fewer neurons per layer.
   - **Spread Across Multiple GPUs**: If available, use data parallelism or model parallelism.

4. **Max Pooling Layer vs Convolutional Layer**:
   - **Computational Efficiency**: Max pooling layers do not have trainable parameters, so they are computationally more efficient than convolutional layers with strides.
   - **Feature Detection**: Max pooling layers capture the most important feature in a local input region, maintaining the spatial hierarchy of the features.

5. **Local Response Normalization Layer**:
   - It was used in some early CNN architectures like AlexNet. The idea is to normalize the activations in a way that considers the activity of neighboring neurons. It can make the activations more robust to slight variations and increase generalization.

6. **Main Innovations in Various Networks**:
   - **AlexNet**: Deeper than LeNet-5, used ReLU instead of tanh for faster training, used dropout for regularization, and employed GPU for faster computation.
   - **GoogLeNet**: Introduced the inception module which allowed for multiple filter sizes in the same layer.
   - **ResNet**: Introduced skip (or residual) connections that skip one or more layers, enabling the training of very deep networks.
   - **SENet**: Introduced the "squeeze-and-excitation" operation that allowed the network to recalibrate the feature maps adaptively.
   - **Xception**: Used depthwise separable convolutions which factorized the convolution operation into a depthwise convolution and a pointwise convolution.

7. **Fully Convolutional Network (FCN)**:
   - It's a network composed only of convolutional layers. To convert a dense layer into a convolutional layer, you can use a convolutional layer with a kernel size equal to the size of the input feature map.

8. **Technical Difficulty of Semantic Segmentation**:
   - One main challenge is handling different objects of the same class that are close to each other or overlap. It's difficult to differentiate and precisely segment each pixel to the correct object or background.

9. **CNN for MNIST**: Building a CNN from scratch would involve:
   - Defining the model architecture, 
   - Compiling the model, 
   - Training the model on the MNIST dataset, 
   - And evaluating its performance. Due to space limitations, a full code example isn't provided here.

10. **Transfer Learning**:
   - This exercise involves multiple steps, from dataset creation to model fine-tuning. Due to the complexity and length of the process, a complete guide with code cannot be provided in this format. However, the steps mentioned provide a clear path to follow. Fine-tuning typically involves taking a pretrained model, freezing the early layers, and retraining only the top layers on the new dataset. 

Remember, when working with deep learning and neural networks, experimentation is key. The above answers provide guidance, but in practice, always validate