Skip to content
Records of papers I've read since 2018/04/15.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Generative-Model
Neural-Style-Transfer
Speech
Voice-Transfer
img
README.md

README.md

Papers-Reading

My reading notes on DL papers, along with my personal comment of each paper, so there may exist lots of mistakes, I really appreciate you to point out.

Neural Style Transfer

  • Neural Style Transfer: A Review ⭐️⭐️⭐️⭐️
    • Investigate the works of Neural Style Transfer till May of 2016.
  • Demystifying Neural Style Transfer
    • Prove that matching the Gram matrices is actually equivalent to minimize the Maximum Mean Discrepancy(MMD) with second order polynomial kernel.
    • Try out for different kernels and parameters.
  • Fast Patch-based Style Transfer of Arbitrary Style
    • A more advanced version of "Fast" Neural Style Transfer that can run in real-time and applies to infinite kind of styles.
    • The drawback is the quality of stylized images is worse than "Fast" Neural Style which yet can only applies to finite styles.

Generative Model

VAE

GAN for Image

  • Self-Attention Generative Adversarial Networks(!!Important)⭐️⭐️⭐️⭐️⭐️

    • Self-Attention GAN, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset.
    • Using Self-Attention to learn long-range dependency.
    • Several tricks inside:
      • Used Spectral-Normalization both on generator and discriminater, it proved to be more stable when training compared with SN-GAN.
      • Showed two-timescale update rule (TTUR) is an effect way for faster converge.
      • Indicated that self-attention mechanism at the middle-to-high level feature maps (e.g., feat32 and feat64) achieve better performance than at low level feature maps. The reason could be that the network receives more evidence with larger feature maps and enjoys more freedom to choose the conditions.
  • Conditional Generative Adversarial Nets⭐️⭐️⭐️⭐️

    • cGAN, you can embed information to control the generated result.
    • The information is feeded both into generator & discriminator. This can be done by concating the z(after fc) with label y(after fc).
    • They experimented on MNIST generation with given number as y(one-hot), and a multimodel tagging, especially for the tagging work, they use an image as information by letting it pass through pretrained CNN to be the y.
  • Wasserstein GAN

GAN for Text&Audio generation

Attention

Speech

WaveNet

  • Pixel Recurrent Neural Networks(Best Paper of ICML2016) ⭐️⭐️⭐️⭐️
    • I quickly skimmed this paper, it introduced a new method to generate image pixel by pixel with sequence model, which means you can only predict current pixel by it's previous pixels(namely the pixels above and to the left of it). To achieve this, they introduce a mask to make sure model can not read later pixels.
    • The loss curve is much more smooth and interpretatable compared to GAN.
  • Conditional Image Generation with PixelCNN Decoders ⭐️⭐️⭐️⭐️⭐️
    • An improvement to PixelRNN & PixelCNN by adding an additional Gated activation unit.
    • Use two stack(vertical and horizontal) to aviod the blind spot in Mask.
    • Explore the performance of image generation in this kind of Gated PixelCNN in conditional distribution image, actually it seems not as good as GAN but, still another method and therefore lead to the famous WaveNet.
  • WaveNet: A Generative Model for Raw Audio ⭐️⭐️⭐️⭐️⭐️
    • A summary of papers of above, and use these methods in audio.
    • Keywords: fuse the technic of Dilated Casual Convolution, Gated Activation Units and residual network along with skip connections.
    • Based on Conditional WaveNet, they explored the experiments of Multi-Speaker Speech Generation, TTS(Text-To-Speech) and Music Generation by feeding additional input h. In speech generation, it's speaker ID of one-hot vector, in TTS it's the text while in music generation it's the tag of generated musich, like the instruments or the genre.
  • Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Tactron

Deep Voice

Others

Speech Conversion(Voice Style Transfer)

Papers related with my current research.

Some most related work !

Random CNN

Self-Attention

Texture

WaveNet Based

  • A Wavenet for Speech Denoising.(ICASSP2018)
    • An end-to-end learning method for speech denoising based on Wavenet.
  • A Universal Music Translation Network(FAIR. 2018,May 21th)⭐️⭐️⭐️⭐️
    • Use WaveNet autoencoder to translate music across musical instruments, genres, and styles. All instruments share the same encoder, but with different decoder.
    • Two major loss, one is for the loss between decoder resconstruction with the ground-truth. the other is an instrument classification loss.
    • The results can be listened on youtube. Though the transfer result is not as good as human musician for known voice, for the unknown voice(like whistling) the transfer results is even better than human. (Maybe because human are not so familiar with the melody ?)
    • They distance their work with Style Transfer, because they believe that a melody played by a piano is not similar except for audio texture differences to the same melody sung by a chorus
  • Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders(Submitted on 5 Apr 2017)

VAE & GAN for Speech

License

This project is licensed under the terms of the MIT license.

You can’t perform that action at this time.