Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Style-conditioned music generation. #302
The image style transfer research over the past year is capable of generating a new image conditioned on a painting’s style. The original results were revelatory and the more recent research directions have been very promising.
This exciting development serves well as an analogy to music. In that domain, we don’t yet have a great way to condition on the style of an artist or genre, let alone transfer styles while preserving content.
The problem therefore is to come up with a good way to do style-conditioned music generation, where style is either artist or genre. The ideal would be that this leads to a natural way to do style transfer as well. Magenta is interested in this because we think it would be a great tool for musicians, an interesting research direction into what are the core parts of music, and possibly would require developing new techniques for music machine learning.
I am currently investigating this exact idea (though not specifically pertaining to Magenta, I have found it to be a tremendous starting point) as an independent research project at university. I am currently running tests on the pre-trained models provided with the recently published "call and response" MIDI interface. These preliminary tests involve attempting to get the program(s) to recognize simple and rudimentary patterns used in music, such as arpeggios or scales.
If at all possible, I would like to request the training data used for the pre-trained models, as I would like to investigate the possible effect of priming for each of them. The paper, Generating Sequences with Recurrent Neural Networks by Graves (link) points to the possible affect on stylistic generation that priming can have. It is possible for the training data to be specifically chosen to represent a specific style, and have that style replicated in generation, as shown in the paper by Graves above, and in A First Look at Music Composition using LSTM Recurrent Neural Networks by Eck and Schmidhuber (link), who in their paper found that LSTMs were capable of replicating a "bebop jazz" style when trained on chords for a similar style.
Obviously priming cannot be the only solution to this, and it certainly would not be the most efficient solution. Re-training the machine each time one wants to replicate a different artistic style seems rather costly. However if priming yields promising results, it could be used to develop more efficient machines and ones which present a much better focus on stylistic generation. This is the goal of the aforementioned tests involving basic patterns.
These are of course the opinions of a first-year undergraduate with limited applicable experience in the fields of deep learning, generative models, and music theory, so this can all be taken with a significantly sized grain of salt. Any insight that the more experienced among this project's contributors have would be greatly appreciated.