Development goals: January-February 2016 #1399

fchollet · 2016-01-03T22:47:18Z

Last month, we delivered on our key development goals for the period. Keras has made great strides in code quality and documentation.

Here's an update with our new goals. On one side, we will continue improving the codebase and feature set of Keras. On the other side, we will start focusing more on providing the community with a wealth of real applications, rather than just library features. As deep learning engineering becomes increasingly commoditized (most notably by Keras), Keras needs to move up the ladder of abstraction and start providing value at the application level in order to stay relevant for the next 5 years.

These applications will roughly fall into two categories:

end-to-end demonstrations of how to tackle simple ML problems with Keras (e.g. text classification from raw text files, image classification from raw images), packaged as standalone reusable applications
state-of-the-art examples targeted to researchers and advanced users, mostly in the generative / artistic domain.

Development:

fix ongoing issues with RNN masking (Masking passed to the backend RNN calls. #1310)
introduce abstract layers TimeDistributed, Highway, Residual (residual learning).
better support for inception-style architectures, and improvements to the Graph API

Applications:

a Keras model catalog, where each model would include Python code, a JSON configuration, and saved pre-trained weights in HDF5.
a Keras blog introducing interesting Keras applications, hopefully on a weekly or biweekly basis
a DeepDream implementation
a NeuralStyle implementation
Caffe support (Caffe Conversion #921)
interactive dialogue and question answering systems(s)
music generation implementation(s)

As a closing note, I am noticing that the October-December period, rich in ML conferences, has seen the release of over 15 research papers using Keras for their experiments (plus an unknowable count of papers that used Keras without citing it --a majority of papers never cite the open source frameworks they use). This is a positive sign : )

The text was updated successfully, but these errors were encountered:

jfsantos · 2016-01-04T00:35:12Z

I have some code for music generation and a dataset that I can share as an example (currently not working due to the problems with masking, but this is already on the to-do list). It's similar to char-rnn but predicts musical "tokens" instead of characters.

fchollet · 2016-01-04T02:19:08Z

@jfsantos sounds great. It would be neat to turn this into a resuable app (e.g. provide a folder with enough MIDI files or audio files in a certain style, and start generating MIDI tracks or audio files in that style). What is the "token" space you were using?

jfsantos · 2016-01-04T15:12:07Z

The token space I used are ABC notation symbols. They are mostly used for representing music for a single instrument (mostly monophonic, even though there's a notation for chords). I don't know if there are a lot of datasets in this format, but there's the one I used (which contains ~25k tunes).

The code could probably be converted to use MIDI or another format instead of ABC. For other formats, we would need a parser. I considered using the parsers from music21 but that would add an external dependency to the example.

fchollet · 2016-01-04T16:42:37Z

MIDI would certainly be a better format to allow a wide range of people to play around with it. It's a good starting point. I think the killer app would involve learning from audio files and generating audio files, with some "clean" data representation in between (possibly derived from ABC). Previous attempts have been doing it completely wrong, but we could do it right.

ozancaglayan · 2016-01-05T13:55:19Z

Regarding masking, I'm trying to implement a feed-forward network using Graph like the following:

Embedding -> Flatten -> Dense -> ...

I'm padding my short sequences with 0 both in input and outputs. If I set mask_zero=True for the embedding layer, the Flatten and Dense layers are broken as they are not supposed to be used with masks. Changing keras/layers/core.py so that they are derived from MaskedLayer instead of Layer makes the system at least train but I'm not sure if the inner parts are nicely playing with the masks. I assume that this wouldn't be so simple to fix this way :)

Sandy4321 · 2016-01-05T14:41:31Z

may you recommend some paper/video/book/ code examples links to study more
about it pls?

On Tue, Jan 5, 2016 at 8:55 AM, Ozan Çağlayan notifications@github.com
wrote:

Regarding masking, I'm trying to implement a feed-forward network using
Graph like the following:

Embedding -> Flatten -> Dense -> ...

I'm padding my short sequences with 0 both in input and outputs. If I set
mask_zero=True for the embedding layer, the Flatten and Dense layers are
broken as they are not supposed to be used with masks. Changing
keras/layers/core.py so that they are derived from MaskedLayer instead of
Layer makes the system at least train but I'm not sure if the inner parts
are working correctly with the masks. I assume that this wouldn't be so
simple to fix this way :)

—
Reply to this email directly or view it on GitHub
#1399 (comment).

farizrahman4u · 2016-01-05T18:26:30Z

We need a K.tensordot which mimics theano's batched_tensordot but should also work on tensorflow. Memory networks are impossible without dot merge.

fchollet · 2016-01-05T18:29:19Z

That's true, but I think we can wait for TensorFlow to implement tensor
contraction. Rolling out our own implementation would be inefficient.

On 5 January 2016 at 10:26, Fariz Rahman notifications@github.com wrote:

We need a K.tensordot which mimics theano's batched_tensor_dot but should
also work on tensorflow. Memory networks are impossible are impossible
without dot merge.

—
Reply to this email directly or view it on GitHub
#1399 (comment).

fchollet · 2016-01-05T19:13:22Z

may you recommend some paper/video/book/ code examples links to study more
about it pls?

Study what, Keras? Here's a pretty good video intro: https://www.youtube.com/watch?v=Tp3SaRbql4k

farizrahman4u · 2016-01-05T19:54:57Z

Adding new apps is definitely a great step. What I would recommend is start with making the current examples interactive. For e.g, after training babi_memnn, the user should be able to input a story and a question (as natural language text, not word index) and ask questions about it to the model. Instead of each example being a single python file, each should be a folder with sub folders train_data, 'test_data and separate scripts train.py and test.py. This will give absolute control to the user, at the cost of save_weights and load_weights.(train.py saves and test.py loads h5py file). Also, there should be explicit examples for visualization.

meanmee · 2016-01-06T17:01:43Z

I am really happy for hearing about these things, And I think for researchers, The state-of-the-art performance's models are in need. And I suggest If you want to make the examples interactive, It is better to give the users a GUI version. In my opinion, The guys who use keras are either do research or their business, rather than having fun.
Anyway, For me, just a beginner of deep learning for one year, which is almost the same age like keras? It is time to post papers, And I think many people also in the same case.
Wishing keras will add some baseline model proposed in the research papers, And I will do some effort as much as I can

antoniosehk · 2016-01-06T17:04:57Z

I agree. I think people using Keras are mostly for serious stuff (research/business) rather than having fun. I would expect Keras supporting more state-of-the-art models rather than making the examples interactive.

Sandy4321 · 2016-01-06T17:15:41Z

cool
just great
but too short
may share more links like this,pls

On Tue, Jan 5, 2016 at 2:13 PM, François Chollet notifications@github.com
wrote:

may you recommend some paper/video/book/ code examples links to study more
about it pls?

Study what, Keras? Here's a pretty good video intro:
https://www.youtube.com/watch?v=Tp3SaRbql4k

—
Reply to this email directly or view it on GitHub
#1399 (comment).

farizrahman4u · 2016-01-06T19:18:05Z

@Sandy4321 That video covers pretty much all the basics. Also checkout the documentation and examples. If you need help with any specific problem, consider opening a new issue.

fchollet · 2016-01-18T01:40:34Z

Update on our progress so far:

fix ongoing issues with RNN masking: DONE
introduce abstract layers TimeDistributed, Highway, Residual (residual learning): There's a PR from @farizrahman4u, needs more work
better support for inception-style architectures, and improvements to the Graph API: TODO
a Keras model catalog: TODO
a Keras blog: TODO
a DeepDream implementation: DONE
a NeuralStyle implementation: DONE
Caffe support (Caffe Conversion #921): TODO, but @pranv might look at it
interactive dialogue and question answering systems(s): TODO
music generation implementation(s): @jfsantos, now that masking is fixed, will you look at this?

fchollet · 2016-01-18T01:56:33Z

What blogging platform would you guys suggest for the Keras blog? Requirements:

should support custom domain names
decent default themes (e.g. not Blogger) with full customization capabilities
support for code snippet syntax highlighting or ability to add such support
not Tumblr nor Blogger

Maybe we'll end up falling back to Github for content management + S3 for hosting + a custom static site generator. Wouldn't be the first time for me.

Also, what hosting platform would you guys suggest for the (500+MB) weight files of a Keras model Zoo? Hosting on my personal S3 account (as I do for Keras datasets) would be prohibitively expensive.

lukedeo · 2016-01-18T02:10:16Z

I mean, how many weight files are we expecting? A quick check on the AWS calculator shows that 10GB will run ~64 cent/mo.

fchollet · 2016-01-18T02:14:29Z

@lukedeo hosting would be inexpensive. It's downloads that are the problem. Keras has around 30k active users, so we could realistically expect several TBs of downloads every month, which would potentially cost hundreds every month.

lukedeo · 2016-01-18T02:54:35Z

Yikes, I didn't realize Keras was at 30k! I remember reading that rackspace doesn't charge based on bandwidth...might be an option.

jfsantos · 2016-01-18T12:16:16Z

@fchollet I'm going to test my music generation models this week. It's still based on a textual representation of music but it's a start.

Regarding blogging platforms, I recommend Pelican, a static site generator written in Python and aimed at blogs. There's plenty of templates to choose from and it's fairly easy to write your own. It also has a plugin interface for adding generation of pages (e.g. I have one for generating a list of publications from a BibTeX file). We could host it on Github pages (that's what I do for my website). Here's one of my blog posts using LaTeX rendering and code snippets.

wb14123 · 2016-01-19T15:01:14Z

What about just use GitHub pages for blog? It can be written with markdown and controlled under git. Jekyll could be the tool to generate it.

wb14123 · 2016-01-19T15:15:49Z

About the QA system, I'd like to implement with the seq2seq model in this paper. But it seems difficult to implement in Keras since it's not easy to copy the encoder RNN's hidden state to the decoder's. Maybe I can try to train the model in examples/addition_rnn.py with some movie subtitles and see the results.

farizrahman4u · 2016-01-19T15:24:50Z

@wb14123 http://www.github.com/farizrahman4u/seq2seq

wb14123 · 2016-01-19T15:35:11Z

@farizrahman4u Thanks. I've found this project before. It is awesome. But it has some custom layers and I don't know if it is a good idea to use that as an example. I think it's better to just stack some exist layers in an example. Maybe merge your layers into the upstream Keras is a good idea?

farizrahman4u · 2016-01-19T15:54:42Z

@wb14123 As you said, custom layers. They are kind of hackish, and do not work with tensorflow. So I don't think it meets the Keras standards, hence the separate repo.

fchollet · 2016-01-19T19:48:30Z

@jfsantos thanks for the suggestions. Pelican + Github Pages sounds good, we'll probably do that.

datnamer · 2016-01-21T03:35:39Z

Suggestions to increase appeal to industry :
Integration with blaze for learning across many backends (databases, out of core dataframes etc

Timeseries prediction

Anmol6 · 2016-01-22T13:59:39Z

Hey,
I've been using Keras for a couple of weeks now and I'd like to contribute in some way! I'd love to take on some sort of NLP-related example task. Also, this'd be my first open source project.

farizrahman4u · 2016-01-22T16:38:54Z

@Anmol6 Try adding multiple hops to the memory network example as mentioned in the paper. Should be a nice start.

Anmol6 · 2016-01-22T23:58:35Z

@farizrahman4u which paper? and you mean this example: https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py?

farizrahman4u · 2016-01-23T07:20:00Z

Yes. That one. But as you can see, there is only one memory hop, so it will work only for babi task 1. But if you do multiple hops(3 at least), you can do this:

You can get theano code from https://github.com/npow/MemN2N

Anmol6 · 2016-01-28T20:22:22Z

I see, I'll try that out. Thanks!

Anmol6 · 2016-01-29T05:51:25Z

Hey so I'm working on getting the multiple hops done. I'm having trouble figuring out how the code at https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py is employing this step outlined in the paper(if at all):

If that's not being used, could you explain the logic behind the model in the code? Thanks!

farizrahman4u · 2016-01-29T17:55:04Z

Its actually easier than you think. In memory hop1, the output is a function of the question and the story. This is already done in the keras example. In memory hop2, the output is a function of the question, the story and the output of hop1.

pasky · 2016-01-29T20:44:06Z

@farizrahman4u maybe this should move into a more specific issue but I was also confused about the BaBi example, it's not really obvious to me that it implements memory networks.

The match seems to correspond to pre-softmax p vector, but I don't think there's any weighed sum going on, except if I'm confused by the embedding of memories to query_maxlen-dimensional space that I didn't really understand.

The way I'd reproduce the MemN2N construction in the current framework would be to add softmax activation to match, embed input_encoder_c to 64d, and compute match-weighted sum of input_encoder_c elements by A. RepeatVector(64) the match to be able to dot-product B. dot-product the match and inpute_encoder_c. There shouldn't be any place where LSTM enters at this point as the shape at that point is just (batch, 64)? Does that make sense? If the current construction is somehow equivalent to that, sorry for the noise, it's lost on me though.

However, this wouldn't really reproduce MemN2N anyway since it treats memories at a word level, picking the relevant words rather than relevant sentences, which is the story-to-memory segmentation the memory networks use. For that, we'd have to bump the dimensionality of the input and put each memory in a separate 2d tensor, then either use averaging or RNNs to get memory embeddings (which might be possible with the very latest git I guess?).

(P.S.: I work on a bunch of related Keras models that model sentence similarities (at the core that's what MemNNs do too), e.g. https://github.com/brmson/dataset-sts/blob/master/examples/anssel_kst1503.py but I already have some way more complicated ones (e.g. almost reproducing 1511.04108) in my notebooks that I hope to tweak and publish soon - once my deadlines pass during February, I'll be happy to clean up and contribute them to Keras as examples.)

stale · 2017-05-23T22:34:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

shwetgarg · 2018-01-05T08:57:04Z

@pasky Though I am very late on this mail thread, but I completely agree that current babi_memnn.py implementation does not treat memory at sentence level. I am trying to implement end to end memory networks and would appreciate if you can share the code that you have written for the same.

pasky mentioned this issue Feb 12, 2016

Implementation of Dynamic Memory Networks #993

Closed

fchollet mentioned this issue Feb 13, 2016

Added visualization based on Bokeh #1707

Closed

8 tasks

stale bot added the stale label May 23, 2017

stale bot closed this as completed Jun 22, 2017

Development goals: January-February 2016 #1399

Development goals: January-February 2016 #1399

Comments

fchollet commented Jan 3, 2016

Development:

Applications:

jfsantos commented Jan 4, 2016

fchollet commented Jan 4, 2016

jfsantos commented Jan 4, 2016

fchollet commented Jan 4, 2016

ozancaglayan commented Jan 5, 2016

Sandy4321 commented Jan 5, 2016

farizrahman4u commented Jan 5, 2016

fchollet commented Jan 5, 2016

fchollet commented Jan 5, 2016

farizrahman4u commented Jan 5, 2016

meanmee commented Jan 6, 2016

antoniosehk commented Jan 6, 2016

Sandy4321 commented Jan 6, 2016

farizrahman4u commented Jan 6, 2016

fchollet commented Jan 18, 2016

fchollet commented Jan 18, 2016

lukedeo commented Jan 18, 2016

fchollet commented Jan 18, 2016

lukedeo commented Jan 18, 2016

jfsantos commented Jan 18, 2016

wb14123 commented Jan 19, 2016

wb14123 commented Jan 19, 2016

farizrahman4u commented Jan 19, 2016

wb14123 commented Jan 19, 2016

farizrahman4u commented Jan 19, 2016

fchollet commented Jan 19, 2016

datnamer commented Jan 21, 2016

Anmol6 commented Jan 22, 2016

farizrahman4u commented Jan 22, 2016

Anmol6 commented Jan 22, 2016

farizrahman4u commented Jan 23, 2016

Anmol6 commented Jan 28, 2016

Anmol6 commented Jan 29, 2016

farizrahman4u commented Jan 29, 2016

pasky commented Jan 29, 2016

stale bot commented May 23, 2017

shwetgarg commented Jan 5, 2018