Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development goals: January-February 2016 #1399

Closed
fchollet opened this issue Jan 3, 2016 · 37 comments
Closed

Development goals: January-February 2016 #1399

fchollet opened this issue Jan 3, 2016 · 37 comments

Comments

@fchollet
Copy link
Member

fchollet commented Jan 3, 2016

Last month, we delivered on our key development goals for the period. Keras has made great strides in code quality and documentation.

Here's an update with our new goals. On one side, we will continue improving the codebase and feature set of Keras. On the other side, we will start focusing more on providing the community with a wealth of real applications, rather than just library features. As deep learning engineering becomes increasingly commoditized (most notably by Keras), Keras needs to move up the ladder of abstraction and start providing value at the application level in order to stay relevant for the next 5 years.

These applications will roughly fall into two categories:

  • end-to-end demonstrations of how to tackle simple ML problems with Keras (e.g. text classification from raw text files, image classification from raw images), packaged as standalone reusable applications
  • state-of-the-art examples targeted to researchers and advanced users, mostly in the generative / artistic domain.

Development:

  • fix ongoing issues with RNN masking (Masking passed to the backend RNN calls. #1310)
  • introduce abstract layers TimeDistributed, Highway, Residual (residual learning).
  • better support for inception-style architectures, and improvements to the Graph API

Applications:

  • a Keras model catalog, where each model would include Python code, a JSON configuration, and saved pre-trained weights in HDF5.
  • a Keras blog introducing interesting Keras applications, hopefully on a weekly or biweekly basis
  • a DeepDream implementation
  • a NeuralStyle implementation
  • Caffe support (Caffe Conversion #921)
  • interactive dialogue and question answering systems(s)
  • music generation implementation(s)

As a closing note, I am noticing that the October-December period, rich in ML conferences, has seen the release of over 15 research papers using Keras for their experiments (plus an unknowable count of papers that used Keras without citing it --a majority of papers never cite the open source frameworks they use). This is a positive sign : )

@jfsantos
Copy link
Contributor

jfsantos commented Jan 4, 2016

I have some code for music generation and a dataset that I can share as an example (currently not working due to the problems with masking, but this is already on the to-do list). It's similar to char-rnn but predicts musical "tokens" instead of characters.

@fchollet
Copy link
Member Author

fchollet commented Jan 4, 2016

@jfsantos sounds great. It would be neat to turn this into a resuable app (e.g. provide a folder with enough MIDI files or audio files in a certain style, and start generating MIDI tracks or audio files in that style). What is the "token" space you were using?

@jfsantos
Copy link
Contributor

jfsantos commented Jan 4, 2016

The token space I used are ABC notation symbols. They are mostly used for representing music for a single instrument (mostly monophonic, even though there's a notation for chords). I don't know if there are a lot of datasets in this format, but there's the one I used (which contains ~25k tunes).

The code could probably be converted to use MIDI or another format instead of ABC. For other formats, we would need a parser. I considered using the parsers from music21 but that would add an external dependency to the example.

@fchollet
Copy link
Member Author

fchollet commented Jan 4, 2016

MIDI would certainly be a better format to allow a wide range of people to play around with it. It's a good starting point. I think the killer app would involve learning from audio files and generating audio files, with some "clean" data representation in between (possibly derived from ABC). Previous attempts have been doing it completely wrong, but we could do it right.

@ozancaglayan
Copy link
Contributor

Regarding masking, I'm trying to implement a feed-forward network using Graph like the following:

Embedding -> Flatten -> Dense -> ...

I'm padding my short sequences with 0 both in input and outputs. If I set mask_zero=True for the embedding layer, the Flatten and Dense layers are broken as they are not supposed to be used with masks. Changing keras/layers/core.py so that they are derived from MaskedLayer instead of Layer makes the system at least train but I'm not sure if the inner parts are nicely playing with the masks. I assume that this wouldn't be so simple to fix this way :)

@Sandy4321
Copy link

may you recommend some paper/video/book/ code examples links to study more
about it pls?

On Tue, Jan 5, 2016 at 8:55 AM, Ozan Çağlayan notifications@github.com
wrote:

Regarding masking, I'm trying to implement a feed-forward network using
Graph like the following:

Embedding -> Flatten -> Dense -> ...

I'm padding my short sequences with 0 both in input and outputs. If I set
mask_zero=True for the embedding layer, the Flatten and Dense layers are
broken as they are not supposed to be used with masks. Changing
keras/layers/core.py so that they are derived from MaskedLayer instead of
Layer makes the system at least train but I'm not sure if the inner parts
are working correctly with the masks. I assume that this wouldn't be so
simple to fix this way :)


Reply to this email directly or view it on GitHub
#1399 (comment).

@farizrahman4u
Copy link
Contributor

We need a K.tensordot which mimics theano's batched_tensordot but should also work on tensorflow. Memory networks are impossible without dot merge.

@fchollet
Copy link
Member Author

fchollet commented Jan 5, 2016

That's true, but I think we can wait for TensorFlow to implement tensor
contraction. Rolling out our own implementation would be inefficient.

On 5 January 2016 at 10:26, Fariz Rahman notifications@github.com wrote:

We need a K.tensordot which mimics theano's batched_tensor_dot but should
also work on tensorflow. Memory networks are impossible are impossible
without dot merge.


Reply to this email directly or view it on GitHub
#1399 (comment).

@fchollet
Copy link
Member Author

fchollet commented Jan 5, 2016

may you recommend some paper/video/book/ code examples links to study more
about it pls?

Study what, Keras? Here's a pretty good video intro: https://www.youtube.com/watch?v=Tp3SaRbql4k

@farizrahman4u
Copy link
Contributor

Adding new apps is definitely a great step. What I would recommend is start with making the current examples interactive. For e.g, after training babi_memnn, the user should be able to input a story and a question (as natural language text, not word index) and ask questions about it to the model. Instead of each example being a single python file, each should be a folder with sub folders train_data, 'test_data and separate scripts train.py and test.py. This will give absolute control to the user, at the cost of save_weights and load_weights.(train.py saves and test.py loads h5py file). Also, there should be explicit examples for visualization.

@meanmee
Copy link

meanmee commented Jan 6, 2016

I am really happy for hearing about these things, And I think for researchers, The state-of-the-art performance's models are in need. And I suggest If you want to make the examples interactive, It is better to give the users a GUI version. In my opinion, The guys who use keras are either do research or their business, rather than having fun.
Anyway, For me, just a beginner of deep learning for one year, which is almost the same age like keras? It is time to post papers, And I think many people also in the same case.
Wishing keras will add some baseline model proposed in the research papers, And I will do some effort as much as I can

@antoniosehk
Copy link

I agree. I think people using Keras are mostly for serious stuff (research/business) rather than having fun. I would expect Keras supporting more state-of-the-art models rather than making the examples interactive.

@Sandy4321
Copy link

cool
just great
but too short
may share more links like this,pls

On Tue, Jan 5, 2016 at 2:13 PM, François Chollet notifications@github.com
wrote:

may you recommend some paper/video/book/ code examples links to study more
about it pls?

Study what, Keras? Here's a pretty good video intro:
https://www.youtube.com/watch?v=Tp3SaRbql4k


Reply to this email directly or view it on GitHub
#1399 (comment).

@farizrahman4u
Copy link
Contributor

@Sandy4321 That video covers pretty much all the basics. Also checkout the documentation and examples. If you need help with any specific problem, consider opening a new issue.

@fchollet
Copy link
Member Author

Update on our progress so far:

  • fix ongoing issues with RNN masking: DONE
  • introduce abstract layers TimeDistributed, Highway, Residual (residual learning): There's a PR from @farizrahman4u, needs more work
  • better support for inception-style architectures, and improvements to the Graph API: TODO
  • a Keras model catalog: TODO
  • a Keras blog: TODO
  • a DeepDream implementation: DONE
  • a NeuralStyle implementation: DONE
  • Caffe support (Caffe Conversion #921): TODO, but @pranv might look at it
  • interactive dialogue and question answering systems(s): TODO
  • music generation implementation(s): @jfsantos, now that masking is fixed, will you look at this?

@fchollet
Copy link
Member Author

What blogging platform would you guys suggest for the Keras blog? Requirements:

  • should support custom domain names
  • decent default themes (e.g. not Blogger) with full customization capabilities
  • support for code snippet syntax highlighting or ability to add such support
  • not Tumblr nor Blogger

Maybe we'll end up falling back to Github for content management + S3 for hosting + a custom static site generator. Wouldn't be the first time for me.

Also, what hosting platform would you guys suggest for the (500+MB) weight files of a Keras model Zoo? Hosting on my personal S3 account (as I do for Keras datasets) would be prohibitively expensive.

@lukedeo
Copy link
Contributor

lukedeo commented Jan 18, 2016

I mean, how many weight files are we expecting? A quick check on the AWS calculator shows that 10GB will run ~64 cent/mo.

@fchollet
Copy link
Member Author

@lukedeo hosting would be inexpensive. It's downloads that are the problem. Keras has around 30k active users, so we could realistically expect several TBs of downloads every month, which would potentially cost hundreds every month.

@lukedeo
Copy link
Contributor

lukedeo commented Jan 18, 2016

Yikes, I didn't realize Keras was at 30k! I remember reading that rackspace doesn't charge based on bandwidth...might be an option.

@jfsantos
Copy link
Contributor

@fchollet I'm going to test my music generation models this week. It's still based on a textual representation of music but it's a start.

Regarding blogging platforms, I recommend Pelican, a static site generator written in Python and aimed at blogs. There's plenty of templates to choose from and it's fairly easy to write your own. It also has a plugin interface for adding generation of pages (e.g. I have one for generating a list of publications from a BibTeX file). We could host it on Github pages (that's what I do for my website). Here's one of my blog posts using LaTeX rendering and code snippets.

@wb14123
Copy link
Contributor

wb14123 commented Jan 19, 2016

What about just use GitHub pages for blog? It can be written with markdown and controlled under git. Jekyll could be the tool to generate it.

@wb14123
Copy link
Contributor

wb14123 commented Jan 19, 2016

About the QA system, I'd like to implement with the seq2seq model in this paper. But it seems difficult to implement in Keras since it's not easy to copy the encoder RNN's hidden state to the decoder's. Maybe I can try to train the model in examples/addition_rnn.py with some movie subtitles and see the results.

@farizrahman4u
Copy link
Contributor

@wb14123
Copy link
Contributor

wb14123 commented Jan 19, 2016

@farizrahman4u Thanks. I've found this project before. It is awesome. But it has some custom layers and I don't know if it is a good idea to use that as an example. I think it's better to just stack some exist layers in an example. Maybe merge your layers into the upstream Keras is a good idea?

@farizrahman4u
Copy link
Contributor

@wb14123 As you said, custom layers. They are kind of hackish, and do not work with tensorflow. So I don't think it meets the Keras standards, hence the separate repo.

@fchollet
Copy link
Member Author

@jfsantos thanks for the suggestions. Pelican + Github Pages sounds good, we'll probably do that.

@datnamer
Copy link

Suggestions to increase appeal to industry :
Integration with blaze for learning across many backends (databases, out of core dataframes etc

Timeseries prediction

@Anmol6
Copy link

Anmol6 commented Jan 22, 2016

Hey,
I've been using Keras for a couple of weeks now and I'd like to contribute in some way! I'd love to take on some sort of NLP-related example task. Also, this'd be my first open source project.

@farizrahman4u
Copy link
Contributor

@Anmol6 Try adding multiple hops to the memory network example as mentioned in the paper. Should be a nice start.

@Anmol6
Copy link

Anmol6 commented Jan 22, 2016

@farizrahman4u which paper? and you mean this example: https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py?

@farizrahman4u
Copy link
Contributor

Yes. That one. But as you can see, there is only one memory hop, so it will work only for babi task 1. But if you do multiple hops(3 at least), you can do this:

babi

You can get theano code from https://github.com/npow/MemN2N

@Anmol6
Copy link

Anmol6 commented Jan 28, 2016

I see, I'll try that out. Thanks!

@Anmol6
Copy link

Anmol6 commented Jan 29, 2016

Hey so I'm working on getting the multiple hops done. I'm having trouble figuring out how the code at https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py is employing this step outlined in the paper(if at all):

image

If that's not being used, could you explain the logic behind the model in the code? Thanks!

@farizrahman4u
Copy link
Contributor

Its actually easier than you think. In memory hop1, the output is a function of the question and the story. This is already done in the keras example. In memory hop2, the output is a function of the question, the story and the output of hop1.

@pasky
Copy link
Contributor

pasky commented Jan 29, 2016

@farizrahman4u maybe this should move into a more specific issue but I was also confused about the BaBi example, it's not really obvious to me that it implements memory networks.

The match seems to correspond to pre-softmax p vector, but I don't think there's any weighed sum going on, except if I'm confused by the embedding of memories to query_maxlen-dimensional space that I didn't really understand.

The way I'd reproduce the MemN2N construction in the current framework would be to add softmax activation to match, embed input_encoder_c to 64d, and compute match-weighted sum of input_encoder_c elements by A. RepeatVector(64) the match to be able to dot-product B. dot-product the match and inpute_encoder_c. There shouldn't be any place where LSTM enters at this point as the shape at that point is just (batch, 64)? Does that make sense? If the current construction is somehow equivalent to that, sorry for the noise, it's lost on me though.

However, this wouldn't really reproduce MemN2N anyway since it treats memories at a word level, picking the relevant words rather than relevant sentences, which is the story-to-memory segmentation the memory networks use. For that, we'd have to bump the dimensionality of the input and put each memory in a separate 2d tensor, then either use averaging or RNNs to get memory embeddings (which might be possible with the very latest git I guess?).

(P.S.: I work on a bunch of related Keras models that model sentence similarities (at the core that's what MemNNs do too), e.g. https://github.com/brmson/dataset-sts/blob/master/examples/anssel_kst1503.py but I already have some way more complicated ones (e.g. almost reproducing 1511.04108) in my notebooks that I hope to tweak and publish soon - once my deadlines pass during February, I'll be happy to clean up and contribute them to Keras as examples.)

@stale
Copy link

stale bot commented May 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot closed this as completed Jun 22, 2017
@shwetgarg
Copy link

@pasky Though I am very late on this mail thread, but I completely agree that current babi_memnn.py implementation does not treat memory at sentence level. I am trying to implement end to end memory networks and would appreciate if you can share the code that you have written for the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests