Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

char_indices, indices_char for custom training text? #9

Closed
shiffman opened this issue Oct 26, 2017 · 11 comments
Closed

char_indices, indices_char for custom training text? #9

shiffman opened this issue Oct 26, 2017 · 11 comments

Comments

@shiffman
Copy link
Member

In class this week, to compare with my markov chain example, I ran through training an LSTM model using train.py with itp.txt. I've updated the code in this repo to reflect this. The steps I took are:

  1. Run train.py
  2. Run json_checkpoints_var.py
  3. Switch out hamlet.js in example with a new itp.js.

I realize this is tiny tiny data with little training so getting good results isn't really possible. However, I got nonsense results with characters not in the original data set. I'm imagining this has something to do with char_indices and indices_char. Is there a way to auto-generate these during the training process? (They would be different depending on the characters used in the training data, yes?)

@cvalenzuela
Copy link
Member

The dictionary is autogenerated during the training process but if you autogenerate it during the inference mode, there are issues on how javascript and python sort characters. They must be the same in both for the lstm to work.

This was my initial approach:

In python using hamlet.txt in train.py:

chars = sorted(list(set(text)))
char_indices = dict((c,i) for i,c in enumerate(chars))

returns:

{' ': 0, '!': 1, '&': 2, "'": 3, ',': 4, '-': 5, '.': 6, '1': 7, '2': 8, ':': 9, ';': 10, '?': 11, '[': 12, ']': 13, 'a': 14, 'b': 15, 'c': 16, 'd': 17, 'e': 18, 'f': 19, 'g': 20, 'h': 21, 'i': 22, 'j': 23, 'k': 24, 'l': 25, 'm': 26, 'n': 27, 'o': 28, 'p': 29, 'q': 30, 'r': 31, 's': 32, 't': 33, 'u': 34, 'v': 35, 'w': 36, 'x': 37, 'y': 38, 'z': 39}

But the same in lstm.js

let chars = Array.from(new Set(Array.from(text))).sort(); 
let char_indices = chars.reduce((acc, cur, i) => {
  acc[cur] = i;
  return acc;
}, {});

returns:

{ '1': 8, '2': 9, '\n': 0, ' ': 1, '!': 2, '&': 3, '\'': 4, ',': 5, '-': 6, '.': 7, ':': 10, ';': 11, '?': 12, '[': 13, ']': 14, a: 15, b: 16, c: 17, d: 18, e: 19, f: 20, g: 21, h: 22, i: 23, j: 24, k: 25, l: 26, m: 27, n: 28, o: 29, p: 30, q: 31, r: 32, s: 33, t: 34, u: 35, v: 36, w: 37, x: 38, y: 39, z: 40 }

That's why I just copied the dictionary from train.py into js. I'm not sure if there's a better approach to this.

@shiffman
Copy link
Member Author

Ah, I see, perhaps we could add something to either train.py or json_checkpoints_var.py that generates a JSON file with the character tables so that at least removes the manual copying and would be less prone to error?

@shiffman
Copy link
Member Author

I did a round of work on this in 1b8160d and 93e4fe8. I'm not getting good results, this is most likely due to the tiniest data set ever and training for only 50 epochs. But @cvalenzuela can you look over what I've done to training and examples/lstm_1 and see if you notice anything awry?

@shiffman
Copy link
Member Author

I also added a README with instructions on doing the training if you want to take a look.

https://github.com/ITPNYU/p5-deeplearn-js/blob/master/training/lstm/README.md

@cvalenzuela
Copy link
Member

nice!
I'm looking at the example now

@cvalenzuela
Copy link
Member

So I did some test and I think I got better results. Here's what I did:

I trained a new model with the itp.txtdata. In train.pyI updated the length of the dictionary, that always needs to match the source text:

NLABELS = len(chars)

and trained it on 1000 epochs since it's a really small source text.

In the lstm.js the onehot variable needs to have a shape that includes the dictionary:

const onehot = track(deeplearn.Array2D.zeros([1, 32]));

Here is what I'm getting from the lstm_1 example

screen shot 2017-10-26 at 19 54 21

I'll push the code now

@shiffman
Copy link
Member Author

awesome, that's great! Thank you! Can the onehot variable pull its shape dynamically?

@shiffman
Copy link
Member Author

Oh, and feel free to add any of these details to the README in train/!

@cvalenzuela
Copy link
Member

Maybe in the training process, train.pycould output just one file that has all the variables and objects that could later be imported into the sketch file?

We could also try other rnn implementations that use words instead of characters. Maybe using something like this, this or this. I'll post how this goes

@cvalenzuela
Copy link
Member

I updated the README in train/ to reflect this changes

@shiffman
Copy link
Member Author

This is great! I made a small change in the code (2dbaf98) to skip saving the model at step 0. Or maybe it's a good idea to leave this in since you get something immediately even if it's nonsense. In any case, I'm closing this issue for now! We can open new ones as to things come up re: LSTM training and generation examples!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants