Add nnseq function to quanteda.classifiers #9

pchest · 2019-05-13T22:25:23Z

No description provided.

…sifiers into mod_edits

kbenoit

OK, I rewrote this pretty substantially, and fixed the CI so that it installs tensorflow before trying the tests. I also fixed a few inconsistent behaviours.

@pchest I reverted the function signature to the one I'd developed in the (now removed) slstm function since we only need the arguments not passed through to keras::fit() via .... The rest are either hardwired or included as options in the function. This is clearly documented in the Rd now.

Also there were some errors, especially about how the "one-hot encoding" (that is such a stupid term) of the outcome variable was done, so if you were using this in the paper before, I suspect the results were wrong. Check the diff carefully - in fact this is worth studying closely both to check what I did and to learn from it.

Issues to note:

predict(tmod, type = "class") (for both models) now returns a factor with the levels equal to those from the training set. This is true even if a single case is predicted (it will still have as many levels as originally existed). This is how textmodel_nb() works so I thought we should make the behaviour consistent with that.
The keras NN models are inherently stochastic. This means that setting the seed in R will not affect this. There is a discussion about this "problem" here along with some suggestions, but the results, especially for the small datasets we use, are just not deterministic. A slight problem from a replication standpoint!
I added an example using our new datasets to ?textmodel_nnseq for immigration sentences, check it out since the results are very encouraging. This is also as clean a workflow as I can provide as an example.
The model object in keras operates differently from most other R objects that are passed by value, in that they are just references to objects modified "in place". Our function seems to work now, where the (pointer?) is attached to the fitted model object as object$seqfitted and it seems to work when we save it, and then restart R and then reload the saved object, but this might cause some weirdnesses later. I guess we will solve those when we come to them.

pchest · 2019-05-15T04:13:41Z

Hey Ken, I'm glad that you found these errors and thank you for correcting them!

Regarding the one-hot encoding error, I'm having a bit of trouble identifying where the error was. Your new code and the old code appear to have a fairly similar workflow, with yours using as.integer while the old code uses the as.numeric function. Was the issue that to_categorical doesn't work properly if its given a non-integer input?
That is in interesting point about the seeds. I read the link you posted and apparently there is a function that may address this issue: use_session_with_seed, though this is not an ideal solution, as it disables GPU or parallel model operations.
Your example works very nicely! I did a bit of additional analysis to see how well the model predicts immigration labels out of sample with the UK manifesto sentences. When I compared the test labels with predicted labels I got an F1 score of 0.56, which is quite high relative to performance we've seen before from the seq model and even the SVM model!

kbenoit · 2019-05-15T04:58:00Z

On the one-hot encoding error, it was at aa319ab#diff-c3df496465c72d7d97b36a144fd2093fL60. The indexing is from 0:numclasses, and somehow the earlier code included an extra class.

pchest and others added 22 commits May 6, 2019 09:41

SEQ changes

ffcae51

Merge remote-tracking branch 'origin/master'

a9af202

Add test and change function name

8d5ba14

Updated namespace

d829faf

Changed input case and defined output as list of type textmodel_nnseq

01bedcd

Update variable documentation

81f2939

Added summary, print, and predict functionalities to nnseq

c3f147e

Add summary, print, and predict functionalities to nnseq

f2a764f

Merge branch 'mod_edits' of https://github.com/quanteda/quanteda.clas…

adad0a5

…sifiers into mod_edits

Add tests + fix nnseq bugs

24e0900

Update parameter description

8e0189b

Remove redundant file and update namespace

86c061f

Deprecate textmodel_svmlin

f6d19ad

return factor for predict(..., type = "class")

9ff62b8

delete mis-named slstm model

63dc870

Set verbosity options for keras on load

8a2f40f

Remove unnecessary packages from Imports

b0712e9

Rewrite textmodel_nseq and tests

aa319ab

Merge branch 'master' into mod_edits

a2c222d

Add example for textmodel_nnseq()

9d89c6a

Install tensorflow for Travis and Appveyor cfg

85c1b0c

Adjust tests to deal with randomness

5b57ddb

kbenoit approved these changes May 14, 2019

View reviewed changes

kbenoit merged commit 0e275a0 into master May 15, 2019

kbenoit deleted the mod_edits branch May 15, 2019 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nnseq function to quanteda.classifiers #9

Add nnseq function to quanteda.classifiers #9

pchest commented May 13, 2019

kbenoit left a comment

pchest commented May 15, 2019

kbenoit commented May 15, 2019

Add nnseq function to quanteda.classifiers #9

Add nnseq function to quanteda.classifiers #9

Conversation

pchest commented May 13, 2019

kbenoit left a comment

Choose a reason for hiding this comment

pchest commented May 15, 2019

kbenoit commented May 15, 2019