Trained model usage #1

goooroooX · 2018-01-12T15:55:22Z

Hi,
Could you please post a few lines of code with a sample of checking domain name against trained model and returning result (generated/non-generated)?
Thanks!

drhyrum · 2018-01-12T17:31:51Z

Thanks for your interest. This code is meant to reproduce the figures in the paper
https://arxiv.org/abs/1611.00791

but you can also query the trained model directly, as follows.

After you've trained the model using data X,y and have valid_chars
https://github.com/endgameinc/dga_predict/blob/master/dga_classifier/lstm.py#L28-L46

you may query the model using the following steps ("domain.xyz")
(1) remove the TLD from the domain
(2) encode domain characters as integer tokens and pad
(3) query the model

# assumes you've already trained the model and have access to "valid_chars"
import tldextract
from keras.preprocessing import sequence
query_domain = 'domain.xyz'
query_domain_stripped = tldextract.extract(query_domain).domain
query = sequence.pad_sequences( [[valid_chars[y] for y in query_domain_stripped]], maxlen=maxlen) 
print( model.predict(query) )

>> [[0.00203814]]

You can find more information in a related blog post:
https://www.endgame.com/blog/technical-blog/using-deep-learning-detect-dgas

goooroooX · 2018-01-16T13:11:12Z

Thank you for a sample.
Is it possible to avoid external libraries usage (keras)? I'm trying to implement a light-weight solution for monitoring and limited with native Python libraries in sandbox.
Thanks!

drhyrum · 2018-01-16T16:19:41Z

This isn't straightforward, and beyond the scope of this repo.

One option: export the keras model as a tensorflow model, then investigate using something like https://github.com/riga/tfdeploy to make numpy as the only dependency. I'm not aware of a fail-safe method to do the first step (export keras to tensorflow), but you might find some resources here:

Another route would be to create your own model from scratch using another framework that you find suitable. For example, I believe that numpy is the CPU backend for https://github.com/chainer/chainer. In that case, this repo would only serve as a guide (and data) to you rewriting and training your own model.

drhyrum closed this as completed Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trained model usage #1

Trained model usage #1

goooroooX commented Jan 12, 2018

drhyrum commented Jan 12, 2018 •

edited

goooroooX commented Jan 16, 2018

drhyrum commented Jan 16, 2018

Trained model usage #1

Trained model usage #1

Comments

goooroooX commented Jan 12, 2018

drhyrum commented Jan 12, 2018 • edited

goooroooX commented Jan 16, 2018

drhyrum commented Jan 16, 2018

drhyrum commented Jan 12, 2018 •

edited