Snowman

This project focuses on the use of convolutional neural networks for the identification of malicious URLs. The modeling work is largely inspired by

The resulting models are deployed through a REST API so that model scoring and prediction is exposed for generic use.

Once complete, your app will be serving on some randomly-generated subdomain such as "afternoon-bastion-75939.herokuapp.com". You can then query whatever endpoints and services you have set up for yourself. One note about the herkou free tier - because it's free they will auto-hibernate your dyno after 30 minutes of no requests to your app. Once in hibernation, your app won't be able to quickly respond to new requests. But, any new request that comes in while it is asleep will have the affect of waking up your dyno. So then the app gets kicked back into action and will be able to handle requests again in a few seconds.

Backend

For deploying the model behind a backend REST API, I'm using python's Flask- more specifically, I'm mostly using flask_restful for defining endpoints and actions. A previously-trained model is deserialized and loaded. Then when the endpoint "/model" is queried with a PUT or a POST, the provided query string is run against the model for scoring. As described above, hosting is done with heroku.

Modeling

The model I'm using here is a convolutional neural network adapted for text classification. The convolutional kernels, pooling layers, and activations are all fairly standard. I suppose the slightly uncommon thing is how to use convolution with text data - this is why the first layer in the network is an embedding layer. Each character in a string is represented as a point in a low dimensional vector space. Thus, a whole string (character sequence) gives us a numeric array - to which I zero-pad to standard the sizes across all strings in the training data. Then, to this numeric array, we can apply all the convolutions we desire. More details about these ideas can be found in the Yoon Kim paper (above), where this kind of network was applied to movie review sentitment classification.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
fixtures		fixtures
snowman		snowman
test		test
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snowman

Contents

Use

Installation

Deployment

Backend

Modeling

About

Releases

Packages

Languages

keeganhines/snowman

Folders and files

Latest commit

History

Repository files navigation

Snowman

Contents

Use

Installation

Deployment

Backend

Modeling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages