In machine learning, support vector machines (SVMs) are [supervised learning](http://en.wikipedia.org/wiki/ Supervised_learning) models with associated learning [algorithms](http://en.wi kipedia.org/wiki/Algorithm) that analyze data and recognize patterns, used for classification and regression analysis. More generally, machine-learning deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions.
Applications for machine learning include:
- Object recognition
- Natural language processing
- Search engines
- Bioinformatics
- Stock market analysis
- Speech and handwriting recognition
- Sentiment analysis
- Recommender systems
- Sequence mining, commonly referred as data mining
- Computational advertising
- Computational finance
Donations are very appreciated. Smaller donations, could fund a latté, during a late night meddling code. While larger donations, could fund further research, by assisting the cost for the following:
- server(s): this could be made open to the public, and implementing machine- learning.
- peripheral device(s): these device(s) could connect to the machine-learning
server(s):
- raspberry pi: these devices could communicate to the machine-learning server(s), or peripheral device(s).
- xbee chip: these chips could implement the zigbee wireless protocol, allowing peripheral device(s) to transmit data between one another, and finally to the machine- learning server(s).
- sensor: multiple types of sensors could be connected via the zigbee wireless protocol to other sensor(s), raspberry pi(s), or directly to the machine-learning server(s).
Please adhere to contributing.md
, when contributing code. Pull requests that deviate from the
contributing.md
, could be labelled
as invalid
, and closed (without merging to master). These best practices
will ensure integrity, when revisions of code, or issues need to be reviewed.
This project implements puppet's r10k
module via vagrant's plugin. A
requirement of this implementation includes a Puppetfile
(already defined),
which includes the following syntax:
#!/usr/bin/env ruby
## Install Module: stdlib (apt dependency)
mod 'stdlib',
:git => "git@github.com:puppetlabs/puppetlabs-stdlib.git",
:ref => "4.6.0"
## Install Module: apt (from master)
mod 'apt',
:git => "git@github.com:puppetlabs/puppetlabs-apt.git"
...
Specifically, this implements the ssh syntax git@github.com:account/repo.git
,
unlike the following alternatives:
https://github.com/account/repo.git
git://github.com/account/repo.git
This allows r10k to clone the corresponding puppet module(s), without a deterrence of DDoS. However, to implement the above syntax, ssh keys need to be generated, and properly assigned locally, as well as on the github account.
The following steps through how to implement the ssh keys with respect to github:
$ cd ~/.ssh/
$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
Enter file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
Enter passphrase (empty for no passphrase): [Type a passphrase]
Enter same passphrase again: [Type passphrase again]
$ ssh-agent -s
Agent pid 59566
$ ssh-add ~/.ssh/id_rsa
$ pbcopy < ~/.ssh/id_rsa.pub
Note: it is recommended to simply press enter, to keep default values
when asked Enter file in which to save the key. Also, if ssh-agent -s
alternative for git bash doesn't work, then eval $(ssh-agent -s)
for other
terminal prompts should work.
Then, at the top of any github page (after login), click Settings > SSH keys > Add SSH Keys
, then paste the above copied key into the Key
field, and click
Add key. Finally, to test the ssh connection, enter the following within
the same terminal window used for the above commands:
$ ssh -T git@github.com
The authenticity of host 'github.com (207.97.227.239)' can't be established.
RSA key fingerprint is 16:27:ac:a5:76:28:2d:36:63:1b:56:4d:eb:df:a6:48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,192.30.252.130' (RSA) to the list of
known hosts.
Hi jeff1evesque! You've successfully authenticated, but GitHub does not provide
shell access.
Fork this project in your GitHub account. Then, clone your repository, with one of the following approaches:
- simple clone: clone the remote master branch.
- commit hash: clone the remote master branch, then checkout a specific commit hash.
- release tag: clone the remote branch, associated with the desired release tag.
cd /[destination-directory]
sudo git clone https://[account]@github.com/[account]/machine-learning.git
cd machine-learning
git remote add upstream https://github.com/[account]/machine-learning.git
Note: [destination-directory]
corresponds to the desired directory path,
where the project repository resides. [account]
corresponds to the git
username, where the repository is being cloned from. If the original
repository was forked, then use your git username, otherwise, use
jeff1evesque
.
cd /[destination-directory]
sudo git clone https://[account]@github.com/[account]/machine-learning.git
cd machine-learning
git remote add upstream https://github.com/[account]/machine-learning.git
# stop vagrant
vagrant halt
# ensure diffs don't prevent checkout, then checkout hash
git checkout -- .
git checkout [hash]
Note: the hashes associated with a release, can be found under the corresponding tag value, on the release page.
Note: [destination-directory]
corresponds to the desired directory path,
where the project repository resides. [account]
corresponds to the git
username, where the repository is being cloned from. If the original
repository was forked, then use your git username, otherwise, use
jeff1evesque
.
cd /[destination-directory]
# clone release tag: master branch does not exist
sudo git clone -b [release-tag] --single-branch --depth 1 https://github.com/[account]/machine-learning.git [destination-directory]
git remote add upstream https://github.com/[account]/machine-learning.git
# create master branch from remote master
cd machine-learning
git checkout -b master
git pull upstream master
# return to release tag branch
git checkout [release-tag]
Note: [release-tag]
corresponds to the release tag
value, used to distinguish between releases.
Note: [destination-directory]
corresponds to the desired directory path,
where the project repository resides. [account]
corresponds to the git
username, where the repository is being cloned from. If the original
repository was forked, then use your git username, otherwise, use
jeff1evesque
.
In order to proceed with the installation for this project, two dependencies need to be installed:
- Vagrant
- Virtualbox (with extension pack)
Once the necessary dependencies have been installed, execute the following command to build the virtual environment:
cd /path/to/machine-learning/
vagrant up
Depending on the network speed, the build can take between 10-15 minutes. So,
grab a cup of coffee, and perhaps enjoy a danish while the virtual machine
builds. Remember, the application is intended to run on localhost, where the
Vagrantfile
defines the exact port-forward on the host machine.
Note: a more complete refresher on virtualization, can be found within the vagrant wiki page.
The following lines, indicate the application is accessible via
localhost:8080
, on the host machine:
...
## Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine. In the example below,
# accessing "localhost:8080" will access port 80 on the guest machine.
config.vm.network "forwarded_port", guest: 5000, host: 8080
config.vm.network "forwarded_port", guest: 443, host: 8585
...
Otherwise, if ssl is configured, then the application is accessible via
https://localhost:8585
, on the host machine.
Note: general convention implements port 443
for ssl.
The web-interface , or GUI implementation, allow users to implement the following sessions:
data_new
: store the provided dataset(s), within the implemented sql database.data_append
: append additional dataset(s), to an existing representation (from an earlierdata_new
session), within the implemented sql database.model_generate
: using previous stored dataset(s) (from an earlierdata_new
, ordata_append
session), generate a corresponding model into the implemented nosql datastore.model_predict
: using a previous stored model (from an earliermodel_predict
session), from the implemented nosql datastore, along with user supplied values, generate a corresponding prediction.
When using the web-interface, it is important to ensure the csv, xml, or json file(s), representing the corresponding dataset(s), are properly formatted. Dataset(s) poorly formatted will fail to create respective json dataset representation(s). Subsequently, the dataset(s) will not succeed being stored into corresponding database tables; therefore, no model, or prediction can be made.
The following are acceptable syntax:
Note: each dependent variable value (for JSON datasets), is an array (square brackets), since each dependent variable may have multiple observations.
As mentioned earlier, the web application can be accessed after subsequent
vagrant up
command, followed by using a browser referencing localhost:8080
(or https://localhost:5050, with ssl), on the host
machine.
The programmatic-interface, or set of API, allow users to implement the following sessions:
data_new
: store the provided dataset(s), within the implemented sql database.data_append
: append additional dataset(s), to an existing representation (from an earlierdata_new
session), within the implemented sql database.model_generate
: using previous stored dataset(s) (from an earlierdata_new
, ordata_append
session), generate a corresponding model into the implemented nosql datastore.model_predict
: using a previous stored model (from an earliermodel_predict
session), from the implemented nosql datastore, along with user supplied values, generate a corresponding prediction.
A post request, can be implemented in python, as follows:
import requests
endpoint_url = 'http://localhost:8080/load-data/'
headers = {'Content-Type': 'application/json'}
requests.post(endpoint_url, headers=headers, data=json_string_here)
Note: the above post
request, can be implemented in a different language,
respectively.
Some additional sample files have been provided, which outline how the data
attribute implement should be implemented, with respect to the above post
implementation:
Note: the content of each of the above files, can substituted for the above
data
attribute.
The following (non-exhaustive) properties define the above implemented data
attribute:
model_id
: the numeric id value, of the generated model in the nosql datastore.model_type
: corresponds to the desired model type, which can be one of the following:classification
regression
session_id
: the numeric id value, that represents the dataset stored in the sql database.session_type
: corresponds to one of the following session types:data_new
data_append
model_generate
model_predict
svm_dataset_type
: corresponds to one of the following dataset types:json_string
: indicate that the dataset is being sent via apost
request
sv_kernel_type
: the type of kernel to apply to the support vectormodel_type
:linear
polynomial
rbf
sigmoid
This project implements unit testing,
to validate logic in a consistent fashion. Currently, only high-level unit
tests have been defined within pytest_svm_session.py
,
and pytest_svr_session.py
.
These unit tests have been automated within corresponding travis builds,
using a series of docker containers, connected via a common docker network:
Current unit tests cover the following sessions:
data_new
data_append
model_predict
model_generate
which can be executed manually as follows:
$ cd /path/to/machine-learning/
$ vagrant up
$ vagrant ssh
vagrant@vagrant-ubuntu-trusty-64:~$ cd /vagrant/test && py.test manual
============================================ test session starts =============================================
platform linux2 -- Python 2.7.6, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /vagrant/test/manual, inifile: pytest.ini
plugins: flask-0.10.0
collected 8 items
manual/programmatic_interface/pytest_svm_session.py ....
manual/programmatic_interface/pytest_svr_session.py ....
========================================= 8 passed in 7.82 seconds ==========================================
Note: future releases (i.e. milestone 1.0), will include more granular unit tests.
Note: every script within this repository, with the
exception
of puppet (erb) templates,
and a handful of open source libraries, have been linted
via .travis.yml
.