# ML-SQL package

## By: Neeraj Asthana (under Professor Robert Brunner)

### Summer 2016 UIUC

Created a python package named **mlsql** that defines the parser and relevant functions for the ML-SQL language.

The two main functions in this package are **mlsql.repl()** and **mlsql.execute**.

1. The **repl()** function creates a sql like shell to continuously run ML-SQL commands.

2. The **execute()** function can be imported by other modules for executing ML-SQL commands that are defined by strings. 

The current keywords and functionalities are:

- READ *file* (with options to specify header existence and separtor)
- SPLIT (into training, testing, and validation sets)
- CLASSIFY *predictors*, *labels* (with algorithms for SVM and Logistic Regression) [hyperparameters can be specified as well]
- REGRESSION *predictors*, *labels* (with algorithms for Simple Linear Regression, Lasso, and Ridge)

In [1]:
from mlsql import execute

In [2]:
#Read
query1 = 'READ /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data (sep="," header=False)'

execute(query1)

['/home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data', ',', 'False']
     0    1    2    3            4
0  5.1  3.5  1.4  0.2  Iris-setosa
1  4.9  3.0  1.4  0.2  Iris-setosa
2  4.7  3.2  1.3  0.2  Iris-setosa
3  4.6  3.1  1.5  0.2  Iris-setosa
4  5.0  3.6  1.4  0.2  Iris-setosa
filename: /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data
header: False
separator: ,
train size: 
test size: 
predictors: 
label: 
algorithm: 



In [3]:
#Read and Split
query2 = 'READ /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data (sep="," header=False) \
            SPLIT (train = .8, test = .2, validation = .0)'

execute(query2)

['/home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data', ',', 'False', '.8', '.2', '.0']
     0    1    2    3            4
0  5.1  3.5  1.4  0.2  Iris-setosa
1  4.9  3.0  1.4  0.2  Iris-setosa
2  4.7  3.2  1.3  0.2  Iris-setosa
3  4.6  3.1  1.5  0.2  Iris-setosa
4  5.0  3.6  1.4  0.2  Iris-setosa
filename: /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data
header: False
separator: ,
train size: .8
test size: .2
predictors: 
label: 
algorithm: 



In [4]:
#Read Split and Classify
query3 = 'READ /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data (sep="," header=False) \
            SPLIT (train = .8, test = .2, validation = .0) \
            CLASSIFY (predictors = (1,2,3,4), label = 5, algorithm = SVM)'

execute(query3)

['/home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data', ',', 'False', '.8', '.2', '.0', '1', '2', '3', '4', '5', 'SVM']
     0    1    2    3            4
0  5.1  3.5  1.4  0.2  Iris-setosa
1  4.9  3.0  1.4  0.2  Iris-setosa
2  4.7  3.2  1.3  0.2  Iris-setosa
3  4.6  3.1  1.5  0.2  Iris-setosa
4  5.0  3.6  1.4  0.2  Iris-setosa
filename: /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data
header: False
separator: ,
train size: .8
test size: .2
predictors: ['1', '2', '3', '4']
label: 5
algorithm: SVM



In [5]:
#Read Split and Regression
query4 = 'READ /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data (sep="," header=False) \
            SPLIT (train = .8, test = .2, validation = .0) \
            REGRESSION (predictors = (1,2,4), label = 3, algorithm = LASSO)'

execute(query4)

['/home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data', ',', 'False', '.8', '.2', '.0', '1', '2', '4', '3', 'LASSO']
     0    1    2    3            4
0  5.1  3.5  1.4  0.2  Iris-setosa
1  4.9  3.0  1.4  0.2  Iris-setosa
2  4.7  3.2  1.3  0.2  Iris-setosa
3  4.6  3.1  1.5  0.2  Iris-setosa
4  5.0  3.6  1.4  0.2  Iris-setosa
filename: /home/ubuntu/notebooks/ML-SQL/dataflows/Classification/iris.data
header: False
separator: ,
train size: .8
test size: .2
predictors: ['1', '2', '4']
label: 3
algorithm: LASSO

