# MLToolKit Example
## Using DataTools to Read Write Data From Different Sources
Create Date: December 15, 2019; Last Update: December 31, 2019. 
Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
<hr>

### Current release: PyMLToolKit [v0.1.10]

MLToolKit (mltk) is a Python package providing a set of user-friendly functions to help building machine learning models in data science research, teaching or production focused projects. MLToolkit supports all stages of the machine learning application development process.

### Installation
```
pip install pymltoolkit
```
If the installation failed with dependancy issues, execute the above command with --no-dependencies

```
pip install pymltoolkit --no-dependencies

In [1]:
import mltk

mltk==0.1.10

Some functions of MLToolKit depends on number of Open Source Python Libraries such as
- Data Manipulation : Pandas
- Machine Learning: Statsmodels, Scikit-learn, Catboost
- Deep Learning: Tensorflow, 
- Model Interpretability: Shap, Lime
- Server Framework: Flask
- Text Processing: BeautifulSoup, TextLab
- Database Connectivity: SQLAlchemy, PyODBC
MLToolkit Project acknowledge the creators and contributors of the above libraries for their contribution to the Open Source Community.



## 1. CSV Files

### Read Data

In [2]:
data_connector = {
    'type' : 'csv', #{'snowflake', 'mysql', 'csv', 'pickle', 'hdf'}
    'connect_parameters' : {
        'file_path':r'C:\Projects\Data\incomedata.csv'
    }
}

data_object = {
    'identifiers' :{
        'dataset_label' : 'incomedata'
    }
}

execute_params = {
    'return_time'  : False, 
    'return_rowcount' : False 
}

Data = mltk.read_data(data_connector=data_connector, data_object=data_object, execute_params=execute_params)
Data.head()

csv C:\Projects\Data\incomedata.csv
read time is 0.091 s
read 32,561 records


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,90,?,77053,HS-grad,9,Widowed,?,Not-in-family,White,Female,0,4356,40,United-States,<=50K
1,82,Private,132870,HS-grad,9,Widowed,Exec-managerial,Not-in-family,White,Female,0,4356,18,United-States,<=50K
2,66,?,186061,Some-college,10,Widowed,?,Unmarried,Black,Female,0,4356,40,United-States,<=50K
3,54,Private,140359,7th-8th,4,Divorced,Machine-op-inspct,Unmarried,White,Female,0,3900,40,United-States,<=50K
4,41,Private,264663,Some-college,10,Separated,Prof-specialty,Own-child,White,Female,0,3900,40,United-States,<=50K


### Write Data

In [3]:
data_connector = {
    'type' : 'csv', #{'snowflake', 'mysql', 'csv', 'pickle', 'hdf'}
    'connect_parameters' : {
        'file_path':r'C:\Projects\Data\incomedata_save.csv'
    }
}

data_object = {
    'identifiers' :{
        'dataset_label' : 'incomedata'
    },
    'structure_parameters' : {
        'index' : False
    },
    'format_parameters': {    
        'separator' : ',',
        'quoting' : 'ALL'
    }
}

execute_params = {
    'return_time'  : False, 
    'return_rowcount' : False 
}

mltk.write_data(Data , data_connector=data_connector, data_object=data_object, execute_params=execute_params)

write time is 0.153 s
write 32,561 records


## 2. HDF Files

## Cite as
```
@misc{mltk2019,
  author =  "Sumudu Tennakoon",
  title = "MLToolKit(mltk): A Simplified Toolkit for Unifying End-To-End Machine Learing Projects",
  year = 2019,
  publisher = "GitHub",
  howpublished = {\url{https://mltoolkit.github.io/mltk/}},
  version = "0.1.11"
}
```

<hr>
This notebook and related materials were developed by Sumudu Tennakoon to demostrate the JSON-MLS usage in MLToolkit python library and its interoperability with the standared Python data analysis and machine learning packages (e.g. Pandas, Sci-kitlearn, Statsmodel, TensorFlow, Catboost, etc.)
Create Date: December 15, 201; Last Update: December 31, 2019. 
Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)