# Abstract away your (machine learning) computation by creating your own python library

<img src="mlautom.png">

Automated architecture selection for regression and classification tasks with just a couple of commands. Optimal architecture will be chosen with Bayesian Optimization.

In [78]:
!pip install ml-automated-123==1.0

Collecting ml-automated-123==1.0
  Downloading ml_automated_123-1.0-py3-none-any.whl (16 kB)
Collecting pytz>=2017.2
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Installing collected packages: ml-automated-123, pytz
  Attempting uninstall: pytz
    Found existing installation: pytz 2016.10
    Uninstalling pytz-2016.10:
      Successfully uninstalled pytz-2016.10
Successfully installed ml-automated-123-1.0 pytz-2019.3


ERROR: gitsome 0.8.4 has requirement pytz<2017.0,>=2016.3, but you'll have pytz 2019.3 which is incompatible.


In [5]:
import ml_automated_123




# House Prices: Advanced Regression Techniques

In [6]:
f_train = 'train_regression1.csv'
f_test = 'test_regression1.csv'

In [7]:
import pandas as pd

In [8]:
pd.read_csv(f_train)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


In [None]:
How does it work

In [9]:

#training
ml_automated_123.automate(path=f_train,
                ignore_cols=[],
                out_dir='model')

#predictions
preds = ml_automated_123.predict(f_test, model_dir='model')
print(preds)


Problem type: REGRESSION
Optimizing RandomForest...
100%|██████████| 20/20 [00:04<00:00,  4.06trial/s, best loss: 744835675.2494552]
  Best score(s): loss: 7.45e+08 r2: 0.805
Optimizing XGBoost...
100%|██████████| 20/20 [00:14<00:00,  1.42trial/s, best loss: 689292605.5782048]
  Best score(s): loss: 6.89e+08 r2: 0.852
Best model selected: XGBRegressor(base_score=0.5, booster=None, colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=0.49156334309367344,
             eta=0.23142768481604303, gamma=0.04875271904207046, gpu_id=-1,
             importance_type='gain', interaction_constraints=None,
             learning_rate=0.21696291566695602, max_delta_step=0, max_depth=7,
             min_child_weight=9, missing=nan, monotone_constraints=None,
             n_estimators=100, n_jobs=-1, num_parallel_tree=1,
             objective='reg:linear', random_state=0, reg_alpha=0, reg_lambda=1,
             scale_pos_weight=1, silent=1, subsample=0.887306137662691,
             t

What was the motivation behind packaging this library and what were some of the initial parts:

1. Find whats the supervised task from the data
2. Do automatic pre-processing of the data
3. Find optimal parameters and algorithms for classification or regression with bayesian optimisation
4. Make the prediction and serialise all of the files (model, prediction etc) afterwards.

Key word for abstracting: Factory pattern. Defines an interface or abstract class for creating an object but let the subclasses decide which class to instantiate.
 
We used it through 'package', 'Factory' and __init__ file in the outer folder. Simply put we abstracted away all of the computation and other folders, and we let methods in the 'package' folder instantiate our Computations in Factory class and then modify and use it.

# How to distribute your library to the world:
What is PyPI? The Python Package Index (PyPI) is a repository of software for the Python programming language. PyPI helps you find and install software developed and shared by the Python community.

Following steps we will follow to publish a library (directory of all of our python files) to PyPI:
    1. Create a PyPI account
    2. Gather all the code in a folder
    3. Push this folder to a github repo
    4. Create a setup.py file connecting to this repo and other configurations
    5. Create a wheel distribution and push to PyPI

In Depth tutorial: 

https://realpython.com/pypi-publish-python-package/#preparing-your-package-for-publication 

https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56