Skip to content
batch machine learning
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
clf
pipe
saved
test
.gitignore
Readme.md
models.py
predict.py
requirements.txt
snippets.md
train.py

Readme.md

Intro

一个Mini型的ml/dl的项目,需要使用者具有一定的编程能力。目录结构为

├── clf
│   │
│   ├── nn
│
├── data
│
├── pipe
│
└── saved
│
└── models.py
├── train.py
└── predict.py
  • 一般情况 data 目录下放置数据集
  • clf 文件夹下是为了自定义的机器学习算法,例如GridSearch SVC等, 而其子文件夹nn用于存放神经网络等深度学习算法
  • pipe 文件夹下放置对数据集的预定义处理, 意味着你可以从任何地方加载并处理你的数据, 例如pipe/iload_aliatec.py即是对此次ATEC风险支付的数据处理
  • saved 为了存放训练好的模型,或者预测后的数据

Useage:

train.py

$ python train.py

Usage: train.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  classification  this for select classification model
  cluster         this for select cluster model
$ python train.py classification --help

Usage: train.py classification [OPTIONS]

  this for select classification model

Options:
  --method TEXT               Your method for training model
  --pipe TEXT                 Data Pipe Line File
  --cross-validation INTEGER  Cross Validation
  --help                      Show this message and exit.

$ python train.py classification --pipe pipe/iload_digits.py --method lg --method rbfsvc

[*] Now Training With LogisticRegression And Model Scores 0.9666666666666667
[*] Now Training With SVC        And Model Scores 0.9805555555555555
[+] Save it in saved/logisticregression.pkl
[+] Save it in saved/svc.pkl

$ python train.py classification --pipe pipe/iload_digits.py --method lg

[*] Now Training With LogisticRegression And Model Scores 0.9666666666666667
[!] saved/logisticregression.pkl Existed
[+] Save it in saved/logisticregression.pkl.second

$ python train.py classification --pipe pipe/iload_iris.py  --method lg --loss neg_log_loss
[*] Now Training With LogisticRegression Loss :  neg_log_loss
 And Model Scores 1.0
[+] Save it in saved/logisticregression.pkl
$ python train.py classification --pipe pipe/iload_iris.py

[!] Now We Will Use Default All Method
[*] Now Training With VotingClassifier And Model Scores 1.0
[*] Now Training With VotingClassifier And Model Scores 1.0
[*] Now Training With AdaBoostClassifier And Model Scores 0.9333333333333333
[*] Now Training With GaussianNB And Model Scores 1.0
[*] Now Training With XGBClassifier And Model Scores 1.0
[*] Now Training With LogisticRegression And Model Scores 1.0
[*] Now Training With SVC        And Model Scores 1.0
[*] Now Training With KNeighborsClassifier And Model Scores 0.9666666666666667
[*] Now Training With RandomForestClassifier And Model Scores 1.0
[*] Now Training With DecisionTreeClassifier And Model Scores 1.0
[*] Now Training With IGridSVC   And Model Scores 1.0
[!] saved/votingclassifier.pkl Existed
[+] Save it in saved/votingclassifier.pkl.second
[!] saved/votingclassifier.pkl Existed
[+] Save it in saved/votingclassifier.pkl.second
[!] saved/adaboostclassifier.pkl Existed
[+] Save it in saved/adaboostclassifier.pkl.second
[!] saved/gaussiannb.pkl Existed
[+] Save it in saved/gaussiannb.pkl.second
[!] saved/xgbclassifier.pkl Existed
[+] Save it in saved/xgbclassifier.pkl.second
[!] saved/logisticregression.pkl Existed
[+] Save it in saved/logisticregression.pkl.second
[!] saved/svc.pkl Existed
[+] Save it in saved/svc.pkl.second
[!] saved/kneighborsclassifier.pkl Existed
[+] Save it in saved/kneighborsclassifier.pkl.second
[!] saved/randomforestclassifier.pkl Existed
[+] Save it in saved/randomforestclassifier.pkl.second
[!] saved/decisiontreeclassifier.pkl Existed
[+] Save it in saved/decisiontreeclassifier.pkl.second
[!] saved/igridsvc.pkl Existed
[+] Save it in saved/igridsvc.pkl.second

predict.py

载入saved文件夹下的已经保存好的模型进行预测,默认预测结果输出到saved文件夹,分别以predictproba的后缀结尾,还可以通过,自定义输出路径,指定预测结果的输出,例如--out woqu

$ python predict.py predict  --help
Usage: predict.py predict [OPTIONS]

Options:
  --method TEXT  Your method for training model
  --pipe TEXT    Data Pipe Line File
  --out TEXT     Directory for save predict
  --help         Show this message and exit.

$ python predict.py predict  --pipe pipe/iload_iris.py --method saved/adaboostclassifier.pkl

 [####################################]  100% predict use model: AdaBoostClassifier

$python predict.py predict  --pipe pipe/iload_iris.py --method saved/adaboostclassifier.pkl --method saved/decisiontreeclassifier.pkl
  
  [##################------------------]   50% predict use model: AdaBoostClassifier
  [####################################]  100% predict use model: DecisionTreeClassifier

$ python predict.py predict  --pipe pipe/iload_iris.py --method saved
Use Batch Models From /home/mour/MlDl/autoclf/saved
  [###---------------------------------]   10% predict use model: LogisticRegression
  [#######-----------------------------]   20% predict use model: AdaBoostClassifier
  [##########--------------------------]   30% predict use model: XGBClassifier
  [##############----------------------]   40% predict use model: SVC
  [##################------------------]   50% predict use model: GaussianNB
  [#####################---------------]   60% predict use model: KNeighborsClassifier
  [#########################-----------]   70% predict use model: VotingClassifier
  [############################--------]   80% predict use model: DecisionTreeClassifier
  [################################----]   90% predict use model: RandomForestClassifier
  [####################################]  100% predict use model: IGridSVC
$ python predict.py predict  --pipe pipe/iload_iris.py --method saved --out woqu
Use Batch Models From /home/mour/MlDl/autoclf/saved
  [###---------------------------------]   10% predict use model: LogisticRegression
  [#######-----------------------------]   20% predict use model: AdaBoostClassifier
  [##########--------------------------]   30% predict use model: XGBClassifier
  [##############----------------------]   40% predict use model: SVC
  [##################------------------]   50% predict use model: GaussianNB
  [#####################---------------]   60% predict use model: KNeighborsClassifier
  [#########################-----------]   70% predict use model: VotingClassifier
  [############################--------]   80% predict use model: DecisionTreeClassifier
  [################################----]   90% predict use model: RandomForestClassifier
  [####################################]  100% predict use model: IGridSVC

Note

  • 在load数据进行Pipline处理后,再交由自定义算法Pipline处理时可能会有意想不到的错误。(Sklearn本身的问题),可以只在其中一处做Pipline,即只在pipe文件夹下load数据时自定义,也可以只在自定义算法时进行pipline

  • 数据预处理文件的定义需要遵循格式,即要处理内容定义在iload_pipe函数中,预测函数定义在ipredict_pipe

Todo

  • 增加requerments.txt 文件
  • HypeOPT 自动search参数
  • Dask分布式计算
  • 单元测试
  • 增加cluster算法相关
  • 重构predict文件
  • 伪ETL工程目录
  • 性能评价模块
  • 动态创建类的函数
  • 自定义 nn 函数
  • 自定义 clf 函数
  • 支持自定义函数的cross_validation
  • 捕获ctrl+c,中断当前训练器
You can’t perform that action at this time.