@@ -471,6 +471,22 @@ Predicting with trained model:
471471| +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
472472| | MultinomialNB | scikitmnb | `scikitmnb <https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB/ >`_ |
473473+----------------+-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
474+ | Clustering | KMeans | scikitkmeans | `scikitkmeans <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans/ >`_ |
475+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
476+ | | Birch | scikitbirch | `scikitbirch <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html#sklearn.cluster.Birch/ >`_ |
477+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
478+ | | MiniBatchKMeans | scikitmbkmeans | `scikitmbkmeans <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html#sklearn.cluster.MiniBatchKMeans/ >`_ |
479+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
480+ | | AffinityPropagation | scikitap | `scikitap <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation/ >`_ |
481+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
482+ | | MeanShift | scikitms | `scikitms <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift/ >`_ |
483+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
484+ | | SpectralClustering | scikitsc | `scikitsc <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html#sklearn.cluster.SpectralClustering/ >`_ |
485+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
486+ | | AgglomerativeClustering | scikitac | `scikitac <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering/ >`_ |
487+ | +-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
488+ | | OPTICS | scikitoptics | `scikitoptics <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html#sklearn.cluster.OPTICS/ >`_ |
489+ +----------------+-------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
474490
475491
476492**Usage Example: **
@@ -512,14 +528,14 @@ Let us take a simple example:
512528 $ dffml train \
513529 -model scikitlr \
514530 -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
515- -model-predict Salary \
531+ -model-predict Salary:float:1 \
516532 -sources f=csv \
517533 -source-filename train.csv \
518534 -log debug
519535 $ dffml accuracy \
520536 -model scikitlr \
521537 -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
522- -model-predict Salary \
538+ -model-predict Salary:float:1 \
523539 -sources f=csv \
524540 -source-filename test.csv \
525541 -log debug
@@ -528,7 +544,7 @@ Let us take a simple example:
528544 dffml predict all \
529545 -model scikitlr \
530546 -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
531- -model-predict Salary \
547+ -model-predict Salary:float:1 \
532548 -sources f=csv \
533549 -source-filename /dev/stdin \
534550 -log debug
@@ -549,3 +565,92 @@ Let us take a simple example:
549565 }
550566 ]
551567
568+
569+ Example below uses KMeans Clustering Model on a small randomly generated dataset.
570+
571+ .. code-block :: console
572+
573+ $ cat > train.csv << EOF
574+ Col1, Col2, Col3, Col4
575+ 5.05776417, 8.55128116, 6.15193196, -8.67349666
576+ 3.48864265, -7.25952218, -4.89216256, 4.69308946
577+ -8.16207603, 5.16792984, -2.66971993, 0.2401882
578+ 6.09809669, 8.36434181, 6.70940915, -7.91491768
579+ -9.39122566, 5.39133807, -2.29760281, -1.69672981
580+ 0.48311336, 8.19998973, 7.78641979, 7.8843821
581+ 2.22409135, -7.73598586, -4.02660224, 2.82101794
582+ 2.8137247 , 8.36064298, 7.66196849, 3.12704676
583+ EOF
584+ $ cat > test.csv << EOF
585+ Col1, Col2, Col3, Col4, cluster
586+ -10.16770144, 2.73057215, -1.49351481, 2.43005691, 6
587+ 3.59705381, -4.76520663, -3.34916068, 5.72391486, 1
588+ 4.01612313, -4.641852 , -4.77333308, 5.87551683, 0
589+ EOF
590+ $ dffml train \
591+ -model scikitkmeans \
592+ -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
593+ -sources f=csv \
594+ -source-filename train.csv \
595+ -source-readonly \
596+ -log debug
597+ $ dffml accuracy \
598+ -model scikitkmeans \
599+ -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1\
600+ -model-tcluster cluster:int:1 \
601+ -sources f=csv \
602+ -source-filename test.csv \
603+ -source-readonly \
604+ -log debug
605+ 0.6365141682948129
606+ $ echo -e 'Col1,Col2,Col3,Col4\n6.09809669,8.36434181,6.70940915,-7.91491768\n' | \
607+ dffml predict all \
608+ -model scikitkmeans \
609+ -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
610+ -sources f=csv \
611+ -source-filename /dev/stdin \
612+ -source-readonly \
613+ -log debug
614+ [
615+ {
616+ "extra": {},
617+ "features": {
618+ "Col1": 6.09809669,
619+ "Col2": 8.36434181,
620+ "Col3": 6.70940915,
621+ "Col4": -7.91491768
622+ },
623+ "last_updated": "2020-01-12T22:51:15Z",
624+ "prediction": {
625+ "confidence": 0.6365141682948129,
626+ "value": 2
627+ },
628+ "src_url": "0"
629+ }
630+ ]
631+
632+ **NOTE **: `Transductive <https://scikit-learn.org/stable/glossary.html#term-transductive/ >`_ Clusterers(scikitsc, scikitac, scikitoptics) cannot handle unseen data.
633+ Ensure that `predict ` and `accuracy ` for these algorithms uses training data.
634+
635+ **Args **
636+
637+ - predict: Feature
638+
639+ - Label or the value to be predicted
640+ - Only used by classification and regression models
641+
642+ - tcluster: Feature
643+
644+ - True cluster, only used by clustering models
645+ - Passed with `accuracy ` to return `mutual_info_score `
646+ - If not passed `accuracy ` returns `silhouette_score `
647+
648+ - features: List of features
649+
650+ - Features to train on
651+
652+ - directory: String
653+
654+ - default: /home/user/.cache/dffml/scikit-{Entrypoint}
655+ - Directory where state should be saved
656+
0 commit comments