<a id='toc'></a>
* [1. FeatureBase](#feature_base)
    * [1.2 Tests](#fb_tests)
* [2. NumericalFeature](#numerical_feature)
* [3. AggregatedFeature](#aggregated_feature)
* [4. CategoricalFeature](#categorical_feature)
* [5. CategoricalCombiner](#categorical_combiner)
* [6. CategoricalFeaturesManager](#features_storage)

In [1]:
import sys
import numpy as np
import pandas as pd
import scipy
_add_to_path = True

In [2]:
if _add_to_path:
    sys.path.append('../')
from ml.feature import *

FEATURE_PREFIXES = \
{'CAT': '',
 'NUM': '',
 'LE' : '',    # LabelEncoded feature
 'OHE': 'Ohe', # OneHotEncoded feature
 'CTR': 'Ctr', # Counter feature
 'LOO': 'Loo', # LeaveOneOut feature
 'FIL': 'Fil'}

In [3]:
print(FeatureKernel.__doc__)


    FeatureKernel - класс, реализующий общий функционал класса FeatureBase. 
    Является одним из аттрибутов экземпляров класса FeatureBase (SparseFeatureBase и DenseFeatureBase).
    Реализуемые операции включают в себя: 
        1) проверку корректности значений признака 
        2) вывод сообщений о некорректности значений
        3) предобработку и постобработку признаков
        4) получение характеристик признаков (размера, формата и т.п.)
    


<a id='feature_base'></a>
## 1. FeatureBase [[toc](#toc)] [[up](#toc)] [[down](#fb_tests)]

In [4]:
%run test_feature_base.py

.....
----------------------------------------------------------------------
Ran 5 tests in 0.027s

OK


<a id='numerical_feature'></a>
## 2. NumericalFeature<sup>[toc](#toc)</sup>

In [5]:
%run test_numerical_feature.py

.....
----------------------------------------------------------------------
Ran 5 tests in 0.020s

OK


<a id='aggregated_feature'></a>
## 3. AggregatedFeature<sup>[toc](#toc)</sup> <sup>[down](#categorical_feature)</sup>

In [6]:
%run test_aggregated_feature.py

AGGREGATED_FEATURE:
AGGR[[NumericalFeature: f0, (10,)][NumericalFeature: f1, (10,)][NumericalFeature: f2, (10,)][NumericalFeature: f3, (10,)][NumericalFeature: f4, (10,)][NumericalFeature: f5, (10,)]]

VALUES with sparse = True:
  (0, 0)	1
  (1, 0)	1
  (4, 0)	1
  (5, 0)	1
  (6, 0)	1
  (7, 0)	1
  (8, 0)	1
  (1, 1)	1
  (3, 1)	1
  (4, 1)	1
  (7, 1)	1
  (1, 2)	1
  (4, 2)	1
  (8, 2)	1
  (2, 3)	1
  (3, 3)	1
  (4, 3)	1
  (5, 3)	1
  (6, 3)	1
  (0, 4)	1
  (1, 4)	1
  (2, 4)	1
  (3, 4)	1
  (4, 4)	1
  (5, 4)	1
  (7, 4)	1
  (8, 4)	1
  (0, 5)	1
  (1, 5)	1
  (2, 5)	1
  (3, 5)	1
  (4, 5)	1
  (5, 5)	1
  (6, 5)	1
  (7, 5)	1
  (8, 5)	1
  (9, 5)	1

VALUES with sparse = False:
   f0  f1  f2  f3  f4  f5
0   1   0   0   0   1   1
1   1   1   1   0   1   1
2   0   0   0   1   1   1
3   0   1   0   1   1   1
4   1   1   1   1   1   1
5   1   0   0   1   1   1
6   1   0   0   1   0   1
7   1   1   0   0   1   1
8   1   0   1   0   1   1
9   0   0   0   0   0   1

VALUES for features ['f1', 'f2', 'f3']
   f1  f2

<a id='categorical_feature'></a>
# 4. CategoricalFeature<sup>[toc](#toc)</sup>

In [7]:
%run test_categorical_feature.py

INITIAL FEATURE:
True values    :  [0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4]
Obtained values:  [0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4]

True CAT values:  ['A', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'E']
Obtained CATs  :  ['A', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'E']


COUNTER FEATURE
Initial feature:  [CategoricalFeature: f, (15,)]
Counter feature:  [NumericalFeature: Ctrf, (15,)]
[5 5 4 5 4 3 5 4 3 2 5 4 3 2 1]
Values of initial and counter features:
initial: 0 0 1 0 1 2 0 1 2 3 0 1 2 3 4
counter: 5 5 4 5 4 3 5 4 3 2 5 4 3 2 1


FILTERED FEATURES
fil_feature props:  {'is_constant': False, 'is_numeric': True, 'is_label_encoded': True}
fil feature name:  [CategoricalFeature: Fil0_f, (15,)]
ctr feature name:  [NumericalFeature: CtrFil0_f, (15,)]
fil_feature: 0 0 1 0 1 2 0 1 2 3 0 1 2 3 4
cat_feature: A A B A B C A B C D A B C D E
ctr_feature: 5 5 4 5 4 3 5 4 3 2 5 4 3 2 1



fil_feature props:  {'is_constant': False, 'is_

FilOHE is constant:    False



OHE feature with omit_uniques = True and threshold = 3
Initial  feature name: [CategoricalFeature: f, (15,)]
Filtered feature name: [CategoricalFeature: Fil3_f, (15,)]
Initial feature values: [0 0 1 0 1 2 0 1 2 3 0 1 2 3 4]
Filtered fature values: [0 0 1 0 1 2 0 1 2 2 0 1 2 2 2]
threhold = 3   unique_label = 2
FilOHE feature name:   OheFil3_f[[NumericalFeature: A, (1, 15)][NumericalFeature: B, (1, 15)]]
FilOHE feature values:
 [[ 1.  0.]
 [ 1.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 0.  1.]
 [ 0.  0.]
 [ 1.  0.]
 [ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 1.  0.]
 [ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
FilOHE is constant:    False



OHE feature with omit_uniques = False and threshold = 4
Initial  feature name: [CategoricalFeature: f, (15,)]
Filtered feature name: [CategoricalFeature: Fil4_f, (15,)]
Initial feature values: [0 0 1 0 1 2 0 1 2 3 0 1 2 3 4]
Filtered fature values: [0 0 1 0 1 1 0 1 1 1 0 1 1 1 1]
threhold = 4   unique_label = 1
FilOHE feature name:   [Nu

<a id='categorical_combiner'></a>
# 5. CategoricalCombiner<sup>[toc](#toc)</sup>

In [8]:
%run test_categorical_combiner.py

comb_feature: [CategoricalFeature: f1+f2+f3+f4, (12,)]
name = f1+f2+f3+f4, values = [6 0 3 7 4 5 5 7 1 2 3 6]
fil_feature:  [CategoricalFeature: Fil1_f1+f2+f3+f4, (12,)]
name = Fil1_f1+f2+f3+f4, values = [2 4 0 3 4 1 1 3 4 4 0 2]

degree = 1
new_features: OrderedDict([('f1', CategoricalFeature(f1; (12, 1))), ('f2', CategoricalFeature(f2; (12, 1))), ('f3', CategoricalFeature(f3; (12, 1)))])
  f1: 0 1 2 0 1 2 2 0 1 2 2 0
  f2: 0 1 0 1 0 1 1 1 1 0 0 0
  f3: 1 0 1 1 1 1 1 1 1 0 1 1

degree = 2
new_features: OrderedDict([('f1+f2', CategoricalFeature(f1+f2; (12, 1))), ('f1+f3', CategoricalFeature(f1+f3; (12, 1))), ('f2+f3', CategoricalFeature(f2+f3; (12, 1)))])
  f1+f2: 0 2 4 1 3 5 5 1 2 4 4 0
  f1+f3: 0 2 4 0 1 4 4 0 1 3 4 0
  f2+f3: 1 3 1 2 1 2 2 2 2 0 1 1

degree = 3
new_features: OrderedDict([('f1+f2+f3', CategoricalFeature(f1+f2+f3; (12, 1)))])
  f1+f2+f3: 0 2 6 1 4 7 7 1 3 5 6 0

degree = None
new_features: OrderedDict([('f1', CategoricalFeature(f1; (12, 1))), ('f2', CategoricalFeature

<a id='features_storage'></a>
# 6. CategoricalFeaturesManager<sup>[toc](#toc)</sup>

In [9]:
%run test_categorical_features_manager.py

SET UP: ['f1', 'f2', 'f3']

	f1 in manager = True
	f2 in manager = True
	f3 in manager = True

    f1  f2  f3
0    0   1   0
1    1   1   1
2    0   0   0
3    1   1   1
4    2   1   0
5    0   0   1
6    1   1   0
7    2   1   1
8    3   0   0
9    0   1   1
10   1   1   0
11   2   0   1
12   3   1   0
13   4   1   1

DELETING AND SETTING FEATURES:
del f1: ['f2', 'f3']
del f2: ['f3']
del f3: []
set f1: ['f1']
set f2: ['f1', 'f2']
set f3: ['f1', 'f2', 'f3']

COMBINING FEATURES:
[CategoricalFeature: f1, (14,)] [0 1 0 1 2 0 1 2 3 0 1 2 3 4]
[CategoricalFeature: f2, (14,)] [1 1 0 1 1 0 1 1 0 1 1 0 1 1]
[CategoricalFeature: f3, (14,)] [0 1 0 1 0 1 0 1 0 1 0 1 0 1]
[CategoricalFeature: f1+f2, (14,)] [1 2 0 2 4 0 2 4 6 1 2 3 5 7]
[CategoricalFeature: f1+f3, (14,)] [0 2 0 2 4 1 3 5 6 1 3 5 6 7]
[CategoricalFeature: f2+f3, (14,)] [3 2 0 2 3 1 3 2 0 2 3 1 3 2]
[CategoricalFeature: f1+f2+f3, (14,)] [ 3  5  0  5  8  1  4  7 10  2  4  6  9 11]
all: ['f1', 'f2', 'f3', 'f1+f2', 'f1+f3', 'f2+f3', 'f1