## 计算数组元素个数

In [2]:
import numpy as np

In [4]:
x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

print(np.asarray((unique, counts)).T)

[[ 1  5]
 [ 2  3]
 [ 5  1]
 [25  1]]


In [5]:
from collections import Counter

In [7]:
y=np.array([[1,2,3],[1,2,3,4],[1,2,3,4,5]])

In [9]:
bags_of_words = [ Counter(x) for x in y ]

In [10]:
print(bags_of_words)

[Counter({1: 1, 2: 1, 3: 1}), Counter({1: 1, 2: 1, 3: 1, 4: 1}), Counter({1: 1, 2: 1, 3: 1, 4: 1, 5: 1})]


In [15]:
sumbags = sum(bags_of_words, Counter())

In [16]:
print(sumbags)

Counter({1: 3, 2: 3, 3: 3, 4: 2, 5: 1})


## 决定系数

[决定系数](https://en.wikipedia.org/wiki/Coefficient_of_determination) $R^2$ 来量化模型的表现。模型的决定系数是回归分析中十分常用的统计信息，经常被当作衡量模型预测能力好坏的标准。

$R^2$ 的数值范围从0至1，表示**目标变量**的预测值和实际值之间的相关程度平方的百分比。一个模型的 $R^2$ 值为0还不如直接用**平均值**来预测效果好；而一个 $R^2$ 值为1的模型则可以对目标变量进行完美的预测。从0至1之间的数值，则表示该模型中目标变量中有百分之多少能够用**特征**来解释。模型也可能出现负值的 $R^2$，这种情况下模型所做预测有时会比直接计算目标变量的平均值差很多。

In [7]:
def performance_metric_1(y_true, y_predict):
    """ Calculates and returns the performance score between 
        true and predicted values based on the metric chosen. """
    
    # TODO: Calculate the performance score between 'y_true' and 'y_predict'
    y_true=np.array(y_true)
    y_true_mean=y_true.mean()
    y_predict=np.array(y_predict)
    sstot=np.sum((y_true- y_true_mean)**2)
    ssres=np.sum((y_true-y_predict)**2)
    score = 1-ssres/sstot
    
    # Return the score
    return score

In [8]:
from sklearn.metrics import r2_score
def performance_metric_2(y_true, y_predict):
    """ Calculates and returns the performance score between 
        true and predicted values based on the metric chosen. """
    
    # TODO: Calculate the performance score between 'y_true' and 'y_predict'
    score=r2_score(y_true,y_predict)
    
    # Return the score
    return score