P041 数据指标计算 - MAE（mean absolute error）

In [1]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv("p041-predictions.csv")
df.head(5)

Unnamed: 0,y_true,y_pred
0,109.934283,113.175123
1,97.234714,93.383891
2,112.953771,106.184551
3,130.460597,136.57736
4,95.316933,105.626928


In [4]:
def mean_absolute_error(y_true,y_pred):
    return abs(y_true-y_pred).sum() / len(y_true)

In [5]:
mae = mean_absolute_error(df["y_true"],df["y_pred"])
mae

6.791794588005249

P042 数据指标计算 - MSE（mean squared error）

In [9]:
import numpy as np
import pandas as pd

In [10]:
df = pd.read_csv("p042-predictions.csv")
df.head(10)

Unnamed: 0,y_true,y_pred
0,109.934283,113.175123
1,97.234714,93.383891
2,112.953771,106.184551
3,130.460597,136.57736
4,95.316933,105.626928
5,95.317261,104.630062
6,131.584256,123.192081
7,115.348695,112.256571
8,90.610512,93.923147
9,110.851201,120.606652


In [13]:
def mean_squared_error(y_true,y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

In [14]:
mse = mean_squared_error(df["y_true"],df["y_pred"])
mse

74.9471459408194

P043 数据指标计算 - Sigmoid函数

In [15]:
df = pd.DataFrame(
    data = np.random.randn(10),
    columns = ["var1"]
)
df

Unnamed: 0,var1
0,0.403684
1,1.668542
2,-0.626382
3,-1.310729
4,0.697787
5,-0.639653
6,0.153932
7,0.620324
8,0.541642
9,-1.39949


In [16]:
def sigmoid(x):
    return 1 / (1+np.exp(-x))

In [17]:
sigmoid(np.array([1,2,3]))

array([0.73105858, 0.88079708, 0.95257413])

In [18]:
df["var1_sigmoid"] = df["var1"].map(sigmoid)

In [19]:
df

Unnamed: 0,var1,var1_sigmoid
0,0.403684,0.599572
1,1.668542,0.841381
2,-0.626382,0.348331
3,-1.310729,0.212365
4,0.697787,0.667697
5,-0.639653,0.345325
6,0.153932,0.538407
7,0.620324,0.650292
8,0.541642,0.632194
9,-1.39949,0.197897


P044 数据指标计算 - entropy函数

In [20]:
df = pd.DataFrame(
    {
        "val_1" : np.arange(0.01, 1, 0.1),
        "val_2" : 1 - np.arange(0.01, 1, 0.1),
    }
)
df

Unnamed: 0,val_1,val_2
0,0.01,0.99
1,0.11,0.89
2,0.21,0.79
3,0.31,0.69
4,0.41,0.59
5,0.51,0.49
6,0.61,0.39
7,0.71,0.29
8,0.81,0.19
9,0.91,0.09


In [21]:
def entropy(x):
    return -np.sum(x*np.log2(x))

In [22]:
df["entropy"] = df.apply(
    lambda x : entropy([x["val_1"], x["val_2"]]),
    axis=1
)

In [36]:
df

Unnamed: 0,y_true,y_pred
0,1,0
1,0,0
2,1,1
3,2,2
4,1,1
5,0,0
6,1,1
7,1,0
8,0,0
9,1,1


P045 数据指标计算 - accuracy_score准确率

In [37]:
import numpy as np
import pandas as pd

In [38]:
from sklearn.metrics import accuracy_score

In [39]:
df = pd.read_csv("./p045-predictions.csv")

In [40]:
df.head(10)

Unnamed: 0,y_true,y_pred
0,1,0
1,0,0
2,1,1
3,2,2
4,1,1
5,0,0
6,1,1
7,1,0
8,0,0
9,1,1


In [41]:
accuracy = accuracy_score(df["y_true"], df["y_pred"])

In [42]:
accuracy

0.7241379310344828

P046 数据指标计算 - confusion_matrix混淆矩阵

In [43]:
from sklearn.metrics import confusion_matrix

In [44]:
df = pd.read_csv("./p046-predictions.txt")

In [47]:
df.head()

Unnamed: 0,y_true,y_pred
0,1,0
1,0,0
2,1,1
3,2,2
4,1,1


In [48]:
cm = confusion_matrix(df["y_true"],df["y_pred"])

In [49]:
cm

array([[ 6,  1,  0],
       [ 3, 10,  2],
       [ 0,  2,  5]], dtype=int64)