From cc238fa2ecae71280b6ed7f8b73f2d182ba9c0f0 Mon Sep 17 00:00:00 2001 From: DrDavidS Date: Wed, 18 Sep 2019 14:24:25 +0800 Subject: [PATCH] =?UTF-8?q?=E4=BF=AE=E6=94=B9=E4=BA=86p=E6=97=A0=E7=A9=B7?= =?UTF-8?q?=E6=97=B6=E5=AF=B9=E5=88=87=E6=AF=94=E9=9B=AA=E5=A4=AB=E8=B7=9D?= =?UTF-8?q?=E7=A6=BB=E7=9A=84=E8=AF=B4=E6=98=8E?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 在 p=\infty 情况下,这应当是闵可夫斯基距离的特例,即切比雪夫距离。这里原文应当是把两者搞混了。 --- .../3.KNearestNeighbors-checkpoint.ipynb" | 1304 +++++++++++++++++ .../3.KNearestNeighbors.ipynb" | 4 +- 2 files changed, 1306 insertions(+), 2 deletions(-) create mode 100644 "\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb" diff --git "a/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb" "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb" new file mode 100644 index 0000000..c474547 --- /dev/null +++ "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/.ipynb_checkpoints/3.KNearestNeighbors-checkpoint.ipynb" @@ -0,0 +1,1304 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 第3章 k近邻法" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1.$k$近邻法是基本且简单的分类与回归方法。$k$近邻法的基本做法是:对给定的训练实例点和输入实例点,首先确定输入实例点的$k$个最近邻训练实例点,然后利用这$k$个训练实例点的类的多数来预测输入实例点的类。\n", + "\n", + "2.$k$近邻模型对应于基于训练数据集对特征空间的一个划分。$k$近邻法中,当训练集、距离度量、$k$值及分类决策规则确定后,其结果唯一确定。\n", + "\n", + "3.$k$近邻法三要素:距离度量、$k$值的选择和分类决策规则。常用的距离度量是欧氏距离及更一般的**pL**距离。$k$值小时,$k$近邻模型更复杂;$k$值大时,$k$近邻模型更简单。$k$值的选择反映了对近似误差与估计误差之间的权衡,通常由交叉验证选择最优的$k$。\n", + "\n", + "常用的分类决策规则是多数表决,对应于经验风险最小化。\n", + "\n", + "4.$k$近邻法的实现需要考虑如何快速搜索k个最近邻点。**kd**树是一种便于对k维空间中的数据进行快速检索的数据结构。kd树是二叉树,表示对$k$维空间的一个划分,其每个结点对应于$k$维空间划分中的一个超矩形区域。利用**kd**树可以省去对大部分数据点的搜索, 从而减少搜索的计算量。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 距离度量" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "设特征空间$x$是$n$维实数向量空间 ,$x_{i}, x_{j} \\in \\mathcal{X}$,$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(n)}\\right)^{\\mathrm{T}}$,$x_{j}=\\left(x_{j}^{(1)}, x_{j}^{(2)}, \\cdots, x_{j}^{(n)}\\right)^{\\mathrm{T}}$\n", + ",则:$x_i$,$x_j$的$L_p$距离定义为:\n", + "\n", + "\n", + "$L_{p}\\left(x_{i}, x_{j}\\right)=\\left(\\sum_{i=1}^{n}\\left|x_{i}^{(i)}-x_{j}^{(l)}\\right|^{p}\\right)^{\\frac{1}{p}}$\n", + "\n", + "- $p= 1$ 曼哈顿距离\n", + "- $p= 2$ 欧氏距离\n", + "- $p= \\infty$ 切比雪夫距离" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import math\n", + "from itertools import combinations" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "def L(x, y, p=2):\n", + " # x1 = [1, 1], x2 = [5,1]\n", + " if len(x) == len(y) and len(x) > 1:\n", + " sum = 0\n", + " for i in range(len(x)):\n", + " sum += math.pow(abs(x[i] - y[i]), p)\n", + " return math.pow(sum, 1 / p)\n", + " else:\n", + " return 0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 课本例3.1" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "x1 = [1, 1]\n", + "x2 = [5, 1]\n", + "x3 = [4, 4]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(4.0, '1-[5, 1]')\n", + "(4.0, '1-[5, 1]')\n", + "(3.7797631496846193, '1-[4, 4]')\n", + "(3.5676213450081633, '1-[4, 4]')\n" + ] + } + ], + "source": [ + "# x1, x2\n", + "for i in range(1, 5):\n", + " r = {'1-{}'.format(c): L(x1, c, p=i) for c in [x2, x3]}\n", + " print(min(zip(r.values(), r.keys())))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "python实现,遍历所有数据点,找出$n$个距离最近的点的分类情况,少数服从多数" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "\n", + "from sklearn.datasets import load_iris\n", + "from sklearn.model_selection import train_test_split\n", + "from collections import Counter" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# data\n", + "iris = load_iris()\n", + "df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", + "df['label'] = iris.target\n", + "df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", + "# data = np.array(df.iloc[:100, [0, 1, -1]])" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sepal lengthsepal widthpetal lengthpetal widthlabel
05.13.51.40.20
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20
55.43.91.70.40
64.63.41.40.30
75.03.41.50.20
84.42.91.40.20
94.93.11.50.10
105.43.71.50.20
114.83.41.60.20
124.83.01.40.10
134.33.01.10.10
145.84.01.20.20
155.74.41.50.40
165.43.91.30.40
175.13.51.40.30
185.73.81.70.30
195.13.81.50.30
205.43.41.70.20
215.13.71.50.40
224.63.61.00.20
235.13.31.70.50
244.83.41.90.20
255.03.01.60.20
265.03.41.60.40
275.23.51.50.20
285.23.41.40.20
294.73.21.60.20
..................
1206.93.25.72.32
1215.62.84.92.02
1227.72.86.72.02
1236.32.74.91.82
1246.73.35.72.12
1257.23.26.01.82
1266.22.84.81.82
1276.13.04.91.82
1286.42.85.62.12
1297.23.05.81.62
1307.42.86.11.92
1317.93.86.42.02
1326.42.85.62.22
1336.32.85.11.52
1346.12.65.61.42
1357.73.06.12.32
1366.33.45.62.42
1376.43.15.51.82
1386.03.04.81.82
1396.93.15.42.12
1406.73.15.62.42
1416.93.15.12.32
1425.82.75.11.92
1436.83.25.92.32
1446.73.35.72.52
1456.73.05.22.32
1466.32.55.01.92
1476.53.05.22.02
1486.23.45.42.32
1495.93.05.11.82
\n", + "

150 rows × 5 columns

\n", + "
" + ], + "text/plain": [ + " sepal length sepal width petal length petal width label\n", + "0 5.1 3.5 1.4 0.2 0\n", + "1 4.9 3.0 1.4 0.2 0\n", + "2 4.7 3.2 1.3 0.2 0\n", + "3 4.6 3.1 1.5 0.2 0\n", + "4 5.0 3.6 1.4 0.2 0\n", + "5 5.4 3.9 1.7 0.4 0\n", + "6 4.6 3.4 1.4 0.3 0\n", + "7 5.0 3.4 1.5 0.2 0\n", + "8 4.4 2.9 1.4 0.2 0\n", + "9 4.9 3.1 1.5 0.1 0\n", + "10 5.4 3.7 1.5 0.2 0\n", + "11 4.8 3.4 1.6 0.2 0\n", + "12 4.8 3.0 1.4 0.1 0\n", + "13 4.3 3.0 1.1 0.1 0\n", + "14 5.8 4.0 1.2 0.2 0\n", + "15 5.7 4.4 1.5 0.4 0\n", + "16 5.4 3.9 1.3 0.4 0\n", + "17 5.1 3.5 1.4 0.3 0\n", + "18 5.7 3.8 1.7 0.3 0\n", + "19 5.1 3.8 1.5 0.3 0\n", + "20 5.4 3.4 1.7 0.2 0\n", + "21 5.1 3.7 1.5 0.4 0\n", + "22 4.6 3.6 1.0 0.2 0\n", + "23 5.1 3.3 1.7 0.5 0\n", + "24 4.8 3.4 1.9 0.2 0\n", + "25 5.0 3.0 1.6 0.2 0\n", + "26 5.0 3.4 1.6 0.4 0\n", + "27 5.2 3.5 1.5 0.2 0\n", + "28 5.2 3.4 1.4 0.2 0\n", + "29 4.7 3.2 1.6 0.2 0\n", + ".. ... ... ... ... ...\n", + "120 6.9 3.2 5.7 2.3 2\n", + "121 5.6 2.8 4.9 2.0 2\n", + "122 7.7 2.8 6.7 2.0 2\n", + "123 6.3 2.7 4.9 1.8 2\n", + "124 6.7 3.3 5.7 2.1 2\n", + "125 7.2 3.2 6.0 1.8 2\n", + "126 6.2 2.8 4.8 1.8 2\n", + "127 6.1 3.0 4.9 1.8 2\n", + "128 6.4 2.8 5.6 2.1 2\n", + "129 7.2 3.0 5.8 1.6 2\n", + "130 7.4 2.8 6.1 1.9 2\n", + "131 7.9 3.8 6.4 2.0 2\n", + "132 6.4 2.8 5.6 2.2 2\n", + "133 6.3 2.8 5.1 1.5 2\n", + "134 6.1 2.6 5.6 1.4 2\n", + "135 7.7 3.0 6.1 2.3 2\n", + "136 6.3 3.4 5.6 2.4 2\n", + "137 6.4 3.1 5.5 1.8 2\n", + "138 6.0 3.0 4.8 1.8 2\n", + "139 6.9 3.1 5.4 2.1 2\n", + "140 6.7 3.1 5.6 2.4 2\n", + "141 6.9 3.1 5.1 2.3 2\n", + "142 5.8 2.7 5.1 1.9 2\n", + "143 6.8 3.2 5.9 2.3 2\n", + "144 6.7 3.3 5.7 2.5 2\n", + "145 6.7 3.0 5.2 2.3 2\n", + "146 6.3 2.5 5.0 1.9 2\n", + "147 6.5 3.0 5.2 2.0 2\n", + "148 6.2 3.4 5.4 2.3 2\n", + "149 5.9 3.0 5.1 1.8 2\n", + "\n", + "[150 rows x 5 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+UXWV97/H3xzCaVIFcIVbIBINic+VXCUQQ40UFWzSkgQvIj1VtI1y59aLoouISaxGxV1CsUsq6WBArijcYKYYfClSh+BtwApgoiGLFZgZuiUECaIAQvvePvedkcjgzc/bMec7Ze5/Pa61ZM3ufffZ8n31gvtn7eb7Po4jAzMwM4Hm9DsDMzMrDScHMzBqcFMzMrMFJwczMGpwUzMyswUnBzMwanBTMzKzBScHMzBqcFMzMrGG71L9A0gxgCBiJiKVNry0HzgdG8l0XRcTnJjrfzjvvHPPnz08QqZlZfa1evfo3ETFnsuOSJwXgvcC9wA7jvP6ViHh3uyebP38+Q0NDHQnMzKxfSPp1O8clfXwkaRA4ApjwX/9mZlYOqfsULgA+ADw7wTHHSFoj6SpJ81odIOkUSUOShtavX58kUDMzS5gUJC0FHo6I1RMcdh0wPyL2Bb4FXN7qoIi4JCIWRcSiOXMmfSRmZmZTlLJPYTGwTNISYCawg6QrIuJtowdExIYxx18KfCJhPGZm07J582aGh4d58sknex3KuGbOnMng4CADAwNTen+ypBARZwJnAkh6A/D+sQkh379LRDyUby4j65A2Myul4eFhtt9+e+bPn4+kXofzHBHBhg0bGB4eZvfdd5/SObpepyDpHEnL8s3TJP1U0o+B04Dl3Y7HzKxdTz75JDvttFMpEwKAJHbaaadp3cl0Y0gqEXErcGv+81lj9jfuJszqZtVdI5x/0308+Ogmdp09izMOX8BRC+f2OiybprImhFHTja8rScGs36y6a4Qzr17Lps1bABh5dBNnXr0WwInBSs3TXJglcP5N9zUSwqhNm7dw/k339Sgiq4sbb7yRBQsWsMcee3Deeed1/PxOCmYJPPjopkL7zdqxZcsWTj31VG644QbuueceVqxYwT333NPR3+HHR2YJ7Dp7FiMtEsCus2f1IBrrlU73K91xxx3ssccevPzlLwfghBNO4JprrmHPPffsVMi+UzBL4YzDFzBrYMY2+2YNzOCMwxf0KCLrttF+pZFHNxFs7VdaddfIpO8dz8jICPPmbZ34YXBwkJGRqZ+vFScFswSOWjiXc4/eh7mzZyFg7uxZnHv0Pu5k7iMp+pUi4jn7Oj0ayo+PzBI5auFcJ4E+lqJfaXBwkHXr1jW2h4eH2XXXXad8vlZ8p2BmlsB4/UfT6Vd69atfzS9+8Qt+9atf8fTTT3PllVeybNmyyd9YgJOCmVkCKfqVtttuOy666CIOP/xwXvWqV3Hcccex1157TTfUbX9HR89mZmbA1iLFTle1L1myhCVLlnQixJacFMzMEqliv5IfH5mZWYOTgpmZNTgpmJlZg5OCmZk1OCmYmVmDk4L1vVV3jbD4vFvY/YNfZ/F5t0xrbhqz1E466SRe8pKXsPfeeyc5v5OC9bUUk5aZpbR8+XJuvPHGZOd3UrC+5sVwLKk1K+Eze8PZs7Pva1ZO+5SHHHIIL37xizsQXGsuXrO+5sVwLJk1K+G602Bz/t/SxnXZNsC+x/Uurkn4TsH6WopJy8wAuPmcrQlh1OZN2f4Sc1KwvubFcCyZjcPF9peEHx9ZX0s1aZkZOw5mj4xa7S8xJwXre1WctMwq4LCztu1TABiYle2fhhNPPJFbb72V3/zmNwwODvLRj36Uk08+eZrBbuWkYD3T6UXNzUpltDP55nOyR0Y7DmYJYZqdzCtWrOhAcONzUrCeGK0PGB0OOlofADgxWH3se1ypRxq14o5m6wnXB5iVk5OC9YTrA6yqIqLXIUxouvE5KVhPuD7AqmjmzJls2LChtIkhItiwYQMzZ86c8jncp2A9ccbhC7bpUwDXB1j5DQ4OMjw8zPr163sdyrhmzpzJ4ODUh706KVhPuD7AqmhgYIDdd9+912EklTwpSJoBDAEjEbG06bUXAF8EDgA2AMdHxAOpY7JycH2AWfl0407hvcC9wA4tXjsZ+G1E7CHpBOATwPFdiMmsVFyzYWWRtKNZ0iBwBPC5cQ45Erg8//kq4DBJShmTWdl4TQcrk9Sjjy4APgA8O87rc4F1ABHxDLAR2ClxTGal4poNK5NkSUHSUuDhiFg90WEt9j1nrJekUyQNSRoqc6+/2VS4ZsPKJOWdwmJgmaQHgCuBQyVd0XTMMDAPQNJ2wI7AI80niohLImJRRCyaM2dOwpDNus81G1YmyZJCRJwZEYMRMR84AbglIt7WdNi1wF/mPx+bH1POqhCzRLymg5VJ1+sUJJ0DDEXEtcBlwJck3U92h3BCt+Mx6zXXbFiZqGr/MF+0aFEMDQ31Ogwzs0qRtDoiFk12nCuarXY+vGotK25fx5YIZkiceNA8/u6ofXodllklOClYrXx41VquuO0/GttbIhrbTgxmk/MsqVYrK25vsSbuBPvNbFtOClYrW8bpIxtvv5lty0nBamXGOLOkjLffzLblpGC1cuJB8wrtN7NtuaPZamW0M9mjj8ymxnUKZmZ9wHUK1hN/fukP+f4vt05ftfgVL+bL7zy4hxH1jtdIsCpyn4J1THNCAPj+Lx/hzy/9YY8i6h2vkWBV5aRgHdOcECbbX2deI8GqyknBLAGvkWBV5aRgloDXSLCqclKwjln8ihcX2l9nXiPBqspJwTrmy+88+DkJoF9HHx21cC7nHr0Pc2fPQsDc2bM49+h9PPrISs91CmZmfcB1CtYTqcbmFzmv6wPMps5JwTpmdGz+6FDM0bH5wLT+KBc5b6oYzPqF+xSsY1KNzS9yXtcHmE2Pk4J1TKqx+UXO6/oAs+lxUrCOSTU2v8h5XR9gNj1OCtYxqcbmFzmv6wPMpscdzdYxox25nR75U+S8qWIw6xeuUzAz6wOuUyipMoyhLxpDGWI2s+5wUuiiMoyhLxpDGWI2s+5xR3MXlWEMfdEYyhCzmXWPk0IXlWEMfdEYyhCzmXWPk0IXlWEMfdEYyhCzmXWPk0IXlWEMfdEYyhCzmXWPO5q7qAxj6IvGUIaYzax7ktUpSJoJfAd4AVnyuSoiPtJ0zHLgfGAk33VRRHxuovO6TsHMrLgy1Ck8BRwaEU9IGgC+J+mGiLit6bivRMS7E8Zh0/ThVWtZcfs6tkQwQ+LEg+bxd0ftM+1jy1L/UJY4zMpg0qQg6QXAMcD8scdHxDkTvS+yW5An8s2B/Kta5dPGh1et5Yrb/qOxvSWisd38x77IsWWpfyhLHGZl0U5H8zXAkcAzwO/GfE1K0gxJdwMPA9+MiNtbHHaMpDWSrpI0r824rUtW3L6u7f1Fji1L/UNZ4jAri3YeHw1GxJuncvKI2ALsJ2k28DVJe0fET8Ycch2wIiKekvRXwOXAoc3nkXQKcArAbrvtNpVQbIq2jNPn1Gp/kWPLUv9QljjMyqKdO4UfSGr9ULhNEfEocCvw5qb9GyLiqXzzUuCAcd5/SUQsiohFc+bMmU4oVtAMqe39RY4tS/1DWeIwK4txk4KktZLWAK8D7pR0X/6YZ3T/hCTNye8QkDQLeBPws6ZjdhmzuQy4dyqNsHROPKj1E71W+4scW5b6h7LEYVYWEz0+WjrNc+8CXC5pBlnyWRkR10s6BxiKiGuB0yQtI+uveARYPs3faR022kHczoiiIseWpf6hLHGYlcWkdQqSvhQRb59sX7e4TsHMrLhO1ins1XTiGYzz7N8ml2pMfJH6gJTnLtK+Kl6LylmzEm4+BzYOw46DcNhZsO9xvY7KSmzcpCDpTOBDwCxJj43uBp4GLulCbLWTakx8kfqAlOcu0r4qXovKWbMSrjsNNucjqTauy7bBicHGNW5Hc0ScGxHbA+dHxA751/YRsVNEnNnFGGsj1Zj4IvUBKc9dpH1VvBaVc/M5WxPCqM2bsv1m45joTmH//Mevjvm5ISLuTBZVTaUaE1+kPiDluYu0r4rXonI2Dhfbb8bEfQp/n3+fCSwCfkz2+Ghf4HayoapWwK6zZzHS4o/edMfEz5Ba/tEbr24g1bmLtK+K16JydhzMHhm12m82jokeH70xIt4I/BrYPy8eOwBYCNzfrQDrJNWY+CL1ASnPXaR9VbwWlXPYWTDQlGQHZmX7zcbRzuij/xoRa0c3IuInkvZLGFNtpRoTX6Q+IOW5i7SviteickY7kz36yApop05hBdkEeFeQzXL6NuBFEXFi+vCey3UKZmbFdbJO4R3Au4D35tvfAS6eRmxWMWWoPbCKc71EZUyaFCLiSeAz+Zf1mTLUHljFuV6iUiaaEG9l/n1tPhHeNl/dC9F6qQy1B1ZxrpeolInuFEYfF013YjyrsDLUHljFuV6iUiYakvpQ/uNhwPMj4tdjv7oTnvVakfUGvDaBtTReXYTrJUqpnUV25gP/JOmXklZKeo+HpPaPMtQeWMW5XqJS2uloPgsaC+W8EzgDuACYMdH7rB7KUHtgFed6iUppp07hw8Bi4EXAXcD3gO+OebzUVa5TMDMrrpN1CkeTrYz2deDbwG35MNVaSzXevsh5y7IugGsPSqbuY/7r3r4ienAt2nl8tL+k7ckmwPsT4FJJ/xkRtZ0QL9V4+yLnLcu6AK49KJm6j/mve/uK6NG1mLSjWdLeZFNb/CVwPDAM3JIsohJINd6+yHnLsi6Aaw9Kpu5j/uveviJ6dC3aeXz0CbLHRhcCP4qIzUkjKoFU4+2LnLcs6wK49qBk6j7mv+7tK6JH12LSO4WIOCIiPhkRP+iHhADpxtsXOe948/93e10A1x6UTN3H/Ne9fUX06Fq0U6fQd1KNty9y3rKsC+Dag5Kp+5j/ureviB5di3YeH/WdVOPti5y3LOsCuPagZOo+5r/u7SuiR9di0jqFsnGdgplZcdOuU5B0HdmiOi1FxLIpxtbXXP9gVhHXnw6rvwCxBTQDDlgOSz89/fOWvA5josdHn+paFH3C9Q9mFXH96TB02dbt2LJ1ezqJoQJ1GBPNkvrtib66GWRduP7BrCJWf6HY/nZVoA5j0o5mSa8EzgX2BGaO7o+IlyeMq5Zc/2BWEbGl2P52VaAOo50hqf9MtibzM8AbgS8CX0oZVF25/sGsIjTOJNDj7W9XBeow2kkKsyLiZrKRSr+OiLOBQ9OGVU+ufzCriAOWF9vfrgrUYbRTp/CkpOcBv5D0bmAEeEnasOrJ9Q9mFTHamdzp0UcVqMNoZz2FVwP3ArOBjwE7Ap+MiNvSh/dcrlMwMyuuY+spRMSP8hM+DzgtIh5vM4CZwHeAF+S/56qI+EjTMS8g66M4ANgAHB8RD7Rz/qKK1gdUbQ2BIrUHdb8WSceBFxm7niqOlO0r+Rj6aSnatjpfiwm0M/poEVln8/b59kbgpIhYPclbnwIOjYgnJA0A35N0Q9MdxsnAbyNiD0knkM3IevxUGjKRovUBVVtDoEjtQd2vRdJx4EXGrqeKI2X7KjCGfsqKtq3O12IS7XQ0fx74XxExPyLmA6eSJYkJReaJfHMg/2p+VnUkcHn+81XAYVLnh8EUrQ+o2hoCRWoP6n4tko4DLzJ2PVUcKdtXgTH0U1a0bXW+FpNoJyk8HhHfHd2IiO8B7T5CmiHpbuBh4JsRcXvTIXOBdfl5nwE2Aju1OM8pkoYkDa1fv76dX72NovUBVVtDoEjtQd2vRdJx4EXGrqeKI2X7KjCGfsqKtq3O12IS7SSFOyT9k6Q3SHq9pP8D3Cppf0n7T/TGiNgSEfsBg8CB+SpuY7W6K3jOX7KIuCQiFkXEojlz5rQR8raK1gdUbQ2BIrUHdb8WSceBFxm7niqOlO2rwBj6KSvatjpfi0m0kxT2A/4I+AhwNvAq4LXA39Pm/EgR8ShwK/DmppeGgXkAkrYjG9n0SDvnLKJofUDV1hAoUntQ92uRdBx4kbHrqeJI2b4KjKGfsqJtq/O1mEQ7o4/eOJUTS5oDbI6IRyXNAt5E1pE81rVkaz//EDgWuCUSzOVdtD6gamsIFKk9qPu1SDoOvMjY9VRxpGxfBcbQT1nRttX5WkyinTqFPwQ+DuwaEW+RtCdwcERcNsn79iXrRJ5BdkeyMiLOkXQOMBQR1+bDVr8ELCS7QzghIv59ovO6TsHMrLiO1SkAXyAbbfQ3+fbPga8AEyaFiFhD9se+ef9ZY35+EnhrGzGYmVkXtNOnsHNErASehcYooWlOFVh+q+4aYfF5t7D7B7/O4vNuYdVdI70OycpgzUr4zN5w9uzs+5qVnTk2laIxlKF9VTtvzbRzp/A7STuRjwqS9BqyoaO1VbmCLeuOIgVNZSh+SlmwVbXivDJ8HhXRzp3C6WQdwq+Q9H2yaSnekzSqHqtcwZZ1R5GCpjIUP6Us2KpacV4ZPo+KaGf00Z2SXg8sIKsruC8iNiePrIcqV7Bl3VGkoKkMxU8pC7aqVpxXhs+jIia9U5D0VrI1FX4KHAV8ZbKitaqrXMGWdUeRgqYyFD+lLNiqWnFeGT6Pimjn8dHfRsTjkl4HHE42zPTitGH1VuUKtqw7ihQ0laH4KWXBVtWK88rweVREO0lh9OH6EcDFEXEN8Px0IfXeUQvncu7R+zB39iwEzJ09i3OP3sedzP1u3+Pgzy6EHecByr7/2YWtOyqLHFuGeIsen6p9VTtvDbVTvHY92WprbyJb92ATcEdE/HH68J7LxWtmZsV1snjtOLI5iz6VT1mxC3DGdAM0q70iC/KURdViLstCOGWJowPaGX30e+DqMdsPAQ+lDMqs8oosyFMWVYu5LLUHZYmjQ9rpUzCzooosyFMWVYu5LLUHZYmjQ5wUzFIosiBPWVQt5rLUHpQljg5xUjBLociCPGVRtZjLUntQljg6xEnBLIUiC/KURdViLkvtQVni6BAnBbMUln4aFp289V/ZmpFtl7HDdlTVYi5L7UFZ4uiQSesUysZ1CmZmxXWyTsEsjSqO7U4Vc6r6gCpeY+spJwXrjSqO7U4Vc6r6gCpeY+s59ylYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPuU7BzKwPtFun4DsFszUr4TN7w9mzs+9rVnb/vKliMCvIdQrW31KN5S9yXtcTWIn4TsH6W6qx/EXO63oCKxEnBetvqcbyFzmv6wmsRJwUrL+lGstf5LyuJ7AScVKw/pZqLH+R87qewErEScH6W6qx/EXO63oCK5FkdQqS5gFfBF4KPAtcEhH/0HTMG4BrgF/lu66OiAl711ynYGZWXBnWU3gG+OuIuFPS9sBqSd+MiHuajvtuRCxNGId1UxXn7y8ScxXbVwa+bpWRLClExEPAQ/nPj0u6F5gLNCcFq4sqjrd3PUF6vm6V0pU+BUnzgYXA7S1ePljSjyXdIGmvbsRjiVRxvL3rCdLzdauU5BXNkl4E/Avwvoh4rOnlO4GXRcQTkpYAq4BXtjjHKcApALvttlviiG3Kqjje3vUE6fm6VUrSOwVJA2QJ4csRcXXz6xHxWEQ8kf/8DWBA0s4tjrskIhZFxKI5c+akDNmmo4rj7V1PkJ6vW6UkSwqSBFwG3BsRLecAlvTS/DgkHZjHsyFVTJZYFcfbu54gPV+3Skn5+Ggx8HZgraS7830fAnYDiIjPAscC75L0DLAJOCGqNpe3bTXaaVilUSZFYq5i+8rA161SvJ6CmVkfKEOdgpWVx4xv6/rTYfUXILZkq54dsHz6q56ZVZSTQr/xmPFtXX86DF22dTu2bN12YrA+5LmP+o3HjG9r9ReK7TerOSeFfuMx49uKLcX2m9Wck0K/8ZjxbWlGsf1mNeek0G88ZnxbBywvtt+s5pwU+o3n7t/W0k/DopO33hloRrbtTmbrU65TMDPrA65T6KJVd41w/k338eCjm9h19izOOHwBRy2c2+uwOqfudQ11b18Z+BpXhpPCNK26a4Qzr17Lps3ZaJWRRzdx5tVrAeqRGOpe11D39pWBr3GluE9hms6/6b5GQhi1afMWzr/pvh5F1GF1r2uoe/vKwNe4UpwUpunBRzcV2l85da9rqHv7ysDXuFKcFKZp19mzCu2vnLrXNdS9fWXga1wpTgrTdMbhC5g1sG2h06yBGZxx+IIeRdRhda9rqHv7ysDXuFLc0TxNo53JtR19VPe58OvevjLwNa4U1ymYmfWBdusU/PjIrM7WrITP7A1nz86+r1lZjXNbz/jxkVldpawPcO1BbflOwayuUtYHuPagtpwUzOoqZX2Aaw9qy0nBrK5S1ge49qC2nBTM6iplfYBrD2rLScGsrlKuneF1OWrLdQpmZn3AdQpmZlaYk4KZmTU4KZiZWYOTgpmZNTgpmJlZg5OCmZk1OCmYmVmDk4KZmTUkSwqS5kn6N0n3SvqppPe2OEaSLpR0v6Q1kvZPFY9Ng+fNN+sbKddTeAb464i4U9L2wGpJ34yIe8Yc8xbglfnXQcDF+XcrC8+bb9ZXkt0pRMRDEXFn/vPjwL1A88LFRwJfjMxtwGxJu6SKyabA8+ab9ZWu9ClImg8sBG5vemkusG7M9jDPTRxIOkXSkKSh9evXpwrTWvG8+WZ9JXlSkPQi4F+A90XEY80vt3jLc2boi4hLImJRRCyaM2dOijBtPJ4336yvJE0KkgbIEsKXI+LqFocMA/PGbA8CD6aMyQryvPlmfSXl6CMBlwH3RsSnxznsWuAv8lFIrwE2RsRDqWKyKfC8+WZ9JeXoo8XA24G1ku7O930I2A0gIj4LfANYAtwP/B54R8J4bKr2Pc5JwKxPJEsKEfE9WvcZjD0mgFNTxWBmZsW4otnMzBqcFMzMrMFJwczMGpwUzMyswUnBzMwanBTMzKzBScHMzBqUlQpUh6T1wK97Hcc4dgZ+0+sgEnL7qqvObQO3rx0vi4hJJ4+rXFIoM0lDEbGo13Gk4vZVV53bBm5fJ/nxkZmZNTgpmJlZg5NCZ13S6wASc/uqq85tA7evY9ynYGZmDb5TMDOzBieFKZA0Q9Jdkq5v8dpySesl3Z1//Y9exDgdkh6QtDaPf6jF65J0oaT7Ja2RtH8v4pyKNtr2Bkkbx3x+lVpiTtJsSVdJ+pmkeyUd3PR6ZT87aKt9lf38JC0YE/fdkh6T9L6mY5J/fikX2amz9wL3AjuM8/pXIuLdXYwnhTdGxHjjot8CvDL/Ogi4OP9eFRO1DeC7EbG0a9F01j8AN0bEsZKeD/xB0+tV/+wmax9U9POLiPuA/SD7hycwAnyt6bDkn5/vFAqSNAgcAXyu17H00JHAFyNzGzBb0i69DqrfSdoBOIRsGVwi4umIeLTpsMp+dm22ry4OA34ZEc2Fusk/PyeF4i4APgA8O8Exx+S3dldJmteluDopgH+VtFrSKS1enwusG7M9nO+rgsnaBnCwpB9LukHSXt0MbppeDqwH/jl/vPk5SS9sOqbKn1077YPqfn5jnQCsaLE/+efnpFCApKXAwxGxeoLDrgPmR8S+wLeAy7sSXGctjoj9yW5VT5V0SNPrrZZZrcowtsnadifZdAB/DPwjsKrbAU7DdsD+wMURsRD4HfDBpmOq/Nm1074qf34A5I/FlgFfbfVyi30d/fycFIpZDCyT9ABwJXCopCvGHhARGyLiqXzzUuCA7oY4fRHxYP79YbJnmgc2HTIMjL0DGgQe7E500zNZ2yLisYh4Iv/5G8CApJ27HujUDAPDEXF7vn0V2R/R5mMq+dnRRvsq/vmNegtwZ0T8Z4vXkn9+TgoFRMSZETEYEfPJbu9uiYi3jT2m6fneMrIO6cqQ9EJJ24/+DPwp8JOmw64F/iIfCfEaYGNEPNTlUAtrp22SXipJ+c8Hkv0/sqHbsU5FRPw/YJ2kBfmuw4B7mg6r5GcH7bWvyp/fGCfS+tERdOHz8+ijDpB0DjAUEdcCp0laBjwDPAIs72VsU/CHwNfy/6+2A/5vRNwo6a8AIuKzwDeAJcD9wO+Bd/Qo1qLaaduxwLskPQNsAk6IalV4vgf4cv4I4t+Bd9Tksxs1Wfsq/flJ+gPgT4D/OWZfVz8/VzSbmVmDHx+ZmVmDk4KZmTU4KZiZWYOTgpmZNTgpmJlZg5OCWUH5TJytZshtub8Dv+8oSXuO2b5VUm3XI7beclIwK7+jgD0nPcqsA5wUrHbyyuWv55Oi/UTS8fn+AyR9O58M76bR6vP8X94XSPpBfvyB+f4D83135d8XTPR7W8TweUk/yt9/ZL5/uaSrJd0o6ReSPjnmPSdL+nkez6WSLpL0WrLK+POVzbH/ivzwt0q6Iz/+v3Xo0pm5otlq6c3AgxFxBICkHSUNkE2QdmRErM8Txf8GTsrf88KIeG0+Qd7ngb2BnwGHRMQzkt4EfBw4ps0Y/oZsGpSTJM0G7pD0rfy1/YCFwFPAfZL+EdgC/C3ZXD6PA7cAP46IH0i6Frg+Iq7K2wOwXUQcKGkJ8BHgTVO5UGbNnBSsjtYCn5L0CbI/pt+VtDfZH/pv5n9UZwBj54xZARAR35G0Q/6HfHvgckmvJJuJcqBADH9KNnni+/PtmcBu+c83R8RGAEn3AC8Ddga+HRGP5Pu/CvzRBOe/Ov++GphfIC6zCTkpWO1ExM8lHUA2R8y5kv6VbEbUn0bEweO9rcX2x4B/i4j/Lmk+cGuBMAQck6+mtXWndBDZHcKoLWT/H7aaEnkio+cYfb9ZR7hPwWpH0q7A7yPiCuBTZI9k7gPmKF/TV9KAtl2AZbTf4XVkM09uBHYkWxIRik9seBPwnjEzdi6c5Pg7gNdL+i+StmPbx1SPk921mCXnf2FYHe1D1jH7LLAZeFdEPC3pWOBCSTuS/bd/AfDT/D2/lfQDsnW3R/sZPkn2+Oh0smf8RXwsP/+aPDE8AIy7bnBEjEj6OHA72fz49wAb85evBC6VdBrZLKBmyXiWVOt7km4F3h8RQz2O40UR8UR+p/A14PMR0bxwu1lSfnxkVh5nS7qbbOGfX1HBpSSt+nynYGZmDb45Mg6qAAAAJklEQVRTMDOzBicFMzNrcFIwM7MGJwUzM2twUjAzswYnBTMza/j/pgzYEDEDapEAAAAASUVORK5CYII=\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n", + "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n", + "plt.xlabel('sepal length')\n", + "plt.ylabel('sepal width')\n", + "plt.legend()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "data = np.array(df.iloc[:100, [0, 1, -1]])\n", + "X, y = data[:,:-1], data[:,-1]\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "class KNN:\n", + " def __init__(self, X_train, y_train, n_neighbors=3, p=2):\n", + " \"\"\"\n", + " parameter: n_neighbors 临近点个数\n", + " parameter: p 距离度量\n", + " \"\"\"\n", + " self.n = n_neighbors\n", + " self.p = p\n", + " self.X_train = X_train\n", + " self.y_train = y_train\n", + "\n", + " def predict(self, X):\n", + " # 取出n个点\n", + " knn_list = []\n", + " for i in range(self.n):\n", + " dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n", + " knn_list.append((dist, self.y_train[i]))\n", + "\n", + " for i in range(self.n, len(self.X_train)):\n", + " max_index = knn_list.index(max(knn_list, key=lambda x: x[0]))\n", + " dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n", + " if knn_list[max_index][0] > dist:\n", + " knn_list[max_index] = (dist, self.y_train[i])\n", + "\n", + " # 统计\n", + " knn = [k[-1] for k in knn_list]\n", + " count_pairs = Counter(knn)\n", + "# max_count = sorted(count_pairs, key=lambda x: x)[-1]\n", + " max_count = sorted(count_pairs.items(), key=lambda x: x[1])[-1][0]\n", + " return max_count\n", + "\n", + " def score(self, X_test, y_test):\n", + " right_count = 0\n", + " n = 10\n", + " for X, y in zip(X_test, y_test):\n", + " label = self.predict(X)\n", + " if label == y:\n", + " right_count += 1\n", + " return right_count / len(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "clf = KNN(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.95" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf.score(X_test, y_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test Point: 1.0\n" + ] + } + ], + "source": [ + "test_point = [6.0, 3.0]\n", + "print('Test Point: {}'.format(clf.predict(test_point)))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XucVXW9//HXx3ESUIRCOgkDjZfipwJxGW/hJS8dvJAa3tOK9Bc//VlWJiXZ8ULH1PCkx3ykB9MyRZQ4hJcSzQtpmdgAI6ho4kllBn9HQkER9AB+fn+sNZthu2dmr9l77b3W2u/n4zGPmfXda3/ns9bW+bDW+n6+X3N3REREALardgAiIpIcSgoiIpKjpCAiIjlKCiIikqOkICIiOUoKIiKSo6QgIiI5SgoiIpKjpCAiIjnbx/0LzKwOaAba3H1C3muTgOlAW9h0g7v/oqv+dtllF29sbIwhUhGR7Fq0aNE/3H1gd/vFnhSAbwHLgZ07ef1ud/9GsZ01NjbS3NxclsBERGqFmb1azH6x3j4yswbgWKDLf/2LiEgyxP1M4Trge8AHXexzopktNbM5Zjak0A5mNtnMms2sefXq1bEEKiIiMSYFM5sAvOHui7rY7T6g0d1HAg8DtxXayd1nuHuTuzcNHNjtLTEREemhOJ8pjAOOM7NjgF7AzmZ2h7uf2b6Du6/psP/NwNUxxiMiCbRp0yZaW1t57733qh1KJvTq1YuGhgbq6+t79P7YkoK7TwWmApjZ54ALOyaEsH1Xd3893DyO4IG0iNSQ1tZW+vbtS2NjI2ZW7XBSzd1Zs2YNra2t7Lbbbj3qo+J1CmY2zcyOCzfPN7PnzOwZ4HxgUqXjEZHqeu+99xgwYIASQhmYGQMGDCjpqqsSQ1Jx9wXAgvDnSzq0564mRLJm3pI2pj/4IqvWbmRQ/95MGT+ME0YPrnZYiaSEUD6lnsuKJAWRWjNvSRtT5y5j46YtALSt3cjUucsAlBgk0TTNhUgMpj/4Yi4htNu4aQvTH3yxShGJFEdJQSQGq9ZujNQuxZs5ExobYbvtgu8zZ5bW39q1a/n5z3/eo/ded911bNiwobQA8lxyySU8/PDDXe6zYMECnnzyybL+3nZKCiIxGNS/d6R2Kc7MmTB5Mrz6KrgH3ydPLi0xJC0pTJs2jSOPPLLLfZQURFJmyvhh9K6v26atd30dU8YPq1JE2XDxxZD/N3jDhqC9py666CJefvllRo0axZQpU5g+fTr77rsvI0eO5NJLLwXg3Xff5dhjj+Uzn/kMw4cP5+677+b6669n1apVHHbYYRx22GGd9r/TTjvx3e9+lzFjxnDEEUfQPitDS0sLBxxwACNHjuSLX/wib731FgCTJk1izpw5QDDX26WXXsqYMWMYMWIEL7zwAq+88go33XQT1157LaNGjeKJJ57o+cEXoKQgEoMTRg/myokjGNy/NwYM7t+bKyeO0EPmEr32WrT2Ylx11VXssccetLS08PnPf56XXnqJp59+mpaWFhYtWsTjjz/O/PnzGTRoEM888wzPPvssRx11FOeffz6DBg3iscce47HHHuu0/3fffZcxY8awePFiDj30UC6//HIAvvKVr3D11VezdOlSRowYkWvPt8suu7B48WLOPfdcrrnmGhobGznnnHP4zne+Q0tLCwcffHDPD74AjT4SickJowcrCZTZ0KHBLaNC7eXw0EMP8dBDDzF69GgA1q9fz0svvcTBBx/MhRdeyPe//30mTJgQ6Q/xdtttx6mnngrAmWeeycSJE1m3bh1r167l0EMPBeCrX/0qJ598csH3T5w4EYCxY8cyd+7cUg6vuHhj/w0iImVyxRXQp8+2bX36BO3l4O5MnTqVlpYWWlpaWLFiBWeffTaf/vSnWbRoESNGjGDq1KlMmzatx78jah3BDjvsAEBdXR2bN2/u8e8tlpKCiKTGGWfAjBnwyU+CWfB9xoygvaf69u3LO++8A8D48eO59dZbWb9+PQBtbW288cYbrFq1ij59+nDmmWdy4YUXsnjx4g+9tzMffPBB7hnBnXfeyUEHHUS/fv346Ec/mnsecPvtt+euGqLGXG66fSQiqXLGGaUlgXwDBgxg3LhxDB8+nKOPPpovfelLHHjggUDwkPiOO+5gxYoVTJkyhe222476+npuvPFGACZPnszRRx/Nrrvu2ulzhR133JHnnnuOsWPH0q9fP+6++24AbrvtNs455xw2bNjA7rvvzi9/+cuiY/7CF77ASSedxD333MPPfvazsj5XMHcvW2eV0NTU5Fp5TSQ7li9fzl577VXtMGKz00475a48KqXQOTWzRe7e1N17dftIRERydPtIRKQM9t9/f95///1t2m6//faKXyWUSklBRKQMFi5cWO0QykK3j0REJEdJQUREcnT7SGqeFsMR2UpXClLT2hfDaVu7EWfrYjjzlrRVOzSpoPnz5zNs2DD23HNPrrrqqmqHU1VKClLTtBiObNmyhfPOO48HHniA559/nlmzZvH8889XO6yq0e0jqWlaDCd9yn277+mnn2bPPfdk9913B+C0007jnnvuYe+99y5XyKmiKwWpaVoMJ13iuN3X1tbGkCFDctsNDQ20tdXu7UMlBalpWgwnXeK43Vdoqp+oM5lmiW4fSU1rv+2g0UfpEMftvoaGBlauXJnbbm1tZdCgQT3uL+2UFKTmaTGc9BjUvzdtBRJAKbf79t13X1566SX+/ve/M3jwYO666y7uvPPOUsJMNd0+kqqZt6SNcVc9ym4X/Y5xVz2qYaDSrThu922//fbccMMNjB8/nr322otTTjmFffbZp9RQU0tXClIV7Q8M2+8Ptz8wBPSvdulUXLf7jjnmGI455phyhJh6SgpSFV09MFRSkK7odl+8dPtIqkL1ASLJpKQgVaH6AJFkUlKQqlB9gEgy6ZmCVIXqA0SSKfakYGZ1QDPQ5u4T8l7bAfg1MBZYA5zq7q/EHZMkgx4YiiRPJW4ffQtY3slrZwNvufuewLXA1RWIRyRxVLNRXWeddRYf//jHGT58eLVDqbpYk4KZNQDHAr/oZJfjgdvCn+cAR1gtTzoiNUlrOlTfpEmTmD9/frXDSIS4rxSuA74HfNDJ64OBlQDuvhlYBwyIOSaRRNGaDhEtnQ3XDofL+gffl84uuctDDjmEj33sY2UILv1iSwpmNgF4w90XdbVbgbYPTVloZpPNrNnMmlevXl22GEWSQDUbESydDfedD+tWAh58v+/8siQGCcR5pTAOOM7MXgHuAg43szvy9mkFhgCY2fZAP+DN/I7cfYa7N7l708CBA2MMWaTyVLMRwSPTYFNesty0MWiXsogtKbj7VHdvcPdG4DTgUXc/M2+3e4Gvhj+fFO7z4cnNRTJMNRsRrGuN1i6RVbxOwcymAc3ufi9wC3C7ma0guEI4rdLxiFSbajYi6NcQ3joq0C5lUZGk4O4LgAXhz5d0aH8POLkSMYgkmWo2inTEJcEzhI63kOp7B+0lOP3001mwYAH/+Mc/aGho4PLLL+fss88uMdh0UkWzZM4P5y1j1sKVbHGnzozT9x/Cv54wotphSTmMPCX4/si04JZRv4YgIbS399CsWbPKEFw2KClIpvxw3jLueOq13PYW99y2EkNGjDyl5CQgndOEeJIpsxYWuN/cRbuIbEtJQTJlSyeD1zprl2TQoMPyKfVcKilIptR1MktKZ+1Sfb169WLNmjVKDGXg7qxZs4ZevXr1uA89U5BMOX3/Ids8U+jYLsnU0NBAa2srmq2gPHr16kVDQ8+H6CopSKa0P0zW6KP0qK+vZ7fddqt2GBKytF2yNTU1eXNzc7XDEBFJFTNb5O5N3e2nKwUpqzNu/gt/fnnr9FXj9vgYM79+YBUjqp55S9pUpSypowfNUjb5CQHgzy+/yRk3/6VKEVWP1kiQtFJSkLLJTwjdtWeZ1kiQtFJSEImB1kiQtFJSEImB1kiQtFJSkLIZt0fh5Qw7a88yrZEgaaWkIGUz8+sHfigB1OrooxNGD+bKiSMY3L83Bgzu35srJ47Q6CNJPNUpiIjUANUpSFXENTY/Sr+qDxDpOSUFKZv2sfntQzHbx+YDJf1RjtJvXDGI1Ao9U5CyiWtsfpR+VR8gUholBSmbuMbmR+lX9QEipVFSkLKJa2x+lH5VHyBSGiUFKZu4xuZH6Vf1ASKl0YNmKZv2B7nlHvkTpd+4YhCpFapTEBGpAapTSKgkjKGPGkMSYhaRylBSqKAkjKGPGkMSYhaRytGD5gpKwhj6qDEkIWYRqRwlhQpKwhj6qDEkIWYRqRwlhQpKwhj6qDEkIWYRqRwlhQpKwhj6qDEkIWYRqRw9aK6gJIyhjxpDEmIWkcqJrU7BzHoBjwM7ECSfOe5+ad4+k4DpQFvYdIO7/6KrflWnICISXRLqFN4HDnf39WZWD/zJzB5w96fy9rvb3b8RYxxSoh/OW8ashSvZ4k6dGafvP4R/PWFEyfsmpf4hKXGIJEG3ScHMdgBOBBo77u/u07p6nweXIOvDzfrwK13l08IP5y3jjqdey21vcc9t5/+xj7JvUuofkhKHSFIU86D5HuB4YDPwboevbplZnZm1AG8Af3D3hQV2O9HMlprZHDMbUmTcUiGzFq4suj3Kvkmpf0hKHCJJUcztowZ3P6onnbv7FmCUmfUHfmtmw9392Q673AfMcvf3zewc4Dbg8Px+zGwyMBlg6NChPQlFemhLJ8+cCrVH2Tcp9Q9JiUMkKYq5UnjSzArfFC6Su68FFgBH5bWvcff3w82bgbGdvH+Guze5e9PAgQNLCUUiqjMruj3Kvkmpf0hKHCJJ0WlSMLNlZrYUOAhYbGYvhrd52tu7ZGYDwysEzKw3cCTwQt4+u3bYPA5Y3pODkPicvn/hO3qF2qPsm5T6h6TEIZIUXd0+mlBi37sCt5lZHUHyme3u95vZNKDZ3e8Fzjez4wieV7wJTCrxd0qZtT8gLmZEUZR9k1L/kJQ4RJKi2zoFM7vd3b/cXVulqE5BRCS6ctYp7JPXcR2d3PuX7sU1Jj5KfUCcfUc5vjSei9RZOhsemQbrWqFfAxxxCYw8pdpRSYJ1mhTMbCrwA6C3mb3d3gz8DzCjArFlTlxj4qPUB8TZd5TjS+O5SJ2ls+G+82FTOJJq3cpgG5QYpFOdPmh29yvdvS8w3d13Dr/6uvsAd59awRgzI64x8VHqA+LsO8rxpfFcpM4j07YmhHabNgbtIp3o6kphTPjjbzr8nOPui2OLKqPiGhMfpT4gzr6jHF8az0XqrGuN1i5C188U/i383gtoAp4huH00ElhIMFRVIhjUvzdtBf7olTomvs6s4B+9zuoG4uo7yvGl8VykTr+G4JZRoXaRTnR1++gwdz8MeBUYExaPjQVGAysqFWCWxDUmPkp9QJx9Rzm+NJ6L1DniEqjPS7L1vYN2kU4UM/rof7n7svYNd3/WzEbFGFNmxTUmPkp9QJx9Rzm+NJ6L1Gl/mKzRRxJBMXUKswgmwLuDYJbTM4Gd3P30+MP7MNUpiIhEV846ha8B5wLfCrcfB24sITZJmSTUHkjKqV4iNbpNCu7+HnBt+CU1Jgm1B5JyqpdIla4mxJsdfl8WToS3zVflQpRqSkLtgaSc6iVSpasrhfbbRaVOjCcploTaA0k51UukSldDUl8PfzwC+Ii7v9rxqzLhSbVFWW9AaxNIQZ3VRaheIpGKWWSnEfgPM3vZzGab2Tc1JLV2JKH2QFJO9RKpUsyD5ksgt1DO14EpwHVAXVfvk2xIQu2BpJzqJVKlmDqFHwLjgJ2AJcCfgCc63F6qKNUpiIhEV846hYkEK6P9Dvgj8FQ4TDXT4hpvH6XfpKwLoNqDhMn6mP+sH18UVTgXxdw+GmNmfQkmwPs8cLOZ/be7Z3ZCvLjG20fpNynrAqj2IGGyPuY/68cXRZXORbcPms1sOMHUFl8FTgVagUdjiygB4hpvH6XfpKwLoNqDhMn6mP+sH18UVToXxdw+uprgttH1wF/dfVOsESVAXOPto/SblHUBVHuQMFkf85/144uiSuei2ysFdz/W3X/i7k/WQkKA+MbbR+m3s/n/K70ugGoPEibrY/6zfnxRVOlcFFOnUHPiGm8fpd+krAug2oOEyfqY/6wfXxRVOhfF3D6qOXGNt4/Sb1LWBVDtQcJkfcx/1o8viiqdi27rFJJGdQoiItGVXKdgZvcRLKpTkLsf18PYaprqH0RS4v4LYNGvwLeA1cHYSTDhp6X3m/A6jK5uH11TsShqhOofRFLi/gug+Zat275l63YpiSEFdRhdzZL6x66+KhlkVqj+QSQlFv0qWnuxUlCH0e2DZjP7FHAlsDfQq73d3XePMa5MUv2DSEr4lmjtxUpBHUYxQ1J/SbAm82bgMODXwO1xBpVVqn8QSQnrZBLoztqLlYI6jGKSQm93f4RgpNKr7n4ZcHi8YWWT6h9EUmLspGjtxUpBHUYxdQrvmdl2wEtm9g2gDfh4vGFlk+ofRFKi/WFyuUcfpaAOo5j1FPYFlgP9gR8B/YCfuPtT8Yf3YapTEBGJrmzrKbj7X8MOtwPOd/d3igygF/A4sEP4e+a4+6V5++xA8IxiLLAGONXdXymm/6ii1gekbQ2BKLUHWT8XsY4DjzJ2Pa44IvQ7cyZcfDG89hoMHQpXXAFnnFGevlMn6rFl+Vx0oZjRR00ED5v7htvrgLPcfVE3b30fONzd15tZPfAnM3sg7wrjbOAtd9/TzE4jmJH11J4cSFei1gekbQ2BKLUHWT8XsY4DjzJ2Pa44IvQ7cyZMngwbNgTbr74abEMniSEFY+h7LOqxZflcdKOYB823Av/X3RvdvRE4jyBJdMkD68PN+vAr/17V8cBt4c9zgCPMyj8MJmp9QNrWEIhSe5D1cxHrOPAoY9fjiiNCvxdfvDUhtNuwIWgvte/UiXpsWT4X3SgmKbzj7k+0b7j7n4BibyHVmVkL8AbwB3dfmLfLYGBl2O9mYB0woEA/k82s2cyaV69eXcyv3kbU+oC0rSEQpfYg6+ci1nHgUcauxxVHhH5fe63Afl20p2EMfY9FPbYsn4tuFJMUnjaz/zCzz5nZoWb2c2CBmY0xszFdvdHdt7j7KKAB2C9cxa2jQlcFH/pL5u4z3L3J3ZsGDhxYRMjbilofkLY1BKLUHmT9XMQ6DjzK2PW44ojQ79ChhXftrD0NY+h7LOqxZflcdKOYpDAK+DRwKXAZsBfwWeDfKHJ+JHdfCywAjsp7qRUYAmBm2xOMbHqzmD6jiFofkLY1BKLUHmT9XMQ6DjzK2PW44ojQ7xVXQJ8+27b16RO0l9p36kQ9tiyfi24UM/rosJ50bGYDgU3uvtbMegNHEjxI7uhegrWf/wKcBDzqMczlHbU+IG1rCESpPcj6uYh1HHiUsetxxRGh3/aHyUWPPkrBGPoei3psWT4X3SimTuGfgB8Dg9z9aDPbGzjQ3W/p5n0jCR4i1xFckcx292lmNg1odvd7w2GrtwOjCa4QTnP3/+qqX9UpiIhEV7Y6BeBXBKON2scs/A24G+gyKbj7UoI/9vntl3T4+T3g5CJiEBGRCijmmcIu7j4b+AByo4RKnCow+eYtaWPcVY+y20W/Y9xVjzJvSVu1Q5IkWDobrh0Ol/UPvi+dXZ594xI1hiQcX9r6zZhirhTeNbMBhKOCzOwAgqGjmZW6gi2pjCgFTUkofoqzYCsBxXmJ6DeDirlSuIDggfAeZvZngmkpvhlrVFWWuoItqYwoBU1JKH6Ks2ArAcV5ieg3g4oZfbTYzA4FhhHUFbzo7ptij6yKUlewJZURpaApCcVPcRZsJaA4LxH9ZlC3VwpmdjLBmgrPAScAd3dXtJZ2qSvYksqIUtCUhOKnOAu2ElCcl4h+M6iY20f/4u7vmNlBwHiCYaY3xhtWdaWuYEsqI0pBUxKKn+Is2EpAcV4i+s2gYpJC+831Y4Eb3f0e4CPxhVR9J4wezJUTRzC4f28MGNy/N1dOHKGHzLVu5Cnwheuh3xDAgu9fuL7wg8oo+yYh3qj7x3V8aes3g4opXrufYLW1IwnWPdgIPO3un4k/vA9T8ZqISHTlLF47hWDOomvCKSt2BaaUGqBI5kVZkCcp0hZzUhbCSUocZVDM6KMNwNwO268Dr8cZlEjqRVmQJynSFnNSag+SEkeZFPNMQUSiirIgT1KkLeak1B4kJY4yUVIQiUOUBXmSIm0xJ6X2IClxlImSgkgcoizIkxRpizkptQdJiaNMlBRE4hBlQZ6kSFvMSak9SEocZaKkIBKHCT+FprO3/ivb6oLtJD6wbZe2mJNSe5CUOMqk2zqFpFGdgohIdOWsUxCJRxrHdscVc1z1AWk8x1JVSgpSHWkc2x1XzHHVB6TxHEvV6ZmCVEcax3bHFXNc9QFpPMdSdUoKUh1pHNsdV8xx1Qek8RxL1SkpSHWkcWx3XDHHVR+QxnMsVaekINWRxrHdccUcV31AGs+xVJ2SglRHGsd2xxVzXPUBaTzHUnWqUxARqQHF1inoSkFk6Wy4djhc1j/4vnR25fuNKwaRiFSnILUtrrH8UfpVPYEkiK4UpLbFNZY/Sr+qJ5AEUVKQ2hbXWP4o/aqeQBJESUFqW1xj+aP0q3oCSRAlBaltcY3lj9Kv6gkkQZQUpLbFNZY/Sr+qJ5AEia1OwcyGAL8GPgF8AMxw93/P2+dzwD3A38Omue7e5dM11SmIiESXhPUUNgPfdffFZtYXWGRmf3D35/P2e8LdJ8QYh1RSGufvjxJzGo8vCXTeUiO2pODurwOvhz+/Y2bLgcFAflKQrEjjeHvVE8RP5y1VKvJMwcwagdHAwgIvH2hmz5jZA2a2TyXikZikcby96gnip/OWKrFXNJvZTsB/At9297fzXl4MfNLd15vZMcA84FMF+pgMTAYYOnRozBFLj6VxvL3qCeKn85YqsV4pmFk9QUKY6e5z819397fdfX348++BejPbpcB+M9y9yd2bBg4cGGfIUoo0jrdXPUH8dN5SJbakYGYG3AIsd/eCcwCb2SfC/TCz/cJ41sQVk8QsjePtVU8QP523VInz9tE44MvAMjNrCdt+AAwFcPebgJOAc81sM7AROM3TNpe3bNX+0DBNo0yixJzG40sCnbdU0XoKIiI1IAl1CpJUGjO+rfsvgEW/At8SrHo2dlLpq56JpJSSQq3RmPFt3X8BNN+yddu3bN1WYpAapLmPao3GjG9r0a+itYtknJJCrdGY8W35lmjtIhmnpFBrNGZ8W1YXrV0k45QUao3GjG9r7KRo7SIZp6RQazR3/7Ym/BSazt56ZWB1wbYeMkuNUp2CiEgNUJ1CBc1b0sb0B19k1dqNDOrfmynjh3HC6MHVDqt8sl7XkPXjSwKd49RQUijRvCVtTJ27jI2bgtEqbWs3MnXuMoBsJIas1zVk/fiSQOc4VfRMoUTTH3wxlxDabdy0hekPvliliMos63UNWT++JNA5ThUlhRKtWrsxUnvqZL2uIevHlwQ6x6mipFCiQf17R2pPnazXNWT9+JJA5zhVlBRKNGX8MHrXb1vo1Lu+jinjh1UpojLLel1D1o8vCXSOU0UPmkvU/jA5s6OPsj4XftaPLwl0jlNFdQoiIjWg2DoF3T4SybKls+Ha4XBZ/+D70tnp6FuqRrePRLIqzvoA1R5klq4URLIqzvoA1R5klpKCSFbFWR+g2oPMUlIQyao46wNUe5BZSgoiWRVnfYBqDzJLSUEkq+JcO0PrcmSW6hRERGqA6hRERCQyJQUREclRUhARkRwlBRERyVFSEBGRHCUFERHJUVIQEZEcJQUREcmJLSmY2RAze8zMlpvZc2b2rQL7mJldb2YrzGypmY2JKx4pgebNF6kZca6nsBn4rrsvNrO+wCIz+4O7P99hn6OBT4Vf+wM3ht8lKTRvvkhNie1Kwd1fd/fF4c/vAMuB/IWLjwd+7YGngP5mtmtcMUkPaN58kZpSkWcKZtYIjAYW5r00GFjZYbuVDycOzGyymTWbWfPq1avjClMK0bz5IjUl9qRgZjsB/wl8293fzn+5wFs+NEOfu89w9yZ3bxo4cGAcYUpnNG++SE2JNSmYWT1BQpjp7nML7NIKDOmw3QCsijMmiUjz5ovUlDhHHxlwC7Dc3X/ayW73Al8JRyEdAKxz99fjikl6QPPmi9SUOEcfjQO+DCwzs5aw7QfAUAB3vwn4PXAMsALYAHwtxnikp0aeoiQgUiNiSwru/icKPzPouI8D58UVg4iIRKOKZhERyVFSEBGRHCUFERHJUVIQEZEcJQUREclRUhARkRwlBRERybGgVCA9zGw18Gq14+jELsA/qh1EjHR86ZXlYwMdXzE+6e7dTh6XuqSQZGbW7O5N1Y4jLjq+9MrysYGOr5x0+0hERHKUFEREJEdJobxmVDuAmOn40ivLxwY6vrLRMwUREcnRlYKIiOQoKfSAmdWZ2RIzu7/Aa5PMbLWZtYRf/7saMZbCzF4xs2Vh/M0FXjczu97MVpjZUjMbU404e6KIY/ucma3r8Pmlaok5M+tvZnPM7AUzW25mB+a9ntrPDoo6vtR+fmY2rEPcLWb2tpl9O2+f2D+/OBfZybJvAcuBnTt5/W53/0YF44nDYe7e2bjoo4FPhV/7AzeG39Oiq2MDeMLdJ1QsmvL6d2C+u59kZh8B+uS9nvbPrrvjg5R+fu7+IjAKgn94Am3Ab/N2i/3z05VCRGbWABwL/KLasVTR8cCvPfAU0N/Mdq12ULXOzHYGDiFYBhd3/x93X5u3W2o/uyKPLyuOAF529/xC3dg/PyWF6K4Dvgd80MU+J4aXdnPMbEiF4ionBx4ys0VmNrnA64OBlR22W8O2NOju2AAONLNnzOwBM9unksGVaHdgNfDL8PbmL8xsx7x90vzZFXN8kN7Pr6PTgFkF2mP//JQUIjCzCcAb7r6oi93uAxrdfSTwMHBbRYIrr3HuPobgUvU8Mzsk7/VCy6ymZRhbd8e2mGA6gM8APwPmVTrAEmwPjAFudPfRwLvARXn7pPmzK+b40vz5ARDeFjsO+E2hlwu0lfXzU1KIZhxwnJm9AtwFHG5md3Tcwd3XuPv74ebNwNjKhlg6d18Vfn+D4J7mfnm7tAIdr4AagFWABkzgAAAEB0lEQVSVia403R2bu7/t7uvDn38P1JvZLhUPtGdagVZ3XxhuzyH4I5q/Tyo/O4o4vpR/fu2OBha7+38XeC32z09JIQJ3n+ruDe7eSHB596i7n9lxn7z7e8cRPJBODTPb0cz6tv8M/DPwbN5u9wJfCUdCHACsc/fXKxxqZMUcm5l9wsws/Hk/gv9H1lQ61p5w9/8HrDSzYWHTEcDzebul8rOD4o4vzZ9fB6dT+NYRVODz0+ijMjCzaUCzu98LnG9mxwGbgTeBSdWMrQf+Cfht+P/V9sCd7j7fzM4BcPebgN8DxwArgA3A16oUa1TFHNtJwLlmthnYCJzm6arw/CYwM7wF8V/A1zLy2bXr7vhS/fmZWR/g88D/6dBW0c9PFc0iIpKj20ciIpKjpCAiIjlKCiIikqOkICIiOUoKIiKSo6QgElE4E2ehGXILtpfh951gZnt32F5gZpldj1iqS0lBJPlOAPbudi+RMlBSkMwJK5d/F06K9qyZnRq2jzWzP4aT4T3YXn0e/sv7OjN7Mtx/v7B9v7BtSfh9WFe/t0AMt5rZX8P3Hx+2TzKzuWY238xeMrOfdHjP2Wb2tzCem83sBjP7LEFl/HQL5tjfI9z9ZDN7Otz/4DKdOhFVNEsmHQWscvdjAcysn5nVE0yQdry7rw4TxRXAWeF7dnT3z4YT5N0KDAdeAA5x981mdiTwY+DEImO4mGAalLPMrD/wtJk9HL42ChgNvA+8aGY/A7YA/0Iwl887wKPAM+7+pJndC9zv7nPC4wHY3t33M7NjgEuBI3tyokTyKSlIFi0DrjGzqwn+mD5hZsMJ/tD/IfyjWgd0nDNmFoC7P25mO4d/yPsCt5nZpwhmoqyPEMM/E0yeeGG43QsYGv78iLuvAzCz54FPArsAf3T3N8P23wCf7qL/ueH3RUBjhLhEuqSkIJnj7n8zs7EEc8RcaWYPEcyI+py7H9jZ2wps/wh4zN2/aGaNwIIIYRhwYria1tZGs/0JrhDabSH4/7DQlMhdae+j/f0iZaFnCpI5ZjYI2ODudwDXENySeREYaOGavmZWb9suwNL+3OEggpkn1wH9CJZEhOgTGz4IfLPDjJ2ju9n/aeBQM/uomW3Ptrep3iG4ahGJnf6FIVk0guDB7AfAJuBcd/8fMzsJuN7M+hH8t38d8Fz4nrfM7EmCdbfbnzP8hOD20QUE9/ij+FHY/9IwMbwCdLpusLu3mdmPgYUE8+M/D6wLX74LuNnMzieYBVQkNpolVWqemS0ALnT35irHsZO7rw+vFH4L3Oru+Qu3i8RKt49EkuMyM2shWPjn76RwKUlJP10piIhIjq4UREQkR0lBRERylBRERCRHSUFERHKUFEREJEdJQUREcv4/C4hKF413pBoAAAAASUVORK5CYII=\n", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n", + "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n", + "plt.plot(test_point[0], test_point[1], 'bo', label='test_point')\n", + "plt.xlabel('sepal length')\n", + "plt.ylabel('sepal width')\n", + "plt.legend()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### scikit-learn实例" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.neighbors import KNeighborsClassifier" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", + " metric_params=None, n_jobs=None, n_neighbors=5, p=2,\n", + " weights='uniform')" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf_sk = KNeighborsClassifier()\n", + "clf_sk.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.95" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf_sk.score(X_test, y_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### sklearn.neighbors.KNeighborsClassifier\n", + "\n", + "- n_neighbors: 临近点个数\n", + "- p: 距离度量\n", + "- algorithm: 近邻算法,可选{'auto', 'ball_tree', 'kd_tree', 'brute'}\n", + "- weights: 确定近邻的权重" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### kd树" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**kd**树是一种对k维空间中的实例点进行存储以便对其进行快速检索的树形数据结构。\n", + "\n", + "**kd**树是二叉树,表示对$k$维空间的一个划分(partition)。构造**kd**树相当于不断地用垂直于坐标轴的超平面将$k$维空间切分,构成一系列的k维超矩形区域。kd树的每个结点对应于一个$k$维超矩形区域。\n", + "\n", + "构造**kd**树的方法如下:\n", + "\n", + "构造根结点,使根结点对应于$k$维空间中包含所有实例点的超矩形区域;通过下面的递归方法,不断地对$k$维空间进行切分,生成子结点。在超矩形区域(结点)上选择一个坐标轴和在此坐标轴上的一个切分点,确定一个超平面,这个超平面通过选定的切分点并垂直于选定的坐标轴,将当前超矩形区域切分为左右两个子区域\n", + "(子结点);这时,实例被分到两个子区域。这个过程直到子区域内没有实例时终止(终止时的结点为叶结点)。在此过程中,将实例保存在相应的结点上。\n", + "\n", + "通常,依次选择坐标轴对空间切分,选择训练实例点在选定坐标轴上的中位数\n", + "(median)为切分点,这样得到的**kd**树是平衡的。注意,平衡的**kd**树搜索时的效率未必是最优的。\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 构造平衡kd树算法\n", + "输入:$k$维空间数据集$T=\\{x_1,x_2,…,x_N\\}$,\n", + "\n", + "其中$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(k)}\\right)^{\\mathrm{T}}$ ,$i=1,2,…,N$;\n", + "\n", + "输出:**kd**树。\n", + "\n", + "(1)开始:构造根结点,根结点对应于包含$T$的$k$维空间的超矩形区域。\n", + "\n", + "选择$x^{(1)}$为坐标轴,以T中所有实例的$x^{(1)}$坐标的中位数为切分点,将根结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{(1)}$垂直的超平面实现。\n", + "\n", + "由根结点生成深度为1的左、右子结点:左子结点对应坐标$x^{(1)}$小于切分点的子区域, 右子结点对应于坐标$x^{(1)}$大于切分点的子区域。\n", + "\n", + "将落在切分超平面上的实例点保存在根结点。\n", + "\n", + "(2)重复:对深度为$j$的结点,选择$x^{(1)}$为切分的坐标轴,$l=j(modk)+1$,以该结点的区域中所有实例的$x^{(1)}$坐标的中位数为切分点,将该结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{(1)}$垂直的超平面实现。\n", + "\n", + "由该结点生成深度为$j+1$的左、右子结点:左子结点对应坐标$x^{(1)}$小于切分点的子区域,右子结点对应坐标$x^{(1)}$大于切分点的子区域。\n", + "\n", + "将落在切分超平面上的实例点保存在该结点。\n", + "\n", + "(3)直到两个子区域没有实例存在时停止。从而形成**kd**树的区域划分。" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# kd-tree每个结点中主要包含的数据结构如下\n", + "class KdNode(object):\n", + " def __init__(self, dom_elt, split, left, right):\n", + " self.dom_elt = dom_elt # k维向量节点(k维空间中的一个样本点)\n", + " self.split = split # 整数(进行分割维度的序号)\n", + " self.left = left # 该结点分割超平面左子空间构成的kd-tree\n", + " self.right = right # 该结点分割超平面右子空间构成的kd-tree\n", + "\n", + "\n", + "class KdTree(object):\n", + " def __init__(self, data):\n", + " k = len(data[0]) # 数据维度\n", + "\n", + " def CreateNode(split, data_set): # 按第split维划分数据集exset创建KdNode\n", + " if not data_set: # 数据集为空\n", + " return None\n", + " # key参数的值为一个函数,此函数只有一个参数且返回一个值用来进行比较\n", + " # operator模块提供的itemgetter函数用于获取对象的哪些维的数据,参数为需要获取的数据在对象中的序号\n", + " #data_set.sort(key=itemgetter(split)) # 按要进行分割的那一维数据排序\n", + " data_set.sort(key=lambda x: x[split])\n", + " split_pos = len(data_set) // 2 # //为Python中的整数除法\n", + " median = data_set[split_pos] # 中位数分割点\n", + " split_next = (split + 1) % k # cycle coordinates\n", + "\n", + " # 递归的创建kd树\n", + " return KdNode(\n", + " median,\n", + " split,\n", + " CreateNode(split_next, data_set[:split_pos]), # 创建左子树\n", + " CreateNode(split_next, data_set[split_pos + 1:])) # 创建右子树\n", + "\n", + " self.root = CreateNode(0, data) # 从第0维分量开始构建kd树,返回根节点\n", + "\n", + "\n", + "# KDTree的前序遍历\n", + "def preorder(root):\n", + " print(root.dom_elt)\n", + " if root.left: # 节点不为空\n", + " preorder(root.left)\n", + " if root.right:\n", + " preorder(root.right)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "# 对构建好的kd树进行搜索,寻找与目标点最近的样本点:\n", + "from math import sqrt\n", + "from collections import namedtuple\n", + "\n", + "# 定义一个namedtuple,分别存放最近坐标点、最近距离和访问过的节点数\n", + "result = namedtuple(\"Result_tuple\",\n", + " \"nearest_point nearest_dist nodes_visited\")\n", + "\n", + "\n", + "def find_nearest(tree, point):\n", + " k = len(point) # 数据维度\n", + "\n", + " def travel(kd_node, target, max_dist):\n", + " if kd_node is None:\n", + " return result([0] * k, float(\"inf\"),\n", + " 0) # python中用float(\"inf\")和float(\"-inf\")表示正负无穷\n", + "\n", + " nodes_visited = 1\n", + "\n", + " s = kd_node.split # 进行分割的维度\n", + " pivot = kd_node.dom_elt # 进行分割的“轴”\n", + "\n", + " if target[s] <= pivot[s]: # 如果目标点第s维小于分割轴的对应值(目标离左子树更近)\n", + " nearer_node = kd_node.left # 下一个访问节点为左子树根节点\n", + " further_node = kd_node.right # 同时记录下右子树\n", + " else: # 目标离右子树更近\n", + " nearer_node = kd_node.right # 下一个访问节点为右子树根节点\n", + " further_node = kd_node.left\n", + "\n", + " temp1 = travel(nearer_node, target, max_dist) # 进行遍历找到包含目标点的区域\n", + "\n", + " nearest = temp1.nearest_point # 以此叶结点作为“当前最近点”\n", + " dist = temp1.nearest_dist # 更新最近距离\n", + "\n", + " nodes_visited += temp1.nodes_visited\n", + "\n", + " if dist < max_dist:\n", + " max_dist = dist # 最近点将在以目标点为球心,max_dist为半径的超球体内\n", + "\n", + " temp_dist = abs(pivot[s] - target[s]) # 第s维上目标点与分割超平面的距离\n", + " if max_dist < temp_dist: # 判断超球体是否与超平面相交\n", + " return result(nearest, dist, nodes_visited) # 不相交则可以直接返回,不用继续判断\n", + "\n", + " #----------------------------------------------------------------------\n", + " # 计算目标点与分割点的欧氏距离\n", + " temp_dist = sqrt(sum((p1 - p2)**2 for p1, p2 in zip(pivot, target)))\n", + "\n", + " if temp_dist < dist: # 如果“更近”\n", + " nearest = pivot # 更新最近点\n", + " dist = temp_dist # 更新最近距离\n", + " max_dist = dist # 更新超球体半径\n", + "\n", + " # 检查另一个子结点对应的区域是否有更近的点\n", + " temp2 = travel(further_node, target, max_dist)\n", + "\n", + " nodes_visited += temp2.nodes_visited\n", + " if temp2.nearest_dist < dist: # 如果另一个子结点内存在更近距离\n", + " nearest = temp2.nearest_point # 更新最近点\n", + " dist = temp2.nearest_dist # 更新最近距离\n", + "\n", + " return result(nearest, dist, nodes_visited)\n", + "\n", + " return travel(tree.root, point, float(\"inf\")) # 从根节点开始递归" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 例3.2" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[7, 2]\n", + "[5, 4]\n", + "[2, 3]\n", + "[4, 7]\n", + "[9, 6]\n", + "[8, 1]\n" + ] + } + ], + "source": [ + "data = [[2,3],[5,4],[9,6],[4,7],[8,1],[7,2]]\n", + "kd = KdTree(data)\n", + "preorder(kd.root)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "from time import clock\n", + "from random import random\n", + "\n", + "# 产生一个k维随机向量,每维分量值在0~1之间\n", + "def random_point(k):\n", + " return [random() for _ in range(k)]\n", + " \n", + "# 产生n个k维随机向量 \n", + "def random_points(k, n):\n", + " return [random_point(k) for _ in range(n)] " + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Result_tuple(nearest_point=[2, 3], nearest_dist=1.8027756377319946, nodes_visited=4)\n" + ] + } + ], + "source": [ + "ret = find_nearest(kd, [3,4.5])\n", + "print (ret)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "time: 5.4623788 s\n", + "Result_tuple(nearest_point=[0.09929288205798159, 0.4954936771850429, 0.8005722800665575], nearest_dist=0.004597223680778027, nodes_visited=42)\n" + ] + } + ], + "source": [ + "N = 400000\n", + "t0 = clock()\n", + "kd2 = KdTree(random_points(3, N)) # 构建包含四十万个3维空间样本点的kd树\n", + "ret2 = find_nearest(kd2, [0.1,0.5,0.8]) # 四十万个样本点中寻找离目标最近的点\n", + "t1 = clock()\n", + "print (\"time: \",t1-t0, \"s\")\n", + "print (ret2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "----\n", + "参考代码:https://github.com/wzyonggege/statistical-learning-method\n", + "\n", + "中文注释制作:机器学习初学者\n", + "\n", + "微信公众号:ID:ai-start-com\n", + "\n", + "配置环境:python 3.5+\n", + "\n", + "代码全部测试通过。\n", + "![gongzhong](../gongzhong.jpg)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git "a/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb" "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb" index 99cd355..c474547 100644 --- "a/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb" +++ "b/\347\254\25403\347\253\240 k\350\277\221\351\202\273\346\263\225/3.KNearestNeighbors.ipynb" @@ -41,7 +41,7 @@ "\n", "- $p= 1$ 曼哈顿距离\n", "- $p= 2$ 欧氏距离\n", - "- $p= inf$ 闵式距离minkowski_distance " + "- $p= \\infty$ 切比雪夫距离" ] }, { @@ -1296,7 +1296,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.4" + "version": "3.6.8" } }, "nbformat": 4,