<a href="https://colab.research.google.com/github/roberthsu2003/machine_learning/blob/main/%E5%9F%BA%E6%9C%ACpackage/README.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**需要安裝的套件**

`$ pip install jupyter numpy scipy matplotlib ipython scikit-learn pandas  mglearn`

mglean可以下載資料夾後,放至工作資料夾內


In [None]:
! pip install jupyter numpy scipy matplotlib ipython scikit-learn pandas  mglearn

In [None]:
#numpy
import numpy as np
x = np.array([[1, 2, 3],[4, 5, 6]])
print("x:\n{}".format(x))

In [None]:
#scipy
from scipy import sparse

eye = np.eye(4)
print("numpy array:\n{}".format(eye))

#轉換為稀疏矩陣
sparse_matrix = sparse.csr_matrix(eye)
print("\nScipy sparse CSR matrix:\n{}".format(sparse_matrix))

In [None]:
#使用COO格式建立(coordinate format)
data = np.ones(4)
row_indices = np.arange(4)
col_indices = np.arange(4)
eye_coo = sparse.coo_matrix((data,(row_indices, col_indices)))
print("COO representation:\n{}".format(eye_coo))

In [None]:
import matplotlib.pyplot as plt
x = np.linspace(-10, 10, 100)
y = np.sin(x)
plt.plot(x, y, marker="x")

In [None]:
## pandas
import pandas as pd

data ={
    'Name':["John", "Anna", "Peter", "Linda"],
    "Location":["New York", "Paris", "Berlin", "London"],
    "Age":[24, 13, 53, 33]
}

data_pandas = pd.DataFrame(data)
display(data_pandas)

#選擇大於30歲的
display(data_pandas[data_pandas.Age > 30])

## 第一個應用程式:分類Iris(鳶尾屬的花)的種類
### 了解下面名詞
- supervised learning
- classification
- output are called classes
- label

### 1. 取得資料
- scikit-learn內建的資料

In [None]:
from sklearn.datasets import load_iris
import pandas as pd
iris_dataset = load_iris()
iris_dataset

In [None]:
print("keys of iris_dataset:\n{}".format(iris_dataset.keys()))

In [None]:
#DESCR是描述資料
print(iris_dataset['DESCR'][:193] + '\n...')

In [None]:
#target_names是想要預測的iris的種類(3個)
print("target names: {}".format(iris_dataset['target_names']))

In [None]:
#feature_names,特徵名稱
print("feature names:\n{}".format(iris_dataset['feature_names']))

In [None]:
#data是儲存特徵的資料
#numpy的ndarray
print("type of data:{}".format(type(iris_dataset['data'])))

In [None]:
#檢查ndarray的外觀
#150筆資料,每筆有4個特徵資料
#這些現有的資料被稱為samples,4個特徵被稱為features
print("Shape of data:{}".format(iris_dataset['data'].shape))

In [None]:
#顯示前5筆資料
print("First five columns of data:\n{}".format(iris_dataset['data'][:5]))

In [None]:
#target記錄是那一種類的iris
print("type of target:{}".format(type(iris_dataset['target'])))

In [None]:
#target記錄150筆的種類,是一維資料
print("Shape of target: {}".format(iris_dataset['target'].shape))

In [None]:
#target的內容
print('Target:\n{}'.format(iris_dataset['target']))

### 2.訓練和測試的資料(training Data and testing Data)
- 訓練的資料必需分為2部份,一為訓練用資料(training data,training set)
- 評估準確度的資料(test data)
- 無法使用訓練資料評估的原因是,訓練的資料會被模型記憶,使用訓練的資料將會非常準確,所以訓練的資料不適用來評估準確性
- scikit-learn提供train_test_split(),預設為使用所有資料的75%當作training set,25%資料當作testing set
- scikit-learn通常使用大寫的X表示資料,小寫的y表示label
- f(x)=y,代表的意思是將資料放入至模型內,會得到輸出的label

In [None]:
from sklearn.model_selection import train_test_split
#大寫的資料X代表要輸入的資料
#y小寫代表label的資料
#75%的資料是shuffle(打亂順序)後的資料
X_train, X_test, y_train, y_test = train_test_split(iris_dataset['data'], iris_dataset['target'],random_state=0)
print('X_train shape:{}'.format(X_train.shape))
print('y_train shape:{}'.format(y_train.shape))
print('X_test shape:{}'.format(X_test.shape))
print('y_test shape:{}'.format(y_test.shape))

### 3. 觀察資料
- 機器學習的好習慣是審視訓練的資料
- 審視資料可以發現一些異常的資料

In [None]:
#create dataframe from data in X_train
#label the columns using the strings in iris_dataset.feature_names
iris_dataframe = pd.DataFrame(X_train,columns=iris_dataset.feature_names)
iris_dataframe

In [None]:
from pandas.plotting import scatter_matrix
import mglearn
#create a scatter matrix from the dataframe, color by y_train

grr = scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o', hist_kwds={'bins':20}, s=60, alpha=.8, cmap=mglearn.cm3)

### 4. 建立第一個模型:(近鄰演算法 K-Nearest Neighbors[KNN])


In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)

### 5. 建立預測

In [None]:
X_new = np.array([[5, 2.9, 1, 0.2]])
print("X_new.shape: {}".format(X_new.shape))

In [None]:
prediction = knn.predict(X_new)
print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(iris_dataset['target_names'][prediction]))

### 6. 評估模型

In [None]:
y_pred = knn.predict(X_test)
print("Test set predictions:\n {}".format(y_pred))

In [None]:
#使用ndarray評估
print("Test set scores:{:.2f}".format(np.mean(y_pred == y_test)))

In [None]:
#使用knn評估
print("Test set score:{:.2f}".format(knn.score(X_test, y_test)))