## DBSCAN:

#### DBSCAN目的是找到密度相連對象的最大集合
<br>基於密度的聚類算法。與劃分和層次聚類方法不同，它將簇定義為密度相連的點的最大集合，能夠把具有足夠高密度的區域劃分為簇，並可在噪聲的空間數據庫中發現任意形狀的聚類。

### 算法優點：
1. 與K-means方法相比，DBSCAN不需要事先知道要形成的簇類的數量。

2. 與K-means方法相比，DBSCAN可以發現任意形狀的簇類。

3. 同時，DBSCAN能夠識別出噪聲點。對離群點有較好的魯棒性，甚至可以檢測離群點。 註：魯棒性，指系統在擾動或不確定的情況下仍能保持它們的特徵行為。

4. DBSCAN對於數據庫中樣本的順序不敏感，即Pattern的輸入順序對結果的影響不大。但是，對於處於簇類之間邊界樣本，可能會根據哪個簇類優先被探測到而其歸屬有所擺動。

5. DBSCAN被設計與數據庫一同使用，可以加速區域的查詢。例如 使用R*樹

### 缺點：
1. DBScan不能很好反映高維數據。

2. DBScan不能很好反映數據集以變化的密度。

3. 由於DBSCAN算法直接對整個數據集進行操作，並且在聚類之前需要建立相應的R*樹，並繪制k-dist圖，因此算法所需的內存空間和I/O消耗都相當可觀。在計算資源有限而數據量又非常龐大的情況下，DBSCAN算法的效率將受到很大影響。
<br>（DBSCAN算法將區域查詢得到的所有未被處理過的點都作為種子點，留待下一步擴展處理。對於大規模數據集中的較大類而言，這種策略會使種子點的數目不斷膨脹，算法所需的內存空間也會快速增加。）

4. 由於DBSCAN算法使用了全局性表征密度的參數，因此當各個類的密度不均勻，或類間的距離相差很大時，聚類的質量較差。
<br>（當各個類的密度不均勻、或類間的距離相差很大時，如果根據密度較高的類選取較小的Eps值，那麽密度相對較低的類中的對象Eps 鄰域中的點數將小Minpts，則這些點將會被錯當成邊界點，從而不被用於所在類的進一步擴展，因此導致密度較低的類被劃分成多個性質相似的類。與此相反，如果根據密度較低的類來選取較大的Eps值，則會導致離得較近而密度較大的類被合並，而它們之間的差異被忽略。所以在上述情況下，很難選取一個合適的全局Eps值來獲得比較準確的聚類結果。）

5. DBSCAN不是完全確定的，邊界點從不同的簇中獲得，可以使不同簇的一部分，取決於數據處理。

6. DBSCAN的質量取決於regionQuery(P,Eps)函數中距離的測量。最常用的距離度量是歐式距離，尤其是在高維數據中，由於所謂的維數災難，這種度量基本上是無用的，很難為E找到一個恰當的值。雖然目前有一些基於歐式距離的算法，但是如果不能對數據和規模有很好的了解，也很難找一個有意義的距離閾值E。

7. 當密度差異大時，由於選取的MinPts-Eps組合不能同時適合所有的簇，DBSACN不能很好的進行數據聚類。（缺點4）

8. 輸入參數敏感,確定參數Eps , MinPts困難 ,若選取不當 ,將造成聚類質量下降。

9. 由於經典的DBSCAN算法中參數Eps和MinPts在聚類過程中是不變的，使得該算法難以適應密度不均勻的數據集。

### 算法改進：

請詳見參考資料。

參考資料：
https://www.itread01.com/content/1497934934.html

##### sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=1)

#### 引數:

    1 eps(float, optional): 鄰域，較低的``eps''表示形成聚類所需的較高密度；
    2 min_samples(int, optional): 以核心點為半徑的鄰域內的樣本數量；
    3 metric(string, or callable): 樣本之間的距離度量,可使用自定義方法；
    4 metric_params=None(dict, optional): 度量函式的附加引數，0.19版本才出現；
    5 algorithm( {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional): 最近鄰模組使用的演算法,逐點計算距離以發現近鄰；
    6 leaf_size(int, optional (default = 30)): 'ball_tree'或'kd_tree'使用
    7 p(float, optional): 閔可夫斯基的冪，以計算點與點之間的距離;
    8 n_jobs(int, optional (default = 1)): 並行作業的數量。

#### 屬性:

    1 core_sample_indices_: 核心點序號，從0開始；
    2 components_: 核心點原始資料；
    3 labels_: 各樣本所屬的類別標號，-1為噪聲點。

#### 方法:

    1　fit(X[, y, sample_weight]): 對X執行DBSCAN聚類；
    2　fit_predict(X[, y, sample_weight]): 對X執行DBSCAN聚類，並返回類別號；
    3　get_params([deep]): 獲得演算法的引數；
    4　set_params(**params): 設定演算法的引數。
    
參考資料：

https://www.itread01.com/p/518873.html

In [533]:
# 能處理載入、整理與視覺化等常見的資料應用套件
import pandas as pd

# 資料視覺化
import matplotlib.pyplot as plt
%matplotlib inline

# 分群 DBSCAN
from sklearn.cluster import DBSCAN

# 以 matplotlib 為基礎建構的高階繪圖套件，讓使用者更加輕鬆地建立圖表，可視為是 matplotlib 的補強。
import seaborn as sns

# 能快速操作多重維度的陣列，具備平行處理的能力，可以將操作動作一次套用在大型陣列上。
import numpy as np

# 計算欄位內元素的頻度
from collections import Counter

# 載入、儲存圖片用
from PIL import Image

# 用來處理文件和目錄
import os

In [534]:
os.getcwd()

'E:\\III\\III\\Workplace\\TopicProject\\DataProcessing'

#### 欄位名稱
品牌：Brand；
食物名：Name；
品牌+食物名：BraName；
食用量：Intake； 
熱量：Calories, Cal；
蛋白質：Protein； 
脂肪：Fat；
飽和脂肪：Saturated fatty acid, SF；
不飽和脂肪 as Unsaturated fat, USF；
多（元）飽和脂肪：Polyunsaturated fat, PUSF；
單（元）飽和脂肪：Monounsaturated fat, MUSF；
膽固醇：Total cholesterol, TC；
碳水化合物：Carbohydrate, CHO；
糖：Sugar； 
纖維：Fiber；
鈉：Na；
鉀：K；
資訊來源：Resource； 
更新時間：UpdateTime； 
反鏈脂肪：Trans fat,TF。

In [535]:
# 讀入 csv 文字檔
# 之前已經個資料庫檔案inner join的檔案
food_csv = "../Data/Data_allFood_g_innerJoin.csv"
dfi = pd.read_csv(food_csv, encoding='utf-8')

### 覽閱一下資料

In [536]:
dfi.head()

Unnamed: 0,Brand,Name,BraName,Intake_g,Cal_kcal,Protein_g,Fat_g,SF_g,CHO_g,Sugar_g,Na_g,Resource,UpdateTime
0,,大豆/黃豆（水煮）,_大豆/黃豆（水煮）,453.59237,367.0,38.42,20.19,2.799,29.62,1.95,0.045,fatsecret中國,2008024
1,,烤豬肉,_烤豬肉,28.349523,54.0,5.94,3.21,1.176,0.0,0.0,0.049,fatsecret中國,20070821
2,,煮熟的萵苣菜（烹飪中加油）,_煮熟的萵苣菜（烹飪中加油）,100.0,69.0,1.74,5.38,1.025,4.75,0.34,0.585,fatsecret中國,20070821
3,好時,醇濃黑巧克力,好時_醇濃黑巧克力,100.0,570.0,9.2,37.6,,34.2,,0.008,fatsecret中國,20130331
4,,蟹肉餅/蟹肉條,_蟹肉餅/蟹肉條,63.0,106.0,14.0,5.03,0.965,0.42,0.18,0.217,fatsecret中國,20070821


In [537]:
dfi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6973 entries, 0 to 6972
Data columns (total 13 columns):
Brand         2823 non-null object
Name          6973 non-null object
BraName       6973 non-null object
Intake_g      6973 non-null float64
Cal_kcal      6973 non-null float64
Protein_g     6952 non-null float64
Fat_g         6944 non-null float64
SF_g          4516 non-null float64
CHO_g         6973 non-null float64
Sugar_g       3659 non-null float64
Na_g          6547 non-null float64
Resource      6973 non-null object
UpdateTime    6973 non-null int64
dtypes: float64(8), int64(1), object(4)
memory usage: 708.3+ KB


<font color = #708090> myfitness 資料有問題，營養成分相加不應該大於食用量，故不採用。 </font> 

#### 刪除df_drink['BraName']重複值
```python
df = df.drop_duplicates(subset='BraName', keep='first') 
```
dataframe.drop_duplicates(subset='column', keep='first', inplace = False) 
<br>subset：以哪個欄位當主；keep：保留哪一值，{'first', 'last', 'False'（刪除所有重複）}, default 'first'；
<br>inplace : boolean, default False. Whether to drop duplicates in place or to return a copy

In [538]:
dfi = dfi.drop_duplicates(subset='BraName', keep='last') #取最後的，因為以官網為主，資料庫的資料為輔

#重整index
dfi = dfi.reset_index(drop=True)

dfi.head()

Unnamed: 0,Brand,Name,BraName,Intake_g,Cal_kcal,Protein_g,Fat_g,SF_g,CHO_g,Sugar_g,Na_g,Resource,UpdateTime
0,,大豆/黃豆（水煮）,_大豆/黃豆（水煮）,453.59237,367.0,38.42,20.19,2.799,29.62,1.95,0.045,fatsecret中國,2008024
1,,烤豬肉,_烤豬肉,28.349523,54.0,5.94,3.21,1.176,0.0,0.0,0.049,fatsecret中國,20070821
2,,煮熟的萵苣菜（烹飪中加油）,_煮熟的萵苣菜（烹飪中加油）,100.0,69.0,1.74,5.38,1.025,4.75,0.34,0.585,fatsecret中國,20070821
3,好時,醇濃黑巧克力,好時_醇濃黑巧克力,100.0,570.0,9.2,37.6,,34.2,,0.008,fatsecret中國,20130331
4,,蟹肉餅/蟹肉條,_蟹肉餅/蟹肉條,63.0,106.0,14.0,5.03,0.965,0.42,0.18,0.217,fatsecret中國,20070821


#### 單位化
營養成分/食用量
```python
df[i]  = df.apply(lambda x: x [i]  / x['Intake_g'], axis=1)
```

In [539]:
# 建立新的df，以做運算
dfa = dfi[['Intake_g', 'Cal_kcal', 'Protein_g', 'Fat_g', 'SF_g', 'CHO_g', 'Sugar_g','Na_g']]

In [540]:
list_ = ['Intake_g', 'Cal_kcal', 'Protein_g', 'Fat_g', 'SF_g', 'CHO_g', 'Sugar_g','Na_g']
for i in list_:
    dfi[i]  = dfa.apply(lambda x: x[i]  / x['Intake_g'], axis=1)

In [541]:
dfi

Unnamed: 0,Brand,Name,BraName,Intake_g,Cal_kcal,Protein_g,Fat_g,SF_g,CHO_g,Sugar_g,Na_g,Resource,UpdateTime
0,,大豆/黃豆（水煮）,_大豆/黃豆（水煮）,1.0,0.809097,0.084702,0.044511,0.006171,0.065301,0.004299,0.000099,fatsecret中國,2008024
1,,烤豬肉,_烤豬肉,1.0,1.904794,0.209527,0.113229,0.041482,0.000000,0.000000,0.001728,fatsecret中國,20070821
2,,煮熟的萵苣菜（烹飪中加油）,_煮熟的萵苣菜（烹飪中加油）,1.0,0.690000,0.017400,0.053800,0.010250,0.047500,0.003400,0.005850,fatsecret中國,20070821
3,好時,醇濃黑巧克力,好時_醇濃黑巧克力,1.0,5.700000,0.092000,0.376000,,0.342000,,0.000080,fatsecret中國,20130331
4,,蟹肉餅/蟹肉條,_蟹肉餅/蟹肉條,1.0,1.682540,0.222222,0.079841,0.015317,0.006667,0.002857,0.003444,fatsecret中國,20070821
5,,香瓜,_香瓜,1.0,0.260000,0.006500,0.001500,0.000400,0.063300,0.060900,0.000130,fatsecret中國,20130115
6,,豬仔包,_豬仔包,1.0,2.461538,0.079538,0.027077,0.005800,0.467538,0.002615,0.005477,fatsecret中國,20141214
7,,湯包,_湯包,1.0,1.960000,0.079300,0.061600,0.013560,0.291000,0.004400,0.003350,fatsecret中國,20151113
8,,砂糖曲奇餅（包括香草味）,_砂糖曲奇餅（包括香草味）,1.0,4.797259,0.051147,0.210938,0.054357,0.679024,0.377079,0.003563,fatsecret中國,2008024
9,,納豆,_納豆,1.0,2.120000,0.177200,0.110000,0.015910,0.143600,0.036000,0.000070,fatsecret中國,2008024


缺值補0
```python
df= df.fillna(0)
```

In [542]:
dfi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6973 entries, 0 to 6972
Data columns (total 13 columns):
Brand         2823 non-null object
Name          6973 non-null object
BraName       6973 non-null object
Intake_g      6973 non-null float64
Cal_kcal      6973 non-null float64
Protein_g     6952 non-null float64
Fat_g         6944 non-null float64
SF_g          4516 non-null float64
CHO_g         6973 non-null float64
Sugar_g       3659 non-null float64
Na_g          6547 non-null float64
Resource      6973 non-null object
UpdateTime    6973 non-null int64
dtypes: float64(8), int64(1), object(4)
memory usage: 708.3+ KB


In [543]:
dfi = dfi.fillna(0) 

In [544]:
dfi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6973 entries, 0 to 6972
Data columns (total 13 columns):
Brand         6973 non-null object
Name          6973 non-null object
BraName       6973 non-null object
Intake_g      6973 non-null float64
Cal_kcal      6973 non-null float64
Protein_g     6973 non-null float64
Fat_g         6973 non-null float64
SF_g          6973 non-null float64
CHO_g         6973 non-null float64
Sugar_g       6973 non-null float64
Na_g          6973 non-null float64
Resource      6973 non-null object
UpdateTime    6973 non-null int64
dtypes: float64(8), int64(1), object(4)
memory usage: 708.3+ KB


#### 百分比化
因為df中，有3個欄位皆為0，而0不能作為除數，故不能用下列方法：
```python
df[i]  = df.apply(lambda x: x[a]/ (x[a]+x[b]+x[c]) *100, axis=1)
```
須改用：
```python
for col in list_col:
    for i in range(0, number of row):
        if x['a'][i]+ x['b'][i] + x['c'][i] != 0:
            df[col][i] = df['a'][i]/ (x['a'][i]+ x['b'][i] + x['c'][i])
```

In [545]:
# 先準備百分比欄
list_col = ['Protein_g', 'Fat_g', 'CHO_g']
for i in list_col:
    newCol = "per" + i  
    dfi[newCol] = 0  
    
# 檢查一下    
dfi.head(3)

Unnamed: 0,Brand,Name,BraName,Intake_g,Cal_kcal,Protein_g,Fat_g,SF_g,CHO_g,Sugar_g,Na_g,Resource,UpdateTime,perProtein_g,perFat_g,perCHO_g
0,0,大豆/黃豆（水煮）,_大豆/黃豆（水煮）,1.0,0.809097,0.084702,0.044511,0.006171,0.065301,0.004299,9.9e-05,fatsecret中國,2008024,0,0,0
1,0,烤豬肉,_烤豬肉,1.0,1.904794,0.209527,0.113229,0.041482,0.0,0.0,0.001728,fatsecret中國,20070821,0,0,0
2,0,煮熟的萵苣菜（烹飪中加油）,_煮熟的萵苣菜（烹飪中加油）,1.0,0.69,0.0174,0.0538,0.01025,0.0475,0.0034,0.00585,fatsecret中國,20070821,0,0,0


In [546]:
#多列運算 #各百分比化 #np.mean(df['Intake_g'])
list_percol = ['perProtein_g',	'perFat_g',	'perCHO_g']
list_col = ['Protein_g', 'Fat_g', 'CHO_g']

for i in range (len(dfi['BraName'])):
    
    total = dfi['Protein_g'][i] + dfi['Fat_g'][i] + dfi['CHO_g'][i]
    dfi['perProtein_g'][i] = dfi['Protein_g'][i]/total *100
    dfi['perFat_g'][i]     = dfi['Fat_g'][i]/total *100
    dfi['perCHO_g'][i]     = dfi['CHO_g'][i]/total *100
#df    

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
  if __name__ == '__main__':
  # Remove the CW

In [547]:
dfi.head()

Unnamed: 0,Brand,Name,BraName,Intake_g,Cal_kcal,Protein_g,Fat_g,SF_g,CHO_g,Sugar_g,Na_g,Resource,UpdateTime,perProtein_g,perFat_g,perCHO_g
0,0,大豆/黃豆（水煮）,_大豆/黃豆（水煮）,1.0,0.809097,0.084702,0.044511,0.006171,0.065301,0.004299,9.9e-05,fatsecret中國,2008024,43.0,22.0,33.0
1,0,烤豬肉,_烤豬肉,1.0,1.904794,0.209527,0.113229,0.041482,0.0,0.0,0.001728,fatsecret中國,20070821,64.0,35.0,0.0
2,0,煮熟的萵苣菜（烹飪中加油）,_煮熟的萵苣菜（烹飪中加油）,1.0,0.69,0.0174,0.0538,0.01025,0.0475,0.0034,0.00585,fatsecret中國,20070821,14.0,45.0,40.0
3,好時,醇濃黑巧克力,好時_醇濃黑巧克力,1.0,5.7,0.092,0.376,0.0,0.342,0.0,8e-05,fatsecret中國,20130331,11.0,46.0,42.0
4,0,蟹肉餅/蟹肉條,_蟹肉餅/蟹肉條,1.0,1.68254,0.222222,0.079841,0.015317,0.006667,0.002857,0.003444,fatsecret中國,20070821,71.0,25.0,2.0


In [548]:
dfi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6973 entries, 0 to 6972
Data columns (total 16 columns):
Brand           6973 non-null object
Name            6973 non-null object
BraName         6973 non-null object
Intake_g        6973 non-null float64
Cal_kcal        6973 non-null float64
Protein_g       6973 non-null float64
Fat_g           6973 non-null float64
SF_g            6973 non-null float64
CHO_g           6973 non-null float64
Sugar_g         6973 non-null float64
Na_g            6973 non-null float64
Resource        6973 non-null object
UpdateTime      6973 non-null int64
perProtein_g    6969 non-null float64
perFat_g        6969 non-null float64
perCHO_g        6969 non-null float64
dtypes: float64(11), int64(1), object(4)
memory usage: 871.8+ KB


## DBSCAN

#### eps : float, optional
- 核心點（core point）的半徑，若2點很靠近且符合eps（在半徑內），則歸屬於同一群集。

#### min_samples : int, optional
- 一群中最少要有幾個資料點。

ε (eps) 和形成高密度區域所需要的最少點數 (minPts)，它由一個任意未被訪問的點開始，然後探索這個點的 ε-鄰域，如果 ε-鄰域裡有足夠的點，則建立一個新的聚類，否則這個點被標籤為雜音。注意這個點之後可能被發現在其它點的 ε-鄰域裡，而該 ε-鄰域可能有足夠的點，屆時這個點會被加入該聚類中。

參考資料：https://zh.wikipedia.org/wiki/DBSCAN

In [549]:
# 抽出欄，製作預之dataframe
df_k = dfi[['perProtein_g',	'perFat_g',	'perCHO_g']]
df_k.head()

Unnamed: 0,perProtein_g,perFat_g,perCHO_g
0,43.0,22.0,33.0
1,64.0,35.0,0.0
2,14.0,45.0,40.0
3,11.0,46.0,42.0
4,71.0,25.0,2.0


In [550]:
# 檢視遺漏值
df_k[df_k.isnull().values==True]

Unnamed: 0,perProtein_g,perFat_g,perCHO_g
1366,,,
1366,,,
1366,,,
2538,,,
2538,,,
2538,,,
4164,,,
4164,,,
4164,,,
4246,,,


In [551]:
df_k = df_k.fillna(0)

In [552]:
# 檢視遺漏值
df_k[df_k.isnull().values==True]

Unnamed: 0,perProtein_g,perFat_g,perCHO_g


In [553]:
db = DBSCAN(eps = 4, min_samples = 1).fit(df_k)
labels = db.labels_

Counter(labels)

Counter({0: 6957, 1: 3, 2: 4, 3: 4, 4: 4, 5: 1})

In [554]:
dfi["label"] = labels
dfi["label"].describe()

count    6973.000000
mean        0.006310
std         0.143577
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         5.000000
Name: label, dtype: float64

存檔變數區

In [557]:
param = 3
Stand = 'Percent'
DBSN = 'eps4_minSamples1'
path = '..\\Output\\DataAnalysis_food_Clustering_DBSCAN_'

In [558]:
# 存一下檔
dfi.to_csv(path + Stand + '_' + DBSN +  '_' + str(param)+  'param' +'.csv', encoding='utf-8', index=False)

In [None]:
##Cal_kcal vs. others

title = 'Cal_kcal' 


# 制訂畫布
plt.figure(num=None, figsize=(12, 12), dpi=300, facecolor='w', edgecolor='k')

# 制訂畫布抬頭
plt.suptitle('k = ' + str(k) + ('\n') + title + ' vs. others', ha = 'center', va = 'bottom',
                 fontsize=20)

# 子圖
plt.subplot(2,2,1)
sns.scatterplot(x=dfi["Cal_kcal"],
                y=dfi["Protein_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,2)
sns.scatterplot(x=dfi["Cal_kcal"],
                y=dfi["Fat_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,3)
sns.scatterplot(x=dfi["Cal_kcal"],
                y=dfi["CHO_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,4)
sns.scatterplot(x=dfi["Cal_kcal"],
                y=dfi["Sugar_g"],
                hue=dfi["label"], palette="Accent")

# 儲存圖片
plt.savefig(path + Stand + 'k' + str(k)
            + str(k) + '_' + str(title) + 'VsOthers' +'.jpg', dpi=300, bbox_inches = 'tight')
plt.show()

In [None]:
# Protein_g vs. others

title = 'Protein_g' 


# 制訂畫布
plt.figure(num=None, figsize=(12, 12), dpi=300, facecolor='w', edgecolor='k')

# 制訂畫布抬頭
plt.suptitle('k = ' + str(k) + ('\n') + title + ' vs. others', ha = 'center', va = 'bottom',
                 fontsize=20)

# 子圖
plt.subplot(2,2,1)
sns.scatterplot(x=dfi["Protein_g"],
                y=dfi["Cal_kcal"],
                hue=dfi["label"], palette="Accent")

plt.subplot(2,2,2)
sns.scatterplot(x=dfi["Protein_g"],
                y=dfi["Fat_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,3)
sns.scatterplot(x=dfi["Protein_g"],
                y=dfi["CHO_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,4)
sns.scatterplot(x=dfi["Protein_g"],
                y=dfi["Sugar_g"],
                hue=dfi["label"], palette="Accent")

# 儲存圖片
plt.savefig(path + Stand + 'k' + str(k) + '_' + str(title) + 'VsOthers' +'.jpg', dpi=300, bbox_inches = 'tight')
plt.show()

In [None]:
# Fat_g vs. others

title = 'Fat_g' 


# 制訂畫布
plt.figure(num=None, figsize=(12, 12), dpi=300, facecolor='w', edgecolor='k')

# 制訂畫布抬頭
plt.suptitle('k = ' + str(k) + ('\n') + title + ' vs. others', ha = 'center', va = 'bottom',
                 fontsize=20)

# 子圖
plt.subplot(2,2,1)
sns.scatterplot(x=dfi["Fat_g"],
                y=dfi["Cal_kcal"],
                hue=dfi["label"], palette="Accent")

plt.subplot(2,2,2)
sns.scatterplot(x=dfi["Fat_g"],
                y=dfi["Protein_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,3)
sns.scatterplot(x=dfi["Fat_g"],
                y=dfi["CHO_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,4)
sns.scatterplot(x=dfi["Fat_g"],
                y=dfi["Sugar_g"],
                hue=dfi["label"], palette="Accent")

# 儲存圖片
plt.savefig(path + Stand +'k' + str(k) + '_' + str(title) + 'VsOthers' +'.jpg', dpi=300, bbox_inches = 'tight')
plt.show()

In [None]:
# CHO_g vs. others

title = 'CHO_g' 


# 制訂畫布
plt.figure(num=None, figsize=(12, 12), dpi=300, facecolor='w', edgecolor='k')

# 制訂畫布抬頭
plt.suptitle('k = ' + str(k) + ('\n') + title + ' vs. others', ha = 'center', va = 'bottom',
                 fontsize=20)
# 子圖
plt.subplot(2,2,1)
sns.scatterplot(x=dfi["CHO_g"],
                y=dfi["Cal_kcal"],
                hue=dfi["label"], palette="Accent")

plt.subplot(2,2,2)
sns.scatterplot(x=dfi["CHO_g"],
                y=dfi["Protein_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,3)
sns.scatterplot(x=dfi["CHO_g"],
                y=dfi["Fat_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,4)
sns.scatterplot(x=dfi["CHO_g"],
                y=dfi["Sugar_g"],
                hue=dfi["label"], palette="Accent")

# 儲存圖片
plt.savefig(path + Stand + 'k' + str(k) + '_' + str(title) + 'VsOthers' +'.jpg', dpi=300, bbox_inches = 'tight')

plt.show()

In [None]:
# Sugar_g vs. others

title = 'Sugar_g' 

# 制訂畫布
plt.figure(num=None, figsize=(12, 12), dpi=300, facecolor='w', edgecolor='k')

# 制訂畫布抬頭
plt.suptitle('k = ' + str(k) + ('\n') + title + ' vs. others', ha = 'center', va = 'bottom',
                 fontsize=20)

# 子圖
plt.subplot(2,2,1)
sns.scatterplot(x=dfi["Sugar_g"],
                y=dfi["Cal_kcal"],
                hue=dfi["label"], palette="Accent")

plt.subplot(2,2,2)
sns.scatterplot(x=dfi["Sugar_g"],
                y=dfi["Protein_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,3)
sns.scatterplot(x=dfi["Sugar_g"],
                y=dfi["Fat_g"],
                hue=dfi["label"], palette="Accent")
 
plt.subplot(2,2,4)
sns.scatterplot(x=dfi["Sugar_g"],
                y=dfi["CHO_g"],
                hue=dfi["label"], palette="Accent")

# 儲存圖片
plt.savefig(path + Stand + 'k' + str(k) + '_' + str(title) + 'VsOthers' +'.jpg', dpi=300, bbox_inches = 'tight')
plt.show()