## Divisive Clustering

- So, we studied Aggomerative Clustering which is a bottom-up approach.
There's another approach for performing Hierarchical Clustering which is Divisive Clustering.

- Divisive clustering starts with one, all-inclusive cluster. At each step, it splits a cluster until each cluster contains a point (or there are k clusters).

<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/021/193/original/Screenshot_2022-12-09_at_5.52.59_PM.png?1670587870">


- It is the complete opposite of the agglomerative approach
- It is a top-down approach
- It starts with one big cluster that contains all the data points.
- It then divides the points into different clusters till each data point is a cluster itself
- It takes the global distribution of the data into consideration which agglomerative clustering does not because it takes decisions on the basis of the local distribution.
- Divisive clustering is more complex as we need a clustering method to split each cluster until we have each data as a singleton cluster.
- This algorithm also does not require to prespecify the number of clusters.
- At each step cluster is splitted using a flat clustering method. eg. KMeans
- At each step we are splitting the cluster into subsequent clusters which have the largest dissimilarity amongst all the possible splits.

## Stocks Data Pre-Processing

**Dataset - Top 50 NSE stocks**

- Contains company's stock symbol
- Name, Industry and other details

Lets import the dependencies

In [None]:
  import pandas as pd
  import numpy as np
  from matplotlib import pyplot as plt
  plt.rcParams["figure.figsize"] = (18,10)
  import seaborn as sns

Reading the data

In [None]:
!wget "https://drive.google.com/uc?export=download&id=1giO5bbp3l0INVvTQIGJ7s_Ai5_TWNuIb" -O ind_nifty50list.csv

--2022-10-07 12:48:52--  https://drive.google.com/uc?export=download&id=1giO5bbp3l0INVvTQIGJ7s_Ai5_TWNuIb
Resolving drive.google.com (drive.google.com)... 172.217.9.206, 2607:f8b0:4004:806::200e
Connecting to drive.google.com (drive.google.com)|172.217.9.206|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-08-64-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/9a87bq3alo3uj5f39s0f0h31cuiqqdqe/1665146925000/10306167880925931714/*/1giO5bbp3l0INVvTQIGJ7s_Ai5_TWNuIb?e=download&uuid=8e97fde8-6408-4249-b67b-c3e1ed9d862b [following]
--2022-10-07 12:48:53--  https://doc-08-64-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/9a87bq3alo3uj5f39s0f0h31cuiqqdqe/1665146925000/10306167880925931714/*/1giO5bbp3l0INVvTQIGJ7s_Ai5_TWNuIb?e=download&uuid=8e97fde8-6408-4249-b67b-c3e1ed9d862b
Resolving doc-08-64-docs.googleusercontent.com (doc-08-64-docs.googleusercontent.com)... 142.251.163.132, 2607:f8b0:4004:c1

**Installing yfinance**


- **yfinance** is a library that helps us download market data from yahoo finance's api. 

- So lets install these libraries into our environment using pip.

- You can read more about it <a href="https://pypi.org/project/yfinance/"> here </a>


In [None]:
!pip install yfinance  

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance
  Downloading yfinance-0.1.75-py2.py3-none-any.whl (28 kB)
Collecting requests>=2.26
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 1.6 MB/s 
Installing collected packages: requests, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
Successfully installed requests-2.28.1 yfinance-0.1.75


In [None]:
# Since yfinance is an old library, it also has some glitches, so we install this library as a fix
!pip install fix-yahoo-finance  

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**Importing Data**

- Purpose of this dataset: Getting list of Company names which are stored in the 'Symbol' column of the data

In [None]:
stocks_df = pd.read_csv("./ind_nifty50list.csv")
list_of_symbols = list(stocks_df['Symbol'])
stocks_df.head()

Unnamed: 0,Company Name,Industry,Symbol,Series,ISIN Code
0,Adani Ports and Special Economic Zone Ltd.,SERVICES,ADANIPORTS,EQ,INE742F01042
1,Asian Paints Ltd.,CONSUMER GOODS,ASIANPAINT,EQ,INE021A01026
2,Axis Bank Ltd.,FINANCIAL SERVICES,AXISBANK,EQ,INE238A01034
3,Bajaj Auto Ltd.,AUTOMOBILE,BAJAJ-AUTO,EQ,INE917I01010
4,Bajaj Finance Ltd.,FINANCIAL SERVICES,BAJFINANCE,EQ,INE296A01024


**Now, we'll take symbols from original dataset**
- We'll add `.NS` at the end of every symbol to fetch data from Yahoo Finance based on company's symbol.

In [None]:
yf_symbols = list(map(lambda x: x + '.NS', list_of_symbols)) 
yf_symbols

['ADANIPORTS.NS',
 'ASIANPAINT.NS',
 'AXISBANK.NS',
 'BAJAJ-AUTO.NS',
 'BAJFINANCE.NS',
 'BAJAJFINSV.NS',
 'BPCL.NS',
 'BHARTIARTL.NS',
 'BRITANNIA.NS',
 'CIPLA.NS',
 'COALINDIA.NS',
 'DIVISLAB.NS',
 'DRREDDY.NS',
 'EICHERMOT.NS',
 'GRASIM.NS',
 'HCLTECH.NS',
 'HDFCBANK.NS',
 'HDFCLIFE.NS',
 'HEROMOTOCO.NS',
 'HINDALCO.NS',
 'HINDUNILVR.NS',
 'HDFC.NS',
 'ICICIBANK.NS',
 'ITC.NS',
 'IOC.NS',
 'INDUSINDBK.NS',
 'INFY.NS',
 'JSWSTEEL.NS',
 'KOTAKBANK.NS',
 'LT.NS',
 'M&M.NS',
 'MARUTI.NS',
 'NTPC.NS',
 'NESTLEIND.NS',
 'ONGC.NS',
 'POWERGRID.NS',
 'RELIANCE.NS',
 'SBILIFE.NS',
 'SHREECEM.NS',
 'SBIN.NS',
 'SUNPHARMA.NS',
 'TCS.NS',
 'TATACONSUM.NS',
 'TATAMOTORS.NS',
 'TATASTEEL.NS',
 'TECHM.NS',
 'TITAN.NS',
 'UPL.NS',
 'ULTRACEMCO.NS',
 'WIPRO.NS']

**Now, we'll define some variables and fetch the corresponding data from Yahoo Finance for the companies in our list** 

> **NOTE**:
- Below given given takes a lot of time!

In [None]:
import yfinance as yf


stock_financials = {
    'marketCap': [],
    'regularMarketVolume': [],
    'earningsQuarterlyGrowth': [],
    'bookValue': [],
    'totalRevenue': [],
    'returnOnAssets': [],
    'profitMargins': [],
    'earningsGrowth': []
    }

for ticker in yf_symbols:
    stock_info = yf.Ticker(ticker).info
    stock_financials['marketCap'].append(stock_info['marketCap'])
    stock_financials['regularMarketVolume'].append(stock_info['regularMarketVolume'])
    stock_financials['earningsQuarterlyGrowth'].append(stock_info['earningsQuarterlyGrowth'])
    stock_financials['bookValue'].append(stock_info['bookValue'])
    stock_financials['totalRevenue'].append(stock_info['totalRevenue'])
    stock_financials['returnOnAssets'].append(stock_info['returnOnAssets'])
    stock_financials['profitMargins'].append(stock_info['profitMargins'])
    stock_financials['earningsGrowth'].append(stock_info['earningsGrowth'])

**Q. What have we collected?**
We 've collected attributes such as '**marketCap**', '**regularMarketVolume**' and others for companies that we collected in variable '**yf_symbols**'

In [None]:
df = pd.DataFrame(stock_financials)
df.head()

Unnamed: 0,marketCap,regularMarketVolume,earningsQuarterlyGrowth,bookValue,totalRevenue,returnOnAssets,profitMargins,earningsGrowth
0,1725595123712,6085264,-0.179,180.288,159007899648,,0.28442,-0.207
1,3206982664192,774333,0.789,143.991,319870205952,,0.10876,0.788
2,2322266062848,7499675,0.859,400.581,433654104064,0.01474,0.37225,0.855
3,1022417240064,591488,-0.006,1033.043,348966289408,,0.17649,-0.007
4,4447297077248,991031,1.59,724.584,202325999616,,0.42614,1.586


In [None]:
df.shape

(50, 8)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   marketCap                50 non-null     int64  
 1   regularMarketVolume      50 non-null     int64  
 2   earningsQuarterlyGrowth  47 non-null     float64
 3   bookValue                49 non-null     float64
 4   totalRevenue             50 non-null     int64  
 5   returnOnAssets           17 non-null     float64
 6   profitMargins            50 non-null     float64
 7   earningsGrowth           47 non-null     float64
dtypes: float64(5), int64(3)
memory usage: 3.2 KB


**Downloading stock price data**
- Purpose of this data: Getting returns of the stocks 

In [None]:
import yfinance as yf

stock_prices = yf.download(yf_symbols, start='2020-01-01')['Adj Close']
stock_prices.columns = list_of_symbols

[*********************100%***********************]  50 of 50 completed


 - 'Adj Close' refers to Adjusted close which is used in stock market. It is the closing price after adjustments for all applicable splits and dividend distributions

In [None]:
stock_prices.shape

(689, 50)

In [None]:
stock_prices.tail()

Unnamed: 0_level_0,ADANIPORTS,ASIANPAINT,AXISBANK,BAJAJ-AUTO,BAJFINANCE,BAJAJFINSV,BPCL,BHARTIARTL,BRITANNIA,CIPLA,...,SUNPHARMA,TCS,TATACONSUM,TATAMOTORS,TATASTEEL,TECHM,TITAN,UPL,ULTRACEMCO,WIPRO
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2022-09-30 00:00:00+05:30,820.650024,3342.449951,733.200012,3527.75,1678.349976,7335.75,799.900024,304.799988,3843.050049,1114.949951,...,948.650024,802.849976,404.600006,99.300003,3004.550049,1008.599976,2606.949951,6255.100098,672.049988,394.25
2022-10-03 00:00:00+05:30,784.400024,3302.899902,722.75,3515.350098,1646.599976,7171.799805,803.349976,308.799988,3768.949951,1130.75,...,944.5,777.950012,397.649994,98.349998,2984.949951,1005.5,2574.199951,6242.549805,665.099976,394.5
2022-10-04 00:00:00+05:30,823.049988,3337.75,742.799988,3579.600098,1701.150024,7488.700195,808.700012,311.450012,3818.149902,1144.650024,...,944.549988,785.150024,407.899994,101.150002,3091.149902,1028.300049,2585.100098,6307.549805,689.950012,405.5
2022-10-06 00:00:00+05:30,824.099976,3328.949951,755.099976,3594.649902,1710.550049,7404.149902,788.599976,312.149994,3767.550049,1134.449951,...,953.900024,794.299988,414.100006,103.550003,3101.949951,1031.300049,2592.850098,6287.25,686.0,410.149994
2022-10-07 00:00:00+05:30,816.900024,3343.699951,755.700012,3603.550049,1699.0,7345.149902,793.099976,307.399994,3785.649902,1130.5,...,955.150024,780.900024,412.149994,103.300003,3064.899902,1023.849976,2730.5,6203.5,690.450012,408.100006


In [None]:
# splitting data by year - 2020 returns
price_2020 = stock_prices.loc["2020-01-02 00:00:00":"2020-12-31 00:00:00"]

# to see % growth of the stock: ((endDate/startDate) - 1) * 100
stock_prices.loc['returns_2020'] = (price_2020.loc['2020-08-04 00:00:00'] / price_2020.loc['2020-01-02 00:00:00'] - 1)*100

stock_prices

Unnamed: 0_level_0,ADANIPORTS,ASIANPAINT,AXISBANK,BAJAJ-AUTO,BAJFINANCE,BAJAJFINSV,BPCL,BHARTIARTL,BRITANNIA,CIPLA,...,SUNPHARMA,TCS,TATACONSUM,TATAMOTORS,TATASTEEL,TECHM,TITAN,UPL,ULTRACEMCO,WIPRO
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-01-01 00:00:00+05:30,373.521973,1770.856567,748.700012,2913.946289,930.323730,4214.786133,451.678009,401.235748,2883.702881,472.062744,...,422.185242,319.631714,184.449997,27.305548,2100.150635,705.708984,1143.308838,4032.545898,574.207214,245.750504
2020-01-02 00:00:00+05:30,378.961853,1768.338379,756.950012,2887.027588,942.936829,4229.478516,453.571259,397.889709,2896.747314,469.682129,...,422.817108,319.138245,193.750000,28.303785,2090.509521,709.366638,1144.150269,4210.910156,581.235352,246.345795
2020-01-03 00:00:00+05:30,378.318970,1729.577393,742.950012,2841.747314,927.089966,4177.084473,453.471588,394.584534,2882.896729,466.160767,...,432.197937,314.055328,191.100006,28.236650,2132.171631,717.747009,1128.407837,4185.515137,576.647583,249.123764
2020-01-06 00:00:00+05:30,376.044098,1685.878784,723.250000,2809.926025,897.018066,3981.102051,448.041077,382.873505,2867.005859,462.986542,...,427.677612,308.034790,185.649994,27.626617,2131.977783,713.394836,1147.120728,4123.910645,570.986023,250.165482
2020-01-07 00:00:00+05:30,380.593811,1702.913940,725.750000,2810.203613,902.270569,3992.009277,443.507385,376.222229,2880.809570,464.821655,...,433.947723,310.798309,184.699997,27.792992,2137.209717,719.599121,1148.457153,4208.231934,580.844910,253.191483
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-10-03 00:00:00+05:30,784.400024,3302.899902,722.750000,3515.350098,1646.599976,7171.799805,803.349976,308.799988,3768.949951,1130.750000,...,944.500000,777.950012,397.649994,98.349998,2984.949951,1005.500000,2574.199951,6242.549805,665.099976,394.500000
2022-10-04 00:00:00+05:30,823.049988,3337.750000,742.799988,3579.600098,1701.150024,7488.700195,808.700012,311.450012,3818.149902,1144.650024,...,944.549988,785.150024,407.899994,101.150002,3091.149902,1028.300049,2585.100098,6307.549805,689.950012,405.500000
2022-10-06 00:00:00+05:30,824.099976,3328.949951,755.099976,3594.649902,1710.550049,7404.149902,788.599976,312.149994,3767.550049,1134.449951,...,953.900024,794.299988,414.100006,103.550003,3101.949951,1031.300049,2592.850098,6287.250000,686.000000,410.149994
2022-10-07 00:00:00+05:30,816.900024,3343.699951,755.700012,3603.550049,1699.000000,7345.149902,793.099976,307.399994,3785.649902,1130.500000,...,955.150024,780.900024,412.149994,103.300003,3064.899902,1023.849976,2730.500000,6203.500000,690.450012,408.100006


In [None]:
stock_prices = stock_prices.transpose()
stock_prices.head()

Date,2020-01-01 00:00:00+05:30,2020-01-02 00:00:00+05:30,2020-01-03 00:00:00+05:30,2020-01-06 00:00:00+05:30,2020-01-07 00:00:00+05:30,2020-01-08 00:00:00+05:30,2020-01-09 00:00:00+05:30,2020-01-10 00:00:00+05:30,2020-01-13 00:00:00+05:30,2020-01-14 00:00:00+05:30,...,2022-09-26 00:00:00+05:30,2022-09-27 00:00:00+05:30,2022-09-28 00:00:00+05:30,2022-09-29 00:00:00+05:30,2022-09-30 00:00:00+05:30,2022-10-03 00:00:00+05:30,2022-10-04 00:00:00+05:30,2022-10-06 00:00:00+05:30,2022-10-07 00:00:00+05:30,returns_2020
ADANIPORTS,373.521973,378.961853,378.31897,376.044098,380.593811,381.434509,387.517273,387.764557,386.08316,384.94574,...,863.400024,844.200012,827.099976,816.400024,820.650024,784.400024,823.049988,824.099976,816.900024,-16.891368
ASIANPAINT,1770.856567,1768.338379,1729.577393,1685.878784,1702.91394,1707.259033,1750.463867,1770.214722,1782.410889,1796.483398,...,3438.050049,3470.649902,3570.649902,3384.800049,3342.449951,3302.899902,3337.75,3328.949951,3343.699951,-3.813726
AXISBANK,748.700012,756.950012,742.950012,723.25,725.75,724.5,742.849976,740.049988,737.400024,747.900024,...,742.599976,737.5,716.450012,719.0,733.200012,722.75,742.799988,755.099976,755.700012,-43.305372
BAJAJ-AUTO,2913.946289,2887.027588,2841.747314,2809.926025,2810.203613,2829.906982,2854.374023,2868.711914,2862.098145,2869.082275,...,3574.5,3541.899902,3545.899902,3476.699951,3527.75,3515.350098,3579.600098,3594.649902,3603.550049,-0.566448
BAJFINANCE,930.32373,942.936829,927.089966,897.018066,902.270569,907.215576,931.990356,929.70874,937.9422,947.822266,...,1676.800049,1679.550049,1652.199951,1635.900024,1678.349976,1646.599976,1701.150024,1710.550049,1699.0,-34.809406


**Putting the data together**

In [None]:
prices = stock_prices.iloc[:, -1]
df.index = stock_prices.index
df['return_2020'] = prices
df.head()

Unnamed: 0,marketCap,regularMarketVolume,earningsQuarterlyGrowth,bookValue,totalRevenue,returnOnAssets,profitMargins,earningsGrowth,return_2020
ADANIPORTS,1725595123712,6085264,-0.179,180.288,159007899648,,0.28442,-0.207,-16.891368
ASIANPAINT,3206982664192,774333,0.789,143.991,319870205952,,0.10876,0.788,-3.813726
AXISBANK,2322266062848,7499675,0.859,400.581,433654104064,0.01474,0.37225,0.855,-43.305372
BAJAJ-AUTO,1022417240064,591488,-0.006,1033.043,348966289408,,0.17649,-0.007,-0.566448
BAJFINANCE,4447297077248,991031,1.59,724.584,202325999616,,0.42614,1.586,-34.809406


**Checking for null values**

In [None]:
df.isna().sum()

marketCap                   0
regularMarketVolume         0
earningsQuarterlyGrowth     3
bookValue                   1
totalRevenue                0
returnOnAssets             33
profitMargins               0
earningsGrowth              3
return_2020                 0
dtype: int64

In [None]:
df['returnOnAssets'] = df['returnOnAssets'].replace(np.nan, 0)

In [None]:
df.dropna(axis=0, inplace=True)
df.shape

(46, 9)

In [None]:
df.head()

Unnamed: 0,marketCap,regularMarketVolume,earningsQuarterlyGrowth,bookValue,totalRevenue,returnOnAssets,profitMargins,earningsGrowth,return_2020
ADANIPORTS,1725595123712,6085264,-0.179,180.288,159007899648,0.0,0.28442,-0.207,-16.891368
ASIANPAINT,3206982664192,774333,0.789,143.991,319870205952,0.0,0.10876,0.788,-3.813726
AXISBANK,2322266062848,7499675,0.859,400.581,433654104064,0.01474,0.37225,0.855,-43.305372
BAJAJ-AUTO,1022417240064,591488,-0.006,1033.043,348966289408,0.0,0.17649,-0.007,-0.566448
BAJFINANCE,4447297077248,991031,1.59,724.584,202325999616,0.0,0.42614,1.586,-34.809406


**Q. Should we Scale the values?**

- The data seems to be in different scales
- Different financial metrics have different scales.
- We need to put them on the same scale.

We can use `StandardScaler` from `sklearn`

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(df)
X = scaler.transform(df)

In [None]:
scaled_df = pd.DataFrame(X, columns=df.columns, index=df.index)

In [None]:
scaled_df

Unnamed: 0,marketCap,regularMarketVolume,earningsQuarterlyGrowth,bookValue,totalRevenue,returnOnAssets,profitMargins,earningsGrowth,return_2020
ADANIPORTS,-0.374159,0.297884,-0.736736,-0.470171,-0.642466,-0.480395,1.20564,-0.777249,-0.360298
ASIANPAINT,0.122779,-0.573685,0.311996,-0.526754,-0.507004,-0.480395,-0.509499,0.321719,0.112946
AXISBANK,-0.174003,0.530001,0.387834,-0.126762,-0.411186,-0.163965,2.063209,0.39572,-1.316149
BAJAJ-AUTO,-0.610044,-0.603692,-0.549308,0.859167,-0.482502,-0.480395,0.151815,-0.556351,0.230456
BAJFINANCE,0.538849,-0.538123,1.1798,0.378318,-0.605988,-0.480395,2.589389,1.203102,-1.008703
BAJAJFINSV,-0.045028,-0.512041,0.076898,-0.356969,-0.195759,-0.480395,-0.858658,0.085358,-0.603871
BHARTIARTL,0.630661,0.287977,4.51451,-0.574809,0.259542,-0.480395,-1.128632,4.379617,-0.169372
BRITANNIA,-0.64718,-0.648503,-0.687983,-0.58566,-0.655178,-0.480395,-0.572281,-0.696621,1.206772
CIPLA,-0.646965,-0.50664,-0.586144,-0.348527,-0.594185,-0.480395,-0.448376,-0.592799,2.102
COALINDIA,-0.478156,1.701987,1.393231,-0.642087,0.182809,-0.480395,0.401968,1.426209,-0.96165


The above data is the same dataset that was used in the lecture for making dendgrogram in Hierarchical Clustering.

## Pros and Cons of different methods to update Proximity Matrix


- When computing Proximity Matrix in Agglomerative clustering, different types of distance metrics can be used.

**How can one know which one to use?**

- Below given are some points where you'll see how different distance metrics can or cannot handle certain distributions of data.

- Based on the distribution of your data, or based on the domain you're working on, you'll choose the distance-metric that best fits your data.


###**Minimum Distance**

- Can handle non-elliptical shapes
<img src="https://drive.google.com/uc?export=view&id=1Hpa5f_DXuHlpckvA9TikTT7U7UcrlgeH">

- But, sensitive to noise and outliers
<img src="https://drive.google.com/uc?export=view&id=1HsIEDFSGeSyMWM4HOskfMit2Vfrt84BD">

### **Maximum Distance**

- Pros:
  - Less susceptible to noise and outliers

<img src="https://drive.google.com/uc?export=view&id=1gaLw7tsMa1UVxZD0Ly0uvA5Dr_TxLir3">

- Cons:
  - It tends to break large clusters
  - Biased towards globular clusters
  - This was also a limitation of K-Means

<img src="https://drive.google.com/uc?export=view&id=10NeziNAWbbLQOkNvKhOOTM7BymgFoFbf">

### **Tradeoff between MIN and MAX**

- Using **Group average** 
- Using **Wards' Distance** (Scikit-Learn's default linkage in agglomerative clustering)

- Pros:
  - Less susceptible to noise and outliers
- Cons:
  - Biased towards globular clusters