#### `Question: Randomly generate a matrix of shape (1Million, 2) and perform below mentioned operations`:
- a. ***Find the distances between each 2-Dimensional data point from the centroid (i.e. mean) of the given dataset. Append the newly calculated distances as new column with the given dataset.**
- b. **Given any data point, generate 3 closest neighbors from it.**

### `a ---------------------------------------------------------------------------`

In [16]:
import numpy as np

#### `Generate an array`

In [17]:
arr = np.random.randint(1000,size = (1000000,2))
arr

array([[594,  10],
       [717, 956],
       [689, 667],
       ...,
       [604, 533],
       [250, 434],
       [539, 701]])

#### `Check the shape`

In [18]:
arr.shape

(1000000, 2)

#### `Find the centroid`

In [19]:
centroid = np.mean(arr,axis=0)  # mean of 2 columns saperately
centroid

array([500.088819, 499.696821])

#### `Use of broadcasting to decrease time complexity`

In [20]:
arr - centroid

array([[  93.911181, -489.696821],
       [ 216.911181,  456.303179],
       [ 188.911181,  167.303179],
       ...,
       [ 103.911181,   33.303179],
       [-250.088819,  -65.696821],
       [  38.911181,  201.303179]])

#### `Find the distance between each point and centroid`

In [21]:
dist = np.sqrt(np.sum((arr - centroid)**2,axis=1))
dist

array([498.62038307, 505.23563969, 252.34458189, ..., 109.11752961,
       258.57395398, 205.02938785])

#### `Verify the shape`

In [22]:
dist.shape

(1000000,)

#### `Change dimension for stacking`

In [23]:
dist = dist.reshape(-1,1)

### `Distance of every point from centroid is in third column`

In [24]:
a = np.hstack([arr,dist])
a

array([[594.        ,  10.        , 498.62038307],
       [717.        , 956.        , 505.23563969],
       [689.        , 667.        , 252.34458189],
       ...,
       [604.        , 533.        , 109.11752961],
       [250.        , 434.        , 258.57395398],
       [539.        , 701.        , 205.02938785]])

### `b -----------------------------------------------------------------------------`
[numpy.argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html)

#### `Given any data point, generate 3 closest neighbors from it.`

In [25]:
data_point = list(map(float,input().split()))         # input a point from user
dist = np.sqrt(np.sum((arr - data_point)**2,axis=1))  # distance between arr point and user input point
dist

250 250


array([419.44725533, 846.47799735, 605.48327805, ..., 453.21628391,
       184.        , 535.65100579])

In [31]:
dist = dist.reshape(-1,1)  # change dimension for stacking
b = np.hstack([arr,dist])     # horizontal stacking
b

array([[594.        ,  10.        , 419.44725533],
       [717.        , 956.        , 846.47799735],
       [689.        , 667.        , 605.48327805],
       ...,
       [604.        , 533.        , 453.21628391],
       [250.        , 434.        , 184.        ],
       [539.        , 701.        , 535.65100579]])

In [34]:
nearest_neighbor = b[b[:,2].argsort()]   # sort points according to distance
print(nearest_neighbor[0:3,0:2])    # nearest neighbor or top 3 less distant points

[[250. 250.]
 [250. 251.]
 [251. 250.]]


## `End of Task -------------------------------------------------------------------- `