# Working of Knn Imputation

In [1]:
# import necessary libraries
import numpy as np
import pandas as pd
from sklearn.impute import KNNImputer

dict = {'Maths': [80, 90, np.nan, 95],
        'Chemistry': [60, 65, 56, np.nan],
        'Physics': [np.nan, 57, 80, 78],
        'Biology': [78, 83, 67, np.nan]}

Before_imputation = pd.DataFrame(dict)

print("Data Before performing imputation\n", Before_imputation)

imputer = KNNImputer(n_neighbors=2) # Calculate mean of the 2 neighbour values for missing value.
After_imputation = imputer.fit_transform(Before_imputation)

print("\n\nAfter performing imputation\n", After_imputation)

Data Before performing imputation
    Maths  Chemistry  Physics  Biology
0   80.0       60.0      NaN     78.0
1   90.0       65.0     57.0     83.0
2    NaN       56.0     80.0     67.0
3   95.0        NaN     78.0      NaN


After performing imputation
 [[80.  60.  68.5 78. ]
 [90.  65.  57.  83. ]
 [87.5 56.  80.  67. ]
 [95.  58.  78.  72.5]]


------------------------------------------------------------------
|Index| Maths | Chemistry | Physics | Biology | Similarity Score |
|-----|-------|-----------|---------|---------|------------------|
|0    |80     |60         |NaN      |78       |14.33              |
|1    |90     |65         |57       |83       |29.42             |
|2    |NaN    |56         |80       |67       |Target rows       |
|3    |95     |NaN        |78       |NaN      |3.46              |
------------------------------------------------------------------

<div style="font-size: 150%;">

Distance (X, y) = $\sqrt{\text{Weight} \times \left[(x_2 - x_1)^2 + (y_2 - y_1)^2\right]}$

Where, 

Weight = $\frac{\text{Total number of coordinates}}{\text{Total number of present coordinates}}$

</div>



# *************************************************************

# *******Handle NaN For Maths*******

In [2]:
# Target row is 2nd index, now i will find how 2nd row is nearer to every row bases on distance
# Now find the distance from index 0
# Distance between index 2 to index 0
# i will not count for particular target column for weightaage cordinates

------------------------------------------------------------------
|Index| Maths | Chemistry | Physics | Biology | Similarity Score |
|-----|-------|-----------|---------|---------|------------------|
|2    |NaN    |56         |80       |67       |Target rows       |

----------------------------------------------------------
|Index| Chemistry | Physics | Biology | Similarity Score |
|-----|-----------|---------|---------|------------------|
|0    |60         |NaN      |78       |14.33             |
|1    |65         |57       |83       |29.42             |
|3    |NaN        |78       |NaN      |3.46              |
----------------------------------------------------------


Distance (X, y) = $\sqrt{\text{Weight} \times \left[(x_2 - x_1)^2 + (y_2 - y_1)^2\right]}$


D = $\sqrt{\text{3/2} \times \left[(56-60)^2 + (67-78)^2\right]}$ = $\sqrt{\text{3/2} (16+121)} $ = $\sqrt{\text{411/2}} $ = 14.33
- similarity score value from target to index 0 --> (14.33)

In [3]:
# Now find the distance from index 1
# Distance between index 2 to index 1

D = $\sqrt{\frac{3}{3} \times \left[(56 - 65)^2 + (80 - 57)^2 + (67 - 83)^2\right]}$ = $\sqrt{(-9)^2 + (23)^2 + (-16)^2}$ = $\sqrt{\text{81+529+256}} $ = $\sqrt{\text{866}} $ = 29.42
- similarity score value from target to index 1 -->(29.42)

In [4]:
# Now find the distance from index 3
# Distance between index 2 to index 3

D = $\sqrt{\frac{3}{1} \times (80 - 78)^2}$ = $\sqrt{\text{3} \times 4} $ = $\sqrt{\text{12}} $ = 3.46
- similarity score value from target to index 2 --> (3.46)

# Conclusion 
- now from the Target second index and index 0 get very closest value
- Now for filling the value in first column i need to take first row and third rows value
- (80 + 95)/2 = `87.5`

# *************************************************************

# *******Handle NaN For Chemistry*******

------------------------------------------------------------------
|Index| Maths | Chemistry | Physics | Biology | Similarity Score |
|-----|-------|-----------|---------|---------|------------------|
|0    |80     |60         |NaN      |78       |25.98             |
|1    |90     |65         |57       |83       |26.43             |
|2    |NaN    |56         |80       |67       |3.56              |
|3    |95     |NaN        |78       |NaN      |Target rows(Chemistry)|
------------------------------------------------------------------

- <b>With index 0</b>

D = $\sqrt{\frac{3}{1} \times (95 - 80)^2}$ = $\sqrt{\text{3} \times 225} $ = $\sqrt{\text{675}} $ = 25.98
- similarity score value from target to index 0 --> (25.98)

- <b>With index 1</b>

D = $\sqrt{\text{3/2} \times \left[(95-90)^2 + (78-57)^2\right]}$ = $\sqrt{\text{3/2} (25+441)} $ = $\sqrt{\text{1398/2}} $ = 26.43
- similarity score value from target to index 1 --> (26.43)

- <b>With index 2</b>

D = $\sqrt{\frac{3}{1} \times (78 - 80)^2}$ = $\sqrt{\text{3} \times 4} $ = $\sqrt{\text{12}} $ = 3.46
- similarity score value from target to index 2 --> (3.46)

# Conclusion 
- now from the `Target`,  index 2 and index 0 get very closest value
- Now for filling the value in second column i need to take first row and third rows value
- (60 + 56)/2 = `58`

# *************************************************************

# *******Handle NaN For Biology*******

------------------------------------------------------------------
|Index| Maths | Chemistry | Physics | Biology | Similarity Score |
|-----|-------|-----------|---------|---------|------------------|
|0    |80     |60         |NaN      |78       |25.98             |
|1    |90     |65         |57       |83       |26.43             |
|2    |NaN    |56         |80       |67       |3.56              |
|3    |95     |NaN        |78       |NaN      |Target rows(Biology)|
------------------------------------------------------------------

- <b>With index 0</b>

D = $\sqrt{\frac{3}{1} \times (95 - 80)^2}$ = $\sqrt{\text{3} \times 225} $ = $\sqrt{\text{675}} $ = 25.98
- similarity score value from target to index 0 --> (25.98)

- <b>With index 1</b>

D = $\sqrt{\text{3/2} \times \left[(95-90)^2 + (78-57)^2\right]}$ = $\sqrt{\text{3/2} (25+441)} $ = $\sqrt{\text{1398/2}} $ = 26.43
- similarity score value from target to index 1 --> (26.43)

- <b>With index 2</b>

D = $\sqrt{\frac{3}{1} \times (78 - 80)^2}$ = $\sqrt{\text{3} \times 4} $ = $\sqrt{\text{12}} $ = 3.46
- similarity score value from target to index 2 --> (3.46)

# Conclusion 
- now from the `Target`,  index 2 and index 0 get very closest value
- Now for filling the value in second column i need to take first row and third rows value
- (78 + 67)/2 = `72.5`


# *************************************************************

# *******Handle NaN For Physics*******

------------------------------------------------------------------
|Index| Maths | Chemistry | Physics | Biology | Similarity Score |
|-----|-------|-----------|---------|---------|------------------|
|0    |80     |60         |NaN      |78       |Target rows(Physics)|
|1    |90     |65         |57       |83       |12.24             |
|2    |NaN    |56         |80       |67       |14.33             |
|3    |95     |NaN        |78       |NaN      |25.98             |
------------------------------------------------------------------

- with index 1

D = $\sqrt{\frac{3}{3} \times \left[(80 - 90)^2 + (60 - 65)^2 + (78 - 83)^2\right]}$ = $\sqrt{(-10)^2 + (-5)^2 + (-5)^2}$ = $\sqrt{\text{100+25+25}} $ = $\sqrt{\text{150}} $ = 12.24
- similarity score value from target to index 1 -->(12.24)

- with index 2

D = $\sqrt{\text{3/2} \times \left[(60-56)^2 + (78-67)^2\right]}$ = $\sqrt{\text{3/2} (16+121)} $ = $\sqrt{\text{411/2}} $ = 14.33
- similarity score value from target to index 2 --> (14.33)

- with index 2

D = $\sqrt{\frac{3}{1} \times (80 - 95)^2}$ = $\sqrt{\text{3} \times 225} $ = $\sqrt{\text{675}} $ = 25.98
- similarity score value from target to index 3 --> (25.98)

# Conclusion 
- now from the `Target`,  index 1 and index 2 get very closest value
- Now for filling the value in second column i need to take first row and third rows value
- (57 + 80)/2 = `68.5`

# *************************************************************