## The problem:

In `GenRAPredValue.predict(...)` method, `denom=np.sum(neigh_sim[j])` takes the sum of similarities across all neighbors for query point `j`. However, `j` represents the index of outputs - i.e., `j` will be within `range(n_outputs)` - and shouldn't be used as index to identify a query point. 

Issues:
- If `n_outputs==1` (which it may often be), but `n_queries>1`, the prediction for all query points not in index=0 were likely miscalculated, since the wrong denominator was used
- This also errors out if `n_outputs>n_neighbors`, which is demonstrated below in this notebook
```nom

## The proposed solution

Setting `denom=np.sum(neigh_sim, axis=1)` outside the forloop.

Note that `denom` is now an array of floats (as opposed to a float). Since `denom.shape == num.shape` (more specifically `shape=(n_queries,)`), the operation `num / denom` becomes an element wise operation where numerator is still the regressand and the numerator is each query point's respective sum of similarities.

### Example of error

Because `j` represents the index of output, but in the old code gets treated as index of query point, we can construct an example where `n_neighbors>n_outputs` and see the `predict` method error out.

Dimensions:
```
n_samples=100
n_features=1000
n_outputs=50
n_queries=1
n_neighbors=8
```

In [19]:
import numpy as np
import pandas as pd

from genra.rax.skl.reg import GenRAPredValue
# scikit example to see how it should behave
from sklearn.neighbors import KNeighborsRegressor

In [20]:
# random binary matrix of size 100 x 1000 (n_samples, n_features)
sample_X = pd.DataFrame(np.random.randint(0,2, size=(100, 1000)))
sample_X

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,990,991,992,993,994,995,996,997,998,999
0,0,0,1,1,1,1,1,0,0,1,...,0,0,1,0,1,1,1,0,1,0
1,0,1,1,0,0,0,1,0,1,0,...,1,0,0,0,1,0,1,1,0,0
2,1,0,0,1,0,1,1,1,0,0,...,1,1,1,0,0,0,0,1,1,0
3,0,0,1,1,0,0,0,0,0,0,...,0,1,1,0,1,0,1,0,1,0
4,0,1,1,0,1,1,1,1,0,1,...,1,0,1,0,0,0,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,1,1,0,0,1,1,1,0,1,0,...,1,1,1,0,0,0,1,1,0,1
96,1,1,1,1,1,0,0,0,1,0,...,0,0,0,0,1,0,0,0,1,0
97,1,0,1,1,1,0,0,1,0,0,...,0,0,1,1,1,1,1,0,0,1
98,1,1,1,0,0,0,0,0,1,0,...,1,0,1,0,1,0,1,0,0,0


In [21]:
# random matrix of size 100 x 50 (n_queries, n_outputs) - i.e., 50 different values to predict
sample_y = pd.DataFrame(np.random.random(size=(100, 50)))
sample_y

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49
0,0.452114,0.032065,0.364285,0.392233,0.355917,0.595651,0.895717,0.999875,0.908490,0.750081,...,0.378443,0.214993,0.869864,0.967591,0.568308,0.373858,0.238336,0.336907,0.735189,0.208528
1,0.616368,0.446414,0.124196,0.881161,0.706904,0.169106,0.506948,0.810981,0.273023,0.128275,...,0.840553,0.231098,0.585147,0.181303,0.110416,0.392077,0.189333,0.950604,0.306619,0.667940
2,0.357270,0.308349,0.321027,0.248440,0.029853,0.167774,0.781586,0.860615,0.143878,0.240694,...,0.223928,0.458134,0.206307,0.563044,0.053622,0.775777,0.337353,0.908520,0.082054,0.859696
3,0.004180,0.812562,0.388201,0.263393,0.212413,0.266511,0.915072,0.101571,0.974111,0.359393,...,0.739699,0.882037,0.236231,0.997473,0.652169,0.554754,0.541816,0.492460,0.231477,0.597966
4,0.176012,0.659479,0.976982,0.801375,0.272097,0.339287,0.992800,0.085688,0.036562,0.709115,...,0.935800,0.803035,0.869046,0.335666,0.691991,0.576968,0.801276,0.129568,0.670519,0.744683
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.458014,0.303171,0.799521,0.959182,0.586324,0.834933,0.888921,0.592514,0.105495,0.291447,...,0.807293,0.796751,0.707114,0.830545,0.087533,0.651908,0.911708,0.725931,0.811368,0.051370
96,0.014264,0.289107,0.810863,0.279296,0.170788,0.932949,0.073788,0.105420,0.627173,0.133664,...,0.809498,0.052013,0.589336,0.177479,0.477017,0.122964,0.743014,0.056588,0.223094,0.356025
97,0.717929,0.138155,0.260905,0.680843,0.881228,0.830034,0.587989,0.766219,0.200880,0.660416,...,0.940266,0.590460,0.067708,0.194836,0.368581,0.667281,0.577572,0.915061,0.072751,0.537982
98,0.935746,0.767309,0.999281,0.337977,0.635719,0.988403,0.606621,0.691682,0.958037,0.812429,...,0.187976,0.736116,0.563236,0.616845,0.647653,0.793831,0.414686,0.844797,0.058041,0.288947


In [22]:
params = {
    "algorithm": "brute",
    "metric": "jaccard",
    "weights": lambda distances: 1 - distances,
    "n_neighbors": 8, # it doesn't really matter, so long as less than n_outputs
}

scikit_model = KNeighborsRegressor(**params)
scikit_model.fit(sample_X, sample_y)

genra_model = GenRAPredValue(**params)
genra_model.fit(sample_X, sample_y)

In [23]:
query_X = pd.DataFrame(np.random.randint(0,2, size=(1, 1000)))
query_X

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,990,991,992,993,994,995,996,997,998,999
0,1,0,0,1,0,1,0,1,1,0,...,1,1,1,1,1,1,1,1,0,1


In [24]:
scikit_model.predict(query_X)



array([[0.49277155, 0.30522343, 0.51419298, 0.48658798, 0.5571723 ,
        0.42588411, 0.5239025 , 0.5611968 , 0.50391897, 0.61166363,
        0.43768643, 0.51528197, 0.68236618, 0.63598903, 0.52674253,
        0.45519885, 0.57564773, 0.5984484 , 0.40373024, 0.30511737,
        0.63870972, 0.50726512, 0.30408372, 0.68612973, 0.37461595,
        0.5468843 , 0.40992071, 0.30231952, 0.37225154, 0.52394841,
        0.49238869, 0.43564842, 0.61429304, 0.4612128 , 0.49802845,
        0.453821  , 0.50071817, 0.58059138, 0.41075284, 0.62730237,
        0.36509254, 0.45475719, 0.52427778, 0.63430034, 0.33792695,
        0.48620817, 0.42914794, 0.56057509, 0.69621025, 0.40077768]])

In [18]:
# this will error out
genra_model.predict(query_X)



IndexError: index 1 is out of bounds for axis 0 with size 1