# MLknn #22

Closed
opened this Issue Jul 2, 2015 · 4 comments

Projects
None yet
2 participants
Contributor

### fbenites commented Jul 2, 2015

 Sorry, I do not have the time to make a pull but there is an "error" and I would like a function: in mlknn compute_cond, mlknn is counting also the first entry of kneighbors, which is the instance itself, I would only count the neighborhood! for instance in xrange(self.num_instances): neighbors = self.knn.kneighbors(X[instance], self.k, return_distance=False)[0,1:] for label in xrange(self.num_labels): delta = sum(y[neighbor][label] for neighbor in neighbors) also the rankings would be nice: def predict_rankings(self, X): result = np.zeros((len(X), self.num_labels), dtype='i8') ranks = np.zeros((len(X), self.num_labels)) for instance in xrange(len(X)): neighbors = self.knn.kneighbors(X[instance], self.k, return_distance=False) for label in xrange(self.num_labels): delta = sum(self.predictions[neighbor][label] for neighbor in neighbors[0]) p_true = self.prior_prob_true[label] * self.cond_prob_true[label][delta] p_false = self.prior_prob_false[label] * self.cond_prob_false[label][delta] prediction = (p_true >= p_false) ranks[instance][label] = p_true/(p_true+p_false) result[instance][label] = int(prediction) return ranks,result Cheers

Contributor

### niedakh commented Jun 15, 2016

 I'll investigate this, I need to read up on mlknn, for now I've just converted the code to sparse matrices, maybe @Antiavanti can comment on this? is it a mistake or did you make it so on purpose? I'll implement predict rankings soon
Contributor

### niedakh commented Jun 18, 2016

 So I've checked it some more, the fact that we're taking ourselves as neighbour in the fitting, doesn't change much for the fitting conditional probabilities, as it is stable over all instances, and deltas actually account for this additional 1 label always added. It's also a safeguard for the case if we actually have one sample with that one label combination. Do you think this argument makes sense? Also for predict rankings, I'm still working on it, should be ready during the weekend :)
Contributor

### fbenites commented Jun 24, 2016

 Hi, I had some errors with the itself neighbor, that is why I came across but not sure exactly why (it was about 4-5 months back). It could be that the implementation I used (in matlab) gave other resuls, but within your implementation it is fine. The ranking issue is easy to fix as it is described in my post. The main point it to weight the p_true with the 1 divided by sum of p_true and p_false. Cheers, Fernando
Contributor

### niedakh commented Feb 8, 2017

 Hi, so I've done some tests and all, and it does turn out, that my @Antiavanti's implementation was indeed correct - I am getting better scores when using the begaviour with myself being a closest neighbour to myself. But for the sake of usefulness, I've added an extra parameter that allows to ignore as many first neighbours as possible. I am also adding a predict_proba function that assigns probabilities to labels instead of ranking them, as this is the general spirit of the scikit API. Such a vector can be easily ranked. Will commit the changes in a moment.