MLknn #22

Closed
fbenites opened this Issue Jul 2, 2015 · 4 comments

Comments

Projects
None yet
2 participants
@fbenites
Contributor

fbenites commented Jul 2, 2015

Sorry, I do not have the time to make a pull but there is an "error" and I would like a function:
in mlknn compute_cond, mlknn is counting also the first entry of kneighbors, which is the instance itself, I would only count the neighborhood!
for instance in xrange(self.num_instances):
neighbors = self.knn.kneighbors(X[instance], self.k, return_distance=False)[0,1:]
for label in xrange(self.num_labels):
delta = sum(y[neighbor][label] for neighbor in neighbors)

also the rankings would be nice:

def predict_rankings(self, X):
result = np.zeros((len(X), self.num_labels), dtype='i8')
ranks = np.zeros((len(X), self.num_labels))
for instance in xrange(len(X)):
neighbors = self.knn.kneighbors(X[instance], self.k, return_distance=False)
for label in xrange(self.num_labels):
delta = sum(self.predictions[neighbor][label] for neighbor in neighbors[0])
p_true = self.prior_prob_true[label] * self.cond_prob_true[label][delta]
p_false = self.prior_prob_false[label] * self.cond_prob_false[label][delta]
prediction = (p_true >= p_false)
ranks[instance][label] = p_true/(p_true+p_false)
result[instance][label] = int(prediction)
return ranks,result

Cheers

@niedakh niedakh modified the milestone: 0.0.2 Feb 4, 2016

@niedakh niedakh added the bug label Feb 26, 2016

@niedakh niedakh modified the milestones: 0.0.3, 0.0.2 Apr 1, 2016

@niedakh

This comment has been minimized.

Show comment
Hide comment
@niedakh

niedakh Jun 15, 2016

Contributor

I'll investigate this, I need to read up on mlknn, for now I've just converted the code to sparse matrices, maybe @Antiavanti can comment on this? is it a mistake or did you make it so on purpose?

I'll implement predict rankings soon

Contributor

niedakh commented Jun 15, 2016

I'll investigate this, I need to read up on mlknn, for now I've just converted the code to sparse matrices, maybe @Antiavanti can comment on this? is it a mistake or did you make it so on purpose?

I'll implement predict rankings soon

@niedakh

This comment has been minimized.

Show comment
Hide comment
@niedakh

niedakh Jun 18, 2016

Contributor

So I've checked it some more, the fact that we're taking ourselves as neighbour in the fitting, doesn't change much for the fitting conditional probabilities, as it is stable over all instances, and deltas actually account for this additional 1 label always added. It's also a safeguard for the case if we actually have one sample with that one label combination. Do you think this argument makes sense?

Also for predict rankings, I'm still working on it, should be ready during the weekend :)

Contributor

niedakh commented Jun 18, 2016

So I've checked it some more, the fact that we're taking ourselves as neighbour in the fitting, doesn't change much for the fitting conditional probabilities, as it is stable over all instances, and deltas actually account for this additional 1 label always added. It's also a safeguard for the case if we actually have one sample with that one label combination. Do you think this argument makes sense?

Also for predict rankings, I'm still working on it, should be ready during the weekend :)

@fbenites

This comment has been minimized.

Show comment
Hide comment
@fbenites

fbenites Jun 24, 2016

Contributor

Hi,

I had some errors with the itself neighbor, that is why I came across but not sure exactly why (it was about 4-5 months back).
It could be that the implementation I used (in matlab) gave other resuls, but within your implementation it is fine.
The ranking issue is easy to fix as it is described in my post. The main point it to weight the p_true with the 1 divided by sum of p_true and p_false.

Cheers,
Fernando

Contributor

fbenites commented Jun 24, 2016

Hi,

I had some errors with the itself neighbor, that is why I came across but not sure exactly why (it was about 4-5 months back).
It could be that the implementation I used (in matlab) gave other resuls, but within your implementation it is fine.
The ranking issue is easy to fix as it is described in my post. The main point it to weight the p_true with the 1 divided by sum of p_true and p_false.

Cheers,
Fernando

@niedakh

This comment has been minimized.

Show comment
Hide comment
@niedakh

niedakh Feb 8, 2017

Contributor

Hi,

so I've done some tests and all, and it does turn out, that my @Antiavanti's implementation was indeed correct - I am getting better scores when using the begaviour with myself being a closest neighbour to myself. But for the sake of usefulness, I've added an extra parameter that allows to ignore as many first neighbours as possible. I am also adding a predict_proba function that assigns probabilities to labels instead of ranking them, as this is the general spirit of the scikit API. Such a vector can be easily ranked.

Will commit the changes in a moment.

Contributor

niedakh commented Feb 8, 2017

Hi,

so I've done some tests and all, and it does turn out, that my @Antiavanti's implementation was indeed correct - I am getting better scores when using the begaviour with myself being a closest neighbour to myself. But for the sake of usefulness, I've added an extra parameter that allows to ignore as many first neighbours as possible. I am also adding a predict_proba function that assigns probabilities to labels instead of ranking them, as this is the general spirit of the scikit API. Such a vector can be easily ranked.

Will commit the changes in a moment.

@niedakh niedakh closed this in 3ae1660 Feb 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment