-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added nearest neighbor algorithm - machine-learning #379
added nearest neighbor algorithm - machine-learning #379
Conversation
Pull Request Test Coverage Report for Build 710
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. it would be nice to add ml. added minor comments.
@@ -350,6 +350,8 @@ If you want to uninstall algorithms, it is as simple as: | |||
- [simplify_path](algorithms/unix/path/simplify_path.py) | |||
- [union-find](algorithms/union-find) | |||
- [count_islands](algorithms/union-find/count_islands.py) | |||
- [machine-learning](algorithms/machine-learning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets change the folder name to ml for simplicity.
As a package, we need to think how people are going to use it.
typing algorithms.ml
is better than typing algorithms.machine-learning
every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keon Good point! I rename the directory
trainSetAND = {(0,0) : 0, (0,1) :0, (1,0) : 0, (1,1) : 1} | ||
|
||
# train set for light or dark colors | ||
trainSetLight = {(11, 98, 237) : 'L', (3, 39, 96) : 'D', (242, 226, 12) : 'L', (99, 93, 4) : 'D', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's separate the test to the test folder.
|
||
# Some test cases | ||
|
||
# print(nearest_neighbor((1,1), trainSetAND)) # => 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
""" | ||
assert isinstance(x, tuple) and isinstance(tSet, dict) | ||
current_key = () | ||
MAX = 32768 # max value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for setting this to 32768?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keon I want a high max value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about np.inf ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it. Thanks
@keon Done. |
algorithms/ml/nearest_neighbor.py
Outdated
y {[tuple]} -- [vector] | ||
""" | ||
assert len(x) == len(y), "The vector must have same length" | ||
import math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why import math?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goswami-rahul for the function sqrt(...)
that I used before numpy
.
algorithms/ml/nearest_neighbor.py
Outdated
@@ -0,0 +1,41 @@ | |||
import numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using a 3rd party module here. numpy
will become this repo's requirement/dependency, which is not desirable. I think we can implement these without using numpy
, unless it is absolutely necessary (and I think that won't be the case), as these are meant to be minimal algorithms in Python.
By the way, great idea to implement ML algorithms here. I will also add some when I get time:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here free online course on edx : https://www.edx.org/course/machine-learning-fundamentals-uc-san-diegox-dse220x
In python 😁
@keon @goswami-rahul @danghai @SaadBenn |
@christianbender |
@christianbender @keon @danghai Moreover, I think this repo's objective is to provide minimal implementations of the algorithms. These algorithms are not very difficult to implement using built-in Python libraries, even if it would be less efficient. I guess we should be going with Simplicity over Efficiency, but it is just a matter of opinion :) Just my two cents. Cheers :) |
@goswami-rahul I agree with Rahul. Numpy based implementation could be a good candidate for another project, though (if I recall correctly, some people already have done this. but I cannot find it.) |
Good point! 👍 I will change my code again. |
@danghai @goswami-rahul @keon |
@christianbender Could you recheck the |
How can I recheck it? |
@christianbender click
It passes the |
@christianbender I fix this issue for you. Look my above commit. It passes all |
@danghai Thanks a lot. |
@danghai I merge it. |
nearest_neighbor(x, tSet)
and a (eulidean) distance functiondistance(x,y)
.numpy
library for calculating the absolute value of a vector.Have someone a tip for finding the k-nearest neighbors? I wrote a simple algorithm.
If creating a new file :
if done some changes :
[] other