added nearest neighbor algorithm - machine-learning #379

christianbender · 2018-07-24T14:28:31Z

Two functions nearest_neighbor(x, tSet) and a (eulidean) distance function distance(x,y).
I used the numpy library for calculating the absolute value of a vector.
Two trainings sets for the logical AND-function and for color-analysing (dark/light color).

Have someone a tip for finding the k-nearest neighbors? I wrote a simple algorithm.

If creating a new file :
- added links to it in the README files ?
- included tests with it ?
- added description (overview of algorithm, time and space compleixty, and possible edge case) in docstrings ?
if done some changes :
- wrote short description in the PR explaining what the changes do ?
- Fixes #[issue number] if related to any issue
[] other

coveralls · 2018-07-24T14:32:26Z

Pull Request Test Coverage Report for Build 710

35 of 36 (97.22%) changed or added relevant lines in 2 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+0.4%) to 71.717%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
tests/test_ml.py	15	16	93.75%

Files with Coverage Reduction	New Missed Lines	%
algorithms/linkedlist/is_palindrome.py	1	93.33%

Totals
Change from base Build 685:	0.4%
Covered Lines:	4123
Relevant Lines:	5749

💛 - Coveralls

keon

Awesome. it would be nice to add ml. added minor comments.

keon · 2018-07-24T20:37:12Z

README.md

@@ -350,6 +350,8 @@ If you want to uninstall algorithms, it is as simple as:
        - [simplify_path](algorithms/unix/path/simplify_path.py)
 - [union-find](algorithms/union-find)
    - [count_islands](algorithms/union-find/count_islands.py)
+- [machine-learning](algorithms/machine-learning)


lets change the folder name to ml for simplicity.
As a package, we need to think how people are going to use it.
typing algorithms.ml is better than typing algorithms.machine-learning every time.

@keon Good point! I rename the directory

keon · 2018-07-24T20:37:42Z

algorithms/machine-learning/nearest_neighbor.py

+trainSetAND = {(0,0) : 0, (0,1) :0, (1,0) : 0, (1,1) : 1} 
+
+# train set for light or dark colors
+trainSetLight = {(11, 98, 237) : 'L', (3, 39, 96) : 'D', (242, 226, 12) : 'L', (99, 93, 4) : 'D',


Let's separate the test to the test folder.

keon · 2018-07-24T20:38:21Z

algorithms/machine-learning/nearest_neighbor.py

+
+# Some test cases
+
+# print(nearest_neighbor((1,1), trainSetAND)) # => 1


keon · 2018-07-24T20:39:46Z

algorithms/machine-learning/nearest_neighbor.py

+    """
+    assert isinstance(x, tuple) and isinstance(tSet, dict)
+    current_key = ()
+    MAX = 32768 # max value 


is there a reason for setting this to 32768?

@keon I want a high max value.

how about np.inf ?

I changed it. Thanks

christianbender · 2018-07-24T21:18:30Z

@keon Done.

goswami-rahul · 2018-07-25T10:29:11Z

algorithms/ml/nearest_neighbor.py

+        y {[tuple]} -- [vector]
+    """
+    assert len(x) == len(y), "The vector must have same length"
+    import math


why import math?

@goswami-rahul for the function sqrt(...) that I used before numpy.

goswami-rahul · 2018-07-25T10:38:45Z

algorithms/ml/nearest_neighbor.py

@@ -0,0 +1,41 @@
+import numpy


We are using a 3rd party module here. numpy will become this repo's requirement/dependency, which is not desirable. I think we can implement these without using numpy, unless it is absolutely necessary (and I think that won't be the case), as these are meant to be minimal algorithms in Python.
By the way, great idea to implement ML algorithms here. I will also add some when I get time:)

Here free online course on edx : https://www.edx.org/course/machine-learning-fundamentals-uc-san-diegox-dse220x
In python 😁

christianbender · 2018-07-25T14:21:07Z

@keon @goswami-rahul @danghai @SaadBenn
Usage of the module numpy?
This module it wide use in machine learning in python.
I can implement the the parts (above) by myself, too.

danghai · 2018-07-25T21:16:51Z

@christianbender
It is nice to implement both :). The one is how to use the API from library, and another one is how to write them.

goswami-rahul · 2018-07-27T09:41:02Z

@christianbender @keon @danghai
IMHO, a user who wants to install algorithms for some specific algorithms, say arrays or dp or strings, etc. He might not want to install numpy or pandas or any other high level libraries for his requirements.

Moreover, I think this repo's objective is to provide minimal implementations of the algorithms. These algorithms are not very difficult to implement using built-in Python libraries, even if it would be less efficient. I guess we should be going with Simplicity over Efficiency, but it is just a matter of opinion :)
The collections of super efficient and optimized algorithms are already available and are constantly being updated, but ours could be the simplest, understandable and minimal collection in Python from scratch!

Just my two cents. Cheers :)
Please share your thoughts on this.

keon · 2018-07-28T00:28:52Z

@goswami-rahul I agree with Rahul.
We should prioritize simplicity over efficiency.
If people want efficiency, they will go to other well known libraries like scikit-learn, scipy, tensorflow, pytorch etc...
People also want to learn the simplest form of an algorithm. I believe we can hit this niche.
Implementing both seems too much work & code to maintain.

Numpy based implementation could be a good candidate for another project, though (if I recall correctly, some people already have done this. but I cannot find it.)
I would be happy to contribute if any of you start it. :)

christianbender · 2018-07-28T15:51:05Z

@goswami-rahul

IMHO, a user who wants to install algorithms for some specific algorithms, say arrays or dp or strings, etc. He might not want to install numpy or pandas or any other high level libraries for his requirements.

Good point! 👍

@goswami-rahul @danghai @keon

I will change my code again.

christianbender · 2018-07-29T14:55:44Z

@danghai @goswami-rahul @keon
Done. I removed numpy

danghai · 2018-08-01T06:51:11Z

@christianbender Could you recheck the travis? It is fail in python3.4 and 3.6

christianbender · 2018-08-01T22:20:44Z

@danghai

Could you recheck the travis? It is fail in python3.4 and 3.6

How can I recheck it?

danghai · 2018-08-01T22:34:05Z

@christianbender click Details at Travis test below. It directly goes : https://travis-ci.org/keon/algorithms/builds/409541731?utm_source=github_status&utm_medium=notification
Select the env: python3.4 for example: The log will show:

ERROR: test_nearest_neighbor (test_ml.TestML)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/keon/algorithms/tests/test_ml.py", line 19, in test_nearest_neighbor
    self.assertEqual(nearest_neighbor((1,1), self.trainSetAND), 1)
  File "/home/travis/build/keon/algorithms/algorithms/ml/nearest_neighbor.py", line 35, in nearest_neighbor
    min_d = math.inf
AttributeError: 'module' object has no attribute 'inf'
----------------------------------------------------------------------
Ran 220 tests in 0.642s

It passes the python3.5, but it is fail python3.4 and python3.6 because one of reason is some features are supported in python3.5 but they are no longer supported others versions. Because our goal works for all python3 following all versions. Therefore, I set the env in Travis to test all versions.

danghai · 2018-08-02T03:54:16Z

@christianbender I fix this issue for you. Look my above commit. It passes all env in Travis now

christianbender · 2018-08-02T13:09:02Z

@danghai Thanks a lot.

christianbender · 2018-08-02T13:09:18Z

@danghai I merge it.

christianbender added 2 commits July 24, 2018 16:22

added nearest neighbor algorithm

dd52a82

changed the readme

343600f

christianbender changed the title ~~added nearest neighbor algorithm~~ added nearest neighbor algorithm - machine-learning Jul 24, 2018

christianbender requested review from keon, SaadBenn, goswami-rahul and danghai July 24, 2018 14:32

changed variable names

d38649e

keon reviewed Jul 24, 2018

View reviewed changes

christianbender added 2 commits July 24, 2018 23:06

renamed the directory

b9bf25f

added a test. removed test cases. renamed directory

19c2596

christianbender added 2 commits July 24, 2018 23:25

added test for distance(...)

4abf64c

added empty lines

a978773

goswami-rahul reviewed Jul 25, 2018

View reviewed changes

removed math. added numpy.inf

065898e

removed numpy

d5ded33

Replace min.inf by float(inf)

0840418

christianbender merged commit 010dce3 into keon:master Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added nearest neighbor algorithm - machine-learning #379

added nearest neighbor algorithm - machine-learning #379

christianbender commented Jul 24, 2018 •

edited

coveralls commented Jul 24, 2018 •

edited

keon left a comment

keon Jul 24, 2018

christianbender Jul 24, 2018

keon Jul 24, 2018

keon Jul 24, 2018

keon Jul 24, 2018

christianbender Jul 24, 2018

keon Jul 24, 2018

christianbender Jul 25, 2018

christianbender commented Jul 24, 2018

goswami-rahul Jul 25, 2018

christianbender Jul 25, 2018

goswami-rahul Jul 25, 2018

christianbender Jul 25, 2018 •

edited

christianbender commented Jul 25, 2018

danghai commented Jul 25, 2018

goswami-rahul commented Jul 27, 2018 •

edited

keon commented Jul 28, 2018 •

edited

christianbender commented Jul 28, 2018

christianbender commented Jul 29, 2018

danghai commented Aug 1, 2018

christianbender commented Aug 1, 2018

danghai commented Aug 1, 2018 •

edited

danghai commented Aug 2, 2018 •

edited

christianbender commented Aug 2, 2018

christianbender commented Aug 2, 2018


		# Some test cases

		# print(nearest_neighbor((1,1), trainSetAND)) # => 1

added nearest neighbor algorithm - machine-learning #379

added nearest neighbor algorithm - machine-learning #379

Conversation

christianbender commented Jul 24, 2018 • edited

coveralls commented Jul 24, 2018 • edited

Pull Request Test Coverage Report for Build 710

💛 - Coveralls

keon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christianbender commented Jul 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christianbender Jul 25, 2018 • edited

Choose a reason for hiding this comment

christianbender commented Jul 25, 2018

danghai commented Jul 25, 2018

goswami-rahul commented Jul 27, 2018 • edited

keon commented Jul 28, 2018 • edited

christianbender commented Jul 28, 2018

christianbender commented Jul 29, 2018

danghai commented Aug 1, 2018

christianbender commented Aug 1, 2018

danghai commented Aug 1, 2018 • edited

danghai commented Aug 2, 2018 • edited

christianbender commented Aug 2, 2018

christianbender commented Aug 2, 2018

christianbender commented Jul 24, 2018 •

edited

coveralls commented Jul 24, 2018 •

edited

christianbender Jul 25, 2018 •

edited

goswami-rahul commented Jul 27, 2018 •

edited

keon commented Jul 28, 2018 •

edited

danghai commented Aug 1, 2018 •

edited

danghai commented Aug 2, 2018 •

edited