Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added nearest neighbor algorithm - machine-learning #379

Merged
merged 10 commits into from
Aug 2, 2018

Conversation

christianbender
Copy link
Collaborator

@christianbender christianbender commented Jul 24, 2018

  • Two functions nearest_neighbor(x, tSet) and a (eulidean) distance function distance(x,y).
  • I used the numpy library for calculating the absolute value of a vector.
  • Two trainings sets for the logical AND-function and for color-analysing (dark/light color).

Have someone a tip for finding the k-nearest neighbors? I wrote a simple algorithm.

  • If creating a new file :

    • added links to it in the README files ?
    • included tests with it ?
    • added description (overview of algorithm, time and space compleixty, and possible edge case) in docstrings ?
  • if done some changes :

    • wrote short description in the PR explaining what the changes do ?
    • Fixes #[issue number] if related to any issue
  • [] other

@christianbender christianbender changed the title added nearest neighbor algorithm added nearest neighbor algorithm - machine-learning Jul 24, 2018
@coveralls
Copy link

coveralls commented Jul 24, 2018

Pull Request Test Coverage Report for Build 710

  • 35 of 36 (97.22%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.4%) to 71.717%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tests/test_ml.py 15 16 93.75%
Files with Coverage Reduction New Missed Lines %
algorithms/linkedlist/is_palindrome.py 1 93.33%
Totals Coverage Status
Change from base Build 685: 0.4%
Covered Lines: 4123
Relevant Lines: 5749

💛 - Coveralls

Copy link
Owner

@keon keon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. it would be nice to add ml. added minor comments.

@@ -350,6 +350,8 @@ If you want to uninstall algorithms, it is as simple as:
- [simplify_path](algorithms/unix/path/simplify_path.py)
- [union-find](algorithms/union-find)
- [count_islands](algorithms/union-find/count_islands.py)
- [machine-learning](algorithms/machine-learning)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets change the folder name to ml for simplicity.
As a package, we need to think how people are going to use it.
typing algorithms.ml is better than typing algorithms.machine-learning every time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keon Good point! I rename the directory

trainSetAND = {(0,0) : 0, (0,1) :0, (1,0) : 0, (1,1) : 1}

# train set for light or dark colors
trainSetLight = {(11, 98, 237) : 'L', (3, 39, 96) : 'D', (242, 226, 12) : 'L', (99, 93, 4) : 'D',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's separate the test to the test folder.


# Some test cases

# print(nearest_neighbor((1,1), trainSetAND)) # => 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

"""
assert isinstance(x, tuple) and isinstance(tSet, dict)
current_key = ()
MAX = 32768 # max value
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for setting this to 32768?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keon I want a high max value.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about np.inf ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it. Thanks

@christianbender
Copy link
Collaborator Author

@keon Done.

y {[tuple]} -- [vector]
"""
assert len(x) == len(y), "The vector must have same length"
import math
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why import math?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goswami-rahul for the function sqrt(...) that I used before numpy.

@@ -0,0 +1,41 @@
import numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a 3rd party module here. numpy will become this repo's requirement/dependency, which is not desirable. I think we can implement these without using numpy, unless it is absolutely necessary (and I think that won't be the case), as these are meant to be minimal algorithms in Python.
By the way, great idea to implement ML algorithms here. I will also add some when I get time:)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christianbender
Copy link
Collaborator Author

@keon @goswami-rahul @danghai @SaadBenn
Usage of the module numpy?
This module it wide use in machine learning in python.
I can implement the the parts (above) by myself, too.

@danghai
Copy link
Collaborator

danghai commented Jul 25, 2018

@christianbender
It is nice to implement both :). The one is how to use the API from library, and another one is how to write them.

@goswami-rahul
Copy link
Collaborator

goswami-rahul commented Jul 27, 2018

@christianbender @keon @danghai
IMHO, a user who wants to install algorithms for some specific algorithms, say arrays or dp or strings, etc. He might not want to install numpy or pandas or any other high level libraries for his requirements.

Moreover, I think this repo's objective is to provide minimal implementations of the algorithms. These algorithms are not very difficult to implement using built-in Python libraries, even if it would be less efficient. I guess we should be going with Simplicity over Efficiency, but it is just a matter of opinion :)
The collections of super efficient and optimized algorithms are already available and are constantly being updated, but ours could be the simplest, understandable and minimal collection in Python from scratch!

Just my two cents. Cheers :)
Please share your thoughts on this.

@keon
Copy link
Owner

keon commented Jul 28, 2018

@goswami-rahul I agree with Rahul.
We should prioritize simplicity over efficiency.
If people want efficiency, they will go to other well known libraries like scikit-learn, scipy, tensorflow, pytorch etc...
People also want to learn the simplest form of an algorithm. I believe we can hit this niche.
Implementing both seems too much work & code to maintain.

Numpy based implementation could be a good candidate for another project, though (if I recall correctly, some people already have done this. but I cannot find it.)
I would be happy to contribute if any of you start it. :)

@christianbender
Copy link
Collaborator Author

@goswami-rahul

IMHO, a user who wants to install algorithms for some specific algorithms, say arrays or dp or strings, etc. He might not want to install numpy or pandas or any other high level libraries for his requirements.

Good point! 👍

@goswami-rahul @danghai @keon

I will change my code again.

@christianbender
Copy link
Collaborator Author

@danghai @goswami-rahul @keon
Done. I removed numpy

@danghai
Copy link
Collaborator

danghai commented Aug 1, 2018

@christianbender Could you recheck the travis? It is fail in python3.4 and 3.6

@christianbender
Copy link
Collaborator Author

@danghai

Could you recheck the travis? It is fail in python3.4 and 3.6

How can I recheck it?

@danghai
Copy link
Collaborator

danghai commented Aug 1, 2018

@christianbender click Details at Travis test below. It directly goes : https://travis-ci.org/keon/algorithms/builds/409541731?utm_source=github_status&utm_medium=notification
Select the env: python3.4 for example: The log will show:

ERROR: test_nearest_neighbor (test_ml.TestML)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/keon/algorithms/tests/test_ml.py", line 19, in test_nearest_neighbor
    self.assertEqual(nearest_neighbor((1,1), self.trainSetAND), 1)
  File "/home/travis/build/keon/algorithms/algorithms/ml/nearest_neighbor.py", line 35, in nearest_neighbor
    min_d = math.inf
AttributeError: 'module' object has no attribute 'inf'
----------------------------------------------------------------------
Ran 220 tests in 0.642s

It passes the python3.5, but it is fail python3.4 and python3.6 because one of reason is some features are supported in python3.5 but they are no longer supported others versions. Because our goal works for all python3 following all versions. Therefore, I set the env in Travis to test all versions.

@danghai
Copy link
Collaborator

danghai commented Aug 2, 2018

@christianbender I fix this issue for you. Look my above commit. It passes all env in Travis now

@christianbender
Copy link
Collaborator Author

@danghai Thanks a lot.

@christianbender
Copy link
Collaborator Author

@danghai I merge it.

@christianbender christianbender merged commit 010dce3 into keon:master Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants