Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New class BaseEncoder #314

Merged
merged 28 commits into from
Mar 13, 2019
Merged

New class BaseEncoder #314

merged 28 commits into from
Mar 13, 2019

Conversation

ctrl-z-9000-times
Copy link
Collaborator

@ctrl-z-9000-times ctrl-z-9000-times commented Mar 7, 2019

See issue #291

TODO:

  • Decide if we should force all encoders to implement serialization
    • Add serializable interface & implement it for all encoders in master branch (ScalarEnc)
  • Documentation
    • C++ docs, cleanup & proof read
    • Python docs
    • C++ example usage & unit test for example
    • Python example usage & unit test for example
  • Python bindings, convenience & compatibility methods:
    • getWidth Won't fix
    • encodeIntoArray Won't fix
    • encode(input) -> SDR
  • Document changes to the ScalarEncoder API in the API_ChangeLog

@dkeeney
Copy link

dkeeney commented Mar 7, 2019

Decide if we should force all encoders to implement serialization

I opinion is that if an encoder has state that would affect the outcome of a subsequent cycle then that state should be serialized. It could be argued that incoming data is not state because it is not the consequence of previous processing. I would think that most encoders are stateless.

The configuration parameters in the NetworkAPI have been included in serialization for everything although I question if it should because it 'normally' is not the consequence of previous processing...perhaps a topic for a separate discussion.

@breznak
Copy link
Member

breznak commented Mar 7, 2019

I opinion is that if an encoder has state that would affect the outcome of a subsequent cycle then that state should be serialized.

+1, serialize. For stateless (most?) encoders, this should be trivial. It is convenient that later the whole algoruthm/NetworkAPI network can be serialized.

@ctrl-z-9000-times
Copy link
Collaborator Author

OK, I will make it serialize. Another reason to want serialization is that many encoders are initialized using a random seed, which is not stored anywhere.

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!
Please add the Serializable,
I'm still not sure about the decode(), but that can be added later.

@dkeeney
Copy link

dkeeney commented Mar 7, 2019

are initialized using a random seed

That would qualify as a state. So yes, it should serialize.

@ctrl-z-9000-times
Copy link
Collaborator Author

I'd like to take the time with this PR to clean up the ScalarEncoder since the encoder API is changing. This will break the ScalarEncoder API Compatibility by:

  • Removing method getWidth
  • Removing methodencodeIntoArray
  • Changing the constructor to accept a parameter-structure.
  • Merging PeriodicScalarEncoder into ScalarEncoder

@dkeeney
Copy link

dkeeney commented Mar 9, 2019

That sound reasonable. ScalarEncoder would be a good place to try out the new base class.

@ctrl-z-9000-times
Copy link
Collaborator Author

I finished implementing the new API in the ScalarEncoder and debugging it. Now it fails because of the network API & regions, which I thought I updated correctly but are failing their unit tests. I don't really understand how the networkAPI works, and I don't have any experience debugging it. Any help would be appreciated.

@ctrl-z-9000-times ctrl-z-9000-times removed the help wanted Extra attention is needed label Mar 12, 2019
Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good base for the encoders! 👍
A few more known TODOs and some my comments.
Looks good and hope we can merge this soon

bindings/py/cpp_src/bindings/encoders/py_ScalarEncoder.cpp Outdated Show resolved Hide resolved
src/examples/hotgym/HelloSPTP.cpp Outdated Show resolved Hide resolved
tEnc.stop();
for(auto i = 0u; i < inputSDR.size; ++i) {
input[i] = (UInt) inputSDR.getDense()[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, you could use VectorHelpers::castVectorType<Byte, UInt>(inputSDR.data()) as used on other places of this PR.
PS: I'll be trying to avoid that by templating the SDR_dense_t 's datatype in SDR in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be far easier & better to overhaul this example to use SDRs throughout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once encoders support SDR, it should go there. So it could be part of this PR.

src/nupic/encoders/BaseEncoder.hpp Show resolved Hide resolved
src/nupic/encoders/ScalarEncoder.cpp Outdated Show resolved Hide resolved
src/nupic/encoders/ScalarEncoder.cpp Show resolved Hide resolved
src/nupic/encoders/ScalarEncoder.hpp Outdated Show resolved Hide resolved
src/nupic/encoders/ScalarEncoder.hpp Outdated Show resolved Hide resolved
src/nupic/regions/ScalarSensor.hpp Show resolved Hide resolved
Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you are creating a fourth extension library for encoders.
Would the .py code be expecting to find the encoders in a separate extension library?
I would think they would be part of the algorithms library. If you think it makes since to do that then ok but you have some more files to change for deployment.

  • packaging/setup.py
  • __init__.py
  • and probably some others.

Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  * Numenta Platform for Intelligent Computing (NuPIC)
 * Copyright (C) 2018, chhenning
 *               2019, David McDougall
 *
 * This program is free software: you can redistribute it and/or modify

I don't think we can change the copyright. There is a standard for what the copyright should look like. We can add our name after the copyright but we cannot include ourselves in the copyright.

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the fixes 👍
for me good to go, please have a look at what David was saying and we can merge

@ctrl-z-9000-times
Copy link
Collaborator Author

Would the .py code be expecting to find the encoders in a separate extension library?

Previously python encoders existed at:
import nupic.encoders
Now they're at:
import nupic.bindings.encoders

files to change for deployment

The encoders are bundled into the algorithms library, even though they have their own python module. As long as algorithms.cpython-34m.so is available, so are the encoders.

@ctrl-z-9000-times
Copy link
Collaborator Author

I don't think we can change the copyright.

rhyolight answers this question at the end of this thread: https://discourse.numenta.org/t/what-is-a-community-fork/3112/10

Also: https://discourse.numenta.org/t/a-word-on-licensing/2023

There is an important distinction between copyright and a license to use the copyrighted work. We absolutely can not change the license. This all must be published under the Affero GNU Public License version 3.

However, I own the copyright for all of the things I've written. If someone was paying me to write this code, then as part of our contract they would own the copyright. We can't change the copyright of the existing code, since we did not write it. If we edit existing code then we gain the copyright for the portions which we add.

@ctrl-z-9000-times
Copy link
Collaborator Author

BTW: You're welcome to retroactively add your name to the copyright notices of the files you've added or improved :)

@dkeeney
Copy link

dkeeney commented Mar 12, 2019

# ----------------------------------------------------------------------
# Numenta Platform for Intelligent Computing (NuPIC)
# Copyright (C) 20XX, YOUR NAME HERE.  
# Copyright (C) 2013, Numenta, Inc.
# 

I stand corrected.

I worked for years as a software contractor where our contract specifically say we have NO rights and we cannot put our name on anything. When I signed the contributor agreement I assumed it contained similar wording but I do not remember if I actually read all of the fine print. Apparently a contributor does have some rights. Cool.

@ctrl-z-9000-times
Copy link
Collaborator Author

I'd like to merge this PR so that other tasks can move forward (#291 #278 #304, probable merge conflict with #320). It's not 100% done yet. Here is a checklist list of the remaining tasks which I will post to issue #258 Encoders in C++.

Outstanding Tasks for ScalarEncoder:

  • C++ example usage & unit test for example
  • Python example usage & unit test for example
  • Documentation could use more details, in-depth explanations, and notes for practical usage.
  • Python script to visually plot the inputs & outputs of the encoder
    • Extra Credit if it is an interactive program or has a graphical user interface

breznak
breznak previously approved these changes Mar 13, 2019
Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for a proper base design for encoders, and fitting it with the existing.
I'm good to merge 👍

@ctrl-z-9000-times
Copy link
Collaborator Author

I noticed you are creating a fourth extension library for encoders. [...] If you think it makes since to do that then ok but you have some more files to change for deployment.

Oops. I forgot about that change and I misunderstood your comment. I just made one last commit to include the encoders library in the install process. I don't know why this didn't fail in CI before now...

The reason I added a fourth extension library was to make it show up in the python module hierarchy like it does now.

Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think you have all of the parts now. Looks good.

I assume that each encoder will have its own Region implementation for NetworkAPI. Is that correct?

@breznak breznak merged commit 96d1111 into master Mar 13, 2019
@breznak breznak deleted the encoder-baseclass branch March 13, 2019 15:36
@Thanh-Binh
Copy link

I think so!

@breznak breznak mentioned this pull request Mar 13, 2019
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants