Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database region #860

Merged
merged 14 commits into from
Jul 9, 2020
Merged

Database region #860

merged 14 commits into from
Jul 9, 2020

Conversation

Zbysekz
Copy link

@Zbysekz Zbysekz commented Jul 8, 2020

Closes #748
Closes htm-community/HTMpandaVis#4

  1. I have named the region "DatabaseOutRegion" instead of PlotRegion -> i think it is more general, since the region really writes data in prescribed format in the database and do nothing with plot in fact.

  2. Unit test was created, it is copy of napi_hello but strip down to bare bones. It writes 20 epochs to the db file, then reads how many rows are in whole database (should be 40 because we are storing raw anomaly & encoder bucket value). Hope this is enough?

  3. Napi_hello is extended by FileOutputRegion, (so the results are stored in csv) as discussed in A new Region for plots in C++ (for C++ and C# apps) #748 (comment)

  4. The DatabaseOutRegion have 10 inputs, called dataIn0 - dataIn9 user can use how many is needed. For each input table in SQL is created. The table name is dataStream_dataIn0 - dataStream_dataIn9. Table contains two columns, one is autoincrement iteration, second is the REAL value.

@Zbysekz
Copy link
Author

Zbysekz commented Jul 8, 2020

pandaDoc readme update htm-community/HTMpandaVis@37b95a4

@breznak
Copy link
Member

breznak commented Jul 8, 2020

Wow! Great job 👏
Reviewing now, a few high-level nits from the description

I have named the region "DatabaseOutRegion" instead of PlotRegion

nit, but imho the "database" is also not ideal, maybe RecordOut(put)Region or something like that fits best?

The DatabaseOutRegion have 10 inputs, called dataIn0 - dataIn9 user can use how many is needed.

looks enough in general, but as is with limits..is there a way how users can define more inputs? And a howto?

@Zbysekz
Copy link
Author

Zbysekz commented Jul 8, 2020

I have named the region "DatabaseOutRegion" instead of PlotRegion

nit, but imho the "database" is also not ideal, maybe RecordOut(put)Region or something like that fits best?

I am not sure... @dkeeney what do you think about this? Or FileOutputDB maybe?

The DatabaseOutRegion have 10 inputs, called dataIn0 - dataIn9 user can use how many is needed.

looks enough in general, but as is with limits..is there a way how users can define more inputs? And a howto?

There is general short description in the header

number of 10 is a constant defined in header. Do you think that is a problem? I welcome any suggestions. I was careful to not change much, since i don't want to break structure of regions by making this region too different from others.

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! 💯
I like the tests, examples and all.

  • let's consider what's a self-explanatory region's name
  • please change some of the small coding style things below
  • IF there's a reasonable way to make the num inputs configurable?
  • please add some readme how the region is used with the visualization

Overall, it fits great and is approved from me. I'd like @dkeeney have it reviewed too as he's the expert on NAPI.

src/htm/regions/DatabaseOutRegion.hpp Outdated Show resolved Hide resolved
src/htm/regions/DatabaseOutRegion.hpp Outdated Show resolved Hide resolved
* the SQLite3 database file.
*
* Inputs can be now scalars (floats). they have names like:
* (dataIn0,dataIn1 ... up to MAX_NUMBER_OF_INPUTS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the MAX_NUMBER_OF_INPUTS should probably be defined in the header (?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made that const as you suggest, and also static, but if i use it only in cpp i think it shouldn't be in the header, since the header is aka interface. But maybe i can make it private class member? But that will not be so visible if its constant that can be changed

#include <htm/regions/DatabaseOutRegion.hpp>
#include <htm/utils/Log.hpp>

#define MAX_NUMBER_OF_INPUTS 10 // maximal number of inputs/scalar streams in the database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try to avoid defines, if possible. could you use const UInt MAX_NUM... = 10;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...as my prev. comment about "how users can modify this", it is already fine like this

  • if users really need more than 10 inputs, they can change it here and recompile the code.
  • would it be possible to make the MAX_NUM_OF_INPUTS part of the region's Spec, ie making it easier to change?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this parameter IS in the Spec. It is the number spec fields to create. The Spec is static, used before the Region is created and therefore this cannot be a parameter passed in at creation.

I personally don't have a problem with #define. But @breznak wants to pull us out of our old C ways and do things more C++ like.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I wonder if you need it at all.
These are used to declare the inputs so you can use links with them. I have not thought this through but what if you used one input and used the Fan-In capability of the Link. The links for data for all fields would be connected to the same input. The Fan-In facility will append the buffers together and are passed into the region as one big array.
Don't know....probably would be hard to separate them out into fields inside the region. Probably not a good idea.
The point is there might be another way to solve this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this parameter IS in the Spec. It is the number spec fields to create. The Spec is static, used before the Region is created

so the other way, can we "read" it from the Spec and get it to the field here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I wonder if you need it at all.
These are used to declare the inputs so you can use links with them.

you need it for the fields in the DB (?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need it for the fields in the DB (?)

We need to declare inputs so we can get data for each field of the record being written to the database. Ideally the names of those inputs should be the names of the db fields being stored. But since those names can be anything and could be different for each instance of the region, they could not be in the Spec. The Spec must be static and the same for all instances of the region, set at compile time. So that is why he has used generic names in the Spec.

I was trying to think of a way where the names of the inputs which correspond to DB fields, and their type, could be set during region creation if not even at runtime like it is in SQL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to think of a way where the names of the inputs which correspond to DB fields, and their type, could be set during region creation if not even at runtime like it is in SQL.

Hmm yes i am thinking the same way, but unless we define new link datatype like tuple, we can't dynamically add the inputs.. I want to keep this simple. I have no idea now how to implement this better.

dkeeney
dkeeney previously approved these changes Jul 8, 2020
Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool.
I did not find anything that would cause it not to work. I have a few suggestions ... mostly ideas for future PR's

#include <htm/regions/DatabaseOutRegion.hpp>
#include <htm/utils/Log.hpp>

#define MAX_NUMBER_OF_INPUTS 10 // maximal number of inputs/scalar streams in the database
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this parameter IS in the Spec. It is the number spec fields to create. The Spec is static, used before the Region is created and therefore this cannot be a parameter passed in at creation.

I personally don't have a problem with #define. But @breznak wants to pull us out of our old C ways and do things more C++ like.

insertData("dataStream_" + inp.first, inObj);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about a DatabaseRegion ... I can see how a similar loop could be added here to iterate the outputs and populate them with data from the database. May be a problem doing both input and output at the same time with the same instance of the region ... but it is something to consider.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i think it can be used for reading from the database with not much effort, but i see no use-case now. That's also why it would be good to name the region "DatabaseRegion" as you suggest.

src/htm/regions/DatabaseOutRegion.cpp Outdated Show resolved Hide resolved
#include <htm/regions/DatabaseOutRegion.hpp>
#include <htm/utils/Log.hpp>

#define MAX_NUMBER_OF_INPUTS 10 // maximal number of inputs/scalar streams in the database
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I wonder if you need it at all.
These are used to declare the inputs so you can use links with them. I have not thought this through but what if you used one input and used the Fan-In capability of the Link. The links for data for all fields would be connected to the same input. The Fan-In facility will append the buffers together and are passed into the region as one big array.
Don't know....probably would be hard to separate them out into fields inside the region. Probably not a good idea.
The point is there might be another way to solve this problem.

@dkeeney
Copy link

dkeeney commented Jul 8, 2020

I am not sure... @dkeeney what do you think about this? Or FileOutputDB maybe?

Actually I like "DatabaseRegion" with the idea that it might eventually handle both input and output. Along the same line, I have thought many times that the "FileOutputRegion" and "FileInputRegion" should be combined into a FileRegion ... one of those back burner projects.

However, I am not going to insist on "DatabaseRegion".

@Zbysekz
Copy link
Author

Zbysekz commented Jul 9, 2020

I am not sure... @dkeeney what do you think about this? Or FileOutputDB maybe?

Actually I like "DatabaseRegion" with the idea that it might eventually handle both input and output. Along the same line, I have thought many times that the "FileOutputRegion" and "FileInputRegion" should be combined into a FileRegion ... one of those back burner projects.

However, I am not going to insist on "DatabaseRegion".

Yes i am also for the renaming from "DatabaseOutRegion" to "DatabaseRegion" to have it more general, do you agree @breznak ?

Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am impressed. This is very nicely done.


// Setup data flows between regions
net.link("encoder", "sp_global", "", "", "encoded", "bottomUpIn");
net.link("sp_global", "tm", "", "", "bottomUpOut", "bottomUpIn");
net.link("tm", "output", "UniformLink", "", "bottomUpOut", "dataIn");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "UniformLink", or anything in this argument, is ignored. This is a hold-over from the NuPIC code.

@dkeeney dkeeney merged commit 2681c2c into htm-community:master Jul 9, 2020
@breznak
Copy link
Member

breznak commented Jul 9, 2020

I'm joining the celebration 💯 This is great and professional implementation. Thank you @Zbysekz 👏 and both of you for tuning the details.

@Zbysekz
Copy link
Author

Zbysekz commented Jul 9, 2020

Thank you guys for thoughtful review and cooperation 👍 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement C++ baker for dash visualization only A new Region for plots in C++ (for C++ and C# apps)
3 participants