Skip to content

Tag searcher CLI executable#16

Merged
okofish merged 60 commits intomasterfrom
feature/cli
Mar 2, 2021
Merged

Tag searcher CLI executable#16
okofish merged 60 commits intomasterfrom
feature/cli

Conversation

@okofish
Copy link
Copy Markdown
Collaborator

@okofish okofish commented Feb 27, 2021

Changes

  • Add an executable target (enabled with the POST_TAG_SYSTEM_BUILD_CLI CMake option) that builds a command-line binary for running searches over ranges of initial conditions
  • Add a Dockerfile for building the CLI
  • Add PostTagFileReader and PostTagFileWriter utility base classes for parsing/serializing the ad-hoc binary data file formats that will be used to store "crib" sequences (known checkpoints), initial conditions, and evaluation results.
  • Add a PostTagCribFile file format struct and accompanying reader class
  • Add a PostTagResultFile file format struct and accompanying writer class

Comments

  • The executable is mostly scaffolding right now, to be filled in once the PostTagSearcher API (PostTagSearcher API #14) is implemented.
  • Boost.Program_options is required for command-line option parsing. Boost static/shared libraries and headers can be found in many package managers.
  • The executable can be built with static linking by enabling the POST_TAG_SYSTEM_CLI_STATIC_BUILD CMake option.

This change is Reviewable

Copy link
Copy Markdown
Owner

@maxitg maxitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 21 files at r1, 10 of 10 files at r2.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @okofish)


CLI/arguments.cpp, line 54 at r1 (raw file):

    ("cribfile,f",    po::value<std::string>()->value_name("path"),
      "Path to crib file (list of known sequences)")
    ("cribsize,b",    po::value<uint32_t>()->value_name("size")

How is this going to be used? Doesn't the cribfile contain that already?


CLI/arguments.cpp, line 62 at r1 (raw file):

    ("initstart,s",   po::value<uint64_t>()->value_name("start"),
      "Starting initial condition")
    ("initcount,n",   po::value<uint64_t>()->default_value(1)->value_name("count")

Good idea. This will prevent people from being confused about end vs. end - 1.


CLI/arguments.cpp, line 66 at r1 (raw file):

      "Number of initial conditions")
    ("initoffset,e",  po::value<uint64_t>()->default_value(0)->value_name("offset"),
      "Initial condition offset (for use with zero-indexed array jobs)")

This description is not very clear to me. I think the comment in main.cpp is better. Or maybe it can say something like "shift all inits by offset * count".


CLI/main.cpp, line 23 at r1 (raw file):

}

void print_bits(std::vector<bool> bits) {

const std::vector<bool>& bits


CLI/main.cpp, line 56 at r1 (raw file):

      auto checkpoint = checkpoint_spec.states[i];
      print_bits(checkpoint.tape);
      printf(" - %u\n", checkpoint.headState);

We probably need a --verbose flag which would disable this, and also the printing of the results.


CLI/files/PostTagCribFile.cpp, line 7 at r2 (raw file):

using PostTagSystem::PostTagState;

PostTagCribFile PostTagCribFileReader::read_file() {

It would be nice to add a googletest for this using TestDatafiles/crib1.postcrib. (And the same for reading/writing other files.)


TestDatafiles/crib1.crib, line 0 at r1 (raw file):
Nit: It might be useful to add cases with tapes over 8 bits for testing.

Copy link
Copy Markdown
Collaborator Author

@okofish okofish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @maxitg and @okofish)


CLI/arguments.cpp, line 54 at r1 (raw file):

Previously, maxitg (Max Piskunov) wrote…

How is this going to be used? Doesn't the cribfile contain that already?

Good catch - this was left over from the initial spec; I'll remove it. Each sequence in cribfile is indeed prefixed with its length.


CLI/main.cpp, line 56 at r1 (raw file):

Previously, maxitg (Max Piskunov) wrote…

We probably need a --verbose flag which would disable this, and also the printing of the results.

This (and most of main.cpp) is just scaffolding for testing; once the searcher API is integrated there won't be anything printed except maybe a brief summary at the very end of the job.

It might be useful to have a --verbose flag that causes a status line to be printed every N seconds (or every N inits), but I think that would have to happen via some status callback that I would pass into your PostTagSearcher.


CLI/files/PostTagCribFile.cpp, line 7 at r2 (raw file):

Previously, maxitg (Max Piskunov) wrote…

It would be nice to add a googletest for this using TestDatafiles/crib1.postcrib. (And the same for reading/writing other files.)

That's a good idea. I'll do that once the file formats are more solidified.


TestDatafiles/crib1.crib, line at r1 (raw file):

Previously, maxitg (Max Piskunov) wrote…

Nit: It might be useful to add cases with tapes over 8 bits for testing.

Definitely; will do.

Copy link
Copy Markdown
Collaborator Author

@okofish okofish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 18 of 24 files reviewed, 5 unresolved discussions (waiting on @maxitg)


CLI/arguments.cpp, line 54 at r1 (raw file):

Previously, okofish (Jesse Friedman) wrote…

Good catch - this was left over from the initial spec; I'll remove it. Each sequence in cribfile is indeed prefixed with its length.

Done.


CLI/arguments.cpp, line 66 at r1 (raw file):
Changed to:

Shifts starting condition by offset * count (for use with zero-indexed array jobs)


CLI/main.cpp, line 23 at r1 (raw file):

Previously, maxitg (Max Piskunov) wrote…

const std::vector<bool>& bits

Done.

Copy link
Copy Markdown
Collaborator Author

@okofish okofish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 18 of 24 files reviewed, 5 unresolved discussions (waiting on @maxitg)


CLI/arguments.cpp, line 66 at r1 (raw file):

Previously, okofish (Jesse Friedman) wrote…

Changed to:

Shifts starting condition by offset * count (for use with zero-indexed array jobs)

Done.

Copy link
Copy Markdown
Owner

@maxitg maxitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 6 files at r3.
Reviewable status: 22 of 24 files reviewed, 1 unresolved discussion (waiting on @maxitg)


CLI/main.cpp, line 56 at r1 (raw file):

Previously, okofish (Jesse Friedman) wrote…

This (and most of main.cpp) is just scaffolding for testing; once the searcher API is integrated there won't be anything printed except maybe a brief summary at the very end of the job.

It might be useful to have a --verbose flag that causes a status line to be printed every N seconds (or every N inits), but I think that would have to happen via some status callback that I would pass into your PostTagSearcher.

Good idea.

@okofish okofish force-pushed the feature/cli branch 5 times, most recently from 7e5ebed to 955b020 Compare March 1, 2021 08:51
NotEvaluated = 8,

// result file spec requires the reason to fit in 5 bits
MAX_DO_NOT_EXCEED = 31
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this out as it's my one modification to the library code.

@okofish okofish marked this pull request as ready for review March 1, 2021 19:55
@okofish okofish added the enhancement New feature or request label Mar 1, 2021
Copy link
Copy Markdown
Owner

@maxitg maxitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 24 of 32 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 1 of 1 files at r10, 5 of 5 files at r11.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @maxitg and @okofish)


.circleci/config.yml, line 189 at r8 (raw file):

          path: ./LibraryResources/

  windows-cli-build:

Wouldn't it make more sense to build CLI in the above job, given that the setup is much more expensive than the build itself? Plus, we'll avoid code duplication for the "Install CMake" step.


CLI/arguments.cpp, line 68 at r8 (raw file):

    ("cribfile,f",    po::value<std::string>()->value_name("path.postcrib"),
      "Path to crib file (list of known sequences)")
    ("initsize,l",    po::value<uint16_t>()->default_value(30)->value_name("size")

I think initsize should be a required parameter. It does not really make much sense to specify the range without it.


CLI/main.cpp, line 120 at r8 (raw file):

    start += count * offset;

    std::cout << boost::format("Evaluating %u initial conditions, starting at %u...\n") % count % start;

"%u initial tapes", there are 3 * %u initial conditions.


CLI/main.cpp, line 124 at r8 (raw file):

    try {
      results = searcher.evaluateRange(tape_length, start, start + count, eval_params);

It's a bit weird that we are evaluating 3 * initcount inits. Would it make more sense to rename the flags to --inittapestart and --inittapecount?


FileFormats/CribFile.wl, line 90 at r8 (raw file):

    version,
    stream,
    tape -> 0

I don't think the default head state of zero makes much sense. Head state is equally as important as the tape.


FileFormats/InitFile.wl, line 99 at r8 (raw file):

    version,
    stream,
    tape -> 0

Same here wrt the default head state.


FileFormats/PostTagFileFormats.wl, line 1 at r8 (raw file):

BeginPackage["PostTagFileFormats`"]

Why not put all this into the main package (in the Kernel directory)? People like Stephen will probably want to use this, and they won't be able to unless we put it in the paclet (or make another one). Plus, this requires SetDirectory as without it the following does not work:

Get["~/git/PostTagSystem/FileFormats/PostTagFileFormats.wl"]

FileFormats/ResultFile.wl, line 133 at r8 (raw file):

(* Register import converter *)

ImportExport`RegisterImport[

Is it possible to make it automatically recognize .postresult extension?


libPostTagSystem/PostTagSearcher.hpp, line 26 at r7 (raw file):

Previously, okofish (Jesse Friedman) wrote…

Calling this out as it's my one modification to the library code.

Sounds good. 31 should be sufficient.


FileFormats/crib2.postcrib, line 0 at r8 (raw file):
Should this file be in the SampleFiles directory?

Copy link
Copy Markdown
Collaborator Author

@okofish okofish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 29 of 33 files reviewed, 11 unresolved discussions (waiting on @maxitg and @okofish)


.circleci/config.yml, line 189 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

Wouldn't it make more sense to build CLI in the above job, given that the setup is much more expensive than the build itself? Plus, we'll avoid code duplication for the "Install CMake" step.

Makes sense; I'll try to get this working via buildLibraryResources.sh (but only on Windows for now, since we don't install Boost on macOS). This may take a few iterations of commit/check CircleCI/commit again.


CLI/arguments.cpp, line 68 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

I think initsize should be a required parameter. It does not really make much sense to specify the range without it.

Done.


CLI/main.cpp, line 120 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

"%u initial tapes", there are 3 * %u initial conditions.

Changed to %u initial condition tapes.


CLI/main.cpp, line 124 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

It's a bit weird that we are evaluating 3 * initcount inits. Would it make more sense to rename the flags to --inittapestart and --inittapecount?

Rather than make the flags longer I changed the descriptions to include "tape"; let me know if this isn't enough.


FileFormats/CribFile.wl, line 90 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

I don't think the default head state of zero makes much sense. Head state is equally as important as the tape.

Done.


FileFormats/InitFile.wl, line 99 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

Same here wrt the default head state.

Done.


FileFormats/PostTagFileFormats.wl, line 1 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

Why not put all this into the main package (in the Kernel directory)? People like Stephen will probably want to use this, and they won't be able to unless we put it in the paclet (or make another one). Plus, this requires SetDirectory as without it the following does not work:

Get["~/git/PostTagSystem/FileFormats/PostTagFileFormats.wl"]

I thought about this previously but I don't know how to use the "new" package format - can I safely just drop these files in the Kernel directory without modifications or do I have to rewrite them in the new style?

I would be OK with making another paclet in this folder; at least I know how to do that.


FileFormats/ResultFile.wl, line 133 at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

Is it possible to make it automatically recognize .postresult extension?

AFAIK only via this: https://resources.wolframcloud.com/FunctionRepository/resources/RegisterFormat

I could put in a utility function like RegisterPostTagFileFormats[] that calls that ResourceFunction. I'm not sure how I would feel about copying the big chunk of mysterious internal code from that ResourceFunction directly into these files. Then again, what's the worst that could happen...


FileFormats/crib2.postcrib, line at r8 (raw file):

Previously, maxitg (Max Piskunov) wrote…

Should this file be in the SampleFiles directory?

I don't even remember what that was for; deleted.

Copy link
Copy Markdown
Owner

@maxitg maxitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r12.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @maxitg and @okofish)


.circleci/config.yml, line 189 at r8 (raw file):

Previously, okofish (Jesse Friedman) wrote…

Makes sense; I'll try to get this working via buildLibraryResources.sh (but only on Windows for now, since we don't install Boost on macOS). This may take a few iterations of commit/check CircleCI/commit again.

Why don't you just run your build step after buildLibraryResources? I think that would be more reliable, and it wouldn't be any slower because of caching.


FileFormats/PostTagFileFormats.wl, line 1 at r8 (raw file):

Previously, okofish (Jesse Friedman) wrote…

I thought about this previously but I don't know how to use the "new" package format - can I safely just drop these files in the Kernel directory without modifications or do I have to rewrite them in the new style?

I would be OK with making another paclet in this folder; at least I know how to do that.

It's quite simple actually. You cannot just copy these files there, but all you need to change is add Package["PostTagSystem`"] at the top of every file, remove all Begin/EndPackage and Begin/End. And then, put PackageExport["symbolName"] for all public symbols and PackageScope["symbolName"] for all symbols used in other files. The package system loads all files automatically in alphabetical order (so you might need to put A1$ in front of Utility).


FileFormats/ResultFile.wl, line 133 at r8 (raw file):

Previously, okofish (Jesse Friedman) wrote…

AFAIK only via this: https://resources.wolframcloud.com/FunctionRepository/resources/RegisterFormat

I could put in a utility function like RegisterPostTagFileFormats[] that calls that ResourceFunction. I'm not sure how I would feel about copying the big chunk of mysterious internal code from that ResourceFunction directly into these files. Then again, what's the worst that could happen...

I don't think we should be calling resource functions here. It's too unreliable. But with lines like this, it really is quite mysterious:

{fmt, True, False, False, False, False, $ext, $mime, None, {}}

We can copy that resource function to a new file, but maybe we shouldn't worry about it for now, given that we don't have much time left.

Copy link
Copy Markdown
Owner

@maxitg maxitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve now so that I don't block you. I think it would be nice to merge FileFormats to the main package, but it can be done in a separate PR.

Reviewed 1 of 2 files at r13.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @maxitg and @okofish)

@okofish okofish merged commit e94f380 into master Mar 2, 2021
@okofish okofish deleted the feature/cli branch March 2, 2021 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants