-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReaderInput and StreamInput are broken #6
Comments
To work around this and test patterns against file input please use |
The core issue is the entanglement of Going forward I am going to try reducing input from byte streams only and provide UTF-8 decoder support in transducers that can be composed directly with UTF-8 input patterns using standard XML entity names eg These changes will take some time and I'll ask for patient use of the workaround prescribed in previous comment in the meantime. |
This commit applies a stop-gap workaround for the IInput-related issues described in issue #6. It just applies to `jrte.Jrte.main()` the same logic as `test.FileRunner.main()` -- the input file path is specified as an input argument and the entire file contents are read into RAM. This workaround just obviates inclusion of jrte-HEAD-test.jar in classpath. Tests have been extended to include runs with FileRunner.main() for benchmarking and Jrte.main() as for regular runs. Output equivalence checks for equivalent output from verbose gc interval times extraction via two different FSTs and one regex. The `etc/sh/jrte.sh` script demonstrates how to use jrte to transduce a file. ``` java -cp jrte-HEAD.jar com.characterforming.jrte.Jrte \ [--nil] <transducer-name> <input-filepath> <gearbox-filepath> ``` The `--nil` option presents a `!nil` signal to the transduction before presenting the file contents. This is all that I intend to do with this issue #6 for now as a good fix will require replacing the entire IInput framework and transducing from raw byte streams; this will be undertaken in connection with issue #15. Signed-off-by: jrte <jrte.project@gmail.com>
Please use |
This is almost fixed in dev@1887933. The main problem with segmented input is fixed in that commit. IInput is gone and the ITransductor.input() method can be used to segment for input. Just call input() to push data onto the transductor input stack (LIFO) and call run() until the input stack is consumed, call input() and run() repeatedly to consume input more input. Mark/reset will not work across input block boundaries for the time being. Same effect can be achieved by copying data that would otherwise be marked into a named value, them push value onto input stack at reset point. Mark/reset, if fully implemented, will not copy data but will retain references to blocks with marked data. A ITransductor.hasMark() method will indicate whether or not the transductor has marked data. Any data buffers passed to a marking transductor run() method will be retained on the input stack until the mark is reset. Callers must be aware that buffers passed to run() while the transducer is maintaining a mark MUST NOT be reused for data, at least until after the transductor stops marking. |
Closing this. The fix works but with caveats. Mark/reset have no effect when there is >1 non-empty frame on the input stack. Input buffers passed into a marking transduction must not be reused (transductor holds original buffers in its mark stack and reuse will overwrite marked data). MArked buffers accumulate until reset() or unmark() are called. Call ITransdictor.hasMark() before reusing data buffers for Transductor.input(). Mark/reset are seldom needed and poorly implemented and will likely be deprecated and removed in the future. A better way would be to paste data that would otherwise be marked into a named value and push named value to reset. |
Reopening this as it is not adequately tested. The cut boundaries imposed by ITransductor.limit() do not simulate marking at limit of physical buffer. |
These were overlooked when replaced by TCompile.{map,model} and the build broke when they were removed. Some mending followed from this. Mark/reset was reworked but still needs more testing so I reopened issue #6. I will rework FileRunner to allow segmented input for ribose but will still need to load entire input into heap (or maybe a direct buffer off-heap) for regex runs. BaseTarget.{map,model} were replaced by TRun.{map,model} in a previous commit. These are generated by running TCompile on ginr automata compiled from patterns/test (ant -f build.xml ribose). Signed-off-by: jrte <jrte.project@gmail.com>
This is fixed in |
Fixed and merged into master with javadoc cleanup (f2f9039). |
These are the available BaseInput subclasses that wrap stdin for simple text transduction processes. Neither of these work unless entire input is preloaded into a single buffer.
Input from stdin is limited to <4k and larger files must be preloaded as for jrte.test.com.characterforming.jrte.test.FileRunner.
The text was updated successfully, but these errors were encountered: