-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for string values and many of BBC BASIC's string functions #56
Conversation
Quote quotes by doubling them up. TIME$ is no longer a special case for EQUS. Introduces a Value type that can be a string or a double. This uses its own string type. Alternatives were to not use a union, or to bring in a big dependency like boost to handle complex unions.
This is moderately complicated because it needed support for functions with more than one parameter.
This worked in VS2010 but gives an error in newer compilers.
Excellent! I haven't had a play with it yet but it sounds very promising. |
Having had a very quick play with this, does command line option -D support defining string variables? I couldn't get it to work, but it may be I'm failing to get the shell escaping correct. |
Another drive-by comment: It doesn't seem to like me creating string variables which end in "$". Would it be easy to allow this? I don't think it's particularly important to be like BBC BASIC and insist on a "$" suffix for string variables, but if it's simple it would be nice to allow it. The existence of TIME$ as a string variable feels slightly odd (although I appreciate it's backwards compatibility) when normal variable names can't contain $.
|
Apart from TIME$ and functions, I think the only place you can currently have a $ after something that resembles an identifier is in things like "LDA$FE". As long as that doesn't get broken it shouldn't be hard. I'll have a look. I meant to do something with -D but forgot. The obvious syntax is -D could always treat options as strings but this would break existing builds. If it guesses the type then this will prevent the use of strings that look like numbers. The simplest way is to introduce a new switch for string symbols, maybe |
Or some ad-hoc syntax like |
I'm not all that keen on -D symbol=$value - it just feels wrong to have stuff on the right hand side of the "=" which isn't part of the value. Then again, this really this isn't that different to the "=?" conditional assignment syntax which doesn't bother me at all. I think a stronger argument against that would be the special treatment of $ in Bourne shells for variable expansion, requiring more platform-specific escaping to work around. Speaking of Bourne shells, I expect there'd be similar-but-different awkwardness with quotes as with your Windows example. Using a trivial C program to just dump out argv:
-S seems like the best option, but it does seem a little bit disappointing not to be able to use -D naturally. |
Just for completeness: -D could define a string symbol if and only if the argument can't be interpreted as an integer, but I think that's pretty ugly and confusing. [Edit: Sorry, just noticed you did mention this option above anyway.] |
On Windows I'm not keen on -S either, but I'll do that for now. |
Thanks for adding those! Another (easy - well, the coding bit, anyway :-) ) suggestion: would it be worth adding BEEBASM_VERSION$="1.10" as a default string symbol? Or maybe BEEBASM_STRING_VERSION="1.10"? And perhaps we could have a BEEBASM_INTEGER_VERSION=110 as well? I really don't know, but I thought I'd mention it. Perhaps something to discuss on stardot later rather than mix in with this. |
PS Glad to see it looks like the "$" support wasn't too hard, although I guess knowing where to make the change is a big part of it and that effort isn't visible in the diff. :-) |
The error reporting is poor for now.
There was a discussion about version numbers on stardot recently. All the proposed solutions (including mine) had downsides so I'm disinclined to mix that in here. The "$" thing was easy when I realised there was already similar code for "%". I've done a first go of the ASM thing. It's great for obfuscating code
The error reporting is terrible for now. I'm not sure how useful it is? I've put it here for now. |
It's that discussion which made me mention it here, and you're right, probably best not to mix the two. Is it correct that "LDA" is quoted in the last line of your ASM example? I'd have expected:
but maybe that's part of your point about obfuscation. :-) I am sure I've had one or two rare cases where this would have been helpful, but of course I can't think of them now. What I thought was one and started to write up here turned out to not be quite the same thing... The best example I can think of right now would also "require" an equivalent of the C macro preprocessor # operator to work:
Edit: I put "require" in quotes because you could just require the caller to pass the argument as a string and use asm for all references to it inside the macro; it's just a bit nicer for the caller not to have to do this. |
Aha, I think I've remembered one of the "good" examples. (YMMV!) Imagine we've got some data in memory and in an ideal world we'd just access it normally via (zp),Y:
Unfortunately there's a hole in the middle of the data we're addressing via ptr - in my specific case (Ozmoo for Acorn), we've got a sideways RAM bank paged in but we've got no shadow RAM, so our block of data starts at (say) &6000 and stretches up to &C000, but we need to "ignore" the 1K of mode 7 screen at &7C00. What was "lda (ptr),y" now turns into something like (assuming ptr is page-aligned, just to keep things simpler):
and we need to do something very similar for "sta (ptr),y" and "ora (ptr),y" and "and (ptr),y" etc. We can wrap the above code inside an "lda_zp_y_skipping_screen" macro, but we need to duplicate the screen-skipping logic in ora_zp_y_skipping_screen and and_zp_y_skipping_screen etc. With something like your ASM, we can write:
and then do:
Maybe this isn't that convincing, I don't know. As I've written it here you could probably factor the skipping screen logic out into a helper macro to avoid a lot of the duplication even if you didn't have your new ASM instruction, but once the screen skipping logic gets more intricate I think the ASM advantage grows. If the ASM implementation doesn't significantly complicate beebasm, I think it's probably a nice thing to have, even if its use allows programs written using beebasm to be confusing. |
Sorry for the comment spam, and for not testing this myself (yet)... Does it work if you do:
If so, does this mean we could/should support EVAL() as a string function? If not, does it matter if this works or not? |
I'm sure it's right because it runs! ASM takes two strings and LDA is a number so it needs to be converted to a string or quoted. ASM evaluates the contents of the second parameter, which can include symbol names. The implementation is in the mungre/asm branch.. If I added BASIC's EVAL function you could pass "zp" as a string parameter and write
And you've just come to the same place! Yes, the "foo+1" does work. EVAL should be straightforward. I'll give it a go. |
Also, thanks for the "good" example. I didn't want to shove ASM in just because I could, but it does look like it will be useful. |
Thanks for the explanation on ASM. I've now built the code and you're right that it works, but I find it very confusing. It seems like the meaning of a quoted expression means something different in this context to elsewhere:
I would really have expected to need to write:
I wonder if my mental model of this is just wrong, but I hope you can at least see where I'm coming from, even if you don't agree with me. On a possibly awkward tangent, I note that:
doesn't work as I might wish. (ASM "LDA", "#4" does work.) Would it be hard to make this work? Edit: I suspect the answer is yes, otherwise ASM could take a single argument and we'd do things like |
Cheers, that is a bug. The TIME$ parsing was eating an extra character. Yes, automating some tests is on my list of things I might get around to one day. |
Thanks for fixing that! I suspect the big problem with automating the tests would be doing it in a cross-platform way. In a simple-ish project of mine I just wrote a shell script to do some basic tests and diff the output against a master set of output, but for beebasm that's not likely to be acceptable. Edit: Just maybe the github automated test stuff which we recently started using has something that could help. Anyway, just a thought, there's no urgency to work on this of course. I think I've found another bug, probably a bit obscure but I do have a project using this... The README says that when using the "-o" option to assemble in a more traditional fashion instead of generating a disc image, SAVE doesn't need a filename. This doesn't seem to work any more:
|
Thanks, I made the rookie mistake of believing the syntax description in a comment. The cross-platform thing is what has put me off. chuckie checks that the ssd hasn't changed but it uses a batch file. I think all of the positive (i.e. non-failing) tests can be expressed as "is this ssd unchanged?" Windows and Unix both have binary compare tools (comp and cmp respectively). A makefile can test its platform by testing environment variables, and then choose the right command. Well, probably, nmake is really limited. A much simpler option might be to include this in beebasm. Provide a "reference ssd" on the command-line and check that the new ssd is identical. Fail if not. This doesn't feel like it really belongs there, but it would be handy. Edit: of course, there are a few things (like SAVE to the file system) that couldn't be tested with this, but it covers the vast majority. It is scruffy though. |
:-) I'm always a bit dubious about my ability to write portable makefiles (Unix variants are bad enough, even without then supporting Windows), but I'm sure what you say is right. It would be a shame not to include the "deliberately failing" tests but something would be better than nothing, I guess. https://www.freecodecamp.org/news/what-are-github-actions-and-how-can-you-automate-tests-and-slack-notifications/#part-1-automating-tests suggests the github actions stuff (which we started using when I merged Dave Lambley's pull request #54 to proposed-updates) might offer something here. But I must admit this is an area of github I know very little about; Dave did the work to start using this feature and it seems pretty nifty, but that's about all I can say. (Edit: That link is probably a bit irrelevant, beyond showing the concept of github running tests exists - it's a bit Node.js-y...) I have built all of my own projects I can think of with the latest mungre/strings code and they all seem to build fine. However, I think there is still a lingering bug because I can't build Prince of Persia (https://github.com/kieranhj/pop-beeb). The build there is set up for Windows and it may be some of my hacks to build it on Linux are actually at fault, but could you please give it a go and see if it works for you? I get:
|
It is broken, I'll have a look. For future reference, if you put I don't know anything much about github actions either. Fundamentally, it seems like it just coordinates running stuff. That could be a test, but it doesn't know anything about it. I suppose a common choice for sane cross-platform scripting is python. It's not standard on Windows but I guess it's pretty common for even non-Python developers (like me) to have it installed. It would be easy to write a bit of Python to find and run tests, and it should be portable. |
Python sounds like a pretty good plan - and as long as the standard build either doesn't run the tests or gracefully skips them if Python isn't installed, it wouldn't be the end of the world if someone didn't have it. You are probably right about how github actions work, but it does look as though Python is available (https://github.com/marketplace/actions/setup-python) so if we did have a Python test harness it could probably be integrated there fairly easily. Thanks for taking a look at PoP, I hope it's not a tricky one... |
When an undefined symbol is evaluated beebasm skips the rest of the expression and throws an exception. Previously, a comma was unequivocally the end of an expression but this is not true with multi-parameter functions. LineParser::SkipExpression needs to keep track of the bracket count to handle commas correctly. This only caused problems when an undefined symbol was surrounded with brackets (like in a function call), and there were multiple expressions in a comma-separated list, like in an EQUB command.
Thanks for finding that. The hairiest change was the one to support multiple parameters in functions so that's exactly where it was. I should have looked there first. It was failing to complete an EQUB list (so everything moved in memory) when undefined symbols were used in a function call. The commit has more details. |
Nice one, with the latest code all of my projects build fine! Thanks very much. Are you going to post on stardot about this - I'd expect a pretty enthusiastic response :-) - and ask for more testers or have you had enough of fixing bugs for one weekend? :-) (If you do post, it might be worth mentioning your branch also includes all the existing proposed-updates changes.) I haven't looked over the code yet, I will see if I can do so over the next few days just on the off-chance there's anything dodgy that might catch my eye. |
PS Don't forget to credit yourself for this work in README.md! :-) |
Thank you. Your suggestions and testing have been really helpful. It's much better than it was two days ago. There's no rush to have a look at the code. I will post on stardot. It seems pretty solid now... I've committed an exe so people can try it easily. I don't like cluttering up the repo with multiple exes so I've put it in mungre/strings-exe, which can be deleted later. |
I wrote a simple test runner in python. It works on Windows and Linux. The tests are only the existing examples. There is a README explaining what it does. |
Nice one! I've had a quick look at this and it seems like a pretty decent framework. Would it be worth opening a new pull request for this just so the discussion doesn't get mixed in with the string value support? It probably doesn't matter if it's just the two of us talking about it. :-) A few miscellaneous thoughts; I appreciate you've probably thought of some or all of these yourself and just haven't wanted to deal with them :-) :
|
PS FWIW the test harness worked fine for me on Ubuntu 20.04.2. |
PPS If it's easy, it would maybe be nice to support both Python 2 and 3. But it makes perfect sense to not worry about this yet, and maybe once the test harness is "done" and no longer being modified a lot it could be tweaked to work on both. |
Discussion of testing moved here. |
I've been over the diff between proposed-updates and mungre/strings, not in great depth. I have a few thoughts, which I hope (but don't guarantee!) aren't based on misconceptions:
|
Glancing over the open issues, I think this work addresses issue #33. I just gave this a quick test and macros certainly seem to support string arguments, am I right? |
Cheers, I've fixed STRINGS$ in the README. Yes, macros support string parameters and the ASM directive handles the comment on issue 33. Yes, shifting of signed ints is undefined for many values. Funnily enough, I'd just noticed the same problem in the binary number parser. The BASIC V shifts are described here. It has both logical and arithmetic shifts right. This seems like a reasonable thing to copy. I'll do that. LEFT$ and RIGHT$ were skipped as unnecessary, but I might as well finish the job and add them in. |
Further to my last commit, I'm not sure what local-forward-branch-3 is testing. I've been adding some simple tests of value parsing and noticed these things: I expected print to write 32-bit integers cleanly but print(&7fffffff) produces 2.14748e+009. Can you see any harm in changing this (for print and STR$)? I was surprised that &ffffffff isn't negative, but it's always been like that so it's too late to change it. However the % parsing can produce negative numbers, and quietly truncates the value to 32 bits (undefined behaviour notwithstanding). The silent truncation can go but I'm not sure about making the negative behaviour consistent with the hex parsing. What do you think? Edit: this comment was supposed to be on the testing pr so I've repeated it there. |
I can't figure out what local-forward-branch-3 is testing either. I haven't tried bisecting this fully, but it does build just fine with the commit immediately before the test case was added, and with v1.08 as well. If I had to guess I'd suspect there was a problem with beebasm using abs instead of zp addressing in macros at some point (probably as a side-effect of some other change I made regarding scoped variables), but I really can't be sure. I think we should probably remove all the local-forward-branch-* tests from the examples directory (as I did on ZornsLemma/testing). My gut feeling would be to keep local-forward-branch-3 as a test just in case, but if you want to get rid of it I won't be too upset. (Is the old retro software forum still around? I know it's superficially present on stardot, but I was trying to find some beebasm posts I made about structured assembler way back and I couldn't find them on there, so I'm not sure it's complete. If it is about, there might well be a post from me on there about local-forward-branch-3.) I can't see any obvious downside to changing the way print displays large integers, but of course it's hard to rule out someone using this in a way that will get broken. I hadn't really thought about & and % generating negative numbers until now. It doesn't sound like a bad idea to make % consistent with & but of course someone could have written some code which depends on it; my gut feeling is that isn't very likely but I really don't know. Perhaps the best suggestion I could offer would be to make it consistent and post to stardot mentioning it in passing; probably no one will care, and if they do at least you haven't sneaked an incompatible change in without mentioning it. Edit: OK, I'll do the same. :-) |
I fixed the undefined shift behaviour, but I realised I can't do logical shift right because |
The README has the new functions.
There's still no way of using a string as an assembler mnemonic or an addressing mode, which I know some people would like. This could be handled with an ASSEMBLE command which takes two strings (the mnemonic and the parameters) and assembles them. Anyway, that's something that could be built on this.
This is quite a big change so comments are very welcome.