Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for string values and many of BBC BASIC's string functions #56

Merged
merged 29 commits into from
Aug 10, 2021

Conversation

mungre
Copy link
Collaborator

@mungre mungre commented Jul 1, 2021

The README has the new functions.

There's still no way of using a string as an assembler mnemonic or an addressing mode, which I know some people would like. This could be handled with an ASSEMBLE command which takes two strings (the mnemonic and the parameters) and assembles them. Anyway, that's something that could be built on this.

This is quite a big change so comments are very welcome.

Quote quotes by doubling them up.

TIME$ is no longer a special case for EQUS.

Introduces a Value type that can be a string or a double.  This uses its
own string type.  Alternatives were to not use a union, or to bring in a
big dependency like boost to handle complex unions.
This is moderately complicated because it needed support for functions
with more than one parameter.
This worked in VS2010 but gives an error in newer compilers.
@ZornsLemma
Copy link
Collaborator

Excellent! I haven't had a play with it yet but it sounds very promising.

@ZornsLemma
Copy link
Collaborator

Having had a very quick play with this, does command line option -D support defining string variables? I couldn't get it to work, but it may be I'm failing to get the shell escaping correct.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 1, 2021

Another drive-by comment: It doesn't seem to like me creating string variables which end in "$". Would it be easy to allow this? I don't think it's particularly important to be like BBC BASIC and insist on a "$" suffix for string variables, but if it's simple it would be nice to allow it. The existence of TIME$ as a string variable feels slightly odd (although I appreciate it's backwards compatibility) when normal variable names can't contain $.

steven@riemann:~/src/beebasm3$ cat s1.beebasm 
foo="Hello "
bar=TIME$
mike$="X"
PRINT foo+bar
steven@riemann:~/src/beebasm3$ ./beebasm -do z.ssd -v -i s1.beebasm
s1.beebasm:3: error: Unrecognised token.

mike$="X"
^

@mungre
Copy link
Collaborator Author

mungre commented Jul 1, 2021

Apart from TIME$ and functions, I think the only place you can currently have a $ after something that resembles an identifier is in things like "LDA$FE". As long as that doesn't get broken it shouldn't be hard. I'll have a look.

I meant to do something with -D but forgot. The obvious syntax is -D symbol="value" but this gets mangled by the argv parsing. On Windows you would need -D symbol=""""value"""". That is eight quotes in total and it's not portable so cross-platform makefiles would be a difficulty.

-D could always treat options as strings but this would break existing builds. If it guesses the type then this will prevent the use of strings that look like numbers. The simplest way is to introduce a new switch for string symbols, maybe -S?

@mungre
Copy link
Collaborator Author

mungre commented Jul 1, 2021

Or some ad-hoc syntax like -D symbol=$value or -D$? Or require a $ on the symbol name -D symbol$=value, though I'm not superkeen on this one.

@ZornsLemma
Copy link
Collaborator

I'm not all that keen on -D symbol=$value - it just feels wrong to have stuff on the right hand side of the "=" which isn't part of the value. Then again, this really this isn't that different to the "=?" conditional assignment syntax which doesn't bother me at all. I think a stronger argument against that would be the special treatment of $ in Bourne shells for variable expansion, requiring more platform-specific escaping to work around.

Speaking of Bourne shells, I expect there'd be similar-but-different awkwardness with quotes as with your Windows example. Using a trivial C program to just dump out argv:

$ ./a.out -D foo="bar" # natural syntax doesn't give desired result
0 ./a.out
1 -D
2 foo=bar
$ ./a.out -D 'foo="bar"' # one possible unnatural syntax which does give desired result
0 ./a.out
1 -D
2 foo="bar"

-S seems like the best option, but it does seem a little bit disappointing not to be able to use -D naturally.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 1, 2021

Just for completeness: -D could define a string symbol if and only if the argument can't be interpreted as an integer, but I think that's pretty ugly and confusing. [Edit: Sorry, just noticed you did mention this option above anyway.]

@mungre
Copy link
Collaborator Author

mungre commented Jul 1, 2021

On Windows 'foo="bar"' gives 'foo=bar'. foo='bar' preserves the single quotes but I suspect it won't on Unix, and if the string contains a space you have to double-quote it as well.

I'm not keen on -S either, but I'll do that for now.

@ZornsLemma
Copy link
Collaborator

Thanks for adding those!

Another (easy - well, the coding bit, anyway :-) ) suggestion: would it be worth adding BEEBASM_VERSION$="1.10" as a default string symbol? Or maybe BEEBASM_STRING_VERSION="1.10"? And perhaps we could have a BEEBASM_INTEGER_VERSION=110 as well? I really don't know, but I thought I'd mention it. Perhaps something to discuss on stardot later rather than mix in with this.

@ZornsLemma
Copy link
Collaborator

PS Glad to see it looks like the "$" support wasn't too hard, although I guess knowing where to make the change is a big part of it and that effort isn't visible in the diff. :-)

The error reporting is poor for now.
@mungre
Copy link
Collaborator Author

mungre commented Jul 1, 2021

There was a discussion about version numbers on stardot recently. All the proposed solutions (including mine) had downsides so I'm disinclined to mix that in here.

The "$" thing was easy when I realised there was already similar code for "%".

I've done a first go of the ASM thing. It's great for obfuscating code

LDA=4
ASM="LDA"
ASM ASM, "LDA"   \ LDA 4

The error reporting is terrible for now. I'm not sure how useful it is? I've put it here for now.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 1, 2021

It's that discussion which made me mention it here, and you're right, probably best not to mix the two.

Is it correct that "LDA" is quoted in the last line of your ASM example? I'd have expected:

ASM ASM, LDA

but maybe that's part of your point about obfuscation. :-)

I am sure I've had one or two rare cases where this would have been helpful, but of course I can't think of them now. What I thought was one and started to write up here turned out to not be quite the same thing...

The best example I can think of right now would also "require" an equivalent of the C macro preprocessor # operator to work:

macro munge_thing zp
    if zp == hot_zp_1 or zp == hot_zp_2
         ; inline munging, we don't want to waste time for this hot address
        lda zp
        ; imagine a lot of code here
        sta zp
    else
        ; Performance isn't critical here and we don't want to waste space on inline munging.
        asm "jsr", "munge_"+#zp+"_subroutine"
    fi
endmacro

munge_thing hot_zp_1 ; generates inline code
munge_thing boring_zp ; generates jsr munge_boring_zp_sub

Edit: I put "require" in quotes because you could just require the caller to pass the argument as a string and use asm for all references to it inside the macro; it's just a bit nicer for the caller not to have to do this.

@ZornsLemma
Copy link
Collaborator

Aha, I think I've remembered one of the "good" examples. (YMMV!)

Imagine we've got some data in memory and in an ideal world we'd just access it normally via (zp),Y:

    lda (ptr),y
    clc:adc #42
    iny:ora (ptr),y:dey
    sta (ptr),y

Unfortunately there's a hole in the middle of the data we're addressing via ptr - in my specific case (Ozmoo for Acorn), we've got a sideways RAM bank paged in but we've got no shadow RAM, so our block of data starts at (say) &6000 and stretches up to &C000, but we need to "ignore" the 1K of mode 7 screen at &7C00.

What was "lda (ptr),y" now turns into something like (assuming ptr is page-aligned, just to keep things simpler):

    lda ptr+1:cmp #&7c:bcc below_screen
    clc:adc #4
.below_screen
    sta temp_ptr+1
    lda ptr:sta temp_ptr
    lda (temp_ptr),y

and we need to do something very similar for "sta (ptr),y" and "ora (ptr),y" and "and (ptr),y" etc. We can wrap the above code inside an "lda_zp_y_skipping_screen" macro, but we need to duplicate the screen-skipping logic in ora_zp_y_skipping_screen and and_zp_y_skipping_screen etc. With something like your ASM, we can write:

macro do_zp_y_skipping_screen opcode, zp
    if opcode != "lda"
        pha
    fi
    lda zp+1:cmp #&7c:bcc below_screen
    clc:adc #4 
.below_screen
    sta temp_ptr+1
    lda zp:sta temp_ptr
    if opcode != "lda"
        pla
    fi
    asm opcode, temp_ptr
endmacro

and then do:

    do_zp_y_skipping_screen "lda", ptr
    clc:adc #42
    iny:do_zp_y_skipping_screen "ora", ptr:dey
    do_zp_y_skipping_screen "sta", ptr

Maybe this isn't that convincing, I don't know. As I've written it here you could probably factor the skipping screen logic out into a helper macro to avoid a lot of the duplication even if you didn't have your new ASM instruction, but once the screen skipping logic gets more intricate I think the ASM advantage grows.

If the ASM implementation doesn't significantly complicate beebasm, I think it's probably a nice thing to have, even if its use allows programs written using beebasm to be confusing.

@ZornsLemma
Copy link
Collaborator

Sorry for the comment spam, and for not testing this myself (yet)...

Does it work if you do:

foo=&70
asm "lda", "foo+1" ; lda &71

If so, does this mean we could/should support EVAL() as a string function? If not, does it matter if this works or not?

@mungre
Copy link
Collaborator Author

mungre commented Jul 1, 2021

I'm sure it's right because it runs! ASM takes two strings and LDA is a number so it needs to be converted to a string or quoted. ASM evaluates the contents of the second parameter, which can include symbol names. The implementation is in the mungre/asm branch..

If I added BASIC's EVAL function you could pass "zp" as a string parameter and write

macro munge_thing zp
    if zp == hot_zp_1 or zp == hot_zp_2
         ; inline munging, we don't want to waste time for this hot address
        lda EVAL(zp)
        ; imagine a lot of code here
        sta EVAL(zp)
    else
        ; Performance isn't critical here and we don't want to waste space on inline munging.
        asm "jsr", "munge_"+zp+"_subroutine"
    fi
endmacro

And you've just come to the same place! Yes, the "foo+1" does work. EVAL should be straightforward. I'll give it a go.

@mungre
Copy link
Collaborator Author

mungre commented Jul 1, 2021

Also, thanks for the "good" example. I didn't want to shove ASM in just because I could, but it does look like it will be useful.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 1, 2021

Thanks for the explanation on ASM. I've now built the code and you're right that it works, but I find it very confusing. It seems like the meaning of a quoted expression means something different in this context to elsewhere:

.START
VAL=4
ASM "LDA", "VAL" ; LDA 4
ASM "LDA", "VAL+1" ; LDA 5
EQUS "VAL" ; emits 'V', 'A', 'L', not '4'
EQUS "VAL+1" ; emits 'V', 'A', 'L', '+', '1', not '5'
.END
SAVE "X", START, END

I would really have expected to need to write:

ASM "LDA", STR$(VAL)

I wonder if my mental model of this is just wrong, but I hope you can at least see where I'm coming from, even if you don't agree with me.

On a possibly awkward tangent, I note that:

ASM "LDA #", "4"

doesn't work as I might wish. (ASM "LDA", "#4" does work.) Would it be hard to make this work? Edit: I suspect the answer is yes, otherwise ASM could take a single argument and we'd do things like ASM "LDA "+STR$(VAL) or ASM "LD"+register+" #42"

@mungre
Copy link
Collaborator Author

mungre commented Jul 3, 2021

Cheers, that is a bug. The TIME$ parsing was eating an extra character.

Yes, automating some tests is on my list of things I might get around to one day.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 3, 2021

Thanks for fixing that!

I suspect the big problem with automating the tests would be doing it in a cross-platform way. In a simple-ish project of mine I just wrote a shell script to do some basic tests and diff the output against a master set of output, but for beebasm that's not likely to be acceptable. Edit: Just maybe the github automated test stuff which we recently started using has something that could help. Anyway, just a thought, there's no urgency to work on this of course.

I think I've found another bug, probably a bit obscure but I do have a project using this... The README says that when using the "-o" option to assemble in a more traditional fashion instead of generating a disc image, SAVE doesn't need a filename. This doesn't seem to work any more:

steven@riemann:~/src/beebasm3$ cat save2.beebasm 
org &900

.start
    lda #42
    jmp &ffee
.end

SAVE start, end
steven@riemann:~/src/beebasm3$ ./beebasm -o save2.out -v -i save2.beebasm
save2.beebasm:8: error: Type mismatch.

SAVE start, end
          ^

@mungre
Copy link
Collaborator Author

mungre commented Jul 3, 2021

Thanks, I made the rookie mistake of believing the syntax description in a comment.

The cross-platform thing is what has put me off. chuckie checks that the ssd hasn't changed but it uses a batch file.

I think all of the positive (i.e. non-failing) tests can be expressed as "is this ssd unchanged?" Windows and Unix both have binary compare tools (comp and cmp respectively). A makefile can test its platform by testing environment variables, and then choose the right command. Well, probably, nmake is really limited.

A much simpler option might be to include this in beebasm. Provide a "reference ssd" on the command-line and check that the new ssd is identical. Fail if not. This doesn't feel like it really belongs there, but it would be handy.

Edit: of course, there are a few things (like SAVE to the file system) that couldn't be tested with this, but it covers the vast majority. It is scruffy though.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 3, 2021

:-)

I'm always a bit dubious about my ability to write portable makefiles (Unix variants are bad enough, even without then supporting Windows), but I'm sure what you say is right. It would be a shame not to include the "deliberately failing" tests but something would be better than nothing, I guess.

https://www.freecodecamp.org/news/what-are-github-actions-and-how-can-you-automate-tests-and-slack-notifications/#part-1-automating-tests suggests the github actions stuff (which we started using when I merged Dave Lambley's pull request #54 to proposed-updates) might offer something here. But I must admit this is an area of github I know very little about; Dave did the work to start using this feature and it seems pretty nifty, but that's about all I can say. (Edit: That link is probably a bit irrelevant, beyond showing the concept of github running tests exists - it's a bit Node.js-y...)

I have built all of my own projects I can think of with the latest mungre/strings code and they all seem to build fine. However, I think there is still a lingering bug because I can't build Prince of Persia (https://github.com/kieranhj/pop-beeb). The build there is set up for Windows and it may be some of my hacks to build it on Linux are actually at fault, but could you please give it a go and see if it works for you? I get:

game/grafix.asm:111: error: Fatal error: the second assembler pass has generated different code to the first.

.chtablist_LO EQUB LO(chtable1),LO(chtable2),LO(chtable3),LO(chtable4)
 ^

Call stack:
pop-beeb.asm:910

@mungre
Copy link
Collaborator Author

mungre commented Jul 3, 2021

It is broken, I'll have a look. For future reference, if you put EQUB &18,&04,&01,&20,&04,&6B,&63 in your version.txt you get an identical ssd.

I don't know anything much about github actions either. Fundamentally, it seems like it just coordinates running stuff. That could be a test, but it doesn't know anything about it.

I suppose a common choice for sane cross-platform scripting is python. It's not standard on Windows but I guess it's pretty common for even non-Python developers (like me) to have it installed. It would be easy to write a bit of Python to find and run tests, and it should be portable.

@ZornsLemma
Copy link
Collaborator

Python sounds like a pretty good plan - and as long as the standard build either doesn't run the tests or gracefully skips them if Python isn't installed, it wouldn't be the end of the world if someone didn't have it. You are probably right about how github actions work, but it does look as though Python is available (https://github.com/marketplace/actions/setup-python) so if we did have a Python test harness it could probably be integrated there fairly easily.

Thanks for taking a look at PoP, I hope it's not a tricky one...

When an undefined symbol is evaluated beebasm skips the rest of the
expression and throws an exception.  Previously, a comma was
unequivocally the end of an expression but this is not true with
multi-parameter functions.  LineParser::SkipExpression needs to keep
track of the bracket count to handle commas correctly.  This only caused
problems when an undefined symbol was surrounded with brackets (like in
a function call), and there were multiple expressions in a comma-separated
list, like in an EQUB command.
@mungre
Copy link
Collaborator Author

mungre commented Jul 3, 2021

Thanks for finding that. The hairiest change was the one to support multiple parameters in functions so that's exactly where it was. I should have looked there first.

It was failing to complete an EQUB list (so everything moved in memory) when undefined symbols were used in a function call. The commit has more details.

@ZornsLemma
Copy link
Collaborator

Nice one, with the latest code all of my projects build fine! Thanks very much.

Are you going to post on stardot about this - I'd expect a pretty enthusiastic response :-) - and ask for more testers or have you had enough of fixing bugs for one weekend? :-) (If you do post, it might be worth mentioning your branch also includes all the existing proposed-updates changes.)

I haven't looked over the code yet, I will see if I can do so over the next few days just on the off-chance there's anything dodgy that might catch my eye.

@ZornsLemma
Copy link
Collaborator

PS Don't forget to credit yourself for this work in README.md! :-)

@mungre
Copy link
Collaborator Author

mungre commented Jul 3, 2021

Thank you. Your suggestions and testing have been really helpful. It's much better than it was two days ago.

There's no rush to have a look at the code.

I will post on stardot. It seems pretty solid now...

I've committed an exe so people can try it easily. I don't like cluttering up the repo with multiple exes so I've put it in mungre/strings-exe, which can be deleted later.

@mungre
Copy link
Collaborator Author

mungre commented Jul 6, 2021

I wrote a simple test runner in python. It works on Windows and Linux. The tests are only the existing examples. There is a README explaining what it does.

@ZornsLemma
Copy link
Collaborator

Nice one! I've had a quick look at this and it seems like a pretty decent framework.

Would it be worth opening a new pull request for this just so the discussion doesn't get mixed in with the string value support? It probably doesn't matter if it's just the two of us talking about it. :-)

A few miscellaneous thoughts; I appreciate you've probably thought of some or all of these yourself and just haven't wanted to deal with them :-) :

  • It's a good idea to run the tests as part of 'make all', but it would probably be better if the output wasn't so verbose in that case - just a "tests passed" output would be nice.
  • Will "make all" fail cleanly if you don't have python3 installed? I suppose it probably doesn't matter, because you can always just do "make code", and if you asked for "all" you should probably get "all".
  • For the "failing" tests at least, it would be nice if there was an ability to give a reference file showing the expected output from beebasm. For example, the errorlinenumber* tests were (IIRC) added because beebasm was correctly failing to assemble some things, but the line number in the error message was wrong - without a reference file, merely noting that the test fails to assemble isn't really "enough". (I appreciate this isn't particularly documented anywhere, but I'm using it as an example to motivate this change more than anything else.)

@ZornsLemma
Copy link
Collaborator

PS FWIW the test harness worked fine for me on Ubuntu 20.04.2.

@ZornsLemma
Copy link
Collaborator

PPS If it's easy, it would maybe be nice to support both Python 2 and 3. But it makes perfect sense to not worry about this yet, and maybe once the test harness is "done" and no longer being modified a lot it could be tweaked to work on both.

@mungre
Copy link
Collaborator Author

mungre commented Jul 6, 2021

Discussion of testing moved here.

@ZornsLemma
Copy link
Collaborator

I've been over the diff between proposed-updates and mungre/strings, not in great depth. I have a few thoughts, which I hope (but don't guarantee!) aren't based on misconceptions:

  • README.md uses "STRINGS$" once; I think it should be "STRING$" everywhere.
  • LineParser::EvalShift{Left,Right} will use the shift operators on negative ints, which I think is undefined behaviour. Could/should we do something to make this behaviour well-defined? (What does BASIC V do?)
  • I assume it's a deliberate decision not to implement LEFT$ and RIGHT$ rather than just an accidental omission? I guess these are not strictly necessary when you have LEN$() and MID$().

@ZornsLemma
Copy link
Collaborator

Glancing over the open issues, I think this work addresses issue #33. I just gave this a quick test and macros certainly seem to support string arguments, am I right?

@mungre
Copy link
Collaborator Author

mungre commented Jul 13, 2021

Cheers, I've fixed STRINGS$ in the README.

Yes, macros support string parameters and the ASM directive handles the comment on issue 33.

Yes, shifting of signed ints is undefined for many values. Funnily enough, I'd just noticed the same problem in the binary number parser. The BASIC V shifts are described here. It has both logical and arithmetic shifts right. This seems like a reasonable thing to copy. I'll do that.

LEFT$ and RIGHT$ were skipped as unnecessary, but I might as well finish the job and add them in.

@mungre
Copy link
Collaborator Author

mungre commented Jul 13, 2021

Further to my last commit, I'm not sure what local-forward-branch-3 is testing.

I've been adding some simple tests of value parsing and noticed these things:

I expected print to write 32-bit integers cleanly but print(&7fffffff) produces 2.14748e+009. Can you see any harm in changing this (for print and STR$)?

I was surprised that &ffffffff isn't negative, but it's always been like that so it's too late to change it. However the % parsing can produce negative numbers, and quietly truncates the value to 32 bits (undefined behaviour notwithstanding). The silent truncation can go but I'm not sure about making the negative behaviour consistent with the hex parsing. What do you think?

Edit: this comment was supposed to be on the testing pr so I've repeated it there.

@ZornsLemma
Copy link
Collaborator

ZornsLemma commented Jul 13, 2021

I can't figure out what local-forward-branch-3 is testing either. I haven't tried bisecting this fully, but it does build just fine with the commit immediately before the test case was added, and with v1.08 as well. If I had to guess I'd suspect there was a problem with beebasm using abs instead of zp addressing in macros at some point (probably as a side-effect of some other change I made regarding scoped variables), but I really can't be sure.

I think we should probably remove all the local-forward-branch-* tests from the examples directory (as I did on ZornsLemma/testing). My gut feeling would be to keep local-forward-branch-3 as a test just in case, but if you want to get rid of it I won't be too upset.

(Is the old retro software forum still around? I know it's superficially present on stardot, but I was trying to find some beebasm posts I made about structured assembler way back and I couldn't find them on there, so I'm not sure it's complete. If it is about, there might well be a post from me on there about local-forward-branch-3.)

I can't see any obvious downside to changing the way print displays large integers, but of course it's hard to rule out someone using this in a way that will get broken.

I hadn't really thought about & and % generating negative numbers until now. It doesn't sound like a bad idea to make % consistent with & but of course someone could have written some code which depends on it; my gut feeling is that isn't very likely but I really don't know.

Perhaps the best suggestion I could offer would be to make it consistent and post to stardot mentioning it in passing; probably no one will care, and if they do at least you haven't sneaked an incompatible change in without mentioning it.

Edit: OK, I'll do the same. :-)

@mungre
Copy link
Collaborator Author

mungre commented Jul 13, 2021

I fixed the undefined shift behaviour, but I realised I can't do logical shift right because value>>>shift already means value>>(>shift). The arithmetic shift right is quite weird. For example &80000000>>5=-&4000000. As values to be shifted, numbers with bit 31 set are treated as negative.

@mungre mungre merged commit 0c4ef18 into proposed-updates Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants