Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BASIC Thomson tokenizer #5479

Merged
merged 3 commits into from
Aug 13, 2019

Conversation

hadess
Copy link
Contributor

@hadess hadess commented Aug 13, 2019

Based on the code in filtbas.cpp and @pulkomandy's bastok.cpp for some info about the format.

Note that while I can write and read a BASIC file, I've not fully checked whether each token is correctly selected (I think there's a problem with GO vs. GOSUB).

This command should work:
imgtool get thom_fd inondation-d-additions.fd INONDATI.BAS TEST.BAS --filter=thombas7
as it matches the expected syntax:
Usage: imgtool get <format> <imagename> <filename> [newname] [--filter=filter] [--fork=fork]
but does not because imgtool fewer "maxargs".

Increase the maximum number of arguments by 2 to cater for --filter and
--fork being passed.
The line:
10 LIMIT$=STR$(LIMIT(N))
was not getting tokenised properly because the loop looking for tokens
wasn't exited and consumed consecutive tokens.

So $ was getting detected, token_shift and token_value were set, the
cursor position got incremented, then = got detected on the next
iteration of the loop.

We should instead exit the loop, and write what we already have.

Closes: mamedev#5478
@pulkomandy
Copy link

My reverse engineering efforts are based on some tool for CoCo BASIC and tested only on the one or two BASIC files I needed to view, so use with caution. Also there are several BASIC versions for Thomson machines and they may use different tokens.

IIRC, GOSUB is encoded as GO and SUB and GOTO is encoded as GO and TO (and the same TO is used in FOR loops as well). In otherwords, there isn't a 1:1 match between tokens and keywords, but rather each token is substituted with a string and that forms the whole keywords (but I don't know how the thing is actually implemented by the ROM).

@hadess
Copy link
Contributor Author

hadess commented Aug 13, 2019

My reverse engineering efforts are based on some tool for CoCo BASIC and tested only on the one or two BASIC files I needed to view, so use with caution. Also there are several BASIC versions for Thomson machines and they may use different tokens.

I understand, and I only used it as another source for understanding the file format. I was also thrown by one of the comments that said the 6809 was little endian ;)

IIRC, GOSUB is encoded as GO and SUB and GOTO is encoded as GO and TO (and the same TO is used in FOR loops as well). In otherwords, there isn't a 1:1 match between tokens and keywords, but rather each token is substituted with a string and that forms the whole keywords (but I don't know how the thing is actually implemented by the ROM).

Yes! I realised that afterwards :)

My test was writing a rather large/non-trivial BASIC file to a floppy image, verifying that it could be read in an emulator, and reading it again, comparing the original file, with the tokenised then de-tokenised sources. It seems to work, and I'd be happy doing some bug fixing in that area.

@rb6502 rb6502 merged commit 82644e5 into mamedev:master Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants