Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Motorola 68k assembly lexer (Devpac syntax) #909

Merged
merged 4 commits into from
Aug 10, 2018

Conversation

nguillaumin
Copy link
Contributor

Added new lexer for Motorola 68000 assembly, for the Devpac syntax.
There are numerous assembler for the 68000 with slightly different
syntax. Devpac is the only one I know, but it could be used as a starting
point to implement the other variants.

It doesn't cover the "older" Devpac syntax where comments can be put
after the last operand, at the end of the line, without a separator (;
or *). This would require the lexer to know the semantics of the
language, and that was a bit too much for my first lexer 😉

I started it from the NASM lexer, but it's a bit simpler.

I initially associated it with *.asm files but that conflicts with
NASM. I also associated it with the text/x-asm mime type, but I
removed it as I'm not sure if that makes sense (why this one, and not
NASM?)

As a result there are no tests, since there's nothing to test in terms
of file/mime association.

Added new lexer for Motorola 68000 assembly, for the Devpac syntax.
There are numerous assembler for the 68000 with slightly different
syntax. Devpac is the only one I know, but it could be used as a starting
point to implement the other variants.

It doesn't cover the "older" Devpac syntax where comments can be put
after the last operand, at the end of the line, without a separator (`;`
or `*`). This would require the lexer to know the semantics of the
language, and that was a bit too much for my first lexer 😉

I started it from the NASM lexer, but it's a bit simpler.

I initially associated it with `*.asm` files but that conflicts with
NASM. I also associated it with the `text/x-asm` mime type, but I
removed it as I'm not sure if that makes sense (why this one, and not
NASM?)

As a result there are no tests, since there's nothing to test in terms
of file/mime association.
@nguillaumin
Copy link
Contributor Author

This is my first attempt and I'm not especially familiar with Ruby, so let me know if there's anything I can improve, thanks!

@dblessing
Copy link
Collaborator

Thanks for the PR @nguillaumin.

How is/isn't the Motorola 68000 assembler related to NASM? As this is currently largely a copy/paste of NASM I think we should consider how they're related and see if subclasses/delegating would be appropriate here.

Note from the README:

Please don't submit lexers that are largely copy-pasted from other files.

Happy to help you through this if you can help me understand the relationship/differences between the two.

@dblessing dblessing added the author-action The PR has been reviewed but action by the author is needed label Jun 13, 2018
@nguillaumin
Copy link
Contributor Author

nguillaumin commented Jun 14, 2018

Hi,

NASM is an assembler for the x86 platform (i.e. PCs). This syntax is for assembly of Motorola 68000 processors which is a completely different architecture. It was for example used on older computers like Atari ST and Amiga.

As such, there's not much that can be shared between the two, as they have completely different instructions and syntax.

The only thing that's common is that it's assembler so it's mostly single-line instruction oriented and lacks control structures or higher level constructs, but I don't think it would make sense to try to refactor that. It would be similar to trying to use a common base for C and Java just because they both use braces and have if and for statements, for example.

I don't believe it's largely a copy/paste of NASM, if you diff the two you'll see they differ quite significantly. Apologies if my comment was misleading, I just wanted to mention I used the NASM one as a starting point. I followed the guide which suggested starting from an existing one (which is a good advice IMHO).

I hope this helps?

Copy link
Collaborator

@dblessing dblessing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments. Thanks for your contribution @nguillaumin and for explaining. I commented on why I thought it a copy/paste.

def self.keywords_type
@keywords_type ||= Set.new %w(
dc ds dcb
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent the keywords two spaces.

@keywords_type ||= Set.new %w(
  dc ds dcb
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this throughout, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

end

state :inline_whitespace do
rule /[ \t\r]+/, Text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the space before the \t to match on a space character? If so, use \s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the semantics differ because \s will match more than just a space (It will match [ \t\r\n\f]) but to be honest I don't really understand this part and using \s works, so fine by me.

state :expr_whitespace do
rule /\n+/m, Text, :expr_bol
mixin :whitespace
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What added value does expr_whitespace have? The first rule is identical to whitespace and then it mixes in whitespace.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the rule that made me think it was a direct copy/paste of nasm. It has this state and I'm not sure why it's needed. It seems to be a paradigm that has traveled through about 4 different lexers. Unless we know why it's needed, let's leave it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, this is indeed coming from NASM and I can't say I understand it fully. I removed it.

@dblessing
Copy link
Collaborator

dblessing commented Jun 20, 2018 via email

@hanklank
Copy link

hanklank commented Jul 3, 2018

First, thanks for the job of implementing m68k-syntax, I was just about to read up on how to do it and was glad to see this. However, I think the should be called m68k just, even if it is initially only tested with one 68k-compiler (devpac). Why? Because m68k for the other compilers have almost the very same syntax in most regards, and It wouldn't make sense, that for example, I next would add an m68k-asmone, m68k-vasm. Let's try to have all the m68k-support in the same lexer. So I suggest to drop the devpac suffix. If I get the time, I'll look into adding full support for code written in some of the others.

@nguillaumin
Copy link
Contributor Author

I'm only familiar with the Devpac one and I assumed other syntaxes would be different enough to warrant a separate lexer. If we can support all syntaxes in a single lexer though, that's great! 👍

@hanklank
Copy link

hanklank commented Jul 5, 2018

I will test your highlighting on a few codebases I have in other 68k assemblers, and see what I can find , and try to suggest additions or minor changes. I noticed Vim and Atom has m68k-highlighting so one could also peek there to get some inspiration. I have never done Ruby before, but I think I understand enough of the lexers now to be able to contribute. Also, we should think about how to identify a file, maybe we could make the guessing better first give it the common extensions (s,i, asm, 68k, more?), and then guess by instructions in the code like move (nasm doesn't have this I think) etc.

Rouge only supports utf-8 though as most codebases in m68k are not (unless they are converted on github/lab etc), but I guess utf-8 only is a design decision and will never change? - Asking @jneen

@nguillaumin
Copy link
Contributor Author

I'm keen to get something merge soon-ish as I need the m68k syntax for another project. So even if we intend to make improvements later, I'll probably just remove the devpac suffix on this branch so that it can be merged as-is, as an initial version, if everyone is ok with that?

@nguillaumin
Copy link
Contributor Author

So I've pushed a commit that does that, removing the Devpac suffix. I think it's good to go (Subsequent improvements can be made on a separate PR). The PR still says "Changes requested" but I think I have addressed these? Let me know if that weren't the case, thanks!

@hanklank
Copy link

Sound good, I don't know if/when I get the time to contribute more to it, so just merge, I'm happy with the devpac suffix removed.

@hanklank
Copy link

@nguillaumin FYI, I've started working on improvements, almost done, after testing on various source codes and looking at the 68k spec. It is covering more cases. The cases with with free style comments possible to follow any opcode or operand made me have to refactor a your contribution a bit and make it more of a flow with a few states. Also, I added a test with s,i, m68k and also looking for move.(bsw) and register d0 in the source which is very characteristic of 68k, and should be good enough to identify a 68k file. I still suggest however this PR is merged first and then I'll add my contribution in a separate PR later, as not to confuse, or delay this PR even longer. @dblessing

@nguillaumin
Copy link
Contributor Author

The build is failing, but I don't think that's related to my changes from what I can tell...

@nguillaumin
Copy link
Contributor Author

Is there anything I can do to help progress this? Thanks!

@dblessing dblessing merged commit 3d2dc84 into rouge-ruby:master Aug 10, 2018
@nguillaumin
Copy link
Contributor Author

Excellent, thanks!

@pyrmont pyrmont removed the author-action The PR has been reviewed but action by the author is needed label Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants