Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symbol output #16

Closed
milanvidakovic opened this issue Sep 23, 2019 · 10 comments
Closed

Add symbol output #16

milanvidakovic opened this issue Sep 23, 2019 · 10 comments

Comments

@milanvidakovic
Copy link

Hi, Thanks for the great product! I have been using it for couple of years.
I think that you don't have the following feature: symbol list. This means that when you assemble the output file, you could have an additional option, for example: -sym symbols.txt
That option would generate the symbol list (symbols.txt file) containing addresses of all labels in the output assembled file. Something like this:
0x00000000 start:
0x000000fa main:
etc.
That feature would help me implement debugger that would show labels instead of addresses.
I hope I was clear enough for you.
Best regards,
Milan

@hlorenzi
Copy link
Owner

I'm so glad you're finding it useful! As I understand it, you're using it for teaching at a university? I'd love to know more details!

A symbol file output was definitely in the plans. Do you have a particular need for that syntax? I was thinking something like this:

start = 0x0
main = 0xfa
main.sublabel = 0x2abc

...which would mirror the source code syntax (except that sublabels have to be written in full), and should still be easy for other people to write a regex to extract values.

Also, could you please check the LogiSim output formats? I've tried reading LogiSim's documentation online, and adapting the code from your fork, but since I've never used that app, I'm not sure if the outputs are correct.

@milanvidakovic
Copy link
Author

Hi, yes, I am using it on two demonstration projects for my students: 1) computer made in LogiSim (yes, your LogiSim formats are correct) and 2) FPGA-based computer (this one would use the symbols if/when you make them). My students were impressed to see custom assembler for both platforms. Thanks again for that!
Regarding the symbols format, your proposition is correct. I don't have any need for any syntax - your syntax is perfectly usable for me.

@hlorenzi
Copy link
Owner

hlorenzi commented Oct 3, 2019

I've added new command line options -s and --symbol for this!
Currently it outputs address labels as well as any variables you defined (with =). Let me know if this is a problem.

@milanvidakovic
Copy link
Author

Thank you! I will start on the debugger part of my emulator to introduce symbols. You have made an excellent software!

@milanvidakovic
Copy link
Author

milanvidakovic commented Oct 5, 2019

Hi,
I have just implemented symbols in my FPGA emulator and it works (almost) perfect! Thanks again for this great assembler!

However, I have encountered a funny problem: sometimes in my assembler code, I just enter the number, not the label:
mov.w r0, 25
However, in the same code, I have a variable (VK_P, for example) which has the value of 25, so my debugger makes a wrong translation:
mov.w r0, VK_P

Is it possible to extend your symbol file with the list of addresses of the exact usage of that symbol?

If there is a list of addresses where the VK_P symbol was actually used, then my debugger would know not to put the VK_P symbol in the line where it was not used.

For example:
draw_next_line = 0xb946 (0xb230, 0xb2a4)

The example above would mean that the symbol draw_next_line has the actual value of 0xb946, and it was used at the addresses: 0xb230 and 0xb2a4. Those addresses could be the actual addresses in the assembled code where the label draw_next_line existed and was used.

I don't know if I was clear enough, so please ask me more details about this feature request.

@hlorenzi
Copy link
Owner

hlorenzi commented Oct 8, 2019

Yeah, I had the feeling this would be a problem... I'm not exactly sure how to solve this right now.

My first instinct was to make the symbol output differentiate between address labels and variables, like so:

; labels
draw_next_line = 0xb946
some_other_label = 0x8080

; variables
VK_P = 0x19
some_var = 0x55

...but you'd probably want the debugger to show variable names as well as address labels, when you actually use them. So this distinction might not be very useful in your case.

About your solution involving usage listings, I think it might not completely solve the issue in the general case? Imagine I had some kind of instruction like add r0, 25, VK_P -- you wouldn't be able to tell which 25 was the one that actually used a label. That said, this might be the best solution so far.

Another idea is to use the annotated output format, which lists addresses, bytes, and the source code excerpts that generated them. This annotated file should be easy to parse, too. What do you think?

@hlorenzi hlorenzi reopened this Oct 8, 2019
@hlorenzi hlorenzi changed the title Feature request Add symbol output Oct 8, 2019
@milanvidakovic
Copy link
Author

milanvidakovic commented Oct 8, 2019

Hi,
here are my thoughts:
Some variables and labels can occur several times in the executable. For instance:

VIDEO_A = VIDEO + 15*160
VIDEO_C = VIDEO_A + 1
mov.w r1, hello  		; r1 holds the address of the "HELLO WORLD" string
mov.w r2, VIDEO_C		; r2 points to the character part of the video memory
mov.w r4, VIDEO_A		; r4 points to the attribute part of the video memory
mov.w r5, VIDEO_A		; r5 points to the attribute part of the video memory
...
hello: 
#str "Hello\0"

The resulting executable would look like this:
location: content

0xB014: 01 C0 00 00 B0 7C	; mov.w r1, hello (0x0000B07C)
0xB01a: 02 C0 00 00 0D 61	; mov.w r2, VIDEO_C  (0x00000D61)
0xB020: 04 C0 00 00 0D 60	; mov.w r4, VIDEO_A  (0x00000D60)
0xB026: 05 C0 00 00 0D 60	; mov.w r5, VIDEO_A  (0x00000D60)
...
0xB07C: 48 65 6C 6C 6F 00	; label hello: "Hello\0"

01 C0, 02 C0, 04 C0 and 05 C0 are opcodes for the mov.w rX, number instructions

Now, the symbols are:
variables:

VIDEO = 0x400
VIDEO_A = 0xd60
VIDEO_C = 0xd61

labels:
hello = 0xb07c

The occurrences of symbols in the executable are at the following addresses:

hello = 0xb16				; it goes from 0xb16 to 0xb19 (four bytes, 32-bits)
VIDEO_A = 0xb022, 0xb028		; VIDEO_A is used in two mov.w instructions
VIDEO_C = 0xb01c			; VIDEO_C is used in one mov.w instruction

So, whenever your assembler replaces a label or variable with the actual number, it would be nice if it would write down the exact address where that symbol was used some way similar, or equal to the one written above.

Is it possible for your code to write down the exact address of the occurrence of a symbol? For example, in the code above, the mov.w r4, VIDEO_A instruction is written at the 0xB020 address, having the following bytes starting at that address: 04 C0 00 00 0D 60 .
So, at the 0xB022 starts the actual number: 00 00 0D 60. You can see that in my proposition:
VIDEO_A = 0xb022, 0xb028 ; VIDEO_A is used in two mov.w instructions, at two memory locations.
Regarding your example:
add r0, 25, VK_P
that also should not be the problem, since the VK_P would be stored in memory at a certain address, just like in my examples above. I don't have that particular instruction, but, let's suppose that I have it:

VK_P = 0x51
add r0, 25, VK_P

The executable would be like this:
0xB000: 00 50 00 00 00 19 00 00 00 51
00 50 would be the add opcode, 00 00 00 19 would be the number 25, and 00 00 00 51 would be VK_P.
The VK_P symbol would appear at the 0xB006.

Huh, so many words. I hope that I was at least a bit clear about this topic.

@hlorenzi
Copy link
Owner

hlorenzi commented Jan 9, 2020

I've been thinking a lot about this, but I still haven't got the time to work on a solution. The crux of the problem seems to be keeping track of named variables as they go through arbitrarily-complex expressions in the body of an instruction definition (since those can even be blocks of code with multiple expressions in sequence). Perhaps something can be done for the simpler cases of a single expression with clean variable usage.

@milanvidakovic
Copy link
Author

Thanks for still thinking about this. I agree that the simplest cases should be done. That certainly works for me.

@hlorenzi
Copy link
Owner

hlorenzi commented May 3, 2023

I'll close this since there's now an annotated output format which should do more-or-less what you're describing! Feel free to open this issue again if you still need help!

@hlorenzi hlorenzi closed this as completed May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants