Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format Address tables outside the binary #130

Closed
BacchusFLT opened this issue May 7, 2022 · 13 comments
Closed

Format Address tables outside the binary #130

BacchusFLT opened this issue May 7, 2022 · 13 comments
Labels
enhancement New feature or request

Comments

@BacchusFLT
Copy link

The current project I work with is a c64 paint program.
The binary is $0800 to $bfff (thereabout), It uses $2000 for the bitmap (inside the scope of the binary), $0400 is the screen memory (which is one of the colour matrixes in c64 multicolour bitmap mode) and the colour memory is at the hardcoded address of $d800. The program I am disassembling has table to point to the first address of every row of the bitmap, one table for the screen and one for the colour memory.

Making an Address table of this is perfectly fine as it is pointing to a segment of the actual binary:

Bitmap_Lo $00,$40,$80 ...
Bitmap_Hi $20,$21,$22 ...

However the two other ones will not accept this:

Screen_Lo $00,$28,$50 ...
Screen_Hi $04,$04,$04

Colour_Lo $00,$28,$50 ...
Colour_Hi $d8,$d8,$d8

I take it as they are outside the scope of the binary. The references inside the scope of the binary turn to Labels, but if outside, isn't it "just" to make them Project symbols? (I know the use of "just" is totally unfair as suggesting is so easy and implementing could be a nuance, so pardon the use of the word).

@fadden fadden added the enhancement New feature or request label May 7, 2022
@fadden
Copy link
Owner

fadden commented May 7, 2022

I'm going to address this in two parts: how external addresses could be handled by the address table formatter, and how I think you should handle this specific situation with the current implementation.


The Format Address Table feature only works with "internal" addresses. Part of the reason for this is that, if you've set it up wrong (e.g. the low/high bytes are reversed), you'll get a bunch of incorrect addresses, and clicking "OK" will create tons of junk in your project symbol table. The other part is that sometimes tables of addresses have "special" entries in them (like $0000 or $ff00) that tell the code to do something special. Creating labels for those would be wrong, and working around the "junk" entries would be annoying.

One possibility would be to have the formatter look for existing labels to match to. External locations are generally either known addresses (like ROM entry points) or memory regions with an explicit size, so having those declared in the project symbol table or in a .SYM65 file would make sense. This would allow the formatter to apply external labels to table entries while avoiding the problems associated with auto-generating external symbols.


Let's consider a chunk of memory that looks like this:

4af8: 00 28 50 78+ ColMemLo        .bulk $00,$28,$50,$78,$a0,$c8,$f0,$18,$40,$68,$90,$b8,$e0,$08,$30,$58
                                    +    $80,$a8,$d0,$f8,$20,$48,$70,$98,$c0
4b11: f8 20 48                     .bulk $f8,$20,$48
4b14: d8 d8 d8 d8+ ColMemHi        .bulk $d8,$d8,$d8,$d8,$d8,$d8,$d8,$d9,$d9,$d9,$d9,$d9,$d9,$da,$da,$da
                                    +    $da,$da,$da,$da,$db,$db,$db,$db,$db
4b2d: 0d 0e 0e                     .bulk $0d,$0e,$0e

25 sets of pointers to a contiguous range of 40-byte entries, spanning $d800-dbe7. (Followed by another 3 bytes of pointers to $0df8/0e20/$0e48, which I'm ignoring.) Start by creating a project symbol "ColorMemArea" at $d800 with a width of 25*40=1000 (hit F6, click New Symbol):

bug130-1

Save that and go back to the code. Select the ColMemLo line, then Edit Operand. Set the format to Single Bytes, and in the Symbolic Reference field type "ColorMemArea". Make sure the "Low" radio button is selected.

bug130-2

Do the same on the ColMemHi line, but this time select the "High" radio button.

You should end up with:

4af8: 00           ColMemLo        .dd1  <ColorMemArea
4af9: 28                           .dd1  <ColorMemArea+40
4afa: 50                           .dd1  <ColorMemArea+80
4afb: 78                           .dd1  <ColorMemArea+120
4afc: a0                           .dd1  <ColorMemArea+160
[...]
4b10: c0                           .dd1  <ColorMemArea+192
4b11: f8 20 48                     .bulk $f8,$20,$48
4b14: d8           ColMemHi        .dd1  >ColorMemArea
4b15: d8                           .dd1  >ColorMemArea
4b16: d8                           .dd1  >ColorMemArea
4b17: d8                           .dd1  >ColorMemArea
4b18: d8                           .dd1  >ColorMemArea
4b19: d8                           .dd1  >ColorMemArea
4b1a: d8                           .dd1  >ColorMemArea
4b1b: d9                           .dd1  >ColorMemArea+1
[...]

(The default display settings will show that last line as (>ColorMemArea)+1. If you prefer the cc65 syntax shown above, Edit > Settings > Display Format, change Expression Style to cc65, or just use the Quick Set at the bottom to configure the values in that tab for cc65 format. The difference is because the operator precedence rules in cc65 allow the expression to be written without parenthesis, and SourceGen tries to output the simplest form possible.)

FWIW, you don't actually need to specify a width on the ColorMemArea to make this work, because you're setting the symbol explicitly, but if you format a pointer somewhere else as an address then having the range would allow it to be resolved automatically. If Format Address Table were changed to look for external addresses, it would also use the range to know that it should use that symbol for all entries.

Use the same approach for the screen pointers... define the region (addr=$0400, len=1000), then format the pointer table bytes as low/high parts of a pointer to a single symbol.

If you actually want separate symbols for each address, you will have to do a lot of typing. You can create the symbols pretty quickly in a .SYM65 file with a text editor, but you'd have to reference them manually on every byte with the data operand editor. If Format Address Table matched against existing external address symbols, you'd still have to do the first (easy) part, but the second (painful) part would be automated.

@BacchusFLT
Copy link
Author

BacchusFLT commented May 8, 2022 via email

@fadden
Copy link
Owner

fadden commented May 8, 2022

Whereas the high byte
is properly incremented in the HighByte table, the lowbytes are always
referencing the first page. Item 8 says ScreenMem+24, but the correct
reference is ScreenMem+280.

If you try to write ".byte <label+280", cc65 will report Error: Range error because it's not an 8-bit value. You have to write ".byte <(label+280)", which seems kind of ugly, though this varies by assembler. In any event, the disassembler doesn't remember that it's part of a 16-bit value, so it doesn't have the option of doing it the other way.

Auto-generated project symbols could be a thing. You'd still edit them as project symbols, but doing so could "promote" them to regular project symbols, the same way it works when you edit an auto-generated label. Some amount of un-formatting might be needed to clean up situations where a project symbol wasn't actually wanted.

I don't think they'd help here though. You'll notice that the labels generated by the table formatter are "Txxx" rather than "Lxxx"; that's because they're permanent "user" labels rather than auto-labels. This is necessary because the disassembler doesn't remember that two bytes in two different places are meant to be combined to form a 16-bit address... the structure of the table isn't stored. (This is why you can't have FOO+$118 instead of FOO+$18... you need to know which high byte the $18 is supposed to be paired with, and the table formatter doesn't record this relationship.) Without that knowledge, we aren't able to auto-generate symbols, and so would need to generate permanent project symbols for each table entry. The situation could be different if the table were formed from a series of 16-bit values, since we could just mark those as address references, but I don't want to have different behavior in different situations.

So, to really be able to use auto-generated project symbols, we'd need to record the address table's structure in the disassembly. This can get messy if the user wants to edit parts of the table to be different, e.g. some entries have special values rather than address parts. Things are significantly simpler if table formatting is a one-time operation rather than one step in layers of changes that must be applied every time (like "flattening" changes in a photo editor vs. keeping multiple layers).

I think, to get the kind of symbols you want, we'd need to add a "generate external address labels" checkbox that would populate the project symbol table with symbols (maybe "ETxxx").


the OK button is there but pressing OK nothing happens

It looks like you sent an image but github ate it... I think you need to drag & drop images into the issue reporting mechanism to include them, e-mailing them doesn't work. However, I think I see the problem.

There's a button in the bottom-right corner of the SourceGen main window that says something like "40 messages (40 warning/error)". Click that if you're not already seeing the message window. In my copy of your project, the first few messages tell you that the labels you've generated are hidden and being ignored.

bug130-3

If you double-click the first message, it'll take you to this line:

0a00: 00 00 00 00+ L0A00           .ds   1728

The labels don't appear because there's a large explicitly-defined region that spans them. If you create labels and then define a region, the Data Operand Editor will carefully break the region into smaller pieces so the labels don't get swallowed. The Address Table Formatter works the other way around, though, so it's possible to create labels inside formatted regions. (It really ought to give you a less subtle warning when this happens.)

Select the line at $0a00 and use the operand editor to change the format back to Default. Your labels will appear and some of the warnings will vanish.

The other warnings are mostly about formatting that appears in the middle of instructions. It looks like some things that used to be formatted as single-byte data items turned into code. You can clean these up by double-clicking the warning and then using Actions > Remove Formatting on the line in question. (It also removes hidden labels, so don't use it on $0a00 or you'll clear the labels embedded in the .ds ... https://6502bench.com/sgmanual/mainwin.html#remove-formatting .)

@BacchusFLT
Copy link
Author

BacchusFLT commented May 9, 2022 via email

@fadden
Copy link
Owner

fadden commented May 10, 2022

Summing up the proposed changes to the Format Address Table feature:

  • Look for matches with platform/project symbols for external addresses. Show these in the preview, and generate symbolic references to them when the table is formatted.
  • Add another checkbox (yikes!) that causes new project symbols ("ETxxxx") to be generated for external addresses that don't have an associated symbol.
  • Show a warning (probably in the label preview window) when the labels we're about to generate would end up hidden inside an existing multi-byte data area. Maybe don't generate those labels?
    • Need to explore how this interacts with the "tag as code start point" feature.

Edit: for external symbol references to work without being annoying we'd also want project symbol renames to refactor all existing references. (I think this is already on the to-do list.) Remember to check for overlapping user labels.

@BacchusFLT
Copy link
Author

BacchusFLT commented May 10, 2022 via email

@BacchusFLT
Copy link
Author

BacchusFLT commented May 10, 2022 via email

@fadden
Copy link
Owner

fadden commented May 10, 2022

The last one would mean that if I have defined "ColourMemory" at $d800, and
being 1000 bytes big, you would warn if I wanted to have a reference to
that 1000 byte space?

If you defined a project/platform symbol for an external address at $d800 with width 1000, the formatter would generate an appropriate symbolic reference (e.g. "ColourMemory+40"). If the program itself spanned $d800 and you created a .bulk that covered 1000 bytes, you'd get a warning, because you can't tell the assembler that you have a single 1000-byte item and also define a label in the middle of it. (The alternative would be to have the address formatter try to split up the .bulk declaration, but that gets messy in a hurry, e.g. what if the target is the middle of a 3-byte instruction.) If the area is covered by an auto-generated .string or .fill, rather than an explicit value, the byte scanner will automatically split the region around the label, like it does when you change the area at $0a00 back to Default format.

SourceGen tries really hard to prevent you from doing confusing or conflicting things, like formatting only the middle byte of an instruction or putting a label on an address in the middle of a string, but there are ways to do it. (Even if the UI were bulletproof, the project files are JSON and can be edited by hand.) The philosophy for dealing with these situations is to leave them in place but warn you that something you told SourceGen to do is being modified or ignored.

Suggestion: The size of the error segment seems fixed. It would be
desirable to be able to select size.

I don't understand what you mean. Are you referring to the window size?

@BacchusFLT
Copy link
Author

BacchusFLT commented May 10, 2022 via email

@fadden
Copy link
Owner

fadden commented May 10, 2022

I get a sense of the complexity, but I don't really understand that
splitting a string would be any different than splitting a bulk segment.

Suppose SourceGen loads a file that says there is a formatted string at $1000 that is 50 bytes long, and also says that there's a label at $1010. SourceGen can:

  1. Ignore the label, and format the string like it was told to do.
  2. Ignore the string directive, and output the label like it was told to do.
  3. Do fancy things to resolve it into two separate strings. It was not told to do this.

My approach has been to do 1 or 2, and let you resolve the conflict, rather than try to resolve it in a way that you might not expect. Conflicts can become arbitrarily complex -- imagine there are 3 or 4 overlapping directives -- and the odds of SourceGen doing something undesirable are fairly high for all but the simplest cases.

The "graceful splitting" mechanism is performed by the editor at the time the edit is made. If you highlight 50 bytes with a label in the middle and open the data operand editor, at the top it will say "50 bytes in two regions". When you format it as a bulk region, it splits it at the label, generating two independent .bulk directives.

(When a region is set to default format, the data scanner looks for runs of identical bytes and character strings, and generates .fill/.str pseudo-ops. These are discarded and regenerated when you add or remove a label. This is why setting the bytes in the region at $0a00 to "default" format allows the labels to appear... the data analyzer re-scans everything, and breaks things up when labels are found. The auto-generated formatting isn't even stored in the project file.)

Poking in the middle of a string is something I see a lot of.

In the normal case, you want a single string without a label in the middle, e.g.:

          lda  MyString
          lda  MyString+8
MyString  .str "this is a nice string"

If you want to provide an explicit label to the assembler, the string has to be split into separate lines:

          lda  MyString
          lda  SubString
MyString  .str "this is "
SubString .str "a nice string"

If the project file has both the string format and the label, SourceGen can't know which was wanted, so when it sees a label in the middle of a string it tells you there's a conflict and asks you to resolve it. The default behavior is to ignore the label, because that's nicer than ignoring the string directive.

In source code you could write something like SubString = MyString+8, giving you a nice symbol without breaking up the string. SourceGen doesn't provide a way to define arbitrary equates.

If I wanted an overview of the 35 issues I had, I could see five of
them as the view is static.

I can't remember if there was an issue with the layout that caused me to omit the resize or if I just didn't bother. There's an argument for splitting this into a stand-alone window with "fix it" buttons that remove formatting and hidden labels. (I think it's too cramped for buttons in its current layout unless your monitor is fairly wide.) In any event, the goal should be to get to zero warnings so the window disappears entirely.

Update: according to comments in MainWindow.xaml, I was having trouble making the WPF UI work correctly. The splitter behaved strangely when the window was hidden and un-hidden.

@BacchusFLT
Copy link
Author

BacchusFLT commented May 11, 2022 via email

@fadden
Copy link
Owner

fadden commented May 11, 2022

Is there a way for me to know the discarded labels?

For hidden labels, the symbol is shown in the "Context" column of the warning message.

@fadden
Copy link
Owner

fadden commented May 18, 2022

Action items moved to https://github.com/fadden/6502bench/wiki/TO-DO-List .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants