Skip to content

Commit

Permalink
Fix un-rendered tags (#106)
Browse files Browse the repository at this point in the history
* <sup>, <sub>, <kbd> and <u>
  • Loading branch information
copyrat90 committed Apr 23, 2024
1 parent 414ee68 commit a44c1ea
Show file tree
Hide file tree
Showing 13 changed files with 30 additions and 30 deletions.
2 changes: 1 addition & 1 deletion content/affine.md
Original file line number Diff line number Diff line change
Expand Up @@ -630,7 +630,7 @@ INLINE s32 lu_cos(uint theta)
{ return sin_lut[((theta>>7)+128)&0x1FF]; }
```
Now, note the angle range: 0-10000h. Remember you don't *have* to use 360 degrees for a circle; in fact, on computers it's better to divide the circle in a power of two instead. In this case, the angle is in 2^16^ parts for compatibility with BIOS functions, which is brought down to a 512 range inside the look-up functions.
Now, note the angle range: 0-10000h. Remember you don't *have* to use 360 degrees for a circle; in fact, on computers it's better to divide the circle in a power of two instead. In this case, the angle is in 2<sup>16</sup> parts for compatibility with BIOS functions, which is brought down to a 512 range inside the look-up functions.
### Initialization
Expand Down
8 changes: 4 additions & 4 deletions content/asm.md
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,7 @@ It is also possible to load/store bytes and halfwords. The opcodes for loads are

All the things you can do with `ldr/str`, you can do with the byte and halfword versions as well: PC-relative, indirect, pre/post-indexing it's all there … with one exception. The signed-byte load (`ldsb`) and _all_ of the halfword loads and stores cannot do shifted register-loads. Only `ldrb` has the complete functionality of the word instructions. The consequence is that signed-byte or halfword arrays may require extra instructions to keep the offset and index in check.

Oh, one more thing: alignment. In C, you could rely on the compiler to align variables to their preferred boundaries. Now that you're taking over from the compiler, it stands to reason that you're also in charge of alignment. This can be done with the ‘.align _n_’ directive, with aligns the next piece of code or data to a 2^n^ boundary. Actually, you're supposed to properly align code as well, something I'm taking for granted in these snippets because it makes things easier.
Oh, one more thing: alignment. In C, you could rely on the compiler to align variables to their preferred boundaries. Now that you're taking over from the compiler, it stands to reason that you're also in charge of alignment. This can be done with the ‘.align _n_’ directive, with aligns the next piece of code or data to a 2<sup>n</sup> boundary. Actually, you're supposed to properly align code as well, something I'm taking for granted in these snippets because it makes things easier.

<pre><code class="language-armasm hljs"> mov r2, #1
@ Byte loads
Expand Down Expand Up @@ -1032,7 +1032,7 @@ Bit-operations like `orr` or `and` don't affect it because they operate purely o

You may find it odd that `-cc` is the code for unsigned higher than. As mentioned, a comparison is essentially a subtraction, but when you subtract, say 7−1, there doesn't really seem to be a carry here. The key here is that subtractions are infact forms of additions: 7−1 is actually 7+0xFFFFFFFF, which would cause an overflow into the carry bit. You can also thing of subtractions as starting out with the carry bit set.

The overflow flag indicates _signed_ overflow (the carry bit would be unsigned overflow). Note, this is _not_ merely a sign change, but a sign change the wrong way. For example, an addition of two positive numbers _should_ always be positive, but if the numbers are big enough (say, 2^30^, see {@tbl:overflow}) then the results of the lower 30 bits may overflow into bit 31, therefore changing the sign and you'll have an incorrect addition. For subtraction, there can be a similar problem. Short of doing the full operation and checking whether the signs are correct, there isn't a simple way of figuring out what counts as overflow, but fortunately you don't have to. Usually overflow is only important for signed comparisons, and the condition mnemonics themselves should provide you with enough information to pick the right one.
The overflow flag indicates _signed_ overflow (the carry bit would be unsigned overflow). Note, this is _not_ merely a sign change, but a sign change the wrong way. For example, an addition of two positive numbers _should_ always be positive, but if the numbers are big enough (say, 2<sup>30</sup>, see {@tbl:overflow}) then the results of the lower 30 bits may overflow into bit 31, therefore changing the sign and you'll have an incorrect addition. For subtraction, there can be a similar problem. Short of doing the full operation and checking whether the signs are correct, there isn't a simple way of figuring out what counts as overflow, but fortunately you don't have to. Usually overflow is only important for signed comparisons, and the condition mnemonics themselves should provide you with enough information to pick the right one.

<div class="lblock">
<table id="tbl:overflow">
Expand Down Expand Up @@ -1098,7 +1098,7 @@ DivSafe:
bx lr
```

The numerator and denominator will be in registers r0 and r1, respectively. The `cmp` checks whether the denominator is zero. If it's not, no branch is taken, the swi 6 is executed and the function returns afterwards. If it is zero, the `beq` will take the code to `.Ldiv_bad`. The two instructions there set r0 to either INT_MAX (2^31^−1 = 0x7FFFFFFF) or INT_MIN (−2^31^ = 0x80000000), depending on whether r0 is positive or negative. If it's a little hard to see that, `mvn` inverts bits, so the first line after `.Ldiv_bad` sets r0 to INT_MAX. The second line we've seen before: ‘`r0, asr #31`’ does a sign-extension in to all other bits, giving 0 or −1 for positive and negative numbers, respectively, giving INT_MAX− −1 = INT_MIN for negative values of r0. Little optimizing tricks like these decide if you're fit to be an assembly programmer; if not you could just as well let the compiler do them, because it does know. (It's where I got the ‘`asr #31`’ thing from in the first place.)
The numerator and denominator will be in registers r0 and r1, respectively. The `cmp` checks whether the denominator is zero. If it's not, no branch is taken, the swi 6 is executed and the function returns afterwards. If it is zero, the `beq` will take the code to `.Ldiv_bad`. The two instructions there set r0 to either INT_MAX (2<sup>31</sup>−1 = 0x7FFFFFFF) or INT_MIN (−2<sup>31</sup> = 0x80000000), depending on whether r0 is positive or negative. If it's a little hard to see that, `mvn` inverts bits, so the first line after `.Ldiv_bad` sets r0 to INT_MAX. The second line we've seen before: ‘`r0, asr #31`’ does a sign-extension in to all other bits, giving 0 or −1 for positive and negative numbers, respectively, giving INT_MAX− −1 = INT_MIN for negative values of r0. Little optimizing tricks like these decide if you're fit to be an assembly programmer; if not you could just as well let the compiler do them, because it does know. (It's where I got the ‘`asr #31`’ thing from in the first place.)

Now in this case I used a branch, but in truth, it wasn't even necessary. The non-branch part consists of one instruction, and the branched part of two, so using conditional instructions throughout would have been both shorter and faster:

Expand Down Expand Up @@ -1834,7 +1834,7 @@ The directives you'd use for data will generally tell you what the datatypes are

A very important and sneaky issue is alignment. **You** are responsible for aligning code and data, not the assembler. In C, the compiler did this for you and the only times you might have had problems was with [alignment mismatches](bitmaps.html#ssec-data-align) when casting, but here both code _and_ data can be misaligned; in assembly, the assembler just strings your code and data together as it finds it, so as soon as you start using anything other than words you have the possibility of mis-alignments.

Fortunately, alignment is very easy to do: ‘`.align `_`n`_’ aligns to the next 2^n^ byte boundary and if you don't like the fact that _n_ is a power here, you can also use ‘`.balign `_`m`_’, which aligns to _m_ bytes. These will update the current location so that the next item of business is properly aligned. Yes, it applies to the _next_ item of code/data; it is not a global setting, so if you intend to have mixed data-sizes, be prepared to align things often.
Fortunately, alignment is very easy to do: ‘`.align `_`n`_’ aligns to the next 2<sup>n</sup> byte boundary and if you don't like the fact that _n_ is a power here, you can also use ‘`.balign `_`m`_’, which aligns to _m_ bytes. These will update the current location so that the next item of business is properly aligned. Yes, it applies to the _next_ item of code/data; it is not a global setting, so if you intend to have mixed data-sizes, be prepared to align things often.

Here are a few examples of how these things would work in practice. Consider it standard boilerplate material for the creation and use of symbols.

Expand Down
6 changes: 3 additions & 3 deletions content/bitmaps.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ In {@fig:link-sm} you can find a bitmap of one of the game characters that made

A bitmap is little more than a *w*×*h* matrix of colors (or color-indices), where *w* is the number of columns (the width) and *h* the number of rows (the height). A particular pixel can be referred to with a coordinate pair: (*x**y*). By the way, the y-axis of the GBA points *down*, not up. So pixel (0, 0) is in the top-left corner. In memory, the lines of the bitmap are laid out sequentially, so that the following rule holds: in a *w×h* bitmap, the pixel (*x, y*) is the (*w×y + x*)-th pixel. This is true for all C matrices, by the way.

{*@fig:link-big} shows how this works. This is a *w*=24 by *h*=24 bitmap, at 8bpp (8 <span class="underline">B</span>its <span class="underline">P</span>er <span class="underline">P</span>ixel (=1 byte)). The numbers in yellow indicate the memory locations; you can count them for yourself if you don't believe me. The first pixel, (0, 0), can be found at location 0. The *last* pixel of the *first* row (23, 0) is at *w*−1 (=23 in this case). The first pixel of the second row (0, 1) is at *w* (=24) etc, etc, till the last pixel at *w×h*−1.
{*@fig:link-big} shows how this works. This is a *w*=24 by *h*=24 bitmap, at 8bpp (8 <u>B</u>its <u>P</u>er <u>P</u>ixel (=1 byte)). The numbers in yellow indicate the memory locations; you can count them for yourself if you don't believe me. The first pixel, (0, 0), can be found at location 0. The *last* pixel of the *first* row (23, 0) is at *w*−1 (=23 in this case). The first pixel of the second row (0, 1) is at *w* (=24) etc, etc, till the last pixel at *w×h*−1.

<div class="cblock">
<table id="fig:link-big">
Expand Down Expand Up @@ -835,7 +835,7 @@ Potential problems during compilation or linking:

Data alignment is about the ‘natural’ memory addresses of variables. It is often beneficial to have a variable of a certain length to start at an address divisible by that length. For example, a 32-bit variable likes to be put at addresses that are a multiple of 4. Processors themselves also have certain preferred alignments. Addressing will work faster if you stick to their native types and alignment (say, 32-bit everything for 32-bit CPUs). For PCs it is not required to do any of this, it'll just run slower. For RISC systems, however, things *must* be aligned properly or data gets mangled.

In most cases, the compiler will align things for you. It will put all halfwords on even boundaries and words on quad-byte boundaries. As long as you stick to the normal programming rules, you can remain completely oblivious to this alignment stuff. Except that you *won't* always stick to the rules. In fact, C is a language that allows you to break the rules whenever you feel like it. It trusts you to know what you're doing. Whether that trust is always justified is another matter <span class="kbd">:P</span>
In most cases, the compiler will align things for you. It will put all halfwords on even boundaries and words on quad-byte boundaries. As long as you stick to the normal programming rules, you can remain completely oblivious to this alignment stuff. Except that you *won't* always stick to the rules. In fact, C is a language that allows you to break the rules whenever you feel like it. It trusts you to know what you're doing. Whether that trust is always justified is another matter <kbd>:P</kbd>

The best example of breaking the rules is pointer casting. For example, most graphics converters will output the data as `u16` arrays, so you can copy it to VRAM with a simple `for` loop. You can speed up copying by roughly 160% if you copy by words (32-bit) rather than halfwords (16-bit). Run the *[txt_se2](text.html#ssec-demo-se2)* demo and see for yourself. All you have to do for this is one or two pointer casts, as shown here.

Expand Down Expand Up @@ -1045,4 +1045,4 @@ This chapter also discussed a few things about handling data, a very important t

Before continuing with further chapters, this may be a good time to do some experimenting with data: try changing the data arrays and see what happens. Look at the different data interpretations, different casts, and maybe some intentional errors as well, just to see what kinds of problems you might face at some point. It's better to make mistakes early, while programs are still short and simple and you have less potential problems.

Or not, of course <span class="kbd">:P</span>. Maybe it's worth waiting a little longer with that; or at least until we've covered basic input, which allows for much more interesting things than just passive images.
Or not, of course <kbd>:P</kbd>. Maybe it's worth waiting a little longer with that; or at least until we've covered basic input, which allows for much more interesting things than just passive images.
2 changes: 1 addition & 1 deletion content/edmake.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Be sure that the devkitARM and msys bin directories are in the system path, or c

I never really knew about PN until it started coming with devkitARM, but it looks really good. I haven't used it that much myself, but only because I am still content with context. That said, PN is probably the better editor, and as it may come with the toolchain, chances are you'll have it already.

For all its benefits, I should say this though: by default, it seems to ignore the desktop color scheme. This may not sound like a big deal, but because the background color defaulted to a hard white, I literally couldn't even look at the thing for more than a minute. When I first tried to fix this in the options, it seemed that you could only change this on a type-by-type basis instead of globally. Took me a while to figure out I'd been looking in the wrong place <span class="kbd">:P</span> all along. Look under Tools-\>Options-\>Styles, not under Tools-\>Options-\>Schemes.
For all its benefits, I should say this though: by default, it seems to ignore the desktop color scheme. This may not sound like a big deal, but because the background color defaulted to a hard white, I literally couldn't even look at the thing for more than a minute. When I first tried to fix this in the options, it seemed that you could only change this on a type-by-type basis instead of globally. Took me a while to figure out I'd been looking in the wrong place <kbd>:P</kbd> all along. Look under Tools-\>Options-\>Styles, not under Tools-\>Options-\>Schemes.

To add commands for makefiles, go to Tools-\>Options-\>Tools (@fig:pn-make), and select the ‘Make’. Then add 2 commands for ‘make build’ and ‘make clean’

Expand Down
2 changes: 1 addition & 1 deletion content/fixed.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ This is actually a subset of the scaling problems of multiplication and division

One way of covering for the extra scale is not to correct after the multiplication, but before it; though you will lose some accuracy in the process. A good compromise would be to right-shift both operands by half the full shift.

Fixed divisions have a similar problem called underflow. As a simple example of this, consider what happens in integers division *a*/*b* if *b*\>*a*. That's right: the result would be zero, even though a fraction would be what you would like. To remedy this behaviour, the numerator is scaled up by *M* first (which may or may not lead to an overflow problem <span class="kbd">:P</span>).
Fixed divisions have a similar problem called underflow. As a simple example of this, consider what happens in integers division *a*/*b* if *b*\>*a*. That's right: the result would be zero, even though a fraction would be what you would like. To remedy this behaviour, the numerator is scaled up by *M* first (which may or may not lead to an overflow problem <kbd>:P</kbd>).

As you can see, the principles of fixed-point math aren't that difficult or magical. But you do have to keep your head: a missed or misplaced shift and the whole thing crumbles. If you're working on a new algorithm, consider doing it with floats first (preferably on a PC), and convert to fixed-point only when you're sure the algorithm itself works.

Expand Down
4 changes: 2 additions & 2 deletions content/lab.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,9 @@ void foo()

Note that I intend the routine to be in IWRAM (and compiled as ARM code) because it's so **very f%#\$@\*g slow**! Or perhaps I shouldn't say slow, just costly.

Think of how a basic sort works. You have *N* elements to sort. In principle, each of these has to be checked with every other element, so that the routine's speed is proportional to *N*^2^, usually expressed as *O*(*N*^2^), where the *O* stands for order of magnitude. For sorting, *O*(*N*^2^) is bad. For example, when *N*=128, you would be looking at 16k checks. Times the number of cycles that the actual checks and updates would take. Not pleasant.
Think of how a basic sort works. You have *N* elements to sort. In principle, each of these has to be checked with every other element, so that the routine's speed is proportional to *N*<sup>2</sup>, usually expressed as *O*(*N*<sup>2</sup>), where the *O* stands for order of magnitude. For sorting, *O*(*N*<sup>2</sup>) is bad. For example, when *N*=128, you would be looking at 16k checks. Times the number of cycles that the actual checks and updates would take. Not pleasant.

Fortunately, there are faster methods, you'd want at least an *O*(*N*·log~2~(*N*)) for sorting algorithms, and as you can see from the aforementioned wiki, there are plenty of those and shellsort is one of them. Unfortunately, even this can be quite expensive. Again, with *N*=128 this is still about 900, and you can be sure the multiplier can be high, as in 80+. With ARM+IWRAM, I can manage to bring that down to 20-30, and a simple exercise in assembly gives me an acceptable 13 to 22 × *N*·log~2~(*N*).
Fortunately, there are faster methods, you'd want at least an *O*(*N*·log<sub>2</sub>(*N*)) for sorting algorithms, and as you can see from the aforementioned wiki, there are plenty of those and shellsort is one of them. Unfortunately, even this can be quite expensive. Again, with *N*=128 this is still about 900, and you can be sure the multiplier can be high, as in 80+. With ARM+IWRAM, I can manage to bring that down to 20-30, and a simple exercise in assembly gives me an acceptable 13 to 22 × *N*·log<sub>2</sub>(*N*).

:::note The Big O Notation

Expand Down
Loading

0 comments on commit a44c1ea

Please sign in to comment.