Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IE-0026: Units and number bases #26

Merged
merged 1 commit into from
Jul 13, 2023
Merged

IE-0026: Units and number bases #26

merged 1 commit into from
Jul 13, 2023

Conversation

zedlopez
Copy link
Collaborator

@zedlopez zedlopez commented Jul 13, 2023

Proposal: IE-0026: Units and number bases
Authors: Graham Nelson
Language feature name: --
Status: Draft
Related proposals: --
Implementation: In progress

@zedlopez zedlopez merged commit 702f558 into ganelson:main Jul 13, 2023
@zedlopez
Copy link
Collaborator Author

zedlopez commented Jul 14, 2023

Disallowing the leading 0 in literals is a wonderful thing. Now we can say

$1.99 specifies a monetary value with parts dollars and cents (2 digits).

and head off the previous gotcha where $1.099 > $1.98.

It feels a lot like it's time to expose MAX_POSITIVE_NUMBER and MIN_NEGATIVE_NUMBER to the daylight so one has the option of (0 up to maximum integer) or something like that.

It'd be nice to be able to use values of these kinds in assertions, so that one had the option of making constants, for instance:

salary cap is always $100.00

(and because I clearly don't think that compiling I7 is complicated enough yet...)

Arithmetically, Inform currently assumes that in a unit with multiple parts, subsequent parts are in effect fractions of earlier ones. In $1.23, the cents are 1/100th of the dollars. It therefore makes total sense that $0.99 + $0.01 = $1.00, and that 3 times $1.13 is $3.39. That occurs quite often as a use case: minutes and seconds, pounds and ounces, etc. On the other hand, if a unit represents, say, coordinate pairs to make a 2-dimensional vector then that is not what you want. I'm thinking about whether there's a comprehensible and viable way to offer users a choice.

with independent parts instead of plain with parts seems like a lucid way this could be specified:

A coordinate is a kind of value.
x<x-val>/y<y-val> specifies a coordinate with independent parts x-val (up to 8 digits) and y-val (up to 8 digits).

(That implicitly forbids having both related and independent parts within the same specification but I feel pretty okay about that.)

It's not a completely straightforward business, though: what does it mean to increment a vector? or to loop from vector A to vector B?

Forbidding increment/decrement for something with independent parts seems reasonable.

repeat with x and y running from start point to end point would be very cool. (I don't think implementing it needs to be any kind of priority; I doubt it would get much real world use... but it would still be very cool.)

[ I was imagining when I wrote that that x from the x-val of start point to the x-val of end point would be the outer loop and y from the y-val of start point to the y-val of end point would be the inner loop, but I'm sure if it were implemented that way, it'd be about 15 minutes until someone was outraged they couldn't specify the order. Maybe it'd be best to forbid increment, decrement, and repeat. ]

<"0">x<numeric> specifies a hex literal with parts numeric (up to 7 hexadecimal digits).
when play begins: let h be 0xBADCAFE;

This is going to be fun.

@ganelson
Copy link
Owner

Do they serve 0xDEADBEEF at the 0xBADCAFE?

It'd be nice to be able to use values of these kinds in assertions, so that one had the option of making constants, for instance:

salary cap is always $100.00

(and because I clearly don't think that compiling I7 is complicated enough yet...)

This already works.

An agent is a kind of value. <"00">7 specifies an agent.
The troublesome agent is always 007.

To begin: showme the troublesome agent.

outputs

"troublesome agent" = agent: 007

You've always been able to use literal notation in assertions. For example:

The Conservatoire is a room.

Time signature is a kind of value. 1/99 time specifies a time signature with parts beats
and bar length (without leading zeros).

A musical score is a kind of thing. A musical score has a time signature. A musical
score usually has time signature 4/4 time.

In the Conservatoire is a musical score called Chopin's Mazurka in F Sharp Minor.
The Mazurka has time signature 3/4 time.

When play begins: showme the time signature of Chopin's Mazurka in F Sharp Minor.

outputs

"time signature of Chopin's Mazurka in F Sharp Minor" = time signature: 3/4 time

@zedlopez
Copy link
Collaborator Author

zedlopez commented Jul 16, 2023

Grid reference is a kind of value. SP <easting> <northing> specifies a grid
reference with parts
	easting (45000 to 54999, corresponding to lengths) and
	northing (5 digits, 3000 to 8999, corresponding to lengths).

I had idly imagined a second order unit whose parts were units but I truly thought it was too much to hope for.

@curiousdannii curiousdannii added the formal-proposal A formal proposal that has been accepted for consideration by the core Inform team label Jul 20, 2023
@zedlopez
Copy link
Collaborator Author

zedlopez commented Aug 7, 2023

It occurs to me that this approaches the ability to handle the time kind of value without it needing special-casing. The big thing missing is that there are a variety of ways to express time literals. Independent of whether it'd ever be a good idea to redo time, might it be worthwhile to allow multiple literal templates that could specify the same kind of value? (Honest question, I'm not sure it's a good idea, just one that seemed interesting enough to throw at the wall.)

(It also occurs to me that the new syntax could accommodate the am/pm syntax by making the first character a base 2 value with digits a and p, which is pretty neat. Handling that the value should then be + meridian * 720 is a different story.)

@curiousdannii
Copy link
Collaborator

@zedlopez I've never worked out how casting works, but this might be a situation in which it could be useful. Rather than having a single kind and multiple representations, there could be multiple kinds which can be automatically cast to the main one?

@ganelson
Copy link
Owner

ganelson commented Aug 8, 2023

I did consider the possibility of having multiple number kinds which cast silently into each other in order to have what amounted to "number which prefers to be printed in hexadecimal by default", but in the end it just seemed a sin to have multiple kinds where there was no real semantic difference between then.

Inform does actually feature the ability for multiple notations to represent the same kind: that's how "24km" and "4.3mm" can both notate the same kind "length", for example. I think with enough use of a crowbar, I could just about get time into this framework, but I'm not sure we gain much.

Fundamentally, time is weird because we are using it both for "time of day" (11:02 am) and "elapsed time" (17 minutes) at the same time, a compromise which only mostly works.

@CelticMinstrel
Copy link

I think with enough use of a crowbar, I could just about get time into this framework, but I'm not sure we gain much.

I somewhat disagree. I think the kind of machinery that would permit time to fit in this framework could be the same kind of machinery that could permit defining a "temperature" kind that can be expressed in either Celsius or Fahrenheit, which means more is gained than just fitting time into this framework.

Time is a kind of value. Temperature point is a kind of value.

23 hours 59 minutes specifies a time with parts hours and minutes.
59 minutes specifies a time with parts minutes.
[probably needs a bit more work here to disallow the 0 value, but close enough for this example, maybe?]
12:59 AM specifies a time with parts hours and minutes.
12:59 PM specifies a time with parts hours and minutes offset by 12 hours.

100.0 degrees Celsius specifies a temperature point.
100.0 degrees Fahrenheit specifies a temperature point scaled up by 1.8 and offset by 32.
100.0 Kelvin specifies a temperature point offset by -273.15.

Syntax like that might be worthwhile even if time ultimately isn't adjusted to be just another unit. It does have a few other oddities besides the AM/PM distinction.

Additional note: for this kind of syntax, scaled up/down by x and offset by y would probably need to be different from offset by x and scaled up/down by y. (This would be needed in order to allow defining Fahrenheit as the base unit and Celsius as a derived unit: 100.0 degrees Celsius specifies a temperature point offset by -32 and scaled down by 1.8.) And I guess it also might not hurt to allow offset by x and scaled up/down by y and offset by z while you're at it? (Probably only if it simplifies the implementation rather than complicating it.)

The choice of the name "temperature point", and the time compromise you mentioned, also raises the idea that there could be merit in a way to define a pair of kinds T and D such that T + D = T, T - D = T, T - T = D, and T + T is an error (and D is just a fully-fledged arithmetic type). I don't have a good idea for what this might look like.

@CelticMinstrel
Copy link

After reading the full proposal, I have a couple of other comments.

First of all, I realize that C has hexadecimal, decimal, octal, and binary literals, but I personally don't think that in and of itself is a good reason to offer exactly that same set in another language. Octal in particular feels overused to me. If I had proposed this, I almost certainly would've excluded it as a possible named base… but you could also go the opposite direction and support more named bases. For example, ternary or duodecimal are ones that I've often seen come up in general discussions of number bases. You could even follow the example of cardinals and ordinals and define a name for every base up to 12 (but also keep hexadecimal as a bonus because it's so common).

Only base 10 numbers are printed as signed, and only when no digit count is required. "[N in decimal]" and "[N in base 10]" are equivalent to "[N]" and print its value as a signed decimal number.
Thus, -1 prints as FFFFFFFF in hex, -1 in decimal, 4294967285 in unsigned decimal, 37777777777 in octal, 11111111111111111111111111111111 in binary, 102002022201221111200 in base 3 and 1Z141Y3 in base 36.

I feel like this goes against the grain for the Inform way of doing things (for example, the fact that it eschews programmer notation for specifying exponents on real numbers and instead uses standard scientific notation). Why do we have to expose the fact that the underlying system represents negative numbers as two's complement? Mathematically, there's no rule that a hexadecimal or binary number cannot be negative. I think I would strongly prefer Inform to go against the programming norm here as well and print -1 as -1 in every base; and for another example, -16 would print as -10 in hex, -16 in decimal, -20 in octal, -10000 in binary, -121 in base 3, and -G in base 26. (This would of course mean you need as unsigned variants for any base.)

It is now also possible to write binary, octal or hexadecimal literals. These are unsigned, …

Again, why? Why can't a binary or hexadecimal literal be negative? It just doesn't make any sense. And while I'm here, what about real hexadecimal numbers (or indeed binary or octal ones)?

To begin:
	let N be a real number;
	now N is hexadecimal 1.21d;
	now N is hexadecimal FADE.D2A x 16 ^ 22;
	now N is octal 12.6;
	now N is octal 77.12 x 8 ^ 12;
	now N is binary 101.1;
	now N is binary 11011.001 x 2 ^ 6;

Hexadecimal float literals are already precedented in some languages, too. (I've never seen one that offers binary or octal real literals though.)

The option digits "TEXT" tells Inform to use the supplied digit characters in place of the regular ones. The text must contain exactly the number of characters which equals the number base (8 for octal, and so on), must have no repeats, and must not use spaces or square brackets.

This feels like a huge missed opportunity.

A thunk is a kind of value. @zzz specifies a thunk with parts data (digits "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/").

After all, converting a number to base 26 and converting data to base64 is essentially the same operation. Mind you, it may indeed not be worthwhile since (if I'm not mistaken) this machinery is all limited to 32-bit integers, so expressing them as base64 probably isn't very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formal-proposal A formal proposal that has been accepted for consideration by the core Inform team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants