Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegerLiteralExpr#asInt produces NumberFormatException for 2147483648 literal #2483

Open
oxisto opened this issue Jan 14, 2020 · 22 comments · May be fixed by #2485
Open

IntegerLiteralExpr#asInt produces NumberFormatException for 2147483648 literal #2483

oxisto opened this issue Jan 14, 2020 · 22 comments · May be fixed by #2485
Assignees

Comments

@oxisto
Copy link

@oxisto oxisto commented Jan 14, 2020

We have been running into an issue while parsing the expression -2147483648. According to the Java Spec (https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.1), the largest number a decimal literal can be is 2147483648, but only in combination with the unary expression, since it then represents the lowest number in Java -2147483648 (-2^31). Larger numbers or the use of it without the unary expression is considered a compile error.

JavaParser will (correctly) deduce that -2147483648 is a unary expression containing a literal. However, by calling asInt on the IntegerLiteralExpr it will (correctly) produce a NumberFormatExpression since it cannot represent the value of the literal 2147483648 in a Java int. However, the combination itself is a valid Java construct which we cannot correctly parse without an exception.

Quick fix would probably be to return the literal as a long, not sure if this would introduce other problems.

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 14, 2020

Ohh man... (I'm gonna apologize ahead of time for this rant)
Javas JDKs int/long parsing is a complete mess, and
(I wonder if this terrible misbehavior causes this issue in JP)

Here's some tweets about it (earlier tweets from Oct 2019):
https://twitter.com/edefazio/status/1179408601657614337

Fun w/ Javas' Integer.parseInt:

assertEquals(15, 0xF);
assertEquals(-268435456, 0xF0000000);
assertEquals(15, Integer.parseInt("F",16));
assertEquals( -268435456, Integer.parseInt("F0000000",16));
java.lang.NumberFormatException: For input string: "F0000000"

To fix it... use "parseUnsignedInt() method... because whats more intuitive than to call parseUnsignedInt which returns a signed int?
assertEquals(15, Integer.parseUnsignedInt("F",16));
assertEquals( -268435456, Integer.parseUnsignedInt("F0000000",16));

...And further back (about parsing Longs)

https://twitter.com/edefazio/status/1131275299398799367
Long n=0x8000000000000000L;
assertEquals(n,Long.parseLong("0x8000000000000000L"));
assertEquals(n,Long.parseLong("8000000000000000L",16));
assertEquals(n,Long.decode("0x8000000000000000L"));
assertEquals(n,Long.decode("0x8000000000000000"));

assertEquals(n, new BigInteger("8000000000000000", 16).longValue());

So... in general, the problem (as I see it) is... the code used by java (the JDK)
to parse the source code with int and long literal values is NOT available(probably written in C++), and instead when we use Integer.parseInt or Long.parseLong you have to be really careful
because it routinely barfs on large or negative numbers...

also couple that with literals which are defined as/with
0b01110010101 (binary)
0xDEADBEEF (hex)
1_000_000 (with '_' separators)
-negative unary expressions
1000L, 10000000L (L,l, postfixes)
which don't work easily with Integer/Long.parseXXX() as is (they have to be manually "decoded" before calling parseInt/parseLong with the radix)

What I want... is ANY String that represents an(int/long) literal somewhere in Java source code
I should be able to call a method that returns me the value (i.e. either a wrapper Number or primitive)
Instead you wander into a quagmire of inconsistencies (and dont get me started on Floating point numbers with Scientific Notation / NaNs/etc.

(sorry end rant)

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 14, 2020

Here's how I did parsing integers/longs (not integrated into JP):

    /**
     * A local number format we can use to compare number literals...
     * we need this because number can have different syntax
     */
    private static final NumberFormat NF = NumberFormat.getInstance();

    public static Long parseLong(LiteralStringValueExpr les) {
        return parseNumber(les.getValue()).longValue();
    }

    public static Long parseLong(IntegerLiteralExpr ile) {
        Number n = parseNumber(ile.getValue());
        return n.longValue();
    }

    public static Long parseLong(LongLiteralExpr lle) {
        Number n = parseNumber(lle.getValue());
        return n.longValue();
    }

    public static Integer parseInt(IntegerLiteralExpr ile) {
        return parseInt(ile.getValue());
    }

    public static Long parseLong(String s) {
        Number n = parseNumber(s);
        return n.longValue();
    }

    /**
     * @param s
     * @return
     */
    public static Integer parseInt(String s) {
        Number n = parseNumber(s);
        return n.intValue();
    }

    /**
     * This String can represent a Float or a Double, since we ONLY have
     * DoubleLiteralExpr to model both Float and Double Literals
     *
     * @param floatOrDbl
     * @return
     */
    public static Number parseFloatOrDouble(String floatOrDbl) {
        floatOrDbl = floatOrDbl.trim().replace("_", "");
        if (floatOrDbl.endsWith("f") || floatOrDbl.endsWith("F")) {
            return Float.parseFloat(floatOrDbl.substring(0, floatOrDbl.length() - 1));
        }
        if (floatOrDbl.endsWith("d") || floatOrDbl.endsWith("D")) {
            return Double.parseDouble(floatOrDbl.substring(0, floatOrDbl.length() - 1));
        }
        return Double.parseDouble(floatOrDbl);
    }

    /**
     * This String can represent a Float or a Double, since we ONLY have
     * DoubleLiteralExpr to model both Float and Double Literals
     *
     * @param dblExpr
     * @return
     */
    public static Number parseFloatOrDouble(DoubleLiteralExpr dblExpr) {
        return parseFloatOrDouble(dblExpr.toString());
    }

    /**
     * Parses and returns the number from the String
     *
     * NOTE this is public for testing, but only really used internally
     * @param s
     * @return
     */
    public static Number parseNumber(String s) {
        // Long l = 0xAAA_BBBL; this is valid
        String str = s.trim();
        if (str.startsWith("0x") || str.startsWith("0X")) {
            if (str.endsWith("L") || str.endsWith("l")) {
                // Im a simple man... and If I can represent some long as a string
                // in some form in Java, JAVA should provide a parser that (given this string)
                // returns me the appropriate long value... but NOOOOOOOO
                // they have to just screw it all up and waste my time with this
                // bullshit
                if (str.equals("0x0L") || str.equals("0x0l") || str.equals("0X0L")) {
                    return 0L;
                }
                return new BigInteger(str.replace("L", "").replace("l", "").replace("0x", "").replace("0X", "").replace("_", ""), 16).longValue();
            }
            return Integer.parseUnsignedInt(str.substring(2).replace("_", ""), 16);
        }
        if (str.startsWith("0b") || str.startsWith("0B")) {
            if (str.endsWith("L") || str.endsWith("l")) {
                String subSt = str.substring(2, str.length() - 1);
                return Long.parseUnsignedLong(subSt.replace("_", ""), 2);
            }
            return Integer.parseInt(str.substring(2).replace("_", ""), 2);
        }
        try {
            return NF.parse(str.replace("_", "").replace("F", "f").replace("D", "d"));
        } catch (Exception e) {
            throw new RuntimeException("" + e);
        }
    }
@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 14, 2020

OK... seems like this (particular) issue has more to do with a VERY specific case
on how JP treats UnaryExprs when it comes to Parsing Numbers
(i.e. it treats the UnaryExpr as the parent to the IntLiteralExpr in ALL negative literal cases...i.e.:

i = -2147483648;

//com.github.javaparser.ast.expr.UnaryExpr "-2147483648"
//com.github.javaparser.ast.expr.IntegerLiteralExpr "2147483648"

j = -1;
//com.github.javaparser.ast.expr.UnaryExpr "-1"
//com.github.javaparser.ast.expr.IntegerLiteralExpr "1"

so we have a situation where (BECAUSE we broke apart the int value from the sign, it now represents an invalid int value, and calling the IntegerLiteralExpr toInt() breaks things)

So, just spitballin it here: (and I recommend neither approach)

we (JP) COULD have toInt on IntegerLiteralExpr check if it's parent is a UnaryExpr (with the "-" MINUS) and specifically use that to call the correct method /return the value.
however... I'm not sure if that follows inline with how things are done in other cases (it COULD be a problem if we did this then unknowingly called because the IntegerLiteralExpr node its actually knowing more about it's context... which seems wrong and could lead to hard to find errors)

the more involved and cumbersome approach would be to post process UnaryExprs that are parents of IntegerLiteralExprs and condense them to simply be IntegerLiteralExprs... (this also has drawbacks & I dunno if @matazoid wants to do this as a special case because it could have a huge impact)

(a more simple practical approach for the time being)
In the interum,(to get past your particular problem) instead of directly calling toInt() on the IntegerLiteralExpr, maybe you (in the program using JP) could manually evaluate the
UnaryExprs (i.e. intercept those, check if they have a single child that is a IntegerLiteralExpr and evaluate it manually)

just a thought

@matozoid

This comment has been minimized.

Copy link
Member

@matozoid matozoid commented Jan 14, 2020

Wow... That is quite some detective work! Uhhhhhh, I'd say that the methods like toInt are helpers and are supposed to deliver the correct value. I think that you are saying that they will currently never return a negative value? That would be a bug then, and grabbing the parent node to see if it is a unary minus would be a correct way to fix this.

@oxisto

This comment has been minimized.

Copy link
Author

@oxisto oxisto commented Jan 14, 2020

I think that you are saying that they will currently never return a negative value? That would be a bug then, and grabbing the parent node to see if it is a unary minus would be a correct way to fix this.

Hmm not sure about this either, because it seems that according to the language specification, a literal is always a positive number (with a defined maximum depending on int and long). So if you want to assign a negative "value" to something from a parser point of view you always have the combination of unary expression + literal. The stupid thing is that to express the lowest negative int value this way you need a literal with a with a value that does not fit in an int :D

@matozoid

This comment has been minimized.

Copy link
Member

@matozoid matozoid commented Jan 14, 2020

I'll leave this to @edefazio then, he seems very passionate about this issue ;-)

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 14, 2020

?thanks?
no really I just worry that if you start changing what toInt() does it opens the door to have crazy hard to find bugs happen...

.. instead of changing how toInt() works on IntegerLiteralExpr...

I think we should add a new method on IntegerLiteralExpr that evaluates the literal in a "context sensitive fashion"

To explain why using a different example:
If I have the following code:

CompilationUnit cu = StaticJavaParser.parse("class C{ int i = -1; }");
List<IntegerLiteralExpr> neg = cu.findAll(IntegerLiteralExpr.class, i-> i.asInt() < 0 );

If I use latest and run this query today I would EXPECT the neg list to return ALL int literals that are less than 0, in the above example, we would probably EXPECT a single instance (-1)...

however,
because we split the expression "-1" into a UnaryExpr parent (Operator.MINUS)
AND a child IntLiteralExpr 1

we will never find any negative numbers....

if we add a method, we arent responsible for breaking the existing code, and we can add this new method literalAsSignedInt() functionality to IntegerLiteralExpr

neg = cu.findAll(IntegerLiteralExpr.class, i-> i.literalAsSignedInt() < 0 );
assertEquals(1, neg.size());

here's the code I'm suggesting (it'll be in IntegerLiteralExpr)
(I dont want to CHANGE toInts behaviour)

       /**
        * context sensitive conversion of this expression to an int (considers whether this
        * node is the child of a UnaryExpr with Operator.MINUS.)
        * 
        * also handles a particularly nasty integration bug where we represent Integer.MIN_VALUE 
        * as a combination of a parent (UnaryExpr Operator.MINUS) 
        * and a child which will be a literal "2147483648"... together they represent a valid int
        *  value, however, evaluating this String "2147483648" as an int independently
        * represents a value that is outside of Int range (-2147483648 to 2147483647)
        * and (in this case) `toInt()` throws an exception
        */
       public int literalAsSignedInt(){
             return literalAsSignedInt(this);
       }

       /**
	 * This is context sensitive (returning a negative value if it is a child of 
	 * a UnaryExpr that has the minus sign)
	 *  
	 * @param ile an integer
	 * @return
	 */
	static int literalAsSignedInt(IntegerLiteralExpr ile) {
		if(ile.getParentNode().isPresent() && 
		   ile.getParentNode().get() instanceof UnaryExpr &&
		   ((UnaryExpr)ile.getParentNode().get()).getOperator() == UnaryExpr.Operator.MINUS){
			
			//check underflow case:
			String result = ile.getValue().replace("_", "");
			if( result.equals("2147483648") ) {
				return Integer.MIN_VALUE; 
			}
			return -(ile.asInt());
		}
		return ile.asInt();
	}
@oxisto

This comment has been minimized.

Copy link
Author

@oxisto oxisto commented Jan 15, 2020

After some thinking the best option would be to have an approach similar how the C++ specification does it, in which the literal will automatically be converted to the next higher class of storage, i.e. if it does not fit into an int it will be a long and so on. In Java we could produce a similar thing with the Number interface and a generic asNumber() function in the literal classes. I will try to implement this in a PR in the next couple of weeks.

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 15, 2020

@oxisto I am not sure I subscribe to this idea...

In every single case asNumber would return an Integer (except the one case where youd return (Integer.MIN_VALUE) which will return a Long.

@matazoid and I are discussing on Gitter (& we can continue the conversation there if you like)
(at the moment seems like we are leaning toward just updating toInt() to take into account the nodes' context (if it's a child of Unary -) and leaving it as an int on the API.

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 15, 2020

At the moment, our thinking is that we add a @deprecated tag to asInt() asLong(), asDouble()
on IntegerLiteralExpr, LongLiteralExpr, and DoubleLiteralExpr
and add new methods which will take into account the context (Note these will be instance methods
evalAsInt()
evalAsLong()
evalAsDouble()
on the corresponding classes, static is just for external testing):

     
	public static int evalAsInt(IntegerLiteralExpr ile) {
		if(ile.getParentNode().isPresent() && 
		   ile.getParentNode().get() instanceof UnaryExpr &&
		   ((UnaryExpr)ile.getParentNode().get()).getOperator() == UnaryExpr.Operator.MINUS){
			
			//check underflow case:
			String result = ile.getValue().replace("_", "");
			if( result.equals("2147483648") ) {
				return Integer.MIN_VALUE;
			}
			return -(ile.asInt());
		}
		return ile.asInt();
	}
	
	public static long evalAsLong(LongLiteralExpr lle) {
		if(lle.getParentNode().isPresent() && 
		   lle.getParentNode().get() instanceof UnaryExpr &&
		   ((UnaryExpr)lle.getParentNode().get()).getOperator() == UnaryExpr.Operator.MINUS){
			
			//check underflow case:
			String result = lle.getValue().replace("_", "");
			if( result.equalsIgnoreCase("9223372036854775808L") ) {
				return Long.MIN_VALUE;
			}
			return -(lle.asLong());
		}
		return lle.asLong();
	}
	public static double evalAsDouble(DoubleLiteralExpr lle) {
		if(lle.getParentNode().isPresent() && 
		   lle.getParentNode().get() instanceof UnaryExpr &&
		   ((UnaryExpr)lle.getParentNode().get()).getOperator() == UnaryExpr.Operator.MINUS){
			return -(lle.asDouble());
		}
		return lle.asDouble();
	}
@oxisto

This comment has been minimized.

Copy link
Author

@oxisto oxisto commented Jan 15, 2020

It is your call of course, but I would actually advise against this, because you are changing the semantics of the literal expression. Fact is that if you consider the code fragment -1 it is really two expressions:

  • a unary exp -
  • and a literal 1.

If you now eval the literal to -1 you still end up with an AST of:

  • a unary exp -
  • and an (evaluated) literal of -1.

This is extremely confusing in my opinion. On the plus side you of course you would then be able to quickly find the usage of -1. Probably a toin coss in the end :)

Maybe I can join the conversation on Gitter tomorrow, I wanted to dig deeper in how the Java compiler actually handles this.

Update: The jdk Compiler condenses both into a single literal expression with the negative value. Might be the most sensible approach.

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 16, 2020

sorry I think i just restated what you said here in Gitter chat, shoulda looked here first....

yes I think we are looking at these scenarios... I think unifying the UnaryExpr + IntegerLiteralExpr into a single IntegerLiteralExpr is probably the most correct thing to do, but I also recognise that this might be alot of work for the parser code so I think we need to ask or "sell" @matozoid on the idea. thanks for your input (highly appreciated) btw

@matozoid

This comment has been minimized.

Copy link
Member

@matozoid matozoid commented Jan 16, 2020

Note that the Java compiler does its parsing with one goal: give the compiler what it needs as quickly as possible. That's another goal than JavaParser, which wants to give everyone a general AST that has everything to their job done. There are many things that JP does different from javac.

@matozoid

This comment has been minimized.

Copy link
Member

@matozoid matozoid commented Jan 16, 2020

Looks like the "eval" naming uncovered the real problem here: toInt is doing a tiny bit of interpretation of the AST, it is the start of an interpreter. If we consider the UnaryExpr next to the IntegerLiteralExpr, shouldn't we also consider the EnclosedExpr so that -(1) works? And -(1+0) - should we remove that zero? It looks to me like toInt's promise to give the value of the literal in a non-confusing way cannot be fulfilled without a lot of work. Maybe take another approach (that was suggested before) and just return long for toInt so that this issue is fixed, but also rename the method to something that says "careful, doesn't include the sign!"

@oxisto

This comment has been minimized.

Copy link
Author

@oxisto oxisto commented Jan 16, 2020

just return long for toInt so that this issue is fixed

The same applies for the long literal expression as well btw, in this case with -9223372036854775808 where 9223372036854775808 does not fit in a long anymore and needs a BigInteger :(

@MysterAitch

This comment has been minimized.

Copy link
Collaborator

@MysterAitch MysterAitch commented Jan 16, 2020

I'm not fully clued in (get the disclaimer in early ;) ) but would it be appropriate to introduce a method such as toNumberPart() (indicating it is only the "number part" not the sign) or toModuloNumber()/toAbsoluteNumber() (more explicitly indicating it will return a positive number).

The interface of such a method is that that it will return a value of type Number and is documented that it will first attempt to return the same concrete number type as the one in the source code (but will "scale up" as required)..?

This can then leave the more specific toInt etc methods who shall throw an exception in the cases mentioned in this issue, without necessarily needing to deprecate them or change them in any way.

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 16, 2020

OK... I am going to endeavor to explain why I think unifying the UnaryExpr + IntegerLiteralExpr is the best solution to this problem (and my earlier ideas while tactical were not strategic & holistic)

At first, I thought the idea was that we wanted to minimize the "scope" of the problem, because (when we have a situation where we have a UnaryExpr parent and IntegerLiteralExpr child) a bug manifests itself in a single use case Integer.MIN_VALUE.

However doing some use cases, I think the issue of having a parent UnaryExpr with child IntegerLiteralExpr has other drawbacks that we should consider holistically...
(in BOTH evaluating and modifying the AST)

we CAN add or deprecate new methods to change what we return (i.e. a long, or Long or Number), but to me this seems like we're kicking the can down the road because, this particular bug can manifest in more confusing and harder to identify ways (especially if we consider modifying the AST):

first off, @MysterAitch why I think returning a Number (while it does fix the issue at hand) but might not be a fix to this problem... if because we'll still have this weird phenomenon:

(in the Long.MIN_VALUE case) within a LongLiteralExpr, we'll still get a NumberFormatException (or we'd have to return a BigInteger) in this situation (because breaking apart the UnaryExpr sign from the LongLiteralExpr

CompilationUnit cu = StaticJavaParser.parse("class C { long l = -9223372036854775808L; }");

...we break apart this number into this parent child:
com.github.javaparser.ast.expr.UnaryExpr "-9223372036854775808L" (parent)
com.github.javaparser.ast.expr.LongLiteralExpr "9223372036854775808L" (child)
and if we ask for the value of the child LongLiteralExpr we can't b/c 9223372036854775808 is outside the Long range (-9223372036854775808, 9223372036854775807) & wed have to return a BigInteger (ugg) ...this'd be the ONLY case where BigInteger would be returned, (as mentioned above by @oxisto)

Mutation & Atomicity
the other scenario where I think modelling and evaluating things as (sometimes a parent/child (UnaryExpr/IntegerLiteralExpr node, sometimes just a IntegerLiterlExpr node), is mutation... since the great part about JP is that the nodes ARE mutable, we can always MANUALLY set the value setInt(), setLong() with an int or long, but then you need to also change those methods to do strange things to be consistent (i,e.:

CompilationUnit cu = StaticJavaParser.parse("class C { int = -100; }");
cu.findFirst(IntegerLiteralExpr.class).get().setInt(100);

above... when we setInt() we would need to change the code internally to REMOVE the UnaryExpr Operator.MINUS parent node, otherwise the assignment has no effect, currently if we do this:

CompilationUnit cu = StaticJavaParser.parse("class C { int i = -100; }");
cu,findFirst(IntegerLiteralExpr.class).get().setInt(100); //we thought we change -100 to 100

the result code is:

class C {int i = -100;}

but what we expected is:

class C {int i = 100;}

...to have the setInt() method to modify the AST it now needs to do scary things like replacing the parent node with a modified version of the child (Collapse them from a parent/child UnaryExpr/IntegerLiteralExpr to a IntegerLiteralExpr.

Non-uniformity (parsing code verses building code manually)
Because IntLiteralExpr is manually buildable or parseable from source, we have (2) separate ways of representing the same thing, and that makes it "non-uniform" (i.e. opens up all kinds of corner cases)

FieldDeclaration parsed = (FieldDeclaration)
	StaticJavaParser.parseBodyDeclaration("int i = -1;");
Ast.describe(parsed );
/*  here we describe the AST for  "int i = -1
com.github.javaparser.ast.body.FieldDeclaration "int i = -1;"
com.github.javaparser.ast.body.VariableDeclarator "i = -1;
com.github.javaparser.ast.type.PrimitiveType "int"
com.github.javaparser.ast.expr.SimpleName "i"
com.github.javaparser.ast.expr.UnaryExpr "-1"
com.github.javaparser.ast.expr.IntegerLiteralExpr "1"
*/		

FieldDeclaration cloned = parsed.clone();
cloned.getVariable(0).setInitializer(new IntegerLiteralExpr("-1"));
Ast.describe(cloned );
/*
com.github.javaparser.ast.body.FieldDeclaration "int i = -1;"
com.github.javaparser.ast.body.VariableDeclarator "i = -1"
com.github.javaparser.ast.type.PrimitiveType "int"
com.github.javaparser.ast.expr.SimpleName "i"
com.github.javaparser.ast.expr.IntegerLiteralExpr "-1"
 */

So... The above problems just go away if we unify the UnaryExpr with the child to only have a single node (IntLiteralExpr, LongLiteralExpr, DoubleLiteralExpr) represent a single literal and even though the number is stored as a string:

  1. we dont run into the overflow problem. with Integer.MIN_VALUE, Long.MIN_VALUE, or Double
  2. we can modify it "atomically" (no need to check parent or modify AST)
  3. it will be more uniform whether we read it from source of create manually
  4. we can read it and evaluate the string in place (if we parse it from a source string or create it manually it'd be the same)

I can't seem to find flaws with the (Unify) approach for Literals, but perhaps people can explain some cases where my logic is faulty (assuming I did a good enough job explaining)

@matozoid

This comment has been minimized.

Copy link
Member

@matozoid matozoid commented Jan 17, 2020

Okay, my last comment was about how this approach is flawed and I see nothing to counter it, so I'm not yet convinced.

Note that JP defines literals according to the JLS right now: there should not be a minus. In your suggestion, I think we should let setInt generate a UnaryExpr when the number is negative :-)

@edefazio

This comment has been minimized.

Copy link

@edefazio edefazio commented Jan 17, 2020

@matozoid
I ran a program with this:

		System.out.println( Integer.MIN_VALUE );
		System.out.println( -( Integer.MIN_VALUE) );	

and while it compiles, runs, it returns the wrong value...
-2147483648
-2147483648

The EnclosedExpr situations you mentioned,
-( 1)
-(1 + 0)
highlight the point I am failing to make. There should be ONE way to represent these syntactic grouping of characters in a AST, ( -( 1) -(1 + 0) ) so we can evaluate holistically.

There should also be ONLY ONE way to represent the expression: -1 but right now there are (2). and If I try to read through an AST to find ALL -1 s I have to account for:

  1. IntLiteralExprs that have the value "-1" (possibly because it was created manually)
  2. UnaryExprs that have Operator.MINUS and a child that is an IntegerLiteralExpr that has a value "1" (the way it is being parsed right now)

BECAUSE we are considering to have IntegerLiteralExprs eval or toInt methods to be "context aware"/ know about their parent (and NOT contain the sign on the literal) we create an issue... I mean if I did this... why wouldnt we ALSO consider evaling this when we called toInt():
100 >>> 3
because it follows the same logic...

Separating the UnaryExpr and `IntegerLiteralExpr / LongLiteralExpr / DoubleLiteralExpr
we are allowing a modelling issue to occur and manifests itself in many ways. specifically there are multiple (non-uniform) ways to represent the exact same letter for letter syntactic code (negative literal numbers).

I think I have failed...(maybe someone else can take this one)

@MysterAitch

This comment has been minimized.

Copy link
Collaborator

@MysterAitch MysterAitch commented Jan 18, 2020

I need some additional time to digest this but I can see where @edefazio is coming from.

The naming of IntLiteralExpr (and others) in edge cases like this is something that has the potential to cause significant headache until the nuances of number representation in binary and within a syntax tree are highlighted.

For example, to a lay-reader if asked about which part of int x = 10; is the IntLiteralExpr, I suppose that most will correctly pick out the 10 part.

If asked to do the same for int x = -10; (and as a follow-up, what happens if we swap the IntLiteralExpr part with -20 or 50), things get a bit muddier because it's not immediately clear whether the "expression" includes the sign -- -10 IS a "literal integer" after all, so why wouldn't -10 become 50 and -10 become -20 (as opposed to --20 or whatever?)....

Given this, perhaps the solution is to do away with IntLiteralExpr (and others) and make it blatant that this is (should be?) an unsigned value. Similarly, in the set methods, perhaps a potential solution here is to only accept unsigned values -- either erroring or raising warnings when passed a signed (or negative) value?

@oxisto

This comment has been minimized.

Copy link
Author

@oxisto oxisto commented Jan 18, 2020

It is not necessarily constrained to „unsigned“/positive value since you can represent negative values directly in an int/long literal by using the binary or hexadecimal representation such as 0xffff_ffff (this is -1) without any unary expression.

Hopefully I have some time over the weekend to propose the solution where we add a asNumber which returns the value of it in an appropriate type that scales „up“ with the content in a PR.

@matozoid

This comment has been minimized.

Copy link
Member

@matozoid matozoid commented Jan 18, 2020

It seems like every solution has its pro's and cons, so I welcome any PR now :-) Just as long as we can give users a fool-proof, obvious way to get that literal turned into a numeric value.

@oxisto oxisto linked a pull request that will close this issue Jan 18, 2020
1 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.