I am afraid I have trouble understanding the training data and I cannot find any explanation in the README or paper. #23

jparsert · 2022-03-22T22:32:31Z

What for example what does a line like that mean:
2|sub Y' add INT+ 4 mul INT+ 2 pow add x sqrt INT+ 3 INT- 1 add mul INT+ 2 ln add x sqrt INT+ 3 mul INT+ 4 x
What is the "sub Y' what is the operator "INT-" or "INT" is it just integer addition/subtraction? And why do you differenciate between that and sub/add? Also where do I get the training pairs, there is only one expression per line not two.

The text was updated successfully, but these errors were encountered:

Ryan-Rhys · 2023-01-21T00:40:15Z

@jparsert I don't suppose you figured this out?

jparsert · 2023-01-21T00:42:03Z

No i did not. Do you have any further information?

Ryan-Rhys · 2023-01-21T01:27:26Z

After some digging it seems as though the tab separates the (x, y) pairs and each line is given in the "Polish notation" for converting a tree (like the one given in section 2.1 of the paper) into a sequence:

https://en.wikipedia.org/wiki/Polish_notation

There's a small note in section 2.2.

I'm not sure what "sub Y" is but I think it can be ignored

Ryan-Rhys · 2023-01-21T01:46:57Z

Is my interpretation of the first line in prim_fwd_test

frieders · 2023-01-21T21:36:40Z

Ryan is spot on, this is how to parse these things.

sub Y' and everything that comes before it can be ignored, I guess those is just the number of identical tree they generated (because it seems in their setup they couldn't exclude generating the same tree multiple times), which has implications on the probability of occurrence I assume - can be ignored for this question. INT+ 5 of course just means the integer 5.

The trick is to start reading the expression from the right (which corresponds to the bottom of the tree). Here are some further expressions, you get the hang of them after parsing a few:

Line 2 from the file prim_fwd.test:

pow x INT+ 3         mul div INT+ 1 INT+ 4 pow x INT+ 4

we can write acc. to Ryan as

(pow x INT+ 3, mul div INT+ 1 INT+ 4 pow x INT+ 4)

which is in normal LaTeX notation

(x^{3}, 3\cdot\frac{1}{4}x^{4})

Line 8 from the file prim_fwd.test:

add pow x INT+ 2 exp x	     add mul div INT+ 1 INT+ 3 pow x INT+ 3 exp x

we can write as

(add pow x INT+ 2 exp x, add mul div INT+ 1 INT+ 3 pow x INT+ 3 exp x)

which is in normal LaTeX notation

(x^{2}+\exp(x), \frac{1}{3}\cdot x^{3}+\exp(x))

You can check these conversion are correct by noticing that integrating the left-hand side always gives the right-hand side (which is what this dataset is about). With this information it becomes clear that you actually just need to understand either side of the pair (X,Y), since you can integrate X to get Y, or differentiate Y to get X.

(@jparsert greetings from another place from Oxford ;) )

jparsert · 2023-01-25T20:16:31Z

I did already figure out the Polish notation bit and also wrote a parser that returns a nice AST. If you are interested i can give you that although it's really not much only a couple lines of code. Do you intend on extending/improving this work?

friederrr · 2023-03-08T20:39:49Z

I don't intend to improve this particular line of work; we were mainly interested in this data to evaluate ChatGPT (see https://arxiv.org/abs/2301.13867) because it is unlikely to have been directly in the training data of a LLM that has been trained with publicly scrapped mathematical data. But it would be worth a thought to perhaps automate this, for which an AST would then be useful. I don't really have a specific interest in going along this route at the moment - though if you are interested in having a chat, please do let me know.

f-charton · 2023-03-08T21:31:24Z

The prefix_to_infix() function, in src/char_sp.py should do this for you. It takes a prefix (ie polish) notation sequence, and rewrites it "as written by humans", with parentheses. To use it, you need to load char_sp.py, then instantiate class CharSPEnvironment, then call prefix_to_infix() on the prefix notation list.

As for the line
2|sub Y' add INT+ 4 mul INT+ 2 pow add x sqrt INT+ 3 INT- 1 add mul INT+ 2 ln add x sqrt INT+ 3 mul INT+ 4 x
The 2 | is just a line number and a separator.
sub Y' add INT+ 4 mul INT+ 2 pow add x sqrt INT+ 3 INT- 1 is the input sequence. It reads
sub ( Y' add (INT+ 4 (mul (INT+ 2 pow (add x sqrt INT+ 3) INT- 1)))
which corresponds (in infix notation) to
Y' - ( 4 + (2 * (x + sqrt(3))^-1)), with =0 assumed at the end of the expression (I think we mention this in the paper)
i.e. Y' = 4 + 2 / (x+sqrt3)). We want to integrate the function 4+2/(x+sqrt(3)).

add mul INT+ 2 ln add x sqrt INT+ 3 mul INT+ 4 x which means 2 * ln(x+sqrt(3)) + 4*x

sub, add and mul are the binary operation + - and *
INT+ and INT- are the positive and negative signs, when writing integers.

friederrr · 2023-03-08T21:51:41Z

Thanks!

After concentrating a bit we did figure out the puzzle that was 2|sub Y' add INT+ 4 mul INT+ 2 pow add x sqrt INT+ 3 INT- 1 for us. ;)

Having the prefix_to_infix() function already coded, maybe we will proceed after all to do the automatic evaluation ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am afraid I have trouble understanding the training data and I cannot find any explanation in the README or paper. #23

I am afraid I have trouble understanding the training data and I cannot find any explanation in the README or paper. #23

jparsert commented Mar 22, 2022 •

edited

Loading

Ryan-Rhys commented Jan 21, 2023

jparsert commented Jan 21, 2023

Ryan-Rhys commented Jan 21, 2023

Ryan-Rhys commented Jan 21, 2023

frieders commented Jan 21, 2023

jparsert commented Jan 25, 2023

friederrr commented Mar 8, 2023 •

edited

Loading

f-charton commented Mar 8, 2023 •

edited

Loading

friederrr commented Mar 8, 2023

I am afraid I have trouble understanding the training data and I cannot find any explanation in the README or paper. #23

I am afraid I have trouble understanding the training data and I cannot find any explanation in the README or paper. #23

Comments

jparsert commented Mar 22, 2022 • edited Loading

Ryan-Rhys commented Jan 21, 2023

jparsert commented Jan 21, 2023

Ryan-Rhys commented Jan 21, 2023

Ryan-Rhys commented Jan 21, 2023

frieders commented Jan 21, 2023

jparsert commented Jan 25, 2023

friederrr commented Mar 8, 2023 • edited Loading

f-charton commented Mar 8, 2023 • edited Loading

friederrr commented Mar 8, 2023

jparsert commented Mar 22, 2022 •

edited

Loading

friederrr commented Mar 8, 2023 •

edited

Loading

f-charton commented Mar 8, 2023 •

edited

Loading