Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support embellishments and non-punctuation delimiters #42

Merged
merged 24 commits into from
Nov 26, 2023

Conversation

theseanl
Copy link
Contributor

@theseanl theseanl commented Sep 6, 2023

Previously, it couldn't attach arguments for macros whose argspec

  • involves embellishments e{_}, E{^}{default},
  • or non-punctuation delimiters such as r^_

This PR adds support for it. The second use-case may be marginal, I initially tried to make embellishments work, then realized that the same code would enable the second feature and unifies some codes.

Since a single embellishment argspec can represent multiple arguments, such as in e{^_} where each token ^ and _ would represents 2 arguments,
gobbleSingleArguments signature was changed to return Ast.Arguments | Ast.Arguments[] | null.

Currently, this just puts embellishment tokens as openMark in Ast. Would it be preferable to have a new separate property for embellishments?

Next goal would be to support/test macro expansions for such arguments. Once it's done, I'd like to work on #24.

Please take a look and provide comments, and feel free to make your own changes!

@theseanl theseanl changed the title feat: support embellishments and non-brace delimiters feat: support embellishments and non-punctuation delimiters Sep 6, 2023
@siefkenj
Copy link
Owner

siefkenj commented Sep 6, 2023

Thanks for the contribution!

It looks like you have some tests failing...

Here are some thoughts:

  1. It would be really nice not to change the signature of gobbleSingleArgument, but I cannot think of a good way around it. One possibility is to change it to always return [] | null
  2. I think more of the existing infrastructure can be reused. The algorithm for e{x} should be very similar to tx m. I think you can reuse the algorithms for grabbing those arguments. A possible idea for e{xyz} is that you keep a record of whether x,y,z has been captured. If one of them has been, you can re-call the function with y,z (or whatever hasn't yet been captured). This could potentially result in a way to keep gobbleSingleArgument returning a singleton. If it always returned the first argument it could gobble, then gobbleArgument would know that on a type=="embellishment" argument, it would need to call until nothing was gobbled. Then then it could rearrange the argument list to appear in the order specified by the oringinal macro spec.
  3. It would be nice to keep e{^} consistent with the parsing of ^. Currently ^ parses as a macro with escapeToken == "^". I think it's reasonable to parse e{x} as
{
                    "type": "argument",
                    "content": [
              {
                    "type": "macro",
                    "content": "^",
                    "escapeToken": "",
                    "args": [
                        {
                            "type": "argument",
                            "content": [
                                {
                                    "type": "string",
                                    "content": "2"
                                }
                            ],
                            "openMark": "{",
                            "closeMark": "}"
                        }
                    ]
                }
                ]
                openMark: "",
                closeMark: ""

That solution is not exactly elegant, though...

@theseanl
Copy link
Contributor Author

theseanl commented Sep 6, 2023

Regarding re-using existing algorithm for t

I wasn't aware that optionalToken shares similar traits. However, the way I see it is the other way around: Parsing optionalToken should use the new infrastructure findBracePositions.
A convoluted example, but the current algorithm for t can't correctly figure out arguments for the following:

\NewDocumentCommand{\asdf}{ t^ r__ }{\IfBooleanTF{#1}{123}{456} #2}
\asdf^_789_

It should be expanded to 123 789. The new findBracePositions algorithm was implemented basically to solve cases like this.
I will fix this and include it as a test case.

Signature of gobbleSingleArgument

One idea I have thought was to artificially decompose e{^-_} into e{^} e{-} e{_}. However it seemed that signatures are intimately tied to the raw .tex source and ArgSpec is assumed to be a faithful representation of it, there is even a warning

This function caches results. Don't mutate the returned AST!

A possible idea for e{xyz} is that you keep a record of whether x,y,z has been captured. If one of them has been, you can re-call the function with y,z (or whatever hasn't yet been captured). This could potentially result in a way to keep gobbleSingleArgument returning a singleton

It seems doable, but then the caller of gobbleSingleArgument should maintain some kind of state, about embellishment tokens we have matched until now, its index, and so on. gobbleSingleArgument will need a 4th argument startPosInStartPos, which would indicate the starting position inside the current node's content (if it is Ast.String node). Maintaining state would be done inside the body of gobbleArguments, but it doesn't look like the right place for that logic, because any serious logic for argspec is in gobbleSingleArgument function body...

If you prefer to have its return type Ast.Arguments[] | null, then I'll refactor that.

Tests

I guess it should be passing now. For some reason, tests in unified-latex-cli fail due to timeout on my machine, so I can't test it but the rest of tests were passing on my end.

Besides, I have made some tests for until argspecs skip, because they were failing on my end in the current HEAD.

It appears that e{{_}} has the same effect as e{_}. e{{{_}}} triggers an
error. This commit reflects this behavior in `gobbleSingleArgument`
function.
@siefkenj
Copy link
Owner

siefkenj commented Sep 6, 2023

I think the real issue with changing e{xy} to e{x} e{y} is that, I believe that e{xy} allows x/y to occur in either order, where e{x} e{y} would not.

The mutation warning is just that: a warning against mutating. However, if you create a new ArgSpec or make a deep copy and mutate the deep copy, it will be fine.

I'm leaning more towards making gobbleSingleArgument no return an array and moving some looping logic into gobbleArgument.

@theseanl
Copy link
Contributor Author

theseanl commented Sep 8, 2023

I think the real issue with changing e{xy} to e{x} e{y} is that, I believe that e{xy} allows x/y to occur in either order, where e{x} e{y} would not.

I wasn't aware of that! Thank you for pointing it out. I've updated code to implement this behavior.
Now given with this order-agnostic behavior of embellishment, moving the looping logic to gobbleArguments feels slightly more natural. However it's still annoying to pass around matched embellishment token and its index between gobbleArguments and gobbleSingleArgument, so I kept the signature in the end. The distinction of gobbleArguments and gobbleSingleArgument may be unnecessary.

Also I've fixed optionalToken and added a test for \asdf^_789_ as promised.

gobbleSingleArguments return values
@siefkenj
Copy link
Owner

siefkenj commented Sep 8, 2023

FYI, I just merged #45 which switches the tests to vitest. They might run successfully on your computer now :-)

@theseanl
Copy link
Contributor Author

theseanl commented Sep 9, 2023

until also manifests the same issue as optionalToken, so I went on to fix that as well :)
If you look at the code, it should be obvious where I am going. We will be able to handle another edge case: #46.
How does it look? What's your thought on gobbleSingleArgument? Also could you elaborate more on escapeToken == "^" (if this is still an issue)?

@siefkenj
Copy link
Owner

Thanks for this! I left a few comments. Also, please run Prettier on all modified files.

I'd still like to move some of the complexity to gobbleArguments and have gobbleSingleArugment only return one thing rather than an array.

@theseanl
Copy link
Contributor Author

theseanl commented Sep 13, 2023

I've added eslint and added curly rule which enforces statements in if ... else to be enclosed in a separate block.
I couldn't find a prettier config, and I doubt it's being consistently used in codes in this repository, for example prettier seems to have some peculiar rules for trailing commas and it is not consistently enforced.
I've tried to add one that matches the overall coding style of the repository. It revealed some obvious "error"s (in the sense of prettier's) whose fix was trivial.

Regarding refactoring gobbleSingleArgument, I think it's doable and I'll try to do it soon.

.prettierrc.json Outdated Show resolved Hide resolved
.eslintrc.js Outdated Show resolved Hide resolved
packages/structured-clone/index.ts Show resolved Hide resolved
@siefkenj
Copy link
Owner

Thanks for the linter updates! Let me know when this is ready for another review :-)

Now gobbleSingleArgument gobbles single argument for each run, and
may mutates its parameter argspec to remove gobbled arguments.
@theseanl
Copy link
Contributor Author

Refactored gobbleSingleArgument. I don't really like the additional complexity introduced by artificially separating logic into two places, but I guess I can live with it. Now gobbleArguments always create a structuredClone of the argSpec because gobbleSingleArgument may mutate it. It may as well check for spec.type === 'embellishment' and create a clone only in that case, but I wanted to make gobbleArgument agnostic to individual argspec types.

@theseanl
Copy link
Contributor Author

Also I think the current tsconfig is somewhat faulty,

cd packages/unified-latex
npm run build

fails with TS errors which shouldn't be an error, and describe and expect isn't resolved in vscode.

@siefkenj
Copy link
Owner

Also I think the current tsconfig is somewhat faulty,

cd packages/unified-latex
npm run build

fails with TS errors which shouldn't be an error, and describe and expect isn't resolved in vscode.

Did you do a general build first? If you haven't built the dependencies, it will fail.

I have vitest/global listed in tsconfig.build.json. That is supposed to explain to typescript that describe and it, etc.. all exist. I don't know why it doesn't work...

@theseanl
Copy link
Contributor Author

Turned out that doing a general build fixes the issue with npm run build. It initially failed, so I fixed some type errors in structured-clone that were causing the issue.

Regarding tsconfig, tsconfig.build.json explicitly excludes any test files, and any other tsconfig inherits from it, so IDE think test files are not part of the project and doesn't see vitest/global defined in tsconfig while trying to resolve types.

IMO, tsconfig.build.json should inherit from tsconfig.json, and excluding test files should be done only in tsconfig.build.json file. Each package directory should contain two tsconfig.build.json and tsconfig.json, and npm run build command should invoke tsc with tsc -b tsconfig.build.json. I'm not confident that I'll be able to change every relevant part, so it'd be nice if you could fix it :)

Copy link
Owner

@siefkenj siefkenj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really close!

@@ -18,7 +18,7 @@ export const argumentParser: ArgumentParser = (nodes, startPos) => {
const { argument: optionalArg, nodesRemoved: optionalArgNodesRemoved } =
gobbleSingleArgument(nodes, argSpecO, startPos);

let codeArg: Argument | null = null;
let codeArg: Argument | Argument[] | null = null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore, right?

} = gobbleSingleArgument(nodes, OPTIONAL_ARGUMENT_ARG_SPEC, pos) as {
argument: Argument;
nodesRemoved: number;
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is casting necessary?

type: "embellishment";
embellishmentTokens: string[];
embellishmentTokens: (Group | string)[];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a group needed here? It seems like maybe the parser should be fixed to never return a group; e{xy} should return embellishment tokens ["x", "y"]. Or is the group for default args..?

Copy link
Contributor Author

@theseanl theseanl Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group here is for e{{x}{y}}. I agree with you that this should be fixed at pegjs level alongside with #46. But working on pegjs would broaden the scope of this pr too much, I'd prefer to fix it in another PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged #49 which should mean that groups are no longer returned as embellishmentTokens.

const tokens = normalizeEmbellishmentTokens(
argSpec.embellishmentTokens
);
argSpec.embellishmentTokens = tokens; // ArgSpec is mutated here
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mutate argSpec in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example, are you OK with mutating it in gobbleArguments function? What exactly is the goal you want by not mutating it here?

}
);
currPos = bracePos[1] + 1;
matchNum = i + 1; // 1-based indices
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move this complexity to gobbleArguments. Since gobbleArguments knows that it's an embellishment and openMark === <embellishment token>, gobbleArguments can put the arguments in the correct order once they're acquired.

@siefkenj
Copy link
Owner

IMO, tsconfig.build.json should inherit from tsconfig.json, and excluding test files should be done only in tsconfig.build.json file. Each package directory should contain two tsconfig.build.json and tsconfig.json, and npm run build command should invoke tsc with tsc -b tsconfig.build.json. I'm not confident that I'll be able to change every relevant part, so it'd be nice if you could fix it :)

Could you do a separate PR that shows what you're talking about in a couple of the workspaces? The tsconfig business is really a pain and I've been trying to figure out how to get it right for a long time...

Alternatively, importing describe, it and friends in each test file will get rid of the typescript warnings.

@siefkenj
Copy link
Owner

siefkenj commented Sep 17, 2023 via email

@siefkenj
Copy link
Owner

siefkenj commented Sep 17, 2023 via email

@siefkenj
Copy link
Owner

siefkenj commented Oct 8, 2023

@theseanl Were you planning on finishing the last changes?

@theseanl
Copy link
Contributor Author

theseanl commented Oct 9, 2023

I need to eventually get it done, because it is needed for my other works, but I've been rather busy and less motivated to refactor an already working code to meet a goalpost that keeps moving. Conceptual distinction of singleton and multiple objects feel artificial to me, I even think that such a tendency may be influenced by one's linguistic backgrounds.. I'd be ok if you take over and address the points you've mentioned in reviews. I'll try to find some time within a few weeks.

@siefkenj
Copy link
Owner

@theseanl Thanks for your hard work! I have fixed the last of the issues. I couldn't figure out how to get pushing to your branch working, so I created PR #57 You can either pull from that branch, or I can close this PR and merge #57 instead.

@siefkenj siefkenj merged commit 8764415 into siefkenj:main Nov 26, 2023
1 check passed
@theseanl
Copy link
Contributor Author

Thank you for picking up the slack!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants