Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement QqLaTeXFormatter #2

Open
ischurov opened this issue Dec 1, 2016 · 15 comments
Open

Implement QqLaTeXFormatter #2

ischurov opened this issue Dec 1, 2016 · 15 comments
Assignees

Comments

@ischurov
Copy link
Owner

ischurov commented Dec 1, 2016

Implement simple qqDoc → LaTeX formatter.

@dorofeefff
Copy link
Collaborator

I added 'qqlatex' file to the qqmbr directory. There are a few issues that I don't know yet how to resolve:

  1. What would be an analogy between qq tags and latex tags? So far, I've added handlers for h1, h2, and paragraph, and I assume that h1 = chapter, h2 = section, paragraph = subsection. Is that correct or do you have something else in mind?
  2. I can't deal with labels correctly. Take a look at 'handle_paragraph' in qqlatex. It doesn't work, and I assume that it is because command 'tag.find('\label')' is incorrect. I don't know how to resolve it though.
  3. The program ignores tabs that are present in the original text. Is it ok or do we need to fix it? The original tabs are probably lost during parsing, but I guess we could recreate them. So far I can't, and each each '\begin{...}' starts at the beginning of a new line (see the screenshot attached). Should I add some kind of variable for 'levels' - what generation of descenders a given tag is? Then we could do something like:
    return ' ' * level + 'begin...'

@ischurov
Copy link
Owner Author

ischurov commented Dec 9, 2016

  1. I believe we have to introduce a dictionary (as an atribute QqLaTeXParser class) for this correspondence. It can be initialized like you propose, but user will be able to set it differently. One can also make a boolean option in constructor of QqLaTeXParser like h1_is_section: if it is True, than h1=section, h2=subsection, h3=subsubsection, h4=paragraph. It can be False by default.
  2. Try tag.find('label') and let me know if it is not working. No backslash needed here as we are finding by tag name. Backslash is not a part of name.
  3. Yes, these indents are lost during parsing, this is by design. I believe it worth to add a possibility to restore them. I'd propose adding an integer option like indent_step and corresponding attribute of parser object to specify how many whitespaces the indent step is. As the parser is recursive, you have to pass the current level to handle/format functions as an argument each time you invoke them. We also have to decide, which tags will increase this level. (I suppose, enumerated environments and equation-like environments, just to begin with). Anyway, this feature is not a top priority right now. The top priority is to make output TeX'able.

@dorofeefff
Copy link
Collaborator

dorofeefff commented Dec 16, 2016

  1. Dictionary added. Seems to work ok. I didn't really understand how to do the default thing. Could you elaborate please?
  2. I modified find() as you said, but it still doesn't work. I guess, we are applying find() to a wrong object. See screenshot attached. Here tag == root[0]
    screen shot 2016-12-16 at 13 51 50
  3. I'll postpone this question for now then.

@ischurov
Copy link
Owner Author

  1. It seems that you forget committing new version, so it's difficult to answer.
  2. It seems that the parser don't know that label is allowed tag. The standard protocol to pass this information to parser is as follows. Formatter's uses.tags() method should return a set of all tags that formatter know how to handle. Then we should feed parser instance with this set. In QqHTMLFormatter.use_tags() we use some hacks to get the list of allowed tags from the handles and their docstrings. You can check how it is done there and port it (most probably just copy-paste) to the new formatter.

P.S. And it is better not to include code as screenshot in bugreports/questions: you can just use markdown to format code properly in the editor. Otherwise it is mostly impossible for example to reproduce your code as I can't copy-paste it.

@dorofeefff
Copy link
Collaborator

dorofeefff commented Feb 4, 2017

Some updates

  1. I changed the structure of handlers and committed it to my branch - you can see it updated and the detailed description of the changes is added as well

  2. Labels refuse to work properly. Check handle_simple. The way labels are handled there does not work. I also tried several other options that are used in qqhtml, but they don't work neither. Apparently because there are other functions that are not defined in QqLaTeXFormatter. I don't think we need them anyway, and there should be a simple way to handle labels. But I don't know how.

  3. A new issue is that in reality we want to translate a string \paragraph newpar into \paragraph{newpar}. However, I don't know how we can use a part of content (or is 'newpar' a part of content?) to add it to the handled tag.

@ischurov
Copy link
Owner Author

ischurov commented Feb 5, 2017

0

It is good habit to mention the number of issue in the commit description (like "Related: #2") and/or mentioned commit id in comment on issue (like "see 51c6cec"). Github makes such mentiones hyperlinks and it allow to follow easily what was changed and why.

1

Good. See my comment there.

2

If you say that something doesn't work, it is a good idea to provide a code sample that shows what do you mean: your settings, input values, actual behaviour and desired behaviour. Currently, I see that label is not in a list of uses_tags() for the formatter.

formatter = QqLaTeXFormatter()
'label' in formatter.uses_tags()
# False

So it will not be parsed in the default settings. If I add it to parser's allowed_tags explicitly, I have:

parser = QqParser(allowed_tags=formatter.uses_tags() | {'label'})
tree = parser.parse(r"""
\h1 Hello \label test
""")
tree.as_list()
# ['_root', '\n', ['h1', 'Hello ', ['label', 'test\n']]]
# looks good here

Not if I trying to format it, I have

print(formatter.format(tree))
...
/Users/user/prj/qqmbr/qqmbr/qqlatex.py in handle_simple(self, tag)
     60 \{{{name}}} \{{{label}}}
     61 {content}
---> 62 """.format(name=self.tag_to_latex[tag.name], content=self.format(tag), label = tag.find('label'))
     63 
     64     def handle_begin_end(self, tag):

KeyError: 'label'

This is due to fact that you are trying to invoke self.format(…) on the tag h1, the formatting goes recursively and it tries to format every single sub-tag of h1, including label. As there's no any special handlers for label defined, it uses handle_simple as a default handler but fails as label is not in the keys of tag_to_latex. If we add to tag_to_latex, this code will work but produce strange results.

The fundamental problem here is inconsistency between LaTeX and qqDoc handling of labels. There are two different cases in LaTeX:

\section{Some section}\label{some:label}

and

\begin{theorem}\label{some:label}
Some theorem
\end{theorem}

In qqDoc, in both cases label tag belongs to the tag it labels to (h1 in the first case and theorem in the second case), but in LaTeX it's different: in the second case \label is located inside the environment and in the first case it is located just after the section command.

So we have to handle this two cases differently. This is what your code in handle_simple tries to do. But we have to keep in mind the other case.

Actually, there are two possibilities:

  1. Assign empty handler to label. (Actually, I'm not sure it is a good idea to use handle_simple as a default handler. A lots of tags that do not generate any output but only affects how their parents are rendered exists or can appear, so I'd rather stick with the decision that default handler is always empty handleк (=returns empty string).) Then you have to handle it manually every time it can appear (particularly, in enumerateble environments). (You also have to fix the code of handle_simple — at least you need \label\{{{label}}}.)

  2. Add label to a list of tags that processed by handle_simple. In this case you have to remove label tag from currently processed simply tag before you invoke recursive formatting of this tag. This can probably be done by delete, but it's better to check it. In this case, you have to do nothing to handle labels in environments.

I leave it to you to decide what is better.

3

You can check how this string is parsed by printing the result of parsing.

tree = parser.parse(r"""
\paragraph Some text here
    More text.
""")
tree.as_list()
# ['_root', '\n', ['paragraph', 'Some text here\nMore text.\n']]

So you see that string Some text here is attached to tag paragraph. This is an expected behaviour according to the specs:

The rest of a line where block tag begins will be attached to that tag either, but it is handled a bit differently if it contains other valid block tags or a separator character.

So what's the problem here?

@dorofeefff
Copy link
Collaborator

dorofeefff commented Feb 23, 2017

  1. Comment resolved

  2. Tags are (almost) working now. I took into account the discrepancy between LaTeX and qq tags, so the two types are being proceeded differently. The only problem that I have now is the following. Suppose I have a body of text like this:

\h1 Section Two
Some text
\theorem \label 2x2
I know this!

Because Section Two is a child of \h1, when I format h1 (see content=self.format(tag) in handle_simple), Section Two will appear again. Hence, the resulting text will be:
\{section} \label{Section Two}
Section Two
Some text
\begin{theorem} \label{2x2}
I know this!
\end{theorem}

  1. I actually tried to explain intuitively (and unclearly) the discrepancy between LaTeX and qq tags that you outlined above
  • By the way, how do you include several lines of code here? :)

dorofeefff added a commit that referenced this issue Feb 23, 2017
Addressed last comment in Implement QqLaTeXFormatter #2
@ischurov
Copy link
Owner Author

First of all,

\h1 Section Two
Some text

is parsed into

["_root", ["h1", "Section Two"], "Some text"]

so Section Two is a part of h1 and Some text is not a part of it (as there's no indent here).

It should be translated to

\section{Section two}
Some text

So \{section} … Some text is incorrect output regardless of label processing.

Please, see also my comments on your commit 9248ad8.

Multiline code snippets are created with triple backtics, see the docs.

@dorofeefff
Copy link
Collaborator

dorofeefff commented Mar 17, 2017

Most of your comments are resolved in commit 0361785
The one thing that I did not manage to fix is the following:

\h1 Section Two
Some text

is now translated to

\section{Section Two}
Section Two

This is true that the preceding code is parsed into
["_root", ["h1", "Section Two"], "Some text"]
but when I use content=self.format(tag), then Section Two gets included into content (see line 79 for an example).

@ischurov
Copy link
Owner Author

Indeed, why do you need {content} in the following snippet?

return """
\{name}{caption} {label}
{content}
""".format(name=self.tag_to_latex[tag.name], content=self.format(tag),
           label = label_string, caption = caption_string)

You have a tag like

\h1 Section Two

It have to be translated to

\section{Section Two}

So the whole content of a tag should go inside a curl brackets in \section{…}. (However, it have to properly formatted, as it is possible that it contains other tags, like \h1 Proof of \ref[Theorem|thm:main], that should be properly translated to LaTeX as well, not left as-is.

@ischurov
Copy link
Owner Author

Btw, I added testcase related to an issue we discussed. We have to create a testcase for every piece of behaviour we need and then make the program to pass them all (it's called test-driven programming). After that if we make any change to the program we can check if it breaks something or not. You can look at the other tests is test folder.

Please, if you see that something does not work as expected in your code, add a test that shows how it should work first, then it is easier to discuss it, find a particular place in the program that doesn't work (using debugger) and so on.

@dorofeefff
Copy link
Collaborator

dorofeefff commented Mar 23, 2017

Ok-ok! I see my mistake: I thought that

\h1 Some section
    Some text

should be translated to

\section{Some section}
Some text

But I see now that you don't use tabs after \h1 so it, not by mistake, is translated to

\section{Some sectionSome text}

In the commit 8c772f0 I resolved this issue for handle_simple.

About cross-referencing: just to clarify, \ref[Theorem|thm:main] should be translated to Theorem \ref{thm:main} ? (this does not work yet)

Also, I don't know how to create tests, test, or debug...

@ischurov
Copy link
Owner Author

This is why tests are essential — code is better than descriptions. To add a test, just add some functions to the corresponding files in test folder (they have to be actually a methods of a corresponding class, look at my tests there; also they should begin with test_). To run test, you can use Run menu in PyCharm (or right-click on the file and pick "run tests…").

We can begin with \ref[Theorem|thm:main]Theorem \ref{thm:main}, yes. (It is a bit better for me to have word "Theorem" as a part of a link as well, but I'm not sure how to do it in LaTeX easily; feel free to investigate.)

@dorofeefff
Copy link
Collaborator

dorofeefff commented Mar 23, 2017

See this 6577e99 commit

  1. I added handle_ref, and it works sometimes. Specifically, when I feed it a line like \ref[Theorem|thm:one], it returns Theorem \ref{thm:one} which is correct. However, when I want to format a reference within another tag, like \h1 proof of \ref[Theorem|thm:one], it returns the following mistake:
QqError: ('New block tag open during inline mode on line %i: %s', (2, '\\separator'))

Any idea?

  1. Test for \ref added

@ischurov
Copy link
Owner Author

Hmm, seem to be a parser bug. I'll dig into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants