Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic comments #16

Open
stasm opened this issue Jan 23, 2017 · 26 comments
Open

Semantic comments #16

stasm opened this issue Jan 23, 2017 · 26 comments
Labels
FUTURE Ideas and requests to consider after Fluent 1.0 semantic comments

Comments

@stasm
Copy link
Contributor

stasm commented Jan 23, 2017

Having a way to semantically describe a message would benefit tooling. It would allow tools to better inform the user what they can do in the translation, and give hints and suggestions.

Perhaps we could consider using something similar JSDoc. In particular the @param tag: http://usejsdoc.org/tags-param.html. JSDoc conveniently allows to specify the type, the description and the default value, which could be used by tools to display an example of a formatted translation.

# @param {number} [$num = 4] Number of new messages
new-messages = { $num ->
   *[one] You have 1 new message.
    [other] You have { $num } new messages.
}

Would it make sense to make this meta-information first-class? Rust differentiates between regular comments (//) and doc comments (///). We could do something similar by making the @ sigil special:

@param {number} [$num = 4] Number of new messages
new-messages = { $num ->
   *[one] You have 1 new message.
    [other] You have { $num } new messages.
}

This is possibly related to #7.

@stasm
Copy link
Contributor Author

stasm commented Jan 26, 2017

Some more thoughts on how this relates to #7: we could introduce our custom @-tags for language-specific meta information which would be ignored by tools like compare-locales:

@meta masculine
brand-name = Firefox

@stasm
Copy link
Contributor Author

stasm commented Jan 26, 2017

We could also introduce versioning for messages . This would allow making small changes to the original copy without having to change the identifier. The default and implicit revision would be revision 0:

accept = Accept the Terms and Conditions.

and some time later:

@rev 1
accept = Please accept the Terms and Conditions.

@stasm
Copy link
Contributor Author

stasm commented Jan 27, 2017

Some more thoughts on how this relates to #7: we could introduce our custom @-tags for language-specific meta information which would be ignored by tools like compare-locales

In #7 (comment) @Pike suggested that we separate semantic comments and grammatical data due to their having different owners.

@stasm stasm added the syntax label Feb 16, 2017
@mathjazz
Copy link
Contributor

I love that. One possible use case is to define maximum string length (we already use it in .lang files, e.g. for translating promotional tweets).

@gunchleoc
Copy link

We could also introduce versioning for messages . This would allow making small changes to the original copy without having to change the identifier.

I'll bet you 100€ that some developer will forget to version a string and that we'll end up with significant changes in the source language that will be missing from the target languages - while you might still control this in QA somehow for Mozilla's core products, you can't count on every extension developer to be aware of the problem. The only way that works reliably in my book is the way gettext handles this: If the string content has changed, it will become fuzzy. It is annoying for localizers if strings become invalid due to a typo having been fixed in the source language, but it's the lesser of 2 evils.

@stasm stasm mentioned this issue Oct 24, 2017
@stasm
Copy link
Contributor Author

stasm commented Oct 24, 2017

In #59 @zbraniecki asked about the possibility of using semantic comments for tags. I'm pasting my reply below to keep the discussion in one place.

My understanding of the scope of semantic comments is that they would be a place to put extra data available to tools rather than the runtime. In fact they wouldn't be parsed on runtime at all. This would make it impossible to use them for tags.

@zbraniecki
Copy link
Collaborator

Even before we move forward with this, can we get a consensus on how we'd like such comments to look like so that we can start writing such comments even without them being "semantic" yet?

I'm trying to decide between:

# Description
#
# Variables
#   $num (String) - Description

and:

# Description
#
#   $num {String}: Description

or sth in between? Thoughts?

@stasm
Copy link
Contributor Author

stasm commented Nov 10, 2017

Even before we move forward with this, can we get a consensus on how we'd like such comments to look like (…)?

Isn't this the exact goal of this issue? :)

Some prior art:

JSDoc:

#  @param {number} $num - The $num value.

Some Python projects use this style:

#  :param $num: The $num value.
#  :type $num: number

I like simple ideas of the form:

# $num (Number) - Description

But without a @param or :param how would we introduce other information like max-length or revision? Could we just use @ for data not related to arguments?

# $num (Number) Description
# @max-length 140

@flodolo
Copy link
Contributor

flodolo commented Nov 10, 2017

If the goal is to parse comments, and extract information about parameters, I feel like we should enforce @param. If it's only about documenting the parameters, that your last example looks good to me.

@zbraniecki
Copy link
Collaborator

One good use of semantic comments would be to instruct the localization tool like Pontoon on what context the string will be used in.

The particular case is where a message like this:

key = Click on <a>my</a> link to paste <a> into the textbox

The latter <a> should be displayed so we probably want to use &lt;a>at least and without knowing if the key goes to alert() or to DOM it's impossible for the tool/localizer to know how to encode it.

Semantic comments could make it trivial:

# @env: html
key = Click on <a>my</a> link to paste <a> into the textbox

and to prevent having to place it in front of every string, we could use group comments and resource comments to annotate the whole file.

@zbraniecki
Copy link
Collaborator

zbraniecki commented Mar 23, 2018

Semantic Comments v1 proposal

(updated: April 5 2018)

Description

Semantic comments is the concept which brings basic computer readable structure to comments. The idea is to design a set of patterns that can be codified which enable algorithmic interpretation of a comment.
Such models allow comments to be parsed and data from them used in tooling.

The core design goal is to develop rules that are easy to naively interpret and memorize by humans with minimal overhead, while at the same time allowing computers to assign meaning.

Semantic comments may serve several high level roles:

  1. They can unify the way meta information is being stored and presented improving readability of the comments and reducing cognitive load on the reader.
  2. They can help tools present contextual information from the developer to localizers in a well organized manner.
  3. They can help tools interpret the translation and provide additional checks and UI options.

In principle, the nature of the data stored in the comments is limited - runtime parsers should be able to skip comments without parsing them and failure to retrieve information from the comment should not result in any serious reduction in usability of the system.

Experience from other programming languages shows that some form of semantic comments are helpful in most languages from JavaScript, Python, C++ to Rust and CSS.

Below is my initial proposal for the first version of Semantic Comments.

Title Line

It would be useful to be able to capture a short description of the section for UI tools to use when operating on long lists of strings.

A great example of such use case is the current preferences.ftl with hundreds of messages clustered into sections.

My proposal is to identify the title line of any comment as fitting into one of two conditions:

  1. Being the only non-empty line of the comment
  2. Being the first non-empty line of the comment with a blank line right below it.

That means that the following two are titles lines:

## Privacy Section - Site Data
## Privacy Section - Site Data
##
## This sections will contain several messages
## that should be translated by a lawyer if possible.

And the result in Pontoon, for example, may look like this

Meta-infrormation

Meta information by the definition should be an open ended system. It means that while we can specify the syntax around it and define a set of values that are defined and known, this system should also be open to be extended in the future with new keys. For that reason, I believe that a key-value param system would work well.

The initial uses of meta information may provide details like: in which context the message is being used? Communication style to use for such message. Are there any legal requirements associated with it (branding policy etc.), what UI toolkit it will use etc, string version, etc.

Another use case is to instruct the tooling about any soft-limitations imposed on the translation. For example we may want to instruct the localization tool that a given file/group/message should remain simple - no formatters, no variants etc.
Such limitation may result from some downdumb conversion that will happen on the file later in the release cycle, but a semantic comment may be useful early on to help CATs.

JSDoc has a nice system of block and inline tags. Copying it, it may look like this:

### Preferences
###
### @license LGPL
### @toolkit HTML


## General Section - Site Data


# @context Window Title
sitedata-exceptions-title =
    .label = Exceptions
    .accesskey = E


# @context Menu button
sitedata-exceptions =
    .label = Exceptions
    .accesskey = E

# @policy "Firefox" should be treated as a brand and kept in English,
#           while "Home" and "(Default)" can be localized.
home-mode-choice-default =
    .label = Firefox Home (Default)

Since the system is open ended, the only first step is to define the syntax for it. I'd like it to be:

@name value
@name value {type}
@name value {type} - description

with all three being optional.

Variables

Variables could either be a particular type of block tags, or a separate thing:

# @arg $name {string} - Name of a search engine.
search-keyword-warning-engine = You have chosen a keyword that is currently in use by “{ $name }”. Please select another.

or:

# $name {string} - Name of a search engine.
search-keyword-warning-engine = You have chosen a keyword that is currently in use by “{ $name }”. Please select another.

I'm not very opinionated here, and we could start with the former and maybe one day add the latter as a convenience mechanism ("($.*)" becomes a "@arg $1").

Syntax coloring / validation

The last item I'd like us to consider is syntax coloring and augumentation.

There are three areas where we may end up placing a syntax from another programming language:

  1. In the DOM Overlay
  2. As an argument (style: width: 15em)
  3. In a comment

For the DOM Overlay, I believe that block tag @toolkit HTML on file/section/message comment should be sufficient for any form of syntax highlighting to kick in.

For arguments it's a bit more tricky, and I thought we could annotate it like this:

# This string is currently used only in Firefox 60 and will be removed when not
# needed for x-channel. See bug 1445686 for details.
#
# @toolkit.style CSS
search-input =
    .title = My Window Title
    .style = width: 15.4em

to allow us to specify that the .style attribute is in CSS.

Finally, in the comment, I really like the RST way:

# This message will be displayed inside a :html:`<span/>` with :css:`font-weight: bold`.
#
# @arg $name {string} - Name of a search engine.
search-keyword-warning-engine = You have chosen a keyword that is currently in use by “{ $name }”. Please select another.

This could be introduced gradually - we could now specify "`" as the sygil for code only, and let tooling autoguess the syntax highlighting for it, and one day extend it with :XXX: prefix to allow for specifying the language if needed.

The total outcome might look like this

@zbraniecki
Copy link
Collaborator

@mathjazz @Pike @stasm @flodolo @adngdb @hkasemir - can I get your feedback here please? I'd like to start unifying our comments around those concepts if they're approved.

@stasm
Copy link
Contributor Author

stasm commented Apr 4, 2018

Nice work, @zbraniecki! This is very much in line with what I had in mind for this, thanks for adding substance and providing details.

Title line

I see your point and I like the Pontoon mockup. I'm not sure this practice needs to be codified as a rule. Pontoon could simply show the first line of the comment truncated to the fit the UI and the effect would be the same, I think?

Meta information

This looks great and using @prop name makes sense to me.

Variables

I like basing the syntax on JSDoc. One thing that I didn't see in your proposal is the syntax for example values. In my original comment I used JSDoc's syntax for default values of optional params:

# @param {number} [$num = 4] Number of new messages

I'm not sure this would be a good fit for Fluent. There is no notion of optional parameters/arguments so the braces [ ] are not required. They also add visual clutter. Perhaps we could add this information to the type? Like so:

# @param $name {Type, "example value"} - Description

Which would give us:

# @param $num {Number, 4} - Number of notifications.
# @param $username {String, "Anne"} - User's first name.

An alternative inspired by TypeScript, Rust and a few others:

# @param $num: Number (4) - Number of notifications.
# @param $username: String ("Anne") - User's first name.

IIUC any such derivation will make our comments syntax incompatible with JSDoc. Should we try to maintain the compatibility? Or is that a non-goal?

Syntax coloring

I recommend sticking to Markdown rather than adding features from RST. As such, I think comment contents should simply be allowed to be valid Markdown. This would make it possible to use backticks for inline code fragments, without any syntax highlighting (like `this`). While this is a limitation, it's not a big one. GitHub comments work quite well despite it :)

@flodolo
Copy link
Contributor

flodolo commented Apr 4, 2018

Title Line

I don't see a great advantage in having a title for section comments. In case, I would prefer something more explicit than relying on position and empty lines, i.e.

## @title Privacy Section - Site Data

## @title Privacy Section - Site Data
##
## This sections will contain several messages
## that should be translated by a lawyer if possible.

Which would make it fall into the next group.

Meta-information

I'm trying to imagine how we could practically use this information, but I'm failing. For example, for us @policy is implicit, since we put brand names in specific files and paths. @context is normally part of the content of the comment itself.

I think we need some valid use cases to justify the added complexity of parsing these comments.

Variables

I agree that we should standardize this type of information, and I'm fine with the @arg approach.

We could even go as far as failing some tests if a string has placeables but not associated comments.

Syntax coloring / validation

I don't think there's value in highlighting syntax in comments (last part of the proposal). It adds a ton of complexity for little gain.

I'm not sure if there should be highlighting in strings either, but I'd be more open about that.

@toolkit.style CSS

I think this should be something more like

@validation .style CSS

Which could be used to both validate the attribute externally (compare-locales), and highlight strings in Pontoon.

@zbraniecki
Copy link
Collaborator

(Title Line) I'm not sure this practice needs to be codified as a rule.

I think there is a value.

IIUC any such derivation will make our comments syntax incompatible with JSDoc. Should we try to maintain the compatibility? Or is that a non-goal?

non-goal

As such, I think comment contents should simply be allowed to be valid Markdown. This would make it possible to use backticks for inline code fragments, without any syntax highlighting (like this).

I agree about backticks, but I'd be concerned if we tried to say that all markdown syntax is supported in our comments. AFAIK Markdown supports much more and tying us to markdown seems a bit excessive (and adds a strong dependency).

@zbraniecki
Copy link
Collaborator

I don't see a great advantage in having a title for section comments. In case, I would prefer something more explicit than relying on position and empty lines, i.e.

I'm not opposed to using @title here. I think it may be redundant (since the position and white line communicate the same thing both to the human reader and can be easily parsed), but we could start with explicit param and consider adding an implicit support later.

I think we need some valid use cases to justify the added complexity of parsing these comments.

If I read your statement correctly it starts with "I don't understand" and finishes with "and thus I believe the proposal is invalid" :) I'm happy to answer your questions and explain further, but I do believe the example listed are valid.

I think this should be something more like
@validation .style CSS

Hmm, how would you denote the syntax of the value then?

@flodolo
Copy link
Contributor

flodolo commented Apr 4, 2018

If I read your statement correctly it starts with "I don't understand" and finishes with "and thus I believe the proposal is invalid" :) I'm happy to answer your questions and explain further, but I do believe the example listed are valid.

Uhm, where did I say “I don't understand”? You gave a few examples:

  • @license: I'm not sure that would work, from a legal perspective, given that we had to copy and paste the same license header to all files for a while now.
  • I've explained why I believe that @context and @policy are not going to be useful in our case.

I'm not against them, in fact I suggested to use @title (and potentially @validation), and I agree on using @arg. The disagreement is more about the open nature you're suggesting.

Hmm, how would you denote the syntax of the value then?

@validation CSS or @validation value CSS? The latter would more intuitively apply only to the value, in case there are more attributes.

On the same subject, I see these should apply only to individual strings, not file wide.

@stasm
Copy link
Contributor Author

stasm commented Apr 4, 2018

I think there is a value.

What's the vale that you're seeing? :) In particular, what is the value over what I suggested:

Pontoon could simply show the first line of the comment truncated to the fit the UI and the effect would be the same, I think?

I agree about backticks, but I'd be concerned if we tried to say that all markdown syntax is supported in our comments.

You're right, we should be explicit about only supporting a strict subset of Markdown.

@Pike
Copy link
Contributor

Pike commented Apr 4, 2018

I think we should split this up. This is way too big to reason about at this point.

High-level comments:

  • @foo: I don't think that JSDoc is a good example for us. There's a couple of overlapping concepts, and the most important @param one is not well defined. That doesn't mean that we should avoid syntax overlap, but I also think we should be strict in how we talk about this.
  • group and resource comments: Right now, pontoon doesn't do a good job at showing group comments, and I'd like to avoid adding comment area for developers if we don't have a good way to show them.

Suggestions:

I'd recommend to have this issue focus on the @foo syntax, and how to parse that. I'd split out individual foos to individual issues. (Yes, having a use-cases helps with the general syntax, but only so far.)
I'd recommend to split out group and resource comment handling into a completely different thread, and possibly have pontoon be the driving force behind changes to that. That effort might also be something to be done post-translate-view-refactor. Or something based on more realistic interactive mock-ups that let us experience how the comments in a group get shown as you translate entity for entity.

@zbraniecki
Copy link
Collaborator

I'm not sure that would work, from a legal perspective, given that we had to copy and paste the same license header to all files for a while now.

I've recently seen several conversations on #developers indicating that this is no longer true. I'd like to verify that so I'll seek further confirmation, but in general, it's a per-project policy and a header like that may be useful. Please remember that we're designing syntax not just for Gecko.

For example, for us @Policy is implicit, since we put brand names in specific files and paths. @context is normally part of the content of the comment itself.

That's not always true. We have a lot of branding related policies in other files and assuming that all brands will end up in separate FTL files is IMHO not going to hold. Having a parameter to provide policy information seems like a low hanging fruit.

Regarding @context - the basic premise behind semantic comments is to extract pieces of the "content of the comment" into bits that are interpretable by software.

For example, while currently a comment may contain contextual information, it would be hard/impossible for Pontoon to try to reason about if such comment contain any contextual information and which part of the comment does so.
Having a @context parameter allows tooling to use a particular message that is semantically described to contain contextual information to be presented in a form that is more relevant to the reader.

Examples may be screenshots of the UI, or even more semantic information like is it a title, message, button label etc, which could be further used by the tool to improve the graphical representation of the message and help the localizer understand how to translate.

A particular example here is that knowing the context of the message may help Pontoon prioritize the messages in translation memory which share the same context over ones that have the same English value but different context. Those are of course just example.

On the same subject, I see these should apply only to individual strings, not file wide.

Agree.

I'd split out individual foos to individual issues.

Agree. I'll file issue per proposal assuming that we're past the stage where a single issue for all elements of the proposal make it easier to discuss them.

Thanks!

@flodolo
Copy link
Contributor

flodolo commented Apr 4, 2018

Please remember that we're designing syntax not just for Gecko.

I think that's one point that I tend to forget in such discussions.

And translation memory seems definitely an interesting application, the challenge would be making sure values for these are chosen consistently.

@zbraniecki
Copy link
Collaborator

Based on conversation with Stas I added an example for meta-data about simple strings to instruct CAT to warn against using any placeables in a given file/group/message.

@stasm stasm added syntax and removed syntax labels May 15, 2018
@stasm stasm added this to the Syntax FUTURE milestone May 18, 2018
@stasm stasm added the FUTURE Ideas and requests to consider after Fluent 1.0 label May 23, 2018
@stasm stasm removed this from the Syntax FUTURE milestone May 23, 2018
@Pike
Copy link
Contributor

Pike commented May 30, 2018

We talked about moduralization of fluent specs, and I think semantic comments would be a good example. Should we have a repo for just semantic comments (projectfluent/comments or projectfluent/semantic-comments), with issues to move individual aspects forward, and possibly a markdown file per spec/facet?

@stasm
Copy link
Contributor Author

stasm commented May 30, 2018

I'd prefer to keep everything in a single repo and use labels and projects. We can add a new file in the spec/ directory. There's already a draft there of how errors should be handled (to be revised, for sure).

@zbraniecki
Copy link
Collaborator

Separated out into issues. Skipped colors for now.

@stasm
Copy link
Contributor Author

stasm commented Oct 16, 2018

I created a GitHub project for tracking the design and implementation of semantic comments: https://github.com/projectfluent/fluent/projects/5. @zbraniecki, should we close this issue given that we now have separate issues for each proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FUTURE Ideas and requests to consider after Fluent 1.0 semantic comments
Projects
None yet
Development

No branches or pull requests

6 participants