Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search extensions fail validation if search parameter values are localized messages #3911

Open
aminomancer opened this issue Sep 13, 2021 · 14 comments

Comments

@aminomancer
Copy link
Member

aminomancer commented Sep 13, 2021

I uploaded some extensions to the add-on dev hub as normal. I'll just discuss one example for simplicity's sake.

This extension's manifest is very similar to Firefox's built-in search extensions, because my intention was to replace the built-in engines with my own engines where I can control the icon. Some of the built-in engines are fine and have x-suggestions URLs (like Amazon) though others lack suggestions when an API is actually available. I've been meaning to file a bug about this on Firefox but it slipped my mind since I don't know what component to file it under.

I also wanted to replace the icons so they look consistent with built-in icons. That is, replace colorful ICO images with SVG files with context properties, so the icon can respond to themes.

Anyway, here is the form of my version of the Amazon extension. As you can see it's nearly identical to the built-in extension.

{
    "name": "Amazon (Search)",
    "description": "__MSG_extensionDescription__",
    "manifest_version": 2,
    "version": "1.3",
    "applications": {
        "gecko": {
            "id": "amazondotcom@aminomancer"
        }
    },
    "hidden": true,
    "default_locale": "en",
    "icons": {
        "16": "favicon.svg"
    },
    "web_accessible_resources": ["favicon.svg"],
    "chrome_settings_overrides": {
        "search_provider": {
            "keyword": "@ama",
            "name": "Amazon",
            "search_url": "__MSG_searchUrl__",
            "search_form": "__MSG_searchForm__",
            "suggest_url": "__MSG_suggestUrl__",
            "search_url_get_params": "__MSG_searchUrlGetParams__"
        }
    }
}

And here are some of those localized messages:

{
  "extensionName": {
    "message": "Amazon.com"
  },
  "extensionDescription": {
    "message": "Amazon.com Search"
  },
  "searchUrl": {
    "message": "https://www.amazon.com/exec/obidos/external-search/"
  },
  "searchForm": {
    "message": "https://www.amazon.com/exec/obidos/external-search/?field-keywords={searchTerms}&ie={inputEncoding}&mode=blended&tag=mozilla-20&sourceid=Mozilla-search"
  },
  "suggestUrl": {
    "message": "https://completion.amazon.com/search/complete?q={searchTerms}&search-alias=aps&mkt=1"
  },
  "searchUrlGetParams": {
    "message": "field-keywords={searchTerms}&ie={inputEncoding}&mode=blended&tag=mozilla-20&sourceid=Mozilla-search"
  }
}

Again, you can see how they match the built-in extension's messages.

And here are my results after uploading to AMO developer hub:

"/chrome_settings_overrides/search_provider/search_url" should match pattern "^(https://|http://(localhost|127\.0\.0\.1|\[::1\])(:\d*)?(/|$)).*$"
"/chrome_settings_overrides/search_provider/search_url" should match format "url"
"/chrome_settings_overrides/search_provider/suggest_url" should match pattern "^$|^(https://|http://(localhost|127\.0\.0\.1|\[::1\])(:\d*)?(/|$)).*$"
"/chrome_settings_overrides/search_provider/search_form" should match pattern "^(https://|http://(localhost|127\.0\.0\.1|\[::1\])(:\d*)?(/|$)).*$"
"/chrome_settings_overrides/search_provider/search_form" should match format "url"

Please correct me if I'm mistaken and this is just an issue with my syntax. But I have used the same syntax before with no problems, and for most of these search engines I just copied the l10n files from Firefox's source code. It seems to only become a problem when I try to define a search parameter with a l10n message.

It seems like the linter isn't capable of interpreting the localization syntax and finding message values? Idk. Anyway, I made 4 of these search extensions. I don't really need the extensions to be signed, it's not a big deal. But it would be nice to not have to tell my users to disable the signing requirement, and put up with the alarming "unverified" message in about:addons.

Of course one way or another, if they're using my search extensions, they're already messing with their Firefox profile or installation pretty heavily. But I think this might be a bigger issue beyond just my applications, because shouldn't there be lots of search extensions with localized search parameters? If I want to provide search suggestions in the user's language, declaring one suggest_url is not sufficient. I can't think of any way to deliver localized suggestions without this feature, since it's not like there's any infrastructure set up for dispatching or receiving l10n information with the initial request. (to my knowledge)

Thanks 🙏

┆Issue is synchronized with this Jira Task

@Rob--W
Copy link
Member

Rob--W commented Sep 30, 2021

This is indeed a bug in the linter.

These properties are declared as localizable: https://searchfox.org/mozilla-central/rev/d10188f9b4f1c4974264f3925505a0498d346c57/browser/components/extensions/schemas/chrome_settings_overrides.json#49,70,89,127

The correct thing to do here is to enumerate all translations and check that the translations match the specified format

@aminomancer
Copy link
Member Author

Well there doesn't seem to be any value I can provide for the translated values that doesn't cause the same error on AMO. I haven't checked every one because some of these extensions have hundreds of translation files. Here's another example that AMO rejects:

{
    "extensionName": {
        "message": "Google Images"
    },
    "extensionDescription": {
        "message": "Google Images"
    },
    "searchUrl": {
        "message": "https://www.google.com/search"
    },
    "searchForm": {
        "message": "https://www.google.com/search?client=firefox-b-d&q={searchTerms}&tbm=isch"
    },
    "suggestUrl": {
        "message": "https://www.google.com/complete/search?client=firefox&q={searchTerms}&tbm=isch"
    },
    "searchUrlGetParams": {
        "message": "client=firefox-b-d&q={searchTerms}&tbm=isch"
    }
}

I don't think there's any other valid way I can express those url patterns anyway. But like I mentioned in the first post, those same values work perfectly fine if I write them directly in manifest.json. So far the problem only seems to arise when I write them in l10n files and reference them as messages. Idk if that's what you meant. Like, is this pattern format only used to check URL values if they are translated messages? That wouldn't make any sense in my mind.

If I had to guess based on this behavior, where the URLs work in manifest.json but don't work in localized messages, I would assume the linter is just literally parsing the URLs as a string value "__MSG_searchUrl__" rather than as a key named __MSG_searchUrl__ with a string value "https://www.amazon.com/exec/obidos/external-search/". But how could something like that go unnoticed until this? Idk. Anyway I still can't find any way to make any functional URL work in a localized message.

@Rob--W
Copy link
Member

Rob--W commented Oct 1, 2021

But how could something like that go unnoticed until this? Idk.

Apparently nobody tried it before, or didn't bother with reporting an issue.

Anyway I still can't find any way to make any functional URL work in a localized message.

It's not possible until the linter supports it. In my previous comment I sketched what the linter should do, patches are welcome if you're interested in unblocking this quickly.

@aminomancer
Copy link
Member Author

Is there some other way to squeeze this through the linter? Is it possible to, like, register the search engine dynamically rather than through the manifest? I forgot about the issue and just wasted another day making another search extension with 80 languages lol. Now I have 6 extensions waiting around to publish since I can't seem to get them signed without removing the l10n support. I don't want to force english onto everyone since it defeats the purpose of most of these extensions, but I don't know of any other way to register a localized search engine than to use __MSG_blah

@Rob--W
Copy link
Member

Rob--W commented Dec 5, 2021

There is no work-around available. Given your (JS) skills and motivation, the fastest way to achieve your goal is to contribute a patch to the linter that accepts the format. Two months ago I described how the patch would look like ;)

@aminomancer
Copy link
Member Author

I understand what you said but it's vague and I have no prior familiarity with this project, so I don't think it's at all likely I will figure this out on my own. I cloned the repo but I can't even follow the test's code path from beginning to end. I think I see where it begins, by searching for "should match pattern" and I think I understand the basic structure of the project. But as far as I can tell, all the locale messages are evaluated separately, agnostic to what they're being used for, and the manifest is evaluated separately, agnostic to the values or types of the messages it references. Actually I get the impression it wasn't expected that locale messages would have any other type than string?

I'm probably missing something, but I don't see any existing methods for enumerating all matching locale messages and iterating over them with a parser. Instead it seems like LocaleMessagesJSONParser is oriented toward parsing an entire file all at once, not property by property. But this is a huge project and it hurts my brain just trying to find what any one function does, so I probably overlooked something if not several things.

I don't really have substantial programming skills. I am just in a very early state of learning javascript. This linter is far beyond anything I've ever worked on in terms of complexity. I mean, I welcome the opportunity to learn more but I don't really have the faintest clue of where to start. Normally I learn by imitation of something concrete, and I can't see anything in the linter that does anything comparable to this. I.e., validating a given property's localized value by validating all the matching locale messages according to the property's pattern or format value in the schema, rather than according to some other validator for locale messages.

@Rob--W
Copy link
Member

Rob--W commented Dec 7, 2021

Admittedly, my previous comment was very concise and didn't have any details on relevant locations in the addon-linter's source code. So I'll try to add some more detail. Feel free to ask follow-up questions (or stop working on this issue).

In #3911 (comment), I linked to Firefox source code that shows the definition of the schema. The schema has the "proprocess": "localize" property that is not used. Consequently, the raw message value is used, which fails to validate because the manifest key contains the "pattern" property that specifies the expected format.

To solve this, we would have to find the relevant localized string when needed (i.e. when"preprocess":"localize" is present), and ideally do that for every locale (but at minimum for default_locale, to start with).

There are a few possible approaches to do that:

  1. Parse the manifest file for every locale.
    • Pro: may be easier to implement. Con: May be computationally expensive.
  2. Parse the manifest file for the default locale.
    • Pro: easier to implement (even easier than option 1), not as computationally expensive as the previous option. Con: does not validate all locales (but on the other hand, we do currently not support localized manifests at all, and supporting one is better than none).
    • The JSON is parsed and validated at src/parsers/manifestjson.js (lines 83-126).
      Some manifest keys related to localization are checked at src/parsers/manifestjson.js (lines 573-625). (Side note: as you can see from a few lines below this referenced code, the validator may edit the parsedJSON object as part of validation, as an implementation detail).
  3. Add a custom keyword parser (for the preprocess keyword) that takes care of the validation.
  4. Add a custom keyword that modifies the input

@aminomancer
Copy link
Member Author

So the preprocess property is not currently used anywhere at all? I was trying to find where it was used but the only place I could find was here, which seems meaningless. Can you point me to where exactly the linter calls the pattern/format validation for a given property? I think there's a gap in my understanding of how schema properties are validated. If I knew where a schema property's value was checked against the schema property's pattern then I think I could better understand the sequence of events and where a new keyword parser would fit into that.

@Rob--W
Copy link
Member

Rob--W commented Dec 10, 2021

So the preprocess property is not currently used anywhere at all?

Not in the addons-linter. That's why we have this issue.

I was trying to find where it was used but the only place I could find was here, which seems meaningless.

This is a link to Firefox's source code, not the linter. It's not necessary to read Firefox's source code to work on this patch (only the schemas, but those are regularly imported into this repository), merely an understanding of the localization feature is sufficient. Since you authored a localized extension, it's clear that you have at least some basic understanding of _locales/<lang>/messages.json, default_locale and localized properties.

(but if you are interested in getting a better understanding of Firefox's source code, or even contributing to it, then I'm willing to assist)

Can you point me to where exactly the linter calls the pattern/format validation for a given property? I think there's a gap in my understanding of how schema properties are validated. If I knew where a schema property's value was checked against the schema property's pattern then I think I could better understand the sequence of events and where a new keyword parser would fit into that.

I'll link source code below (and already did in my last comment), but it may be easier to understand if you were to add some console.log calls, build the library and see the effect of running the linter. You could start with logging the details of deprecated to get a better understanding of how a keyword is processed, and follow up by adding a validator.addKeyword("preprocessor", ...) call as suggested in my last comment at point 3 (or even 4).

The pattern property validation is done by ajv (documentation: https://ajv.js.org/json-schema.html#pattern), initialized at

const validator = ajv({
allErrors: true,
errorDataPath: 'property',
jsonPointers: true,
verbose: true,
schemas: this.schemas,
schemaId: 'auto',
});
validator.addMetaSchema(jsonSchemaDraft06);
ajvMergePatch(validator);
this._addCustomFormats(validator);
this._addCustomKeywords(validator);

We use ajv version 6.x, whose source code is available at https://github.com/ajv-validator/ajv/blob/v6.12.6/lib/ajv.js.

If you aren't able to get the feature to work via the ajv library, then an alternative way to solve the issue is to "manually" preprocess the manifest, before anything is passed to ajv at all. The logic would be something along the lines of "if default_locale is set, check if there are any properties that need localizing, and if so, read the locales and localize them before validating the parsedJSON (manifest.json)". See previous comment, point 2 for links to relevant source code locations if you'd like to take this approach. The disadvantage of this approach is that it requires code for every property (which increases the maintenance burden), but if you are somehow not able to implement the feature in a generic way, a specific way is better than nothing at all.

@aminomancer
Copy link
Member Author

aminomancer commented Dec 11, 2021

This is a link to Firefox's source code, not the linter. It's not necessary to read Firefox's source code to work on this patch (only the schemas, but those are regularly imported into this repository), merely an understanding of the localization feature is sufficient. Since you authored a localized extension, it's clear that you have at least some basic understanding of _locales/<lang>/messages.json, default_locale and localized properties.

(but if you are interested in getting a better understanding of Firefox's source code, or even contributing to it, then I'm willing to assist)

Yeah that makes sense. Thanks. I understand that the module I linked is unrelated to the linter, I was just wondering if it was significant that I couldn't find the preprocess property anywhere else. I guess it was just intended for future expansion or something. Actually I am pretty familiar with firefox's frontend code, but I have very little experience with node, so this is all new to me.

If you aren't able to get the feature to work via the ajv library, then an alternative way to solve the issue is to "manually" preprocess the manifest, before anything is passed to ajv at all. The logic would be something along the lines of "if default_locale is set, check if there are any properties that need localizing, and if so, read the locales and localize them before validating the parsedJSON (manifest.json)". See previous comment, point 2 for links to relevant source code locations if you'd like to take this approach. The disadvantage of this approach is that it requires code for every property (which increases the maintenance burden), but if you are somehow not able to implement the feature in a generic way, a specific way is better than nothing at all.

My first instinct would be to create some one-off logic for localized properties in the manifest, the way you described, since I don't need to learn a new library to do that. But that would add a lot of redundancy, especially if I intend to validate every message in every language (which seems like it would be mandatory for my patch to be accepted, I imagine it may be possible to abuse otherwise), and I think I get how it works for the deprecated property, so I'm trying to add a preprocess keyword to _addCustomKeywords. My basic understanding is that if I add a keyword "preprocess" to the validator, then when it hits a property in the schema definition with the preprocess property, it will use the validate function I pass to addKeyword?

validator.addKeyword('preprocess', {
  validate: function validate(
    message,
    propValue,
    schema,
    dataPath
  ) {
    console.log('message :>> ', message);
    console.log('propValue :>> ', propValue);
    console.log('schema :>> ', schema);
    console.log('dataPath :>> ', dataPath);
    return true;
  }
});

But when I try this for example, I get the console messages but the localized URL/component properties still get flagged as errors, presumably by something else. So I guess even if I make this validator function correctly, I'm still missing something crucial. Edit: to be clear, I'm not sure if this is because 1) the properties with preprocess are still being validated by something else in addition to my keyword validator, or 2) there's a timing issue. All I can see is my own console messages and the validation summary at the end. Not sure where else I would need to place console messages to find out. Tomorrow I'll try a more scattershot approach.

Aside from that, in principle I can make whatever translations need to be made for that particular property inside that validator function and only return true if every translation passes the relevant test. And I'm pretty sure by imitating the other source code, I can figure out how to retrieve schema information and all the localized messages for the property being validated. I'm just not really sure how to call the pattern testing functions from within a keyword validator function. The existing keywords in validator.js have much simpler validator functions than what is needed here, and they don't seem to perform any string test anyway.

I will need to retrieve the localized messages, then for each message, perform the same pattern/format validation that would otherwise have been applied to the __MSG_*__ string itself. But I don't know how to access that pattern/format validation yet. I can access the pattern string from the schema value passed to the validate function. But I'm sure their intention is not for me to recreate my own match function, so I must be missing something. I read most of the ajv API documentation but I'm still not sure how to access a particular test or pass the translated propValue to it.

@Rob--W
Copy link
Member

Rob--W commented Dec 13, 2021

This is a link to Firefox's source code, not the linter. It's not necessary to read Firefox's source code to work on this patch (...)
Yeah that makes sense. Thanks. I understand that the module I linked is unrelated to the linter, I was just wondering if it was significant that I couldn't find the preprocess property anywhere else.

Just to satisfy your curiosity:

  1. The Context class in Schemas.jsm defines a base implementation with a dummy implementation for preprocessors.localize that simply returns the input, at https://searchfox.org/mozilla-central/rev/5909d5b7f3e247dddff8229e9499db017eb438e2/toolkit/components/extensions/Schemas.jsm#383-387,400-404.
  2. When the manifest file is loaded with localization-awareness, the localize method is overridden, at https://searchfox.org/mozilla-central/rev/5909d5b7f3e247dddff8229e9499db017eb438e2/toolkit/components/extensions/Extension.jsm#949,952,958,962-965,967

If you aren't able to get the feature to work via the ajv library, then an alternative way to solve the issue is to "manually" preprocess the manifest, before anything is passed to ajv at all. The logic would be something along the lines of "if default_locale is set, check if there are any properties that need localizing, and if so, read the locales and localize them before validating the parsedJSON (manifest.json)". See previous comment, point 2 for links to relevant source code locations if you'd like to take this approach. The disadvantage of this approach is that it requires code for every property (which increases the maintenance burden), but if you are somehow not able to implement the feature in a generic way, a specific way is better than nothing at all.

My first instinct would be to create some one-off logic for localized properties in the manifest, the way you described, since I don't need to learn a new library to do that. But that would add a lot of redundancy, especially if I intend to validate every message in every language (which seems like it would be mandatory for my patch to be accepted, I imagine it may be possible to abuse otherwise), and I think I get how it works for the deprecated property, so I'm trying to add a preprocess keyword to _addCustomKeywords. My basic understanding is that if I add a keyword "preprocess" to the validator, then when it hits a property in the schema definition with the preprocess property, it will use the validate function I pass to addKeyword?

validator.addKeyword('preprocess', {
  validate: function validate(
    message,
    propValue,
    schema,
    dataPath
  ) {
    console.log('message :>> ', message);
    console.log('propValue :>> ', propValue);
    console.log('schema :>> ', schema);
    console.log('dataPath :>> ', dataPath);
    return true;
  }
});

But when I try this for example, I get the console messages but the localized URL/component properties still get flagged as errors, presumably by something else.

That's probably because the validator runs the validation logic for every defined keyword. Even if you define a custom keyword, then the default logic for pattern (in ajv) would still run.

I'm not sure if it is possible to disable validation of other properties.
One thing that I would explore is to use the modifier to return the (default) localization, and repeat the validation logic with the other localized versions.

@aminomancer
Copy link
Member Author

One thing that I would explore is to use the modifier to return the (default) localization, and repeat the validation logic with the other localized versions.

What do you mean by the modifier? I thought I could only return booleans. I can use a keyword validator to change what is passed to the ajv pattern validator?

@Rob--W
Copy link
Member

Rob--W commented Dec 15, 2021

One thing that I would explore is to use the modifier to return the (default) localization, and repeat the validation logic with the other localized versions.

What do you mean by the modifier? I thought I could only return booleans. I can use a keyword validator to change what is passed to the ajv pattern validator?

Point 4 of #3911 (comment). It's not just the brief answer there, but also the references to Github issues/comments at https://stackoverflow.com/questions/50876024/modify-data-using-ajv-for-json-schema#comments-50894529

For more documentation, see this section (which documents the feature for the ajv version that we are using): https://github.com/ajv-validator/ajv/tree/v6.12.6#user-content-defining-custom-keywords

@stale
Copy link

stale bot commented Jun 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this bug should stay open, please comment on the issue with further details. Thank you for your contributions.

@stale stale bot added the state: stale label Jun 18, 2023
@stale stale bot removed the state: stale label Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants