Implementing Firefox Sync #1573

Closed
jerone opened this Issue Jun 24, 2012 · 26 comments

Projects

None yet

8 participants

@jerone
Contributor
jerone commented Jun 24, 2012

Find a way to implement Firefox Sync to sync GreaseMonkey settings and all userscripts.

https://developer.mozilla.org/en/Firefox_Sync
http://docs.services.mozilla.com/

Keep up the great work.

@arantius
Collaborator

https://groups.google.com/d/topic/greasemonkey-users/-pTrnYXrREg/discussion

It's been brought up before.

http://blog.mozilla.org/services/2011/06/15/enabling-quotas-for-firefox-sync/

The quota is probably 25MB. We have no data at all on how much space would be required for this feature.

@jerone
Contributor
jerone commented Jun 25, 2012

That's a valid problem. Without knowing how much space those script require in the first place, it's hard to know if it's even remotely possible.

Is it possible to setup some statistics?

I found out that I'm using 4 mb of my total data (consider myself a heavy FF user) and all my scripts (30) use 2 mb of space.

@anubeon
anubeon commented Jun 27, 2012

Surely these no need to synchronise the actual scripts, they should be easy enough to reinstall. All that needs to be synchronised is a simple XML (or plain text) file detailing the locations of the scripts to be reinstalled, and perhaps (if applicable) additional XML files detailing each script's settings (which admittedly could cause problems along the lines you envision vis a vis sync quotas). Greasemonkey could then parse this file(s) and reinstall where appropriate. I'd imagine that this is how Firefox Sync handles synchronising add-ons.

One problem I can foresee would be the fringe case of custom, locally stored scripts. I suppose then synchronising the actual script(s) would be necessary, unless Greasemonkey could nudge the user into publishing any local/unpublished scripts to userscripts.org (perhaps to a private space, though that depends on how charitable the webmasters at userscripts.org are feeling) for the purposes of synchronising (warning the user that any local scripts not published to userscripts.org will not be synchronised)

Just a few thoughts.

P.S.: It might be worth checking out how Stylish Sync (a companion extension for Stylish/Firefox) implements this. Would it make sense implementing this outside of the main Greasemonkey extension? Perhaps to serve as a companion extension to both Greasemonkey AND Scriptish (AND any other forks that may or may not arise).

@jerone
Contributor
jerone commented Jun 27, 2012

It does look like a list of all addons locations are synced instead of the whole addons. This system could also be used for userscripts (which are not locally and altered scripts).

The problem of scripts that are altered by the user, could be stored in a boolean in the settings. If the script is altered and the boolean is set, the whole script should be synced, instead of the install url.

The problem of having different sets of userscripts on different locations, could be solved with sets or just with simple checkboxes if sync is an option on that script.

Aldo this feature could be very handy, I understand that there are many different situations, which all require another solution.
But I think in the cloud-based era we live in now, we can't ignore this.

@arantius
Collaborator
arantius commented Oct 4, 2012

FWIW I still think this is difficult to accomplish, but am personally more interested in it.

I'm thinking that a "good enough" solution (at first?) would be to sync the download URLs of installed scripts, and simply re-download them at each installation. Syncing GM preferences is straightforward because there are a small/fixed number of them. Then there's script preferences, which could potentially be extremely large.

@arantius
Collaborator
arantius commented Oct 7, 2012

What I'm thinking now is:

  1. Add an "installed by" field to scripts, either user (implied default when non existent) or sync (or update ?).
  2. Allow the config object (which specifies all the settings) to be exported to Firefox Sync. (This includes all applicable user settings; enabled/disabled status, user 'cludes, download url, etc.)
    • Only scripts that were installed by user. *1
    • But not scripts from file:// which will not be available elsewhere.
    • And not scripts that have been locally modified (see #3, their hashes will never match).
  3. When sending scripts to Sync, include a hash *2 of each installed file. When installing from Sync, download but only install if all files' hashes match.

*1 I think? Implications of this either way must be considered.
*2 Be super careful to do this crypto correctly! Hashing? And which algorithm? Should we really have a signature not a hash? Or is a simple content hash really enough for this usage?

Lots of other info to consider in https://developer.mozilla.org/en-US/docs/Firefox_Sync/JavaScript_Client_API including a request to reach out to the developers.

@arantius
Collaborator

I've reached out as suggested and the answer was (apparently not visible on the read only mirror of the list so pasted):

Random notes from scanning that thread:

Building a sync engine for Greasemonkey would be pretty straightforward if
you have hooks for script installation/uninstallation operations. The
add-ons engine is more complicated, because it has to handle non-restartless
add-ons and additional state.

• Yes, this is similar to add-on sync. That engine only syncs add-ons
available from AMO[0], so the content of the add-on doesn't need to be
stored. You're probably best off taking the same approach.
• See
http://dxr.mozilla.org/mozilla-central/services/sync/modules/engines/addons.js.html.
• You need either a single record (like we do with prefs) or one record
per “something” (probably per script). I suggest the latter. If you do the
latter, you need to name the records, but the names of each record should
not be identifiable — that is, if I can see Sync's HTTP logs, I shouldn't be
able to tell which user scripts you have installed. That rules out simple
hashing of public data.

All that said, you should be aware that we are part-way through two sets
of changes to Sync.

Firstly, we're implementing a next-generation protocol. This has some
crypto changes, but should be fairly transparent to engine code. (One nice
thing it will probably add is a 'salts' resource, so you can do things like
non-identifiable hashed names.)

Secondly, we're refactoring the internals to be callback-driven, and
eventually use a timestamp-based interface for retrieving changes, rather
than the current observer-based change-tracker approach. This stands a
decent chance of breaking third-party sync engines, so we'll be hand-holding
through that change.

These will probably both start to trickle out in Q1 or Q2 of next year
(but don't quote me on that).

If you structure your code appropriately, these change should be
straightforward for you, so I wouldn't be too concerned about you developing
against the current API. The majority of your work will be in designing your
record format, choosing how to apply them, reconcile conflicts, handle
errors, track changes, etc.

@johan
Collaborator
johan commented Oct 15, 2012

Syncing urls instead of script content sounds like the right thing to do, though it does create corner cases we'd have to think through carefully, e g: failing to download a synced script might cause a new "half-installed" script state, unless we can make things work by just failing that script and have it get re-attempted next sync (which may be problematic when installed scripts depend on the actions of other scripts – I still wish we'd ended up implementing some of the @depend stuff we discussed a few years, so that kind of metadata was visible to GM).

It sounds (by that reach-out) like our script records would have to be named something like SHA1(user_install_timestamp + download_url) (and sync the former, of course, rather than creating a new install timestamp for when the synced version finally got installed), so some non-public bit of data salts our sync name.

If "Only scripts that were installed by user" would get their config object synced, would that imply that tweaks to that script's enablement status (for instance) done at another machine, wouldn't get synced back?

@arantius
Collaborator

e g: failing to download a synced script might cause a new "half-installed" script state

My refactored RemoteScript should handle this fine. Downloading and installing are separate steps. If the download fails the install won't start. And this might be a common case, e.g. I have scripts installed at work, from URLs that will only resolve on the corporate network and will otherwise fail to install.

... make things work by just failing that script and have it get re-attempted next sync

Which will be semi-bad if it fill fail every time because that other computer you're syncing to is never (e.g.) at work, like in my above example. Probably a local failure counter, and permanent give up after N failures (in a row without success)?

It sounds (by that reach-out) like our script records would have to be named something like SHA1(user_install_timestamp + download_url)

More reading and planning is definitely necessary to figure out the right way to do this. One possibility might just be waiting for the "salts service" which sounds like it handles doing this right.

If "Only scripts that were installed by user" would get their config object synced, would that imply that tweaks to that script's enablement status (for instance) done at another machine, wouldn't get synced back?

Yeah, that was probably a poor idea. Probably sync everything if sync is enabled.

@arantius
Collaborator

Moving this to the 1.6 milestone. I want to do #1651 before this.

  • I want to consider the privacy policy more, which is required for Sync, and a really good idea for stats gathering. (At least, for using the default servers Mozilla provides. https://developer.mozilla.org/en-US/docs/Firefox_Sync/JavaScript_Client_API "... Further, you agree (a) to maintain and link to (including on websites from which your Third Party Client may be downloaded) a separate, conspicuous, and reasonably detailed privacy policy detailing how data collected or transmitted by your Third Party Client is managed and protected ...")
  • I want to know some of the data that #1651 could gather, like number and size of scripts actually in use.
  • If Mozilla is going to provide a good/tested/working tool for doing salting properly, I'd prefer to use that. If we start with our own then that rolls out, migration would be difficult at best.
@SunRunAway

Hello, I think this feature is great, do you have a plan to implement it?

@arantius
Collaborator
arantius commented May 6, 2013

It's complex for a variety of reasons, so it keeps slipping, but yes. That's what this issue exists for.

@voltagex

All I can do is add a +1

@arantius
Collaborator

This has obviously been delayed quite a bit. I hope to get to it soonish.

I just parsed out the script and value sizes from the past two months of anonymous data submissions and:

Script Mean   415693.577834
Script Median 105752.5
Script Max    60191412
Value Mean    98083.3976765
Value Median  0.0
Value Max     198395736

The standard sort of long tail distribution.

Bytes of scripts per user, graphed (note the log scale):

screenshot from 2013-08-27 14 40 36

Worst case is ~60MB. All but 209 (of 61,375) fit within 5MB.

Bytes of values per user, graphed (again log scale):

screenshot from 2013-08-27 14 43 43

Worst case is ~200MB. All but 176 fit in 5MB.


I went into this thinking something like:

  • Sync just the downloadURL of installed scripts.
  • When new script comes in from Sync and fails to install, sync that fact.
  • When a new installed script has been marked with "install failed" flag, then Sync the entire script and its contents.

Hopefully the majority of scripts are public and will just work. One extra round trip will sync the full content of private scripts across, for a minimum of used quota.

I personally have ~10MB of Sync quota used today, almost all of which is history. And the default quota on Mozila's servers is 25MB (or at least it was .. where is this listed?).

But maybe for the 99% case simple sync of all content will just work. Disable sync when there's too much (>10MB?) combined script-and-value data? Hide that behind a preference in case heavy users have extra quota/their own Sync server? Looking at sum of script and value, 99.75% would fit in 10MB and 99.09% within 5MB.

If we disable Sync for some users due to threshold, then we need to surface this in the UI somehow. And we need to be especially careful about users who start to Sync, then go over this limit, what happens there?

@arantius
Collaborator

Or possibly: ordered by total size (script + resources + values), sync as many scripts as will fit within a limit? Hide the limit in an about:config entry that can be tweaked?

@jerone
Contributor
jerone commented Aug 29, 2013

My preference would go to:

  • Sync just the downloadURL for all not-altered scripts.
  • Sync whole content folder for all altered scripts.

I'm thinking that the majority of the scripts are not altered.

When syncing the first time I would show a clear message saying that only X amount of data can be synced.
When a new script is added or existing script is altered, re-count the total size and show message when exceeding max size.

For users still exceeding the max size, a checkboxlist should do. let the user decide which scripts to sync.

I also read that the limit is about 25mb. People can request an increase, so making the limit configurable is better.

@rnewman
rnewman commented Sep 27, 2013

For add-ons, we explicitly only sync those hosted on AMO, precisely because they're world-reachable. You probably want to follow a similar pattern: sync URLs for the large subset of scripts that are hosted somewhere, and then think about whether you want to basically rebuild Dropbox for the rest!

You shouldn't try to skate close to the quota — aim for all users fitting under a MB, with most well below that, and you will avoid a bunch of worries. Even if one doesn't hit a quota, storing and moving data is best avoided.

@arantius arantius added a commit to arantius/greasemonkey that referenced this issue Sep 27, 2013
@arantius arantius Basic support for Firefox Sync of User Scripts.
Refs #1573
5baeac4
@arantius arantius added a commit to arantius/greasemonkey that referenced this issue Oct 2, 2013
@arantius arantius Add Sync for script values.
Hidden behind a default-off preference.

Refs #1573
e9d947c
@arantius
Collaborator
arantius commented Oct 4, 2013

I recently got a Sync Error bar and the Browser Console says:

[20:45:17.017] 1380847517017 Sync.Store.Greasemonkey WARN Encountered exception: Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIChannel.asyncOpen] Stack trace: resource://greasemonkey/remoteScript.js:531 < resource://greasemonkey/remoteScript.js:295 < resource://greasemonkey/remoteScript.js:273 < resource://greasemonkey/sync.js:85 < Store_applyIncoming()@resource://services-sync/engines.js:267 < applyIncomingBatch()@resource://services-sync/engines.js:235 < doApplyBatch()@resource://services-sync/engines.js:814 < resource://services-sync/engines.js:925 < resource://services-sync/record.js:625 < Channel_onDataAvail()@resource://gre/modules/services-sync/resource.js:542 < file:unknown

No more debugging info available.

@arantius
Collaborator
arantius commented Oct 8, 2013

Got a bunch of Sync errors in 2013.10.03 nightly. https://gist.github.com/arantius/6878488

@arantius arantius added a commit to arantius/greasemonkey that referenced this issue Oct 10, 2013
@arantius arantius Add user 'cludes to Sync.
Refs #1573
974f14a
@arantius
Collaborator

Still needs testing. Might cause failures/unexpected behaviors if you have an Sync'ed script that is temporarily not accessible when Sync wants to add it. Still off by default for testing.

But scripts sync, by URL. As do user includes/excludes and enabled status. Done enough for now.

@arantius arantius added a commit to arantius/greasemonkey that referenced this issue Oct 17, 2013
@arantius arantius Sync enabled status of GM Sync and other prefs.
Refs #1573
2ed3b9a
@arantius arantius closed this Oct 17, 2013
@bakkerthehacker

This still has some issues involving sync + master password. When sync and master password are both enabled and greasemonkey is installed, the master password window appears immediately when firefox is launched. This occurs even if the GM sync check box is left unselected.

This can have odd and far reaching issues. I only found out about this because I was using the Master Password+ addon here: (https://addons.mozilla.org/en-US/firefox/addon/master-password). I set MP+ to prompt for a password when I started firefox. But since GM was also prompting, the two conflicted and firefox refused to start.

It seems to me that sync is not fully disabled by GM and it is causing issues when GM tries to obtain the sync encryption passwords.

@rnewman
rnewman commented Dec 27, 2013

Attempting to retrieve credentials will trigger a MP prompt.

Generally you should be watching for observer notifications — e.g., weave:engine:start-tracking.

If you're following this pattern, things should work fine:

https://hg.mozilla.org/mozilla-central/file/default/services/sync/Weave.js#l30

@arantius
Collaborator

Thanks, this is #1847 .

@arantius
Collaborator
arantius commented Jan 3, 2014

Whoops, no. It is not #1847. Have reported #1852 to track the fix.

@arantius
Collaborator
arantius commented Jan 3, 2014

@rnewman does "this pattern" refer to the import action of various scripts inside resource://services-sync/? Otherwise, as far as I can tell we do already exactly that:

https://github.com/greasemonkey/greasemonkey/blob/master/modules/sync.js#L27

Check for .ready and if it is not truthy, add an observer over weave:service:ready (and wait for it to fire) before we register our engine with the weave service.

Please comment over at #1852.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment