Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"flatpak install" doesn't use static deltas when downloading repository data #3412

Closed
clopez opened this issue Feb 12, 2020 · 7 comments · May be fixed by ostreedev/ostree#2005
Closed

"flatpak install" doesn't use static deltas when downloading repository data #3412

clopez opened this issue Feb 12, 2020 · 7 comments · May be fixed by ostreedev/ostree#2005

Comments

@clopez
Copy link

clopez commented Feb 12, 2020

Linux distribution and version

Debian 10 (Buster) and Fedora 31 (tested both)

Flatpak version

1.2.5 (Debian) and 1.4.3 (Fedora)

Description of the problem

Flatpak documentation advertises that you can generate a repository with static deltas to make downloads much faster.

From: https://docs.flatpak.org/en/latest/hosting-a-repository.html

Flatpak supports something called static deltas. These are single files that contain all the data needed to go between two revisions (or from nothing to a revision). Creating such deltas will take up more space on the server, but will make downloads much faster. This can be done with the flatpak build-update-repo --generate-static-deltas option.

From: https://blogs.gnome.org/alexl/2017/02/10/maintaining-a-flatpak-repository/

Additionally, as objects are whole files, we still have to download the entire file even if just one byte in the file changed.
In order to solve this flatpak supports something called static deltas. These are pre-generated delta files (using bsdiff) that contain all the data needed to go from one version to another (or from nothing to a version). If these files are available they are used instead of the individual objects, which allows pulls to be much more efficient.
You can generate static deltas for the latest versions of all apps by passing --generate-static-deltas to flatpak build-update-repo, and I recommend everyone do this for production repositories.

The problem is that the command flatpak install seems unable to use static deltas in practice when downloading the repository data, and it downloads all the objects one by one, making the download to be very slow for a repository that contains lot of small objects (lot of small files), even when the repository has enabled static deltas.

Downloading the very same repository with ostree pull instead of with flatpak install works as expected and static deltas are used, therefore making the download much faster.

Steps to reproduce

1. test: flatpak install (static deltas not working)

Try to download the following test repository with flatpak install and see how the download speed its slow (less than 2Mbytes/sec).

rm -fr /tmp/test-fp
mkdir /tmp/test-fp
export FLATPAK_USER_DIR=/tmp/test-fp
cd /tmp/test-fp
flatpak remote-add --user webkit-sdk https://software.igalia.com/flatpak-refs/webkit-sdk.flatpakrepo 
flatpak install --user webkit-sdk org.webkit.Sdk 0.1 --assumeyes --no-related

On top of the obvious slow download, you can verify that flatpak its downloading the (many small) individual objects instead of the deltas by checking the many HTTP petitions its doing, either with wireshark/tcpdump or by passing the environment variable OSTREE_DEBUG_HTTP=1 and counting the GET petitions it reports the debug log.

rm -fr /tmp/test-fp
mkdir /tmp/test-fp
export FLATPAK_USER_DIR=/tmp/test-fp
cd /tmp/test-fp
flatpak remote-add --user webkit-sdk https://software.igalia.com/flatpak-refs/webkit-sdk.flatpakrepo
OSTREE_DEBUG_HTTP=1 flatpak install --user webkit-sdk org.webkit.Sdk 0.1 --assumeyes --no-related 2>&1 | grep GET | wc -l

And the last command (wc -l) reports (after a long while): 85751 (that its, it did more than 85k GET petitions!, that its clearly not using deltas)

2. test: ostree pull (static deltas working)

Now let's do the same but instead of flatpak install let's use ostree pull to pull the very same repository.

rm -fr /tmp/test-fp
mkdir /tmp/test-fp
export FLATPAK_USER_DIR=/tmp/test-fp
cd /tmp/test-fp
flatpak remote-add --user webkit-sdk https://software.igalia.com/flatpak-refs/webkit-sdk.flatpakrepo
ostree --repo=repo pull webkit-sdk runtime/org.webkit.Sdk/x86_64/0.1

Now you can see the download speed much faster. Then you can also see how OSTree prints its downloading a few deltas only. On top of that you can verify its the case by debugging it like previously:

rm -fr /tmp/test-fp
mkdir /tmp/test-fp
export FLATPAK_USER_DIR=/tmp/test-fp
cd /tmp/test-fp
flatpak remote-add --user webkit-sdk https://software.igalia.com/flatpak-refs/webkit-sdk.flatpakrepo
OSTREE_DEBUG_HTTP=1 ostree --repo=repo pull webkit-sdk runtime/org.webkit.Sdk/x86_64/0.1 2>&1 | grep GET | wc -l

In this case (with ostree pull), the last command completes much faster and (wc -l) only reports: 295 GET petitions (because it used deltas for downloading)

This very same issue its reproducible with other flatpak repositories, like for example, the LibreOffice one on FlatHub, but there the issue its less obvious because the LibreOffice flatpak repository has much less objects than the one from this example.
The one from this example has 85744 objects with an average size of 10kb per object, meanwhile the LibreOffice one (app/org.libreoffice.LibreOffice/x86_64/stable) has "only" 5669 objects with an average size of 46Kb per object.

Downloading the objects individually instead of using the static deltas its a problem for this kind of flatpak repositories with so many small objects. I would like "flatpak install" to support downloading it using static deltas like "ostree pull" does.

@alexlarsson
Copy link
Member

What happens is that to be able to get good progress reporting for the extra data case we first pull just the commit, which is a OSTREE_REPO_PULL_FLAGS_COMMIT_ONLY call here:
https://github.com/flatpak/flatpak/blob/master/common/flatpak-dir.c#L4947

But, this triggers a suboptimal behaviour when we're pulling the entire thing later here:
https://github.com/ostreedev/ostree/blob/master/src/libostree/ostree-repo-pull.c#L3429

Basically instead of using the from-scratch delta it does a regular pull because it thinks we have a mostly up-to-date pull just because the ref exists in the local repo, even though we really only have the commit object...

@alexlarsson
Copy link
Member

The ostree change which causes this is:
ostreedev/ostree@e7305bb

which is in Ostree v2018.8.

The flatpak change seems to be 0007edb which is in 0.9.3.

alexlarsson added a commit to alexlarsson/flatpak that referenced this issue Feb 12, 2020
As noticed in flatpak#3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.
alexlarsson added a commit to alexlarsson/flatpak that referenced this issue Feb 12, 2020
As noticed in flatpak#3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.
alexlarsson added a commit to alexlarsson/flatpak that referenced this issue Feb 13, 2020
As noticed in flatpak#3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.
alexlarsson added a commit to alexlarsson/flatpak that referenced this issue Feb 13, 2020
As noticed in flatpak#3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.
alexlarsson added a commit that referenced this issue Feb 13, 2020
As noticed in #3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.
@alexlarsson
Copy link
Member

Fixed in 1.6.2

@clopez
Copy link
Author

clopez commented Feb 13, 2020

Thanks!
I have a question: Would this be back-ported to older stable branches like 1.4.x or 1.2.x?

@alexlarsson
Copy link
Member

Not sure, i mainly do backports for security things.

@nanonyme
Copy link
Contributor

@alexlarsson could you please at least consider the complexity of the backport? This most likely has a significant impact on Flathub network usage for runtime downloads.

@clopez
Copy link
Author

clopez commented Feb 14, 2020

I also think this is worth a backport. Not only the load on flathub would be reduced but the user-experience of new installs would be very much improved.
If this helps, I did a backport of the patch to 1.2.5 and built debian packages for debian 10. The backport was easy, just only one reject that was easy to handle

alexlarsson added a commit to alexlarsson/flatpak that referenced this issue Feb 14, 2020
As noticed in flatpak#3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.

(cherry picked from commit b371ef9)
alexlarsson added a commit that referenced this issue Feb 14, 2020
As noticed in #3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.

(cherry picked from commit b371ef9)
dbnicholson pushed a commit to endlessm/flatpak that referenced this issue Feb 18, 2020
As noticed in flatpak/flatpak#3412 we
regressed at some point and are no longer using from-scratch deltas.
This is caused by an optimization in ostree where it decides to not
use a from-scratch deltas if theres is *some* version of the ref
locally available.

This conflicts with some code in flatpak that pulls *only* the commit
object in order to look for extra data size information so that we can
get the progress reporting right. Unfortunately the existance of
just the object triggers the above causing us to *never* use from-scratch
deltas.

We fix this by throwing away the partial pull in an aborted ostree
transaction.

(cherry picked from commit b371ef9)

https://phabricator.endlessm.com/T29334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants