object storage #4445

uhthomas · 2023-10-11T17:16:22Z

Object storage support has been widely requested (#1683) and is something we're keen to support. The limitations imposed by object storage happen to be beneficial for data resilience and consistency, as it makes features like the storage template infeasible. Issues like orphaned assets (#2877) or asset availability (#4442) would be resolved completely.

As discussed on the orphaned assets issue (#2877), I'd like to propose a new storage layout designed for object storage, with scalability and resilience as priorities.

Where:

<asset id> is a unique ID for an asset, ideally a random UUID. UUIDv7 may serve to be beneficial due to its property of natural order. If not, UUIDv4 should be sufficient.
<original asset filename> is the original filename of an asset, as it was uploaded.

.
└── <asset id>/
    ├── <original asset filename>
    ├── sidecar.xml
    └── thumbnails/
        ├── small.jpg
        └── large.webp

The above structure should serve to efficiently scale with resiliency and flexibility. The unique 'directory' for an asset can contain additional files like edits, colour profiles, thumbnails or anything else.

The original file and filename is preserved in case of an unlikely full database loss, where it should be possible to restore most information in such a scenario. This property is also good for humans, or if a full export is required. A directory of vague filenames without extensions would be quite unhelpful. I feel this strikes a good balance between legibility and a resiliency.

I have also considered content addressable storage (CAD), as it would save space in the event of duplicate uploads but consider it to be impractical due to complexity and the previous concern of legibility. I believe this should instead be deferred to the underlying storage provider, which can make much better decisions about how to store opaque binary blobs.

Part of this effort will require some changes to the storage interfaces (#1011) and the actual object storage implementation should use the AWS S3 SDK (docs). Most, if not all object storage systems should have an S3-compatible API.

The text was updated successfully, but these errors were encountered:

tonya11en · 2023-11-08T21:37:57Z

The advantages aren't clear to me after reading- can you elaborate? Seems like the pitch is that it fixes orphaned assets and availability during storage migrations, but I'm failing to see how object storage fixes this as opposed to anything else (storing photos in the DB, flat filesystem indexed by DB, etc.)

uhthomas · 2023-11-08T23:16:35Z

@tonya11en This is possible with a regular file system, but there has been a lot of push back for implementing the proposal for it. The current model is fundamentally incompatible with object storage, and so the proposed safe and efficient structure is required.

It may be possible to introduce some configuration option to completely disable storage migration and use this proposal for block storage too, but I am not sure if it's worth the confusion at current. I'd much rather implement object storage and gather feedback.

I have started work on this, so hopefully I can show it soon.

jrasm91 · 2023-11-09T03:49:19Z

I think the tldr is that if you don't ever move the file after it is uploaded you get a simpler system.

It has been discussed several times before and we have no immediate plans to drop support the storage template feature.

pinpox · 2023-12-26T11:32:31Z

Apart from the technical benefits, object storage can be rented way cheaper on providers like backblaze, scales without re-partitioning drives and has become pretty standard for theses applications as it makes deployment in clouds a lot easier and cleaner.

I'm eagerly awaiting the S3 support of immich to be able to migrate all my photos. Currently running a self hosted Nextcloud instance on a small VPS with external S3 storage via backblaze.

So, TL;DR: please strongly consider adding native support 🙂

uhthomas · 2023-12-29T14:44:27Z

#5917 will help - as it allows storage migration to be disabled.

janbuchar · 2023-12-31T10:45:01Z

To sum up discussion from Discord:

@zackpollard mentioned that having many nested folders would make listing all files in the library slow on HDDs
- however, it is desirable to have the same directory layout for both object storage and local storage
- not just for simplicity's sake, I can see myself wanting to move my library from rclone-mounted S3 to native S3
- listing all files is done quite often, for example on the Repair page in the administration
renaming, however, is slow (and potentially expensive) in cloud storage - it's always copy+delete
- the current directory layout needs to move every single uploaded file though
disabling storage template migration for cloud storage seems like a reasonable thing to do
wouldn't a middle ground approach where we store uploaded files in the local storage and upload them to cloud storage once metadata are extracted be sufficient? (I believe this was not answered by the team)

It is evident that there is pushback from the team against radical changes to the directory layout that may hinder performance. What would an MVP for cloud storage support look like?

uhthomas · 2024-01-02T01:28:58Z

listing all files is done quite often

I would argue this is not the case at all and listing files is not a normal part of operation for Immich at all. It is only used for the repair page, and should run infrequently (if ever). There was also discussion of backups and how that may take a while, but I would also argue it should be an infrequent operation. Regardless, it seems important to some users, so we should try to optimise for this case. @bo0tzz proposed we move forward with an object storage implementation and answer some of these questions later, as to make progress, which I agree with.

wouldn't a middle ground approach where we store uploaded files in the local storage and upload them to cloud storage once metadata are extracted be sufficient? (I believe this was not answered by the team)

I don't think this would be sensible. The whole point of object storage support is to be fast and reliable. We should try to understand how to read directly from object storage rather than add additional complexity (i.e persisting things in multiple places).

janbuchar · 2024-01-02T17:01:35Z

listing all files is done quite often

I would argue this is not the case at all and listing files is not a normal part of operation for Immich at all.

I can't be the judge of that, but it looks like that there is no consensus about that amongst the developers, so a conservative approach to this seems correct.

wouldn't a middle ground approach where we store uploaded files in the local storage and upload them to cloud storage once metadata are extracted be sufficient? (I believe this was not answered by the team)

I don't think this would be sensible. The whole point of object storage support is to be fast and reliable. We should try to understand how to read directly from object storage rather than add additional complexity (i.e persisting things in multiple places).

I believe that this complexity is inherent to the problem though. Object storage can be the long-term destination for the assets, and assets can be delivered directly from there. However, operations such as metadata extraction and thumbnail generation work with the local filesystem and it would be difficult to change that.

bo0tzz · 2024-01-02T17:49:47Z

However, operations such as metadata extraction and thumbnail generation work with the local filesystem and it would be difficult to change that.

This is true, but I think the best approach there would be for the microservices instances to keep a cache folder that they download files into, rather than having files go to local storage -first- before being uploaded to S3.

janbuchar · 2024-01-03T18:31:17Z

However, operations such as metadata extraction and thumbnail generation work with the local filesystem and it would be difficult to change that.

This is true, but I think the best approach there would be for the microservices instances to keep a cache folder that they download files into, rather than having files go to local storage -first- before being uploaded to S3.

If I understand correctly, the two proposed ways of operation for the microservices are very similar - check if the target asset is present in the local filesystem (it doesn't matter if we call it a cache), if not, fetch it from the object storage. Then proceed with whatever the microservice does.

If the uploads folder is on the local filesystem, we 1) save ourselves one roundtrip to the object storage and 2) won't need to rename the uploaded file after we extract the metadata. What are the advantages of the object-storage-first approach?

bo0tzz · 2024-01-03T19:25:06Z

The advantage is consistency, knowing for a fact that if an asset is in Immich, it is absolutely also in the object storage. It also means that the server and microservices instances can be decoupled further, no longer requiring a shared filesystem.

janbuchar · 2024-01-04T16:52:19Z

The advantage is consistency, knowing for a fact that if an asset is in Immich, it is absolutely also in the object storage. It also means that the server and microservices instances can be decoupled further, no longer requiring a shared filesystem.

Fair enough. What would be the way forward with object storage support though?

use the proposed storage for both object storage and local filesystem, ignoring the performance concerns?
have a different storage layout for object storage and local filesystem?
something entirely different?

bo0tzz · 2024-01-04T20:40:20Z

The past few days have seen significant discussion of the object storage topic amongst the maintainer team. There's no full consensus yet, but one thing that seems clear is that there will be a need for significant refactoring around how we store and handle files before object storage can be approached directly. That means cases such as abstracting away the current (filesystem) storage backend behind a common interface and using file streams throughout the code base rather than directly accessing paths. (I'll let @jrasm91 chime in on what other refactors might be needed).

aries1980 · 2024-02-14T21:09:11Z

As a workaround, maybe https://github.com/efrecon/docker-s3fs-client could help? Has anyone tried using mounting S3 with FUSE?

janbuchar · 2024-02-15T12:51:10Z

As a workaround, maybe https://github.com/efrecon/docker-s3fs-client could help? Has anyone tried using mounting S3 with FUSE?

I currently run immich with the rclone docker volume driver and it is perfectly usable.

LawyZheng · 2024-02-21T02:25:42Z

Is it possible to support s3 storage as an external library?
Something went wrong when I tried to use s3fs-client to share the volume between my host and container.
So maybe embed rclone/s3fs in the docker image?
Use rlone to mount s3 bucket as a local folder, and the next thing will be the same.

Underknowledge · 2024-02-25T11:59:42Z

As a workaround, maybe https://github.com/efrecon/docker-s3fs-client could help? Has anyone tried using mounting S3 with FUSE?

I guess it could work, but I would see the real benefit of using S3 is having the files on a remote S3 storage.
You could use for example presinged URL's to avoid piping the the data throu the immich instance. (where I host the instance I have only 8mbit upload)

xangelix · 2024-05-12T14:13:40Z

For all using FUSE mount options-- please consider https://github.com/yandex-cloud/geesefs
It should be dramatically faster and dramatically more posix compatible.

Hoping for official support though! FUSE is always very un-ideal.

mdafer · 2024-05-12T14:45:37Z

For all using FUSE mount options-- please consider https://github.com/yandex-cloud/geesefs It should be dramatically faster and dramatically more posix compatible.

Hoping for official support though! FUSE is always very un-ideal.

Thanks for the suggestion, I configured rclone volume plugin yesterday and it was not usable at all, most thumbnails were missing and many original files were either missing or corrupted...

I'm gonna try this one today based on your suggestion.

Really looking forward to having native S3-compatible storage support!

Thank you Immich team for this amazing software :)

dislazy · 2024-05-13T06:30:04Z

Immich is indeed an amazing software, the experience is very good, we live based on the cloud era, and always want to have more backups, so it feels like a very great way to access S3 or even S3 compatible object storage, and it can also effectively prevent data loss

pinpox · 2024-05-13T07:52:00Z

Immich team: I would be willing to contribute time or money for this feature, since S3 support is something I need personally. Is there a roadmap for this? Could this be broken up into tasks I can tacle as contributer?
If this is something you as a team would rather implement internally, would it be possible to set up a bounty or similar specific for this?

I would love to help out with this, let me know how to make it possible!

uhthomas added the enhancement New feature or request label Oct 11, 2023

uhthomas self-assigned this Oct 11, 2023

uhthomas mentioned this issue Oct 11, 2023

[BUG] assets may become orphaned when moved by the storage template service #2877

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

object storage #4445

object storage #4445

uhthomas commented Oct 11, 2023

tonya11en commented Nov 8, 2023

uhthomas commented Nov 8, 2023

jrasm91 commented Nov 9, 2023 •

edited

pinpox commented Dec 26, 2023

uhthomas commented Dec 29, 2023

janbuchar commented Dec 31, 2023 •

edited

uhthomas commented Jan 2, 2024 •

edited

janbuchar commented Jan 2, 2024

bo0tzz commented Jan 2, 2024

janbuchar commented Jan 3, 2024

bo0tzz commented Jan 3, 2024

janbuchar commented Jan 4, 2024

bo0tzz commented Jan 4, 2024

aries1980 commented Feb 14, 2024

janbuchar commented Feb 15, 2024

LawyZheng commented Feb 21, 2024

Underknowledge commented Feb 25, 2024

xangelix commented May 12, 2024 •

edited

mdafer commented May 12, 2024 •

edited

dislazy commented May 13, 2024

pinpox commented May 13, 2024

object storage #4445

object storage #4445

Comments

uhthomas commented Oct 11, 2023

tonya11en commented Nov 8, 2023

uhthomas commented Nov 8, 2023

jrasm91 commented Nov 9, 2023 • edited

pinpox commented Dec 26, 2023

uhthomas commented Dec 29, 2023

janbuchar commented Dec 31, 2023 • edited

uhthomas commented Jan 2, 2024 • edited

janbuchar commented Jan 2, 2024

bo0tzz commented Jan 2, 2024

janbuchar commented Jan 3, 2024

bo0tzz commented Jan 3, 2024

janbuchar commented Jan 4, 2024

bo0tzz commented Jan 4, 2024

aries1980 commented Feb 14, 2024

janbuchar commented Feb 15, 2024

LawyZheng commented Feb 21, 2024

Underknowledge commented Feb 25, 2024

xangelix commented May 12, 2024 • edited

mdafer commented May 12, 2024 • edited

dislazy commented May 13, 2024

pinpox commented May 13, 2024

jrasm91 commented Nov 9, 2023 •

edited

janbuchar commented Dec 31, 2023 •

edited

uhthomas commented Jan 2, 2024 •

edited

xangelix commented May 12, 2024 •

edited

mdafer commented May 12, 2024 •

edited