Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add middleware to override storage class. #666

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

gaul
Copy link
Owner

@gaul gaul commented Jul 20, 2024

This is best-effort and some storage classes do not map exactly, particularly for non-S3 object stores. Fixes #625.

@gaul gaul requested a review from timuralp July 20, 2024 14:36
@gaul
Copy link
Owner Author

gaul commented Jul 20, 2024

@michaelcourcy could you take a look at this to see if it meets your needs?

@michaelcourcy
Copy link

Hi @gaul, is there any reason that you chose the word tier instead of storageClass in the way you name your constant and class ?

public static final String PROPERTY_TIER_BLOBSTORE = "s3proxy.tier-blobstore";
public final class TierBlobStore extends ForwardingBlobStore

Maybe tier relates to something else, something more global that make sense in the context of s3proxy ?

@michaelcourcy
Copy link

michaelcourcy commented Jul 21, 2024

Ok I understand S3 is a more specific use case of jcloud. org.jclouds.blobstore.domain.Tier map to org.jclouds.s3.domain.StorageClass.

ARCHIVE -> REDUCED_REDUDANCY
INFREQUENT -> STANDARD_IA
STANDARD -> STANDARD

@gaul
Copy link
Owner Author

gaul commented Jul 21, 2024

Using Tier is probably confusing to users -- perhaps the user-facing interface should only reference S3 storage classes? I imagine most users will use S3-S3 mapping although S3-other object store will probably require understanding the jclouds concepts. BTW the jclouds ARCHIVE tier actually maps to S3 deep archive storage class. Unfortunately there will be some lossy conversions unless we add further support to jclouds.

@michaelcourcy
Copy link

BTW the jclouds ARCHIVE tier actually maps to S3 deep archive storage class.

Ah I missed the mapping code

         case STANDARD: return StorageClass.STANDARD;
         case INFREQUENT: return StorageClass.STANDARD_IA;
         case COOL: return StorageClass.STANDARD_IA;
         case COLD: return StorageClass.GLACIER_IR;
         case ARCHIVE: return StorageClass.DEEP_ARCHIVE;

Unfortunately there will be some lossy conversions unless we add further support to jclouds.

It's already a lot and far enough for starting this new feature IMO.

I will try to test it against AWS S3 and then later against CEPH. It's been a long time I haven't build a maven project.

@gaul
Copy link
Owner Author

gaul commented Jul 21, 2024

Thinking about this some more, anything that requires reading the source code is a non-starter. I think this would work better if the user specifies the desired storage class in S3 terms and S3Proxy should log when it does a lossy conversion. If we round-trip the StorageClass -> Tier -> StorageClass we have lossless conversions for:

      STANDARD(Tier.STANDARD),
      STANDARD_IA(Tier.COOL),
      GLACIER_IR(Tier.COLD),
      DEEP_ARCHIVE(Tier.ARCHIVE);

And lossy conversions for:

      INTELLIGENT_TIERING(Tier.STANDARD),
      REDUCED_REDUNDANCY(Tier.STANDARD),
      ONEZONE_IA(Tier.COOL),
      GLACIER(Tier.ARCHIVE),

Logging the conversions makes even more sense for non-S3 object stores. But I imagine most users will use S3 and be happy with the lossless mapping of the first four storage classes.

@michaelcourcy
Copy link

michaelcourcy commented Jul 21, 2024

I think this would work better if the user specifies the desired storage class in S3 terms

If user proxy against a s3 for sure but if he proxies against Azure or GCP then this s3 mapping may be confusing and against the spirit of the project ? (I'm asking I'm really not sure)

anything that requires reading the source code is a non-starter.

Sure but we'll add a wiki page

@michaelcourcy
Copy link

@gaul My test fails but maybe I missed the configuration.

I checkout tier-blobstore branch and build

mvn package

s3proxy.properties I introduce s3proxy.tier-blobstore=INFREQUENT

s3proxy.tier-blobstore=INFREQUENT
s3proxy.endpoint=http://127.0.0.1:8080
s3proxy.authorization=aws-v2-or-v4
s3proxy.identity=local-identity
s3proxy.credential=local-credential 
jclouds.provider=aws-s3
jclouds.identity=<redacted>
jclouds.credential=<redacted>

launch s3proxy ./s3proxy --properties s3proxy.properties

then check my configuration works

export AWS_ACCESS_KEY_ID=local-identity
export AWS_SECRET_ACCESS_KEY=local-credential
aws s3 --endpoint-url http://127.0.0.1:8080  ls s3://mcourcy-testia/
                           PRE k10/
                           PRE mic/

And try to put a file

echo "put me in test" > test.txt
aws s3 --endpoint-url http://127.0.0.1:8080 cp test.txt s3://mcourcy-testia/test.txt

But when I check the storageclass of test.txt it's STANDARD instead of STANDARD_IA

aws  s3api head-object --endpoint-url http://127.0.0.1:8080 --bucket mcourcy-testia --key test.txt

I get

aws  s3api head-object --endpoint-url http://127.0.0.1:8080 --bucket mcourcy-testia --key test.txt
{
    "LastModified": "2024-07-21T14:52:10+00:00",
    "ContentLength": 15,
    "ETag": "\"a99a4b4b9e8204c1489f9f37dd21ec1f\"",
    "ContentEncoding": "",
    "ContentType": "text/plain",
    "Metadata": {},
    "StorageClass": "STANDARD"
}

I confirm by checking in AWS console
image

@michaelcourcy
Copy link

Also tried with

s3proxy.tier-blobstore=COOL

But did not work.

@gaul
Copy link
Owner Author

gaul commented Jul 21, 2024

Sorry please try the latest proposed PR -- previously it lacked the changes to Main.java to actually accept the configuration.

@michaelcourcy
Copy link

It works now !
Here is the configuration I used

s3proxy.endpoint=http://127.0.0.1:8080
s3proxy.authorization=aws-v2-or-v4
s3proxy.identity=local-identity
s3proxy.credential=local-credential
s3proxy.tier-blobstore=INFREQUENT
jclouds.provider=aws-s3
jclouds.identity=<redacted>
jclouds.credential=<redacted>

@michaelcourcy
Copy link

I will test it against CEPH S3.

I created this wiki page to document it https://github.com/gaul/s3proxy/wiki/Middleware-Tier-blobstore

Copy link

@michaelcourcy michaelcourcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the little change I suggest in the comments but not a blocker IMO.

* s3proxy.tier-blobstore = VALUE
*
* VALUE can be anything from org.jclouds.blobstore.domain.Tier, e.g.,
* STANRDARD, COOL, COLD, ARCHIVE. org.jclouds.s3.domain.StorageClass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use COOL in the configuration

s3proxy.tier-blobstore=COOL

I get this error message

Exception in thread "main" java.lang.IllegalArgumentException: No enum constant org.jclouds.blobstore.domain.Tier.COOL
        at java.base/java.lang.Enum.valueOf(Enum.java:293)
        at org.jclouds.blobstore.domain.Tier.valueOf(Tier.java:24)
        at org.gaul.s3proxy.Main.parseMiddlewareProperties(Main.java:278)
        at org.gaul.s3proxy.Main.main(Main.java:125)

Look like we are limited by the enum values of org.jclouds.blobstore.domain.Tier which are

STANDARD
INFREQUENT
ARCHIVE

I would suggest to change this comment accordingly unless I pick up the wrong version of jcloud in the maven dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely that's what happened in this version there is COOL and COLD
https://github.com/apache/jclouds/blob/master/blobstore/src/main/java/org/jclouds/blobstore/domain/Tier.java
I did not figure out why when I built s3proxy but that does not matter please ignore my comment.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jclouds 2.6.0 only has a few Tier constants but 2.7.0-SNAPSHOT includes a wider variety.

@gaul gaul changed the title Add TierBlobStore to override Tier during putBlob Add middleware to override storage class. Jul 22, 2024
@gaul
Copy link
Owner Author

gaul commented Jul 22, 2024

This now maps and logs S3 storage classes instead of jclouds Tiers as previously discussed.

blobStore = storageClassBlobStore;
System.err.println("Requested tier: tier");
System.err.println("Overriden to: " +
StorageClass.fromTier(storageClassBlobStore.getTier()));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for S3-S3 mappings and is confusing for other storage backends.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wiki accordingly.

This is best-effort and some storage classes do not map exactly,
particularly for non-S3 object stores.  Fixes #625.
Copy link
Collaborator

@timuralp timuralp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Unfortunate that we're limited by the jclouds set of storage classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a middleware to change the storageclass
3 participants