Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Implementation #267

Closed

Conversation

duyquangnguyenhac
Copy link
Contributor

@duyquangnguyenhac duyquangnguyenhac commented Jun 5, 2020

#Description:
With consideration of the past S3-related commits, I opened this pull request in order to work on it full time for this year's GSOC program. As of now, the work is not done, but I've created a mock S3 storage client based on the current storage implementation with all of the same methods, but with S3 implementation. This is so I could understand the concept of the task at hand, and make sure I'm heading towards the right direction.

The S3 class created is mostly functional with several bugs to be fixed, there are several RSpec tests I've made to ensure its functionality. If you would like to run those RSpec tests, you can by providing your own credentials manually in the default config section. This is the way to do it as of now because I haven't done any changes to the CLI. Eventually, the tests will move away from VCR because VCR testings are not exactly optimal. The class can also successfully run with gem_pusher tests with small fixes to the expected parameters.

#Bugs at hand

  • The @content uploaded to the S3 servers are encoded in UTF-8 which is different from the binary encoding on the filesystem files which causes the tests on modules such as the gem_pusher to fail.
  • Issues accessing the gemstash_env variables for S3 storage.

#Features to be implemented:

  • Implementation into the CLI
  • In-depth tests for the classes
  • Abstractualized storage class as presented Abstract local storage #264
  • Stub-testing instead of VCR

I will abide by the code of conduct.

James added 10 commits June 5, 2020 10:30
I manually use gemstash_env config to store AWS credentials. This is temporary, I will change it when I get to modify CLI.
Similar to the original storage.rb resource class
Since S3 doesn't have proper directory structure, I use the object name to mimic a similar directory structure on S3 storage. The directory won't exist until there is an resource stored under the same directory path.
I also complete the methods similar to the original methods with S3 implementation.
The tests is now using vcr gem which makes actual API calls and record the response that was made. If you want to run these tests, provide your own credentials and it will make actual changes on your AWS bucket to test whether the methods work.
Remove pry and byebug
Right now the tests uses VCR, but I know that is not optimal. I'll change it later.
@@ -3,3 +3,7 @@
source "https://rubygems.org"

gemspec

gem 'aws-sdk-s3', '~> 1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to lock to ~> 1.67

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think this should be now a runtime dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0944f59

@@ -27,4 +27,6 @@ module Gemstash
autoload :Web, "gemstash/web"
autoload :WebError, "gemstash/http_client"
autoload :VERSION, "gemstash/version"
autoload :S3, "gemstash/S3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why same name for different paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because s3 class and s3 resource class have the same location in s3_storage.rb
Refractored in 8c8713b

extend Gemstash::Env::Helper
attr_reader :folder, :client, :bucket

VERSION = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this STORAGE_VERSION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8c8713b

attr_reader :folder, :client, :bucket

VERSION = 1
class VersionTooNew < StandardError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it VersionTooNewError ?

VERSION = 1
class VersionTooNew < StandardError
def initialize(folder, version)
super("Gemstash storage version #{Gemstash::S3::VERSION} does " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this error means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a look at the initialization at line 23, it is used to make sure that when the S3 Storage class is originally initialized, the resource version is checked. If root = false, or if the initialization isn't the original initialization, but merely adding on to a previous initialization, then it is not checked. I'm unsure how the addition of a root variable helps, but I think it improves performance so that you don't have to check the version if it isn't the root initialization.


begin
@S3resource.object(content_filename(key)).delete
rescue StandardError => e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to catch the specific error that @S3resource.object(content_filename(key)).delete can cause so we don't so we can rescue properly

Copy link
Contributor Author

@duyquangnguyenhac duyquangnguyenhac Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit #3278315


begin
@S3resource.object(properties_filename).delete unless content?
rescue StandardError => e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved above ^

return if @properties && !force
return unless @S3resource.object(properties_filename).exists?

properties_file = @S3resource.object(properties_filename).get.body
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename this to properties_content since is not a file but the body

end

def save_file(filename)
content = yield
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to yield the content and to pass it as an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, this method yield is certainly a bit unnecessary. I leave it as is because it was already implemented in the previous storage class so I didn't want to cause unexpected errors. At some point, I will refactor the entire storage class to make it simpler, and more straightforward.

check_resource_version
end

def save_properties(props)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can set the default of props as props = {}

Before, I needed to pass a gemstash_env variable to access config variables. By adding a 'include gemstash::env' line, and fixing the variable names, I no longer need add gemstash_env to the initialization.
Before, fetching data from S3 results in the data to be encoded in a different encoding than filesystem. This fixes so that it matches the binary encoding on file system files.
Add delete_with_prefix method which ensures that all the test files are deleted after use. Change variable to @storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants