New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluemix Cloud Object Storage Support [Python] #28
Conversation
required credentials
python/README.md
Outdated
The class CloudObjectStorage allows you to connect to bluemix cos. You can connect to bluemix using api keys | ||
as follows: | ||
|
||
```pythonw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be python
python/README.md
Outdated
``` | ||
|
||
Alternatively, you can connect to bluemix cos using IAM token. Example: | ||
```pythonw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be python
python/README.md
Outdated
```pythonw | ||
import ibmos2spark | ||
|
||
# @hidden_cell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we add the # @hidden_cell
we should 1) make a link to some docs to describe what it is, and 2) make a separate code snippet to encourage developers/users to separate hidden cells from the other parts of the code that doesn't need to be hidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need that at all actually. It is a DSX thing. I took it out
python/ibmos2spark/osconfig.py
Outdated
|
||
''' | ||
This class allows you to connect to a cloud object storage (COS) instance. It also support connecting to a cos instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cos -> COS
python/ibmos2spark/osconfig.py
Outdated
* endpoint | ||
* access_key | ||
* secret_key | ||
credentials: a dictionary with the required keys to connect to cos. The required keys differ according |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cos -> COS
python/ibmos2spark/osconfig.py
Outdated
* access_key | ||
* secret_key | ||
credentials: a dictionary with the required keys to connect to cos. The required keys differ according | ||
to the type of cos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cos -> COS
python/ibmos2spark/osconfig.py
Outdated
* secret_key | ||
credentials: a dictionary with the required keys to connect to cos. The required keys differ according | ||
to the type of cos. | ||
- for cos type "softlayer_cos" the following key are required: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cos
python/ibmos2spark/osconfig.py
Outdated
* endpoint | ||
* access_key | ||
* secret_key | ||
- for cos type "bluemix_cos", here are the required/optional key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cos
@bassel-zeidan @gadamc |
@bassel-zeidan @gadamc I don't think we should mention somewhere "Bluemix Cloud Object Storage Support" If it's only title - then it's fine, but if customers will see it somewhere in the code - it's wrong. We have IBM Cloud Object Storage", not "Bluemix Cloud Object Storage Support" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of comments on this PR, which would probably also apply to the R/Scala versions as well.
-
I'd like to get your comment about the named args in the README. Seems that it would be a little more helpful to include them.
-
What you think about the options regarding the maze of "if" statements, which I also commented on below?
Our options seem to be
a. do nothing and merge now
b. fix now and merge afterwards.
c. create a new issue to address code structure and merge now.
I'd prefer "b", but will accept answer "c".
} | ||
|
||
configuration_name = 'os_bluemix_cos_config' | ||
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name, 'bluemix_cos') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use named args here instead? configuration_name=configuration_name, cos_type='bluemix_cos'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course having the args name would work but i think this form is quite nicer and minimised. The user who is using our lib knows Python and knows how params work in it. I think this form is fine especially because we explained in details all params and what they do.
} | ||
|
||
configuration_name = 'os_bluemix_cos_config' | ||
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name, 'bluemix_cos', 'iam_token') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use named args here instead? configuration_name=configuration_name, cos_type='bluemix_cos', auth_method='iam_token'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
||
bucket_name = 'bucket_name' | ||
object_name = 'file_name' | ||
data = sc.textFile(cos.url(object_name, bucket_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cos.url(object_name, bucket_name)
is a confusing order (logically opposite order of typical folder/file hierarchy structure) and different from the container_name, object_name order of the softlayer/bluemix objects.
I know why it's there (historically that was how it was rolled out) and if we rearranged the order we'd break things...
I guess the questions are, how much would we break and how easy to repair? I guess it would break too many of the current deployments and usage of this lib in DSX accounts??
It's also really unfortunate that this is also different from the same functionality in the Scala and R versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the bucket_name at the end is due to the fact that this param is optional and in python it has to go to the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think about it if we want to change this in a next release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but it's only optional because when this was first supported, we used a fixed configuration name. I was opposed to that because it didn't support multiple configurations like the other implementations, but it was pushed ahead anyways and released too quickly, in my opinion.
@@ -170,37 +170,50 @@ def url(self, container_name, object_name): | |||
|
|||
class CloudObjectStorage(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the complicated maze of "if" statements.
Can we think of a cleaner solution that could be more easily extensible/configurable in the future?
We could subclass CloudObjectStorage and create a new class for each connection type. And/Or perhaps make a factory class/method for building the correct class type based on the input parameters.
I'm open to ideas here to simplify this. We could also consider incorporating the bluemix
and softlayer
classes into this code refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think how it is right now is quite clean and clear. Honestly, i think having a new class for Bluemix COS would be confusing since it is really the same with different creds.
@gadamc I will get the updates in. Let's collect feedback from the users with the next version and improve in case needed in next iterations. |
No description provided.