Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluemix Cloud Object Storage Support [Python] #28

Merged
merged 14 commits into from Oct 9, 2017

Conversation

bassel-zeidan
Copy link
Collaborator

No description provided.

@bassel-zeidan bassel-zeidan changed the title Bluemix Cloud Object Storage Support [Python] Bluemix Cloud Object Storage Support [Python] [Don't merge] Oct 2, 2017
python/README.md Outdated
The class CloudObjectStorage allows you to connect to bluemix cos. You can connect to bluemix using api keys
as follows:

```pythonw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be python

python/README.md Outdated
```

Alternatively, you can connect to bluemix cos using IAM token. Example:
```pythonw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be python

python/README.md Outdated
```pythonw
import ibmos2spark

# @hidden_cell
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we add the # @hidden_cell we should 1) make a link to some docs to describe what it is, and 2) make a separate code snippet to encourage developers/users to separate hidden cells from the other parts of the code that doesn't need to be hidden.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need that at all actually. It is a DSX thing. I took it out


'''
This class allows you to connect to a cloud object storage (COS) instance. It also support connecting to a cos instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cos -> COS

* endpoint
* access_key
* secret_key
credentials: a dictionary with the required keys to connect to cos. The required keys differ according
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cos -> COS

* access_key
* secret_key
credentials: a dictionary with the required keys to connect to cos. The required keys differ according
to the type of cos.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cos -> COS

* secret_key
credentials: a dictionary with the required keys to connect to cos. The required keys differ according
to the type of cos.
- for cos type "softlayer_cos" the following key are required:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cos

* endpoint
* access_key
* secret_key
- for cos type "bluemix_cos", here are the required/optional key:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cos

@gilv
Copy link

gilv commented Oct 3, 2017

@bassel-zeidan @gadamc
I think ( we better check ) - it need to IBM COS and not just COS

@bassel-zeidan
Copy link
Collaborator Author

@gadamc @gilv i did some updates to the description and readme file

@bassel-zeidan bassel-zeidan changed the title Bluemix Cloud Object Storage Support [Python] [Don't merge] Bluemix Cloud Object Storage Support [Python] Oct 4, 2017
@gilv
Copy link

gilv commented Oct 4, 2017

@bassel-zeidan @gadamc I don't think we should mention somewhere "Bluemix Cloud Object Storage Support" If it's only title - then it's fine, but if customers will see it somewhere in the code - it's wrong. We have IBM Cloud Object Storage", not "Bluemix Cloud Object Storage Support"

Copy link
Collaborator

@gadamc gadamc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments on this PR, which would probably also apply to the R/Scala versions as well.

  1. I'd like to get your comment about the named args in the README. Seems that it would be a little more helpful to include them.

  2. What you think about the options regarding the maze of "if" statements, which I also commented on below?

Our options seem to be
a. do nothing and merge now
b. fix now and merge afterwards.
c. create a new issue to address code structure and merge now.

I'd prefer "b", but will accept answer "c".

}

configuration_name = 'os_bluemix_cos_config'
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name, 'bluemix_cos')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use named args here instead? configuration_name=configuration_name, cos_type='bluemix_cos'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course having the args name would work but i think this form is quite nicer and minimised. The user who is using our lib knows Python and knows how params work in it. I think this form is fine especially because we explained in details all params and what they do.

}

configuration_name = 'os_bluemix_cos_config'
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name, 'bluemix_cos', 'iam_token')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use named args here instead? configuration_name=configuration_name, cos_type='bluemix_cos', auth_method='iam_token'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


bucket_name = 'bucket_name'
object_name = 'file_name'
data = sc.textFile(cos.url(object_name, bucket_name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cos.url(object_name, bucket_name) is a confusing order (logically opposite order of typical folder/file hierarchy structure) and different from the container_name, object_name order of the softlayer/bluemix objects.

I know why it's there (historically that was how it was rolled out) and if we rearranged the order we'd break things...

I guess the questions are, how much would we break and how easy to repair? I guess it would break too many of the current deployments and usage of this lib in DSX accounts??

It's also really unfortunate that this is also different from the same functionality in the Scala and R versions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the bucket_name at the end is due to the fact that this param is optional and in python it has to go to the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think about it if we want to change this in a next release

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but it's only optional because when this was first supported, we used a fixed configuration name. I was opposed to that because it didn't support multiple configurations like the other implementations, but it was pushed ahead anyways and released too quickly, in my opinion.

@@ -170,37 +170,50 @@ def url(self, container_name, object_name):

class CloudObjectStorage(object):
Copy link
Collaborator

@gadamc gadamc Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the complicated maze of "if" statements.

Can we think of a cleaner solution that could be more easily extensible/configurable in the future?
We could subclass CloudObjectStorage and create a new class for each connection type. And/Or perhaps make a factory class/method for building the correct class type based on the input parameters.
I'm open to ideas here to simplify this. We could also consider incorporating the bluemix and softlayer classes into this code refactoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think how it is right now is quite clean and clear. Honestly, i think having a new class for Bluemix COS would be confusing since it is really the same with different creds.

@bassel-zeidan
Copy link
Collaborator Author

@gadamc I will get the updates in. Let's collect feedback from the users with the next version and improve in case needed in next iterations.

@bassel-zeidan bassel-zeidan merged commit fc92a7f into master Oct 9, 2017
@bassel-zeidan bassel-zeidan deleted the bluemix_cos_support_python branch October 9, 2017 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants