From fa0345aa0a27f8bd0140eeff5afb28a8a3c218f4 Mon Sep 17 00:00:00 2001 From: Bassel Zeidan Date: Thu, 13 Jul 2017 15:08:33 +0200 Subject: [PATCH] Update readme file --- python/README.md | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/python/README.md b/python/README.md index e6fac7d..4ae6ffe 100644 --- a/python/README.md +++ b/python/README.md @@ -1,16 +1,16 @@ # ibmos2spark -The package sets Spark Hadoop configurations for connecting to +The package sets Spark Hadoop configurations for connecting to IBM Bluemix Object Storage and Softlayer Account Object Storage instances. This packages uses the new [stocator](https://github.com/SparkTC/stocator) driver, which implements the `swift2d` protocol, and is availble -on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience). +on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience). -Using the `stocator` driver connects your Spark executor nodes directly +Using the `stocator` driver connects your Spark executor nodes directly to your data in object storage. This is an optimized, high-performance method to connect Spark to your data. All IBM Apache Spark kernels -are instantiated with the `stocator` driver in the Spark kernel's classpath. -You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator) -and adding it to your local Apache Spark kernel's classpath. +are instantiated with the `stocator` driver in the Spark kernel's classpath. +You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator) +and adding it to your local Apache Spark kernel's classpath. ## Installation @@ -21,22 +21,38 @@ pip install --user --upgrade ibmos2spark ## Usage The usage of this package depends on *from where* your Object Storage instance was created. This package -is intended to connect to IBM's Object Storage instances obtained from Bluemix or Data Science Experience -(DSX) or from a separate account on IBM Softlayer. The instructions below show how to connect to -either type of instance. +is intended to connect to IBM's Object Storage instances (Swift OS). This OS can be obtained from Bluemix or Data Science Experience (DSX) or from a separate account on IBM Softlayer. The package also supports IBM Cloud Object Storage as well (COS). +The instructions below show how to connect to either type of instance. The connection setup is essentially the same. But the difference for you is how you deliver the credentials. If your Object Storage was created with Bluemix/DSX, with a few clicks on the side-tab within a DSX Jupyter notebook, you can obtain your account credentials in the form of a Python dictionary. If your Object Storage was created with a Softlayer account, each part of the credentials will -be found as text that you can copy and paste into the example code below. +be found as text that you can copy and paste into the example code below. + +### CloudObjectStorage / Data Science Experience +```python +import ibmos2spark + +credentials = { + 'endpoint': 'https://s3-api.objectstorage.softlayer.net/', #just an example. Your url might be different + 'access_key': '', + 'secret_key': '' +} + +cos = ibmos2spark.CloudObjectStorage(sc, credentials) #sc is the SparkContext instance + +bucket_name = 'some_bucket_name' +object_name = 'file1' +data = sc.textFile(cos.url(object_name, bucket_name)) +`` ### Bluemix / Data Science Experience ```python import ibmos2spark -#To obtain these credentials in IBM Spark, click the "insert to code" +#To obtain these credentials in IBM Spark, click the "insert to code" #button below your data source found on the panel to the right of your notebook. credentials = { @@ -78,7 +94,7 @@ data = sc.textFile(slos.url(container_name, object_name)) ``` -## License +## License Copyright 2016 IBM Cloud Data Services