Merge pull request #39 from ibm-watson-data-lab/update_readmes

Okay - going to merge this in without a review in oder to move this along. I hope any documentation problems will be flushed out by the next release.
ibm-watson-data-lab · Oct 26, 2017 · 0dbe3d3 · 0dbe3d3
2 parents c0d7517 + 7450363
commit 0dbe3d3
Show file tree

Hide file tree

Showing 7 changed files with 431 additions and 253 deletions.
diff --git a/python/README.md b/python/README.md
@@ -92,12 +92,6 @@ data = sc.textFile(data_url)
 Alternatively, you can connect to an IBM Bluemix COS using IAM token. Set the `auth_method` to `iam_token` and
 provide the appropriate values in the credentials.
 
-If you do not provide a `configuration_name`, 
-a default value will be used (`service`). However, if you are reading or 
-writing to multiple Object Storage instances you will need to define separate `configuration_name`
-values for each Object Storage instance. Otherwise, only one connection will be 
-configured at a time, potentially causing errors and confusion.  
-
 
 ```python
 import ibmos2spark
@@ -170,7 +164,7 @@ This is the service described in **middle right** pane in the image above (and w
 as Softlayer Swift Object Storage).  [Documentation is here](https://ibm-public-cos.github.io/crs-docs/)
 
 Note below that credentials are not passed in as a dictionary, like in the other implementations. 
-Rather, each piece of information is supplied as a separate, required arguement when instantiating
+Rather, each piece of information is supplied as a separate, required argument when instantiating
 a new `softlayer` object. 
 
 

diff --git a/r/sparklyr/README.md b/r/sparklyr/README.md
@@ -1,27 +1,43 @@
 # ibmos2sparklyr
 
-The package sets Spark Hadoop configurations for connecting to 
-IBM Bluemix Object Storage and Softlayer Account Object Storage instances. This packages uses the new [stocator](https://github.com/SparkTC/stocator) driver, which implements the `swift2d` protocol, and is availble
-on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience). 
+The `ibmos2park` library facilitates data read/write connections between Apache Spark clusters and the various 
+[IBM Object Storage services](https://console.bluemix.net/catalog/infrastructure/object-storage-group). 
 
-Using the `stocator` driver connects your Spark executor nodes directly 
-to your data in object storage.
-This is an optimized, high-performance method to connect Spark to your data. All IBM Apache Spark kernels
-are instantiated with the `stocator` driver in the Spark kernel's classpath. 
-You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator) 
-and adding it to your local Apache Spark kernel's classpath. 
+![IBM Object Storage Services](fig/ibm_objectstores.png "IBM Object Storage Services")
 
+### Object Storage Documentation
 
-This package expects a SparkContext instantiated by sparklyr. It has been tested
-to work with IBM RStudio from DSX, though it should work with other Spark
-installations that utilize the [swift2d/stocator](https://github.com/SparkTC/stocator).
+* [Cloud Object Storage](https://www.bluemix.net/docs/services/cloud-object-storage/getting-started.html) **Not Yet Supported.**
+* [Cloud Object Storage (IaaS)](https://ibm-public-cos.github.io/crs-docs/) **Not Yet Supported.**
+* [Object Storage OpenStack Swift (IaaS)](https://ibm-public-cos.github.io/crs-docs/)
+* [Object Storage OpenStack Swift for Bluemix](https://www.ng.bluemix.net/docs/services/ObjectStorage/index.html)
+
+
+
+## Requirements
+
+* Apache Spark with `stocator` library
+
+The easiest way to install the `stocator` library with Apache Spark is to 
+[pass the Maven coordinates at launch](https://spark-packages.org/package/SparkTC/stocator).
+Other installation options are described in the [`stocator` documentation](https://github.com/SparkTC/stocator).
+
+
+## Apache Spark at IBM
+
+The `stocator` library is pre-installled and available on 
+
+* [Apache Spark through IBM Bluemix](https://console.bluemix.net/catalog/services/apache-spark)
+* [IBM Analytics Engine (Beta)](https://console.bluemix.net/catalog/services/ibm-analytics-engine)  
+* [IBM Data Science Experience](https://datascience.ibm.com)
 
 ## Installation 
 
     library(devtools)
     devtools::install_url("https://github.com/ibm-cds-labs/ibmos2spark/archive/<version>.zip", subdir= "r/sparklyr/",dependencies = FALSE)
 
-where `version` should be a tagged release, such as `0.0.7`. (If you're daring, you can use `master`.)
+where `version` should be a tagged release, such as `1.0.2`. 
+
 
 ###### WARNING
 
@@ -33,65 +49,100 @@ where RVERSION is the newest install of R (currently 3.3) and delete the `sparkl
 After deleting, choose File->Quit Session to refresh your R kernel. These steps will refresh your 
 sparklyr package to the special DSX version. 
 
+
 ## Usage
 
-The usage of this package depends on *from where* your Object Storage instance was created. This package
-is intended to connect to IBM's Object Storage instances obtained from Bluemix or Data Science Experience 
-(DSX) or from a separate account on IBM Softlayer. The instructions below show how to connect to 
-either type of instance. 
+The instructions below demonstrate how to use this package to retrieve data from the various 
+IBM Object Storage services.
+
+These instructions will refer to the image at the top of this README.
+
+### Cloud Object Storage 
+
+This is the service described on the **far left** in the image above. This service is also called IBM Bluemix Cloud Object Storage (COS) in various locations. [Documentation is here](https://www.bluemix.net/docs/services/cloud-object-storage/getting-started.html).
+
+Not Yet Implemented.
+
+### Cloud Object Storage (IaaS)
+
+This is the service described **middle left** pane in the image above. This service is sometimes refered to 
+as the Softlayer IBM Cloud Object Storage service. 
+[Documentation is here](https://ibm-public-cos.github.io/crs-docs/).
+
+Not Yet Implemented.
+
+### Object Storage OpenStack Swift (Iaas)
+
+This is the service described in **middle right** pane in the image above (and was previously referred to 
+as Softlayer Swift Object Storage).  [Documentation is here](https://ibm-public-cos.github.io/crs-docs/)
+
+Note below that credentials are not passed in as a list of key-value pairs, like in the other implementations. 
+Rather, each piece of information is supplied as a separate, required argument when instantiating
+a new `softlayer` object. 
+
+```
+library(ibmos2sparklyr)
+configurationname = "softlayerOScon" #can be any any name you like (allows for multiple configurations)
+
+slconfig = softlayer(sparkcontext=sc, 
+             name=configurationname, 
+             auth_url="https://identity.open.softlayer.com",
+             tenant = "XXXXX", 
+             username="XXXXX", 
+             password="XXXXX"
+       )
+       
+container = "my_container" # name of your object store container
+object = "my_data.csv" # name of object that you want to retrieve in the container
+spark_object_name = "dataFromSwift" # name to assign to the new spark object
 
-The connection setup is essentially the same. But the difference for you is how you deliver the
-credentials. If your Object Storage was created with Bluemix/DSX, with a few clicks on the side-tab
-within a DSX Jupyter notebook, you can obtain your account credentials in the form of a list.
-If your Object Storage was created with a Softlayer account, each part of the credentials will
-be found as text that you can copy and paste into the example code below. 
+data = sparklyr::spark_read_csv(sc, spark_object_name,slconfig$url(container,object))
+```
 
-### Bluemix / Data Science Experience
+### Object Storage OpenStack Swift for Bluemix
 
-    library(ibmos2sparklyr)
-    configurationname = "bluemixOScon" #can be any any name you like (allows for multiple configurations)
+This is the service described in **far right** pane in the image above. 
+This was previously referred to as Bluemix Swift Object Storage in this documentation. It is 
+referred to as ["IBM Object Storage for Bluemix" in Bluemix documenation](https://console.bluemix.net/docs/services/ObjectStorage/os_works_public.html). It has also been referred to as 
+"OpenStack Swift (Cloud Foundry)". 
 
-    # In DSX notebooks, the "insert to code" will insert this credentials list for you
-    creds = list(
-            auth_url="https://identity.open.softlayer.com",
-            region="dallas", 
-            project_id = "XXXXX", 
-            user_id="XXXXX", 
-            password="XXXXX")
-            
-    bmconfig = bluemix(sparkcontext=sc, name=configurationname, credentials = creds)
-           
-    container = "my_container" # name of your object store container
-    object = "my_data.csv" # name of object that you want to retrieve in the container
-    spark_object_name = "dataFromSwift" # name to assign to the new spark object
-
-    data = sparklyr::spark_read_csv(sc, spark_object_name,bmconfig$url(container,object))
+Credentials are passed as 
+a list of key-value pairs and the `bluemix` object is used to configure the connection to 
+this Object Storage service.
 
+If you do not provide a `configurationName`, 
+a default value will be used (`service`). However, if you are reading or 
+writing to multiple Object Storage instances you will need to define separate `configurationName`
+values for each Object Storage instance. Otherwise, only one connection will be 
+configured at a time, potentially causing errors and confusion. 
 
-### Softlayer
+```
+library(ibmos2sparklyr)
+configurationname = "bluemixOScon" #can be any any name you like (allows for multiple configurations)
 
-    library(ibmos2sparklyr)
-    configurationname = "softlayerOScon" #can be any any name you like (allows for multiple configurations)
+# In DSX notebooks, the "insert to code" will insert this credentials list for you
+creds = list(
+        auth_url="https://identity.open.softlayer.com",
+        region="dallas", 
+        project_id = "XXXXX", 
+        user_id="XXXXX", 
+        password="XXXXX")
+        
+bmconfig = bluemix(sparkcontext=sc, name=configurationname, credentials = creds)
+       
+container = "my_container" # name of your object store container
+object = "my_data.csv" # name of object that you want to retrieve in the container
+spark_object_name = "dataFromSwift" # name to assign to the new spark object
 
-    slconfig = softlayer(sparkcontext=sc, 
-                 name=configurationname, 
-                 auth_url="https://identity.open.softlayer.com",
-                 tenant = "XXXXX", 
-                 username="XXXXX", 
-                 password="XXXXX"
-           )
-           
-    container = "my_container" # name of your object store container
-    object = "my_data.csv" # name of object that you want to retrieve in the container
-    spark_object_name = "dataFromSwift" # name to assign to the new spark object
+data = sparklyr::spark_read_csv(sc, spark_object_name,bmconfig$url(container,object))
+```
 
-    data = sparklyr::spark_read_csv(sc, spark_object_name,slconfig$url(container,object))
 
 
 
 ## License 
 
-Copyright 2016 IBM Cloud Data Services
+Copyright 2017 IBM Cloud Data Services
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.

diff --git a/r/sparklyr/fig/ibm_objectstores.png b/r/sparklyr/fig/ibm_objectstores.png