Skip to content

Commit

Permalink
Update readme file
Browse files Browse the repository at this point in the history
  • Loading branch information
bassel-zeidan committed Jul 13, 2017
1 parent 3e6e6ea commit fa0345a
Showing 1 changed file with 28 additions and 12 deletions.
40 changes: 28 additions & 12 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# ibmos2spark

The package sets Spark Hadoop configurations for connecting to
The package sets Spark Hadoop configurations for connecting to
IBM Bluemix Object Storage and Softlayer Account Object Storage instances. This packages uses the new [stocator](https://github.com/SparkTC/stocator) driver, which implements the `swift2d` protocol, and is availble
on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience).
on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience).


Using the `stocator` driver connects your Spark executor nodes directly
Using the `stocator` driver connects your Spark executor nodes directly
to your data in object storage.
This is an optimized, high-performance method to connect Spark to your data. All IBM Apache Spark kernels
are instantiated with the `stocator` driver in the Spark kernel's classpath.
You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator)
and adding it to your local Apache Spark kernel's classpath.
are instantiated with the `stocator` driver in the Spark kernel's classpath.
You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator)
and adding it to your local Apache Spark kernel's classpath.

## Installation

Expand All @@ -21,22 +21,38 @@ pip install --user --upgrade ibmos2spark
## Usage

The usage of this package depends on *from where* your Object Storage instance was created. This package
is intended to connect to IBM's Object Storage instances obtained from Bluemix or Data Science Experience
(DSX) or from a separate account on IBM Softlayer. The instructions below show how to connect to
either type of instance.
is intended to connect to IBM's Object Storage instances (Swift OS). This OS can be obtained from Bluemix or Data Science Experience (DSX) or from a separate account on IBM Softlayer. The package also supports IBM Cloud Object Storage as well (COS).
The instructions below show how to connect to either type of instance.

The connection setup is essentially the same. But the difference for you is how you deliver the
credentials. If your Object Storage was created with Bluemix/DSX, with a few clicks on the side-tab
within a DSX Jupyter notebook, you can obtain your account credentials in the form of a Python dictionary.
If your Object Storage was created with a Softlayer account, each part of the credentials will
be found as text that you can copy and paste into the example code below.
be found as text that you can copy and paste into the example code below.

### CloudObjectStorage / Data Science Experience
```python
import ibmos2spark

credentials = {
'endpoint': 'https://s3-api.objectstorage.softlayer.net/', #just an example. Your url might be different
'access_key': '',
'secret_key': ''
}

cos = ibmos2spark.CloudObjectStorage(sc, credentials) #sc is the SparkContext instance

bucket_name = 'some_bucket_name'
object_name = 'file1'
data = sc.textFile(cos.url(object_name, bucket_name))
``

### Bluemix / Data Science Experience

```python
import ibmos2spark

#To obtain these credentials in IBM Spark, click the "insert to code"
#To obtain these credentials in IBM Spark, click the "insert to code"
#button below your data source found on the panel to the right of your notebook.

credentials = {
Expand Down Expand Up @@ -78,7 +94,7 @@ data = sc.textFile(slos.url(container_name, object_name))
```


## License
## License

Copyright 2016 IBM Cloud Data Services

Expand Down

0 comments on commit fa0345a

Please sign in to comment.