Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #37 from ibm-watson-data-lab/update_readmes
Update readmes
- Loading branch information
Showing
5 changed files
with
226 additions
and
66 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,71 @@ | ||
# ibmos2spark | ||
|
||
The package sets Spark Hadoop configurations for connecting to | ||
IBM Bluemix Object Storage and Softlayer Account Object Storage instances. This packages uses the new [stocator](https://github.com/SparkTC/stocator) driver, which implements the `swift2d` protocol, and is availble | ||
on the latest IBM Apache Spark Service instances (and through IBM Data Science Experience). | ||
The `ibmos2park` library facilitates data read/write connections between Apache Spark clusters and the various | ||
[IBM Object Storage services](https://console.bluemix.net/catalog/infrastructure/object-storage-group). | ||
|
||
![IBM Object Storage Services](fig/ibm_objectstores.png "IBM Object Storage Services") | ||
|
||
Using the `stocator` driver connects your Spark executor nodes directly | ||
to your data in object storage. | ||
This is an optimized, high-performance method to connect Spark to your data. All IBM Apache Spark kernels | ||
are instantiated with the `stocator` driver in the Spark kernel's classpath. | ||
You can also run this locally by installing the [stocator driver](https://github.com/SparkTC/stocator) | ||
and adding it to your local Apache Spark kernel's classpath. | ||
### Object Storage Documentation | ||
|
||
* [Cloud Object Storage](https://www.bluemix.net/docs/services/cloud-object-storage/getting-started.html) | ||
* [Cloud Object Storage (IaaS)](https://ibm-public-cos.github.io/crs-docs/) | ||
* [Object Storage OpenStack Swift (IaaS)](https://ibm-public-cos.github.io/crs-docs/) | ||
* [Object Storage OpenStack Swift for Bluemix](https://www.ng.bluemix.net/docs/services/ObjectStorage/index.html) | ||
|
||
This repository contains separate packages for `python`, `R` and `scala`. | ||
You will find their documentation within the sub-folders. | ||
|
||
## Requirements | ||
|
||
* Apache Spark with `stocator` library | ||
|
||
The easiest way to install the `stocator` library with Apache Spark is to | ||
[pass the Maven coordinates at launch](https://spark-packages.org/package/SparkTC/stocator). | ||
Other installation options are described in the [`stocator` documentation](https://github.com/SparkTC/stocator). | ||
|
||
## Apache Spark at IBM | ||
|
||
The `stocator` and `ibmos2spark` libraries are pre-installled and available on | ||
|
||
* [Apache Spark through IBM Bluemix](https://console.bluemix.net/catalog/services/apache-spark) | ||
* [IBM Analytics Engine (Beta)](https://console.bluemix.net/catalog/services/ibm-analytics-engine) | ||
* [IBM Data Science Experience](https://datascience.ibm.com) | ||
|
||
## Languages | ||
|
||
The library is implemented for use in [Python](python), [R](r) and [Scala/Java](scala). | ||
|
||
## Details | ||
|
||
This library only does two things. | ||
|
||
1. [Uses the `SparkContext.hadoopConfiguration` object to set the appropriate keys](https://github.com/SparkTC/stocator#configuration-keys) to define a connection to an object storage service. | ||
2. Provides the caller with a URL to objects in their object store, which are typically passed to a SparkContext | ||
object to retrieve data. | ||
|
||
### Example Usage | ||
|
||
The following code demonstrates how to use this library in Python and connect to the Cloud Object Storage | ||
service, described in the far left pane of the image above. | ||
|
||
```python | ||
import ibmos2spark | ||
|
||
credentials = { | ||
'auth_url': 'https://identity.open.softlayer.com', #your URL might be different | ||
'project_id': '', | ||
'region': '', | ||
'user_id': '', | ||
'username': '', | ||
'password': '', | ||
} | ||
|
||
configuration_name = 'my_bluemix_objectstore' #you can give any name you like | ||
|
||
bmos = ibmos2spark.bluemix(sc, credentials, configuration_name) #sc is the SparkContext instance | ||
|
||
container_name = 'some_name' | ||
object_name = 'file_name' | ||
|
||
data_url = bmos.url(container_name, bucket_name) | ||
|
||
data = sc.textFile(data_url) | ||
``` |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Development | ||
|
||
We [follow this process](https://github.com/gadamc/release-python) to release new versions to PyPI. | ||
|
||
|
||
# Code Standards | ||
|
||
We do not currently have any specific coding standards in place, but please try to match our style | ||
if you issue a pull request that fixes a bug or adds a feature. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.