# Load data into a notebook using different Data Connectors

Before you can start analyzing your data, you have to load the data from different data source. This reference notebook shows you how to use Data Connectors to load data that is stored in different data sources.

The notebook sample code shows you how to use Data Connectors to load data into a Python notebook. You can copy and paste these code snippets into the notebook you are developing.


In [1]:
from ingest import Connectors


In [7]:
dir(ingest)

NameError: name 'ingest' is not defined

## Load data from Amazon S3

In [None]:
from ingest import Connectors
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

S3loadoptions = { 
                  Connectors.AmazonS3.ACCESS_KEY          : '***********',
                  Connectors.AmazonS3.SECRET_KEY          : '***********',
                  Connectors.AmazonS3.SOURCE_BUCKET       : '***********',
                  Connectors.AmazonS3.SOURCE_FILE_NAME    : 'addresses3.csv',
                  Connectors.AmazonS3.SOURCE_INFER_SCHEMA : '1',
                  Connectors.AmazonS3.SOURCE_FILE_FORMAT  : 'csv'}

S3DF = sqlContext.read.format('com.ibm.spark.discover').options(**S3loadoptions).load()
S3DF.printSchema()
S3DF.show(5)

Now your dataset which was stored in Amazon S3 is loaded into a notebook as a DataFrame and you can begin analyzing it.

## Save the DataFrame to Amazon S3

Once we have Analyzed the DataFrame and Transformed the Dataframe , User has the option to save the modified DataFrame back to Amazon S3 using the below code . In our sample code we are assuming that above DataFrame S3DF is the new Transformed Dataset that we need to save it back to Amazon S3.

In [None]:
S3saveoptions = { Connectors.AmazonS3.ACCESS_KEY        : '***********',
                  Connectors.AmazonS3.SECRET_KEY        : '***********',
                  Connectors.AmazonS3.TARGET_BUCKET     : '***********',
                  Connectors.AmazonS3.TARGET_FILE_NAME  : 'addresses4.csv',
                  Connectors.AmazonS3.TARGET_WRITE_MODE : 'write'}

NewS3DF = S3DF.write.format("com.ibm.spark.discover").options(**S3saveoptions).save()

## Load data from dashDB

In [None]:
dashDBloadOptions = { Connectors.DASHDB.HOST              : '***********',
                      Connectors.DASHDB.DATABASE          : 'BLUDB',
                      Connectors.DASHDB.USERNAME          : '***********',
                      Connectors.DASHDB.PASSWORD          : '***********',
                      Connectors.DASHDB.SOURCE_TABLE_NAME : 'DASH105036.TABLE1'}

dashdbDF = sqlContext.read.format("com.ibm.spark.discover").options(**dashDBloadOptions).load()
dashdbDF.printSchema()
dashdbDF.show()

## Save the DataFrame to dashDB

Once we have Analyzed the DataFrame and Transformed the Dataframe , User has the option to save the modified DataFrame back to dashDB using the below code . In our sample code we are assuming that above DataFrame dashdbDF is the new Transformed Dataset that we need to save it back to dashDB.


In [None]:
dashdbsaveoption = { 
                     Connectors.DASHDB.HOST              : '***********',
                     Connectors.DASHDB.DATABASE          : 'BLUDB',
                     Connectors.DASHDB.USERNAME          : '***********',
                     Connectors.DASHDB.PASSWORD          : '***********',
                     Connectors.DASHDB.TARGET_TABLE_NAME : 'DASH105036.TABLE2',
                     Connectors.DASHDB.TARGET_WRITE_MODE : 'merge' }

NewdashDBDF = dashdbDF.write.format("com.ibm.spark.discover").options(**dashdbsaveoption).save()

## Load data from Softlayer Objectstore

In [None]:
softlayerobjloadoptions = { 
    Connectors.SoftLayerObjectStorage.ACCESS_KEY          : '***********',
    Connectors.SoftLayerObjectStorage.SECRET_KEY          : '***********',
    Connectors.SoftLayerObjectStorage.URL                 : '***********',
    Connectors.SoftLayerObjectStorage.SOURCE_CONTAINER    : '***********',
    Connectors.SoftLayerObjectStorage.SOURCE_FILE_NAME    : 'users.avro', 
    Connectors.SoftLayerObjectStorage.SOURCE_FILE_FORMAT  : 'avro', 
    Connectors.SoftLayerObjectStorage.SOURCE_INFER_SCHEMA : '1'  }
    
softlyobjDF =  sqlContext.read.format("com.ibm.spark.discover").options(**softlayerobjloadoptions).load()
softlyobjDF.printSchema()
softlyobjDF.show()

## Save the DataFrame to Softlayer Objectstore 

Once we have Analyzed and Transformed the Dataframe , User has the option to save the modified DataFrame back to Softlayer Objectstore using the below code . In our sample code we are assuming that above DataFrame softlyobjDF is the new Transformed Dataset that we need to save it back to Softlayer Objectstore.

In [None]:
softlayerobjsaveoptions = {
    Connectors.SoftLayerObjectStorage.ACCESS_KEY         : '***********', 
    Connectors.SoftLayerObjectStorage.SECRET_KEY         : '***********', 
    Connectors.SoftLayerObjectStorage.URL                : '***********', 
    Connectors.SoftLayerObjectStorage.TARGET_CONTAINER   : '***********', 
    Connectors.SoftLayerObjectStorage.TARGET_FILE_NAME   : 'newusers.avro', 
    Connectors.SoftLayerObjectStorage.TARGET_FILE_FORMAT : 'avro', 
    Connectors.SoftLayerObjectStorage.TARGET_WRITE_MODE  : 'write'}
    
NewsoftlyobjDF = softlyobjDF.write.format("com.ibm.spark.discover").options(**softlayerobjsaveoptions).save()

## Load data from Cloudant

In [None]:
Cloudantloadoptions = {
    Connectors.Cloudant.HOST            : '***********', 
    Connectors.Cloudant.PORT            : '443', 
    Connectors.Cloudant.SSL             : 'yes', 
    Connectors.Cloudant.SOURCE_DATABASE : 'sample' , 
    Connectors.Cloudant.USERNAME        : '***********' ,
    Connectors.Cloudant.PASSWORD        : '***********'}

cloudantDF = sqlContext.read.format("com.ibm.spark.discover").options(**Cloudantloadoptions).load()
cloudantDF.printSchema()
cloudantDF.show()

## Save the DataFrame to Cloudant

Once we have Analyzed the DataFrame and Transformed the Dataframe , User has the option to save the modified DataFrame back to Cloudant using the below code . In our sample code we are assuming that above DataFrame cloudantDF is the new Transformed Dataset that we need to save it back to cloudant.

In [None]:
Cloudantsaveoptions = {
    Connectors.Cloudant.HOST                 : '***********',
    Connectors.Cloudant.PORT                 : '443',
    Connectors.Cloudant.SSL                  : 'yes',
    Connectors.Cloudant.TARGET_DATABASE      : 'sample',
    Connectors.Cloudant.USERNAME             : '***********',
    Connectors.Cloudant.PASSWORD             : '***********',
    Connectors.Cloudant.TARGET_DOCUMENT_TYPE : 'json'}

NewcloudantDF = cloudantDF.write.format("com.ibm.spark.discover").options(**Cloudantsaveoptions).save()

## Load data from Bluemix Objectstore 

In [None]:
objectstoreloadOptions = {
       Connectors.BluemixObjectStorage.AUTH_URL             : '***********',
        Connectors.BluemixObjectStorage.USERID              : '***********',
        Connectors.BluemixObjectStorage.PASSWORD            :'***********',
        Connectors.BluemixObjectStorage.PROJECTID           : '***********',
        Connectors.BluemixObjectStorage.REGION              : 'dallas',
        Connectors.BluemixObjectStorage.SOURCE_CONTAINER    : 'newstage1',
        Connectors.BluemixObjectStorage.SOURCE_FILE_NAME    : 'BlocPower_T.csv',
        Connectors.BluemixObjectStorage.SOURCE_INFER_SCHEMA : '1'}
                     
objectstoreDF = sqlContext.read.format("com.ibm.spark.discover").options(**objectstoreloadOptions).load()
objectstoreDF.printSchema()
objectstoreDF.show(5)

## Save the DataFrame to Bluemix Objectstore

Once we have Analyzed the DataFrame and Transformed the Dataframe , User has the option to save the modified DataFrame back to Bluemix Objectstore using the below code . In our sample code we are assuming that above DataFrame objectstoreDF is the new Transformed Dataset that we need to save it back to Bluemix Objectstore.

In [None]:
objectstoresaveOptions = {
        Connectors.BluemixObjectStorage.AUTH_URL          : '***********',
        Connectors.BluemixObjectStorage.USERID            : '***********',
        Connectors.BluemixObjectStorage.PASSWORD          : '***********',
        Connectors.BluemixObjectStorage.PROJECTID         : '***********',
        Connectors.BluemixObjectStorage.REGION            : 'dallas',
        Connectors.BluemixObjectStorage.TARGET_CONTAINER  :  'newstage1',
        Connectors.BluemixObjectStorage.TARGET_FILE_NAME  : 'NewBlocPower_T.csv',
        Connectors.BluemixObjectStorage.TARGET_WRITE_MODE : 'write'}
                     
NewobjectstoreDF = objectstoreDF.write.format("com.ibm.spark.discover").options(**objectstoresaveOptions).save()

## Load Data from LocalFS 

If you happen to upload the files to the notebook environment and if we want to load that dataset in your scala notebook then we can use the below code

In [None]:
localfsloadoptions = {
      Connectors.LocalFS.SOURCE_FILE_NAME    : '/gpfs/global_fs01/cluster/ys1-dwspark-dal09-env4-011.bluemix.net/user/sc93-b1ddac2d470956-05b1d10fb12b/data/test.csv',
      Connectors.LocalFS.SOURCE_INFER_SCHEMA : '1',
      Connectors.LocalFS.SOURCE_FILE_FORMAT  : 'csv'}

LocalfsDF = sqlContext.read.format("com.ibm.spark.discover").options(**localfsloadoptions).load()
LocalfsDF.printSchema()
LocalfsDF.show()

## Save the DataFrame to LocalFS

Notebook Users has the option to save the DataFrame to LocalFS path using the below code . In our sample code we are assuming that above DataFrame LocalfsDF is the DataFrame that we want to save to LocalFS.

In [None]:
localfssaveoptions = {
      Connectors.LocalFS.TARGET_FILE_NAME : '/gpfs/global_fs01/cluster/ys1-dwspark-dal09-env4-011.bluemix.net/user/sc93-b1ddac2d470956-05b1d10fb12b/data/test.csv',
      Connectors.LocalFS.TARGET_WRITE_MODE : 'write',
      Connectors.LocalFS.TARGET_FILE_FORMAT : 'csv'}

NewLocalfsDF = LocalfsDF.write.format("com.ibm.spark.discover").options(**localfssaveoptions).save()

## Load Data from IBM Biginsights HDFS

In [None]:
HDFSloadOptions = {
    Connectors.HdfsBigInsights.URL : 'https://bi-hadoop-prod-4161.bi.services.us-south.bluemix.net:8443/gateway/default/webhdfs/v1/',
    Connectors.HdfsBigInsights.USERNAME : '***********', 
    Connectors.HdfsBigInsights.PASSWORD : '***********',
    Connectors.HdfsBigInsights.SOURCE_FILE_NAME : 'token.csv',
    Connectors.HdfsBigInsights.SOURCE_FILE_FORMAT : 'csv',
    Connectors.HdfsBigInsights.SOURCE_INFER_SCHEMA : '1'
  }

hdfsDF = sqlContext.read.format("com.ibm.spark.discover").options(**HDFSloadOptions).load()
hdfsDF.printSchema()
hdfsDF.show()

## Save the DataFrame to IBM Biginsights HDFS

Once we have Analyzed and Transformed the Dataframe loaded as a part of load option above , User has the option to save the modified DataFrame back to HDFS using the below code . In our sample code we are assuming that above DataFrame hdfsDF is the new Transformed Dataset that we need to save it back to IBM Biginsights HDFS.

In [None]:
HDFSsaveOptions = {
    Connectors.HdfsBigInsights.URL : 'https://bi-hadoop-prod-4161.bi.services.us-south.bluemix.net:8443/gateway/default/webhdfs/v1/',
    Connectors.HdfsBigInsights.USERNAME : '***********', 
    Connectors.HdfsBigInsights.PASSWORD : '***********',
    Connectors.HdfsBigInsights.TARGET_FILE_NAME : 'token2.csv',
    Connectors.HdfsBigInsights.TARGET_WRITE_MODE : 'write'
}

NewHdfsDF = hdfsDF.write.format("com.ibm.spark.discover").options(**HDFSsaveOptions).save()

# Load Data from Hive

In [None]:
HiveloadOptions = { Connectors.Hive.HOST                        : '***********',
                      Connectors.Hive.PORT                      : '****',
                      Connectors.Hive.DATABASE                  : 'BLUDB',
                      Connectors.Hive.USERNAME                  : '***********',
                      Connectors.Hive.PASSWORD                  : '***********',
                      Connectors.Hive.SOURCE_TABLE_NAME         : '***********'}

HiveDF = sqlContext.read.format("com.ibm.spark.discover").options(**HiveloadOptions).load()
HiveDF.printSchema()
HiveDF.show()

# Load Data from Amazon Redshift

In [None]:
RedshiftloadOptions = { 
                     Connectors.Redshift.HOST              : '***********',
                     Connectors.Redshift.PORT              : '***********',
                     Connectors.Redshift.DATABASE          : 'BLUDB',
                     Connectors.Redshift.USERNAME          : '***********',
                     Connectors.Redshift.PASSWORD          : '***********',
                      Connectors.Redshift.SOURCE_TABLE_NAME         : '***********'}

RedshiftDF = sqlContext.read.format("com.ibm.spark.discover").options(**RedshiftloadOptions).load()
RedshiftDF.printSchema()
RedshiftDF.show()

# Save the DataFrame to Amazon Redshift

In [None]:
Redshiftsaveoption = { 
                     Connectors.Redshift.HOST              : '***********',
                     Connectors.Redshift.PORT              : '***********',
                     Connectors.Redshift.DATABASE          : 'BLUDB',
                     Connectors.Redshift.USERNAME          : '***********',
                     Connectors.Redshift.PASSWORD          : '***********',
                     Connectors.Redshift.TARGET_TABLE_NAME : 'DASH105036.TABLE2',
                     Connectors.Redshift.TARGET_TABLE_ACTION : 'merge'}

NewRedshiftDF = RedshiftDF.write.format("com.ibm.spark.discover").options(**Redshiftsaveoption).save()

# Load Data from DB2

In [None]:
DB2loadOptions = { 
                     Connectors.DB2.HOST              : '***********',
                     Connectors.DB2.PORT              : '***********',
                     Connectors.DB2.DATABASE          : 'BLUDB',
                     Connectors.DB2.USERNAME          : '***********',
                     Connectors.DB2.PASSWORD          : '***********',
                      Connectors.DB2.SOURCE_TABLE_NAME         : '***********'}

DB2DF = sqlContext.read.format("com.ibm.spark.discover").options(**DB2loadOptions).load()
DB2DF.printSchema()
DB2DF.show()

# Save the DataFrame to DB2

In [None]:
DB2saveoption = { 
                     Connectors.DB2.HOST              : '***********',
                     Connectors.DB2.PORT              : '***********',
                     Connectors.DB2.DATABASE          : 'BLUDB',
                     Connectors.DB2.USERNAME          : '***********',
                     Connectors.DB2.PASSWORD          : '***********',
                     Connectors.DB2.TARGET_TABLE_NAME : 'DASH105036.TABLE2',
                     Connectors.DB2.TARGET_TABLE_ACTION : 'merge',
                     Connectors.DB2.TARGET_WRITE_MODE : 'insert'}

NewDB2DF = DB2DF.write.format("com.ibm.spark.discover").options(**DB2saveoption).save()

# Load Data from Informix

In [None]:
InformixloadOptions = { 
                     Connectors.Informix.HOST              : '***********',
                     Connectors.Informix.PORT              : '***********',
                     Connectors.Informix.SERVER            : '***********',
                     Connectors.Informix.DATABASE          : 'BLUDB',
                     Connectors.Informix.USERNAME          : '***********',
                     Connectors.Informix.PASSWORD          : '***********',
                      Connectors.Informix.SOURCE_TABLE_NAME         : '***********'}

InformixDF = sqlContext.read.format("com.ibm.spark.discover").options(**InformixloadOptions).load()
InformixDF.printSchema()
InformixDF.show()

# Save the DataFrame to Informix

In [None]:
Informixsaveoption = { 
                     Connectors.Informix.HOST              : '***********',
                     Connectors.Informix.PORT              : '***********',
                     Connectors.Informix.SERVER            : '***********',
                     Connectors.Informix.DATABASE          : 'BLUDB',
                     Connectors.Informix.USERNAME          : '***********',
                     Connectors.Informix.PASSWORD          : '***********',
                     Connectors.Informix.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.Informix.TARGET_TABLE_ACTION : 'merge'}

NewInformixDF = InformixDF.write.format("com.ibm.spark.discover").options(**Informixsaveoption).save()

# Load data from Watson Analytics

In [None]:
WatsonAnalyticsloadOptions = { 
                     Connectors.WatsonAnalytics.CLIENT_ID              : '***********',
                     Connectors.WatsonAnalytics.SECRET_ID              : '***********',
                     Connectors.WatsonAnalytics.CUSTOM_URL            : '***********',
                     Connectors.WatsonAnalytics.USERNAME          : '***********',
                     Connectors.WatsonAnalytics.PASSWORD          : '***********',
                     Connectors.WatsonAnalytics.SOURCE_FILE_NAME         : '***********'}

WatsonAnalyticsDF = sqlContext.read.format("com.ibm.spark.discover").options(**WatsonAnalyticsloadOptions).load()
WatsonAnalyticsDF.printSchema()
WatsonAnalyticsDF.show()

# Save the DataFrame to Watson Analytics

In [None]:
WatsonAnalyticssaveoption = { 
                     Connectors.WatsonAnalytics.CLIENT_ID              : '***********',
                     Connectors.WatsonAnalytics.SECRET_ID              : '***********',
                     Connectors.WatsonAnalytics.CUSTOM_URL            : '***********',
                     Connectors.WatsonAnalytics.USERNAME          : '***********',
                     Connectors.WatsonAnalytics.PASSWORD          : '***********',
                     Connectors.WatsonAnalytics.TARGET_FILE_NAME : '********',
                     Connectors.WatsonAnalytics.TARGET_WA_META_DATA : '***********',
                     Connectors.WatsonAnalytics.TARGET_WRITE_MODE : '*****'}

NewWatsonAnalyticsDF = WatsonAnalyticsDF.write.format("com.ibm.spark.discover").options(**WatsonAnalyticssaveoption).save()

# Load data from SQL Server

In [None]:
SqlServerloadOptions = { 
                     Connectors.SqlServer.HOST              : '***********',
                     Connectors.SqlServer.PORT              : '***********',
                     Connectors.SqlServer.DATABASE          : 'BLUDB',
                     Connectors.SqlServer.USERNAME          : '***********',
                     Connectors.SqlServer.PASSWORD          : '***********',
                      Connectors.SqlServer.SOURCE_TABLE_NAME         : '***********'}

SqlServerDF = sqlContext.read.format("com.ibm.spark.discover").options(**SqlServerloadOptions).load()
SqlServerDF.printSchema()
SqlServerDF.show()

# Save the DataFrame to SQL Server

In [None]:
SqlServersaveoption = { 
                     Connectors.SqlServer.HOST              : '***********',
                     Connectors.SqlServer.PORT              : '***********',
                     Connectors.SqlServer.DATABASE          : 'BLUDB',
                     Connectors.SqlServer.USERNAME          : '***********',
                     Connectors.SqlServer.PASSWORD          : '***********',
                     Connectors.SqlServer.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.SqlServer.TARGET_TABLE_ACTION : 'merge'}

NewSqlServerDF = SqlServerDF.write.format("com.ibm.spark.discover").options(**SqlServersaveoption).save()

# Load data from MySql

In [None]:
MySQLloadOptions = { 
                     Connectors.MySQL.HOST              : '***********',
                     Connectors.MySQL.PORT              : '***********',
                     Connectors.MySQL.DATABASE          : 'BLUDB',
                     Connectors.MySQL.USERNAME          : '***********',
                     Connectors.MySQL.PASSWORD          : '***********',
                      Connectors.MySQL.SOURCE_TABLE_NAME         : '***********'}

MySQLDF = sqlContext.read.format("com.ibm.spark.discover").options(**MySQLloadOptions).load()
MySQLDF.printSchema()
MySQLDF.show()

# Save the DataFrame to MySQL

In [None]:
MySQLsaveoption = { 
                     Connectors.MySQL.HOST              : '***********',
                     Connectors.MySQL.PORT              : '***********',
                     Connectors.MySQL.DATABASE          : 'BLUDB',
                     Connectors.MySQL.USERNAME          : '***********',
                     Connectors.MySQL.PASSWORD          : '***********',
                     Connectors.MySQL.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.MySQL.TARGET_TABLE_ACTION : 'merge'}

NewMySQLDF = MySQLDF.write.format("com.ibm.spark.discover").options(**MySQLsaveoption).save()

# Load data from Netezza

In [None]:
NetezzaloadOptions = { 
                     Connectors.Netezza.HOST              : '***********',
                     Connectors.Netezza.PORT              : '***********',
                     Connectors.Netezza.DATABASE          : 'BLUDB',
                     Connectors.Netezza.USERNAME          : '***********',
                     Connectors.Netezza.PASSWORD          : '***********',
                      Connectors.Netezza.SOURCE_TABLE_NAME         : '***********'}

NetezzaDF = sqlContext.read.format("com.ibm.spark.discover").options(**NetezzaloadOptions).load()
NetezzaDF.printSchema()
NetezzaDF.show()

# Save the DataFrame to Netezza

In [None]:
Netezzasaveoption = { 
                     Connectors.Netezza.HOST              : '***********',
                     Connectors.Netezza.PORT              : '***********',
                     Connectors.Netezza.DATABASE          : 'BLUDB',
                     Connectors.Netezza.USERNAME          : '***********',
                     Connectors.Netezza.PASSWORD          : '***********',
                     Connectors.Netezza.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.Netezza.TARGET_TABLE_ACTION : 'merge',
                     Connectors.Netezza.TARGET_WRITE_MODE : 'insert'}

NewNetezzaDF = NetezzaDF.write.format("com.ibm.spark.discover").options(**Netezzasaveoption).save()

# Load data from Oracle

In [None]:
OracleloadOptions = { 
                     Connectors.Oracle.HOST              : '***********',
                     Connectors.Oracle.PORT              : '***********',
                     Connectors.Oracle.SID               : 'BLUDB',
                     Connectors.Oracle.SERVICE_NAME      : '***********',
                     Connectors.Oracle.USERNAME          : '***********',
                     Connectors.Oracle.PASSWORD          : '***********',
                      Connectors.Oracle.SOURCE_TABLE_NAME         : '***********'}

OracleDF = sqlContext.read.format("com.ibm.spark.discover").options(**OracleloadOptions).load()
OracleDF.printSchema()
OracleDF.show()

# Save the DataFrame to Oracle

In [None]:
Oraclesaveoption = { 
                     Connectors.Oracle.HOST              : '***********',
                     Connectors.Oracle.PORT              : '***********',
                     Connectors.Oracle.SID               : '***********',
                     Connectors.Oracle.SERVICE_NAME      : '***********',
                     Connectors.Oracle.USERNAME          : '***********',
                     Connectors.Oracle.PASSWORD          : '***********',
                     Connectors.Oracle.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.Oracle.TARGET_TABLE_ACTION : 'merge'}

NewOracleDF = OracleDF.write.format("com.ibm.spark.discover").options(**Oraclesaveoption).save()

# Load data from Greenplum

In [None]:
GreenplumloadOptions = { 
                     Connectors.Greenplum.HOST              : '***********',
                     Connectors.Greenplum.PORT              : '***********',
                     Connectors.Greenplum.DATABASE          : 'BLUDB',
                     Connectors.Greenplum.USERNAME          : '***********',
                     Connectors.Greenplum.PASSWORD          : '***********',
                      Connectors.Greenplum.SOURCE_TABLE_NAME         : '***********'}

GreenplumDF = sqlContext.read.format("com.ibm.spark.discover").options(**GreenplumloadOptions).load()
GreenplumDF.printSchema()
GreenplumDF.show()

# Save the DataFrame to Greenplum

In [None]:
Greenplumsaveoption = { 
                     Connectors.Greenplum.HOST              : '***********',
                     Connectors.Greenplum.PORT              : '***********',
                     Connectors.Greenplum.DATABASE          : 'BLUDB',
                     Connectors.Greenplum.USERNAME          : '***********',
                     Connectors.Greenplum.PASSWORD          : '***********',
                     Connectors.Greenplum.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.Greenplum.TARGET_TABLE_ACTION : 'merge'}

NewGreenplumDF = GreenplumDF.write.format("com.ibm.spark.discover").options(**Greenplumsaveoption).save()

# Load data from PostgreSQL

In [None]:
PostgreSQLloadOptions = { 
                     Connectors.PostgreSQL.HOST              : '***********',
                     Connectors.PostgreSQL.PORT              : '***********',
                     Connectors.PostgreSQL.DATABASE          : 'BLUDB',
                     Connectors.PostgreSQL.USERNAME          : '***********',
                     Connectors.PostgreSQL.PASSWORD          : '***********',
                      Connectors.PostgreSQL.SOURCE_TABLE_NAME         : '***********'}

PostgreSQLDF = sqlContext.read.format("com.ibm.spark.discover").options(**PostgreSQLloadOptions).load()
PostgreSQLDF.printSchema()
PostgreSQLDF.show()

# Save the DataFrame to PostgreSQL

In [None]:
PostgreSQLsaveoption = { 
                     Connectors.PostgreSQL.HOST              : '***********',
                     Connectors.PostgreSQL.PORT              : '***********',
                     Connectors.PostgreSQL.DATABASE          : 'BLUDB',
                     Connectors.PostgreSQL.USERNAME          : '***********',
                     Connectors.PostgreSQL.PASSWORD          : '***********',
                     Connectors.PostgreSQL.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.PostgreSQL.TARGET_TABLE_ACTION : 'merge'}

NewPostgreSQLDF = PostgreSQLDF.write.format("com.ibm.spark.discover").options(**PostgreSQLsaveoption).save()

# Load data from PostgreSQL on Compose

In [None]:
PostgreSQLComposeloadOptions = { 
                     Connectors.PostgreSQLCompose.HOST              : '***********',
                     Connectors.PostgreSQLCompose.PORT              : '***********',
                     Connectors.PostgreSQLCompose.DATABASE          : 'BLUDB',
                     Connectors.PostgreSQLCompose.USERNAME          : '***********',
                     Connectors.PostgreSQLCompose.PASSWORD          : '***********',
                      Connectors.PostgreSQLCompose.SOURCE_TABLE_NAME         : '***********'}

PostgreSQLComposeDF = sqlContext.read.format("com.ibm.spark.discover").options(**PostgreSQLComposeloadOptions).load()
PostgreSQLComposeDF.printSchema()
PostgreSQLComposeDF.show()

# Save the DataFrame to PostgreSQL on Compose

In [None]:
PostgreSQLComposesaveoption = { 
                     Connectors.PostgreSQLCompose.HOST              : '***********',
                     Connectors.PostgreSQLCompose.PORT              : '***********',
                     Connectors.PostgreSQLCompose.DATABASE          : 'BLUDB',
                     Connectors.PostgreSQLCompose.USERNAME          : '***********',
                     Connectors.PostgreSQLCompose.PASSWORD          : '***********',
                     Connectors.PostgreSQL.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.PostgreSQL.TARGET_WRITE_MODE : 'insert',
                     Connectors.PostgreSQL.TARGET_TABLE_ACTION : 'append'}

NewPostgreSQLComposeDF = PostgreSQLComposeDF.write.format("com.ibm.spark.discover").options(**PostgreSQLComposesaveoption).save()

# Load data from Salesforce.com

In [None]:
SalesforceloadOptions = { 
                     Connectors.Salesforce.USERNAME          : '***********',
                     Connectors.Salesforce.PASSWORD          : '***********',
                      Connectors.Salesforce.SOURCE_TABLE_NAME         : '***********'}

SalesforceDF = sqlContext.read.format("com.ibm.spark.discover").options(**SalesforceloadOptions).load()
SalesforceDF.printSchema()
SalesforceDF.show()

# Save the DataFrame to Salesforce.com

In [None]:
Salesforcesaveoption = { 
                     Connectors.Salesforce.USERNAME          : '***********',
                     Connectors.Salesforce.PASSWORD          : '***********',
                     Connectors.Salesforce.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.Salesforce.TARGET_TABLE_ACTION : 'append'}

NewSalesforceDF = SalesforceDF.write.format("com.ibm.spark.discover").options(**Salesforcesaveoption).save()

# Load data from Sybase

In [None]:
SybaseloadOptions = { 
                     Connectors.Sybase.HOST              : '***********',
                     Connectors.Sybase.PORT              : '***********',
                     Connectors.Sybase.DATABASE          : 'BLUDB',
                     Connectors.Sybase.USERNAME          : '***********',
                     Connectors.Sybase.PASSWORD          : '***********',
                      Connectors.Sybase.SOURCE_TABLE_NAME         : '***********'}

SybaseDF = sqlContext.read.format("com.ibm.spark.discover").options(**SybaseloadOptions).load()
SybaseDF.printSchema()
SybaseDF.show()

# Save the DataFrame to Sybase

In [None]:
Sybasesaveoption = { 
                     Connectors.Sybase.HOST              : '***********',
                     Connectors.Sybase.PORT              : '***********',
                     Connectors.Sybase.DATABASE          : 'BLUDB',
                     Connectors.Sybase.USERNAME          : '***********',
                     Connectors.Sybase.PASSWORD          : '***********',
                     Connectors.Sybase.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.Sybase.TARGET_TABLE_ACTION : 'append'}

NewSybaseDF = SybaseDF.write.format("com.ibm.spark.discover").options(**Sybasesaveoption).save()

# Load data from SybaseIQ

In [None]:
SybaseIQloadOptions = { 
                     Connectors.SybaseIQ.HOST              : '***********',
                     Connectors.SybaseIQ.PORT              : '***********',
                     Connectors.SybaseIQ.DATABASE          : 'BLUDB',
                     Connectors.SybaseIQ.USERNAME          : '***********',
                     Connectors.SybaseIQ.PASSWORD          : '***********',
                      Connectors.SybaseIQ.SOURCE_TABLE_NAME         : '***********'}

SybaseIQDF = sqlContext.read.format("com.ibm.spark.discover").options(**SybaseIQloadOptions).load()
SybaseIQDF.printSchema()
SybaseIQDF.show()

# Save the DataFrame to SybaseIQ

In [None]:
SybaseIQsaveoption = { 
                     Connectors.SybaseIQ.HOST              : '***********',
                     Connectors.SybaseIQ.PORT              : '***********',
                     Connectors.SybaseIQ.DATABASE          : 'BLUDB',
                     Connectors.SybaseIQ.USERNAME          : '***********',
                     Connectors.SybaseIQ.PASSWORD          : '***********',
                     Connectors.SybaseIQ.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.SybaseIQ.TARGET_TABLE_ACTION : 'append'}

NewSybaseIQDF = SybaseIQDF.write.format("com.ibm.spark.discover").options(**SybaseIQsaveoption).save()

# Load data from SQLDB

In [None]:
SQLDBloadOptions = { 
                     Connectors.SQLDB.HOST              : '***********',
                     Connectors.SQLDB.PORT              : '***********',
                     Connectors.SQLDB.DATABASE          : 'BLUDB',
                     Connectors.SQLDB.USERNAME          : '***********',
                     Connectors.SQLDB.PASSWORD          : '***********',
                      Connectors.SQLDB.SOURCE_TABLE_NAME         : '***********'}

SQLDBDF = sqlContext.read.format("com.ibm.spark.discover").options(**SQLDBloadOptions).load()
SQLDBDF.printSchema()
SQLDBDF.show()

# Save the DataFrame to SQLDB

In [None]:
SQLDBsaveoption = { 
                     Connectors.SQLDB.HOST              : '***********',
                     Connectors.SQLDB.PORT              : '***********',
                     Connectors.SQLDB.DATABASE          : 'BLUDB',
                     Connectors.SQLDB.USERNAME          : '***********',
                     Connectors.SQLDB.PASSWORD          : '***********',
                     Connectors.SQLDB.TARGET_TABLE_NAME : 'TABLE2',
                     Connectors.SQLDB.TARGET_TABLE_ACTION : 'append'}

NewSQLDBDF = SQLDBDF.write.format("com.ibm.spark.discover").options(**SQLDBsaveoption).save()