New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#91] [Connector] Add RestfulAPI connector to Gimel #93

Merged
merged 1 commit into from May 18, 2018

Conversation

Projects
None yet
3 participants
@Dee-Pac
Copy link
Contributor

Dee-Pac commented May 16, 2018

Make sure you have checked all steps below.

GitHub Issue

Fixes #91

Connector Reads / Writes data from / to Any Source System that has caters Restful API. Examples -

  • Youtube
  • Wiki
  • Twitter
  • Github
  • Many more...

Checklist:

  • This pull request updates the documentation
  • This pull request changes library dependencies
  • Title of the PR is of format (example) : [#25][Github] Add Pull Request Template

What is the purpose of this pull request?

  • This gives gimel Capability to use SQL / API to read / write to any storage that supports REST API.

How was this change validated?

  • Building & testing locally
# ------- Bootstrap Gimel and storage ecosystems Locally ----------

# quickstart/start-gimel

# ------- Build Gimel ----------

build/gimel

# ------- Copy Gimel to Docker ----------

docker cp \
/Users/dmohanakumarchan/workspace/eclipse_luna/oss/gimel/gimel-dataapi/gimel-sql/target/gimel-sql-1.2.0-SNAPSHOT-uber.jar \
spark-master:/tmp/gimel-sql-1.2.0-SNAPSHOT-uber.jar

# ------- Start spark shell , this command is available once you Bootstrap Gimel --------

docker exec -it spark-master bash -c "export USER=dmohanakumarchan; export SPARK_HOME=/spark/; /spark/bin/spark-shell --jars /tmp/gimel-sql-1.2.0-SNAPSHOT-uber.jar"
  • Spark Snippet
// Imports

import com.paypal.gimel._
import com.paypal.gimel.common.catalog.{DataSetProperties,Field}
val dataset = DataSet(spark);

// Setting catalog provider as user

spark.conf.set("gimel.catalog.provider" , "USER");
spark.conf.set("gimel.logging.level" , "CONSOLE");

// Properties, that can go into either Hive TBLPROPERTIES or as a Map programmatically

val baseDetailsYoutube = Map(
  "gimel.restapi.baseURL" -> "https://www.googleapis.com/youtube"
  , "gimel.restapi.apiVersion" -> "v3"
  , "gimel.restapi.accessKey" -> "YOUR_KEY"
  , "gimel.restapi.url.pattern" -> "gimel.restapi.pattern.subscriptions"
  , "gimel.restapi.videoId" -> "F7C0xojv2fE" 
  , "gimel.restapi.channelId" -> "UCXe1qKfGweMKTnmRrMw9yOg"
  , "gimel.restapi.pattern.subscriptions" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/subscriptions/?channelId={gimel.restapi.channelId}&part=snippet%2CcontentDetails&key={gimel.restapi.accessKey}"
  , "gimel.restapi.pattern.channelsById" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels/?id={gimel.restapi.channelId}&part=snippet%2CcontentDetails%2Cstatistics&key={gimel.restapi.accessKey}"
  , "gimel.restapi.pattern.channelsByUserName" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels?key={gimel.restapi.accessKey}&forUsername={gimel.restapi.userName}&part=id"
  , "gimel.restapi.pattern.commentsByChannelId" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/comments/?parentId={gimel.restapi.channelId}&part=snippet&key={gimel.restapi.accessKey}"
  , "gimel.restapi.pattern.commentThreads" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/commentThreads/?videoId={gimel.restapi.videoId}&part=snippet%2Creplies&key={gimel.restapi.accessKey}"
)

// Constructing DataSetProperties object programmatically

val dataSetProperties = DataSetProperties("RESTAPI",Array(),Array(),baseDetailsYoutube)

// Setting dataSetProperties

val props = Map("youtube.dataSetProperties" ->dataSetProperties )

// Data API - Read

val urlData = dataset.read("youtube",  props)

// Without Parsing response PayLoad, a resulting DataFrame with just one column - "payload"

spark.conf.set("gimel.restapi.parse.payload" , "true");
val urlData = dataset.read("youtube",  props)
urlData.printSchema
root
 |-- etag: string (nullable = true)
 |-- items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- contentDetails: struct (nullable = true)
 |    |    |    |-- activityType: string (nullable = true)
 |    |    |    |-- newItemCount: long (nullable = true)
 |    |    |    |-- totalItemCount: long (nullable = true)
 |    |    |-- etag: string (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- kind: string (nullable = true)
 |    |    |-- snippet: struct (nullable = true)
 |    |    |    |-- channelId: string (nullable = true)
 |    |    |    |-- description: string (nullable = true)
 |    |    |    |-- publishedAt: string (nullable = true)
 |    |    |    |-- resourceId: struct (nullable = true)
 |    |    |    |    |-- channelId: string (nullable = true)
 |    |    |    |    |-- kind: string (nullable = true)
 |    |    |    |-- thumbnails: struct (nullable = true)
 |    |    |    |    |-- default: struct (nullable = true)
 |    |    |    |    |    |-- url: string (nullable = true)
 |    |    |    |    |-- high: struct (nullable = true)
 |    |    |    |    |    |-- url: string (nullable = true)
 |    |    |    |    |-- medium: struct (nullable = true)
 |    |    |    |    |    |-- url: string (nullable = true)
 |    |    |    |-- title: string (nullable = true)
 |-- kind: string (nullable = true)
 |-- nextPageToken: string (nullable = true)
 |-- pageInfo: struct (nullable = true)
 |    |-- resultsPerPage: long (nullable = true)
 |    |-- totalResults: long (nullable = true)
 

// With Parsing Payload into DataFrame with fields.

spark.conf.set("gimel.restapi.parse.payload" , "false");
val urlData = dataset.read("youtube",  props)
urlData.printSchema
root
 |-- payload: string (nullable = true)



// Adding additional runtime props as example to showcase overriding options

spark.conf.set("gimel.restapi.url.pattern","gimel.restapi.pattern.channelsById")
spark.conf.set("gimel.restapi.channelId", "UCXe1qKfGweMKTnmRrMw9yOg")
spark.conf.set("gimel.restapi.parse.payload" , "false");
val urlData = dataset.read("youtube",  props)
urlData.collect.foreach(println)

// Override all properties and just set the complete-URL directly

spark.conf.set("gimel.restapi.url","https://www.googleapis.com/youtube/v3/activities/?maxResults=10&channelId=UC_x5XG1OV2P6uZZ5FSM9Ttw&part=snippet%2CcontentDetails&key=AIzaSyBeYqw8TdtDjwnoXQBfxyokhUmyyxGExY0")
val urlData = dataset.read("youtube",  props)
urlData.collect.foreach(println)
  • GSQL
val ddl = """
|create external table pcatalog.youtube
|(payload string )
|LOCATION '/tmp/youtube'
|TBLPROPERTIES
|(
|  'gimel.restapi.baseURL' = 'https://www.googleapis.com/youtube'
|  ,'gimel.restapi.apiVersion' = 'v3'
|  ,'gimel.restapi.accessKey' = 'YOUR_KEY'
|  ,'gimel.restapi.url.pattern' = 'gimel.restapi.pattern.subscriptions'
|  ,'gimel.restapi.videoId' = 'F7C0xojv2fE' 
|  ,'gimel.restapi.channelId' = 'UCXe1qKfGweMKTnmRrMw9yOg'
|  ,'gimel.restapi.pattern.subscriptions' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/subscriptions/?channelId={gimel.restapi.channelId}&part=snippet%2CcontentDetails&key={gimel.restapi.accessKey}'
|  ,'gimel.restapi.pattern.channelsById' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels/?id={gimel.restapi.channelId}&part=snippet%2CcontentDetails%2Cstatistics&key={gimel.restapi.accessKey}'
|  ,'gimel.restapi.pattern.channelsByUserName' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels?key={gimel.restapi.accessKey}&forUsername={gimel.restapi.userName}&part=id'
|  ,'gimel.restapi.pattern.commentsByChannelId' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/comments/?parentId={gimel.restapi.channelId}&part=snippet&key={gimel.restapi.accessKey}'
|  ,'gimel.restapi.pattern.commentThreads' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/commentThreads/?videoId={gimel.restapi.videoId}&part=snippet%2Creplies&key={gimel.restapi.accessKey}'
|)
|"""

val gsql = com.paypal.gimel.sql.GimelQueryProcessor.executeBatch(_:String,spark)
gsql: String => org.apache.spark.sql.DataFrame = <function1>

// Create DDL
gsql(ddl)

// Set Catalog Provider Hive
gsql("set gimel.catalog.provider=HIVE")
gsql("select * from pcatalog.youtube")
  • DataSet.write
gsql("set gimel.catalog.provider=HIVE")
gsql("set gimel.logging.level=CONSOLE")
val payload = """
     | {
     |   "clusterDescription": "Test1",
     |   "clusterId": 666,
     |   "clusterName": "Test1"
     | }"""
     
     
val rdd = spark.sparkContext.parallelize(Seq(payload))
val df = spark.sqlContext.read.json(rdd)

spark.sql("set gimel.restapi.url=http://pcatalog_host:8080/cluster/cluster")
import com.paypal.gimel._
import com.paypal.gimel.common.catalog.{DataSetProperties,Field}
val dataset = DataSet(spark);
dataset.write("pcatalog.data_api_clusters",df)

Commit Guidelines

  • My commits all reference GH issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@Dee-Pac Dee-Pac force-pushed the Dee-Pac:fix-91 branch from c190895 to 0449cd5 May 16, 2018

@Dee-Pac Dee-Pac requested review from prabhu1984 , bthilakraj and theromit May 16, 2018

@Dee-Pac Dee-Pac force-pushed the Dee-Pac:fix-91 branch from 0449cd5 to 8a69edf May 16, 2018

@Dee-Pac Dee-Pac requested a review from rampallydheeraj May 16, 2018

@Dee-Pac Dee-Pac force-pushed the Dee-Pac:fix-91 branch 3 times, most recently from 3e05233 to 2c8a83b May 17, 2018

## Create Hive Table pointing to Rest table

The following hive table points to a Cassandra

This comment has been minimized.

@bthilakraj

bthilakraj May 17, 2018

Contributor

It has Cassandra here

,'gimel.restapi.pattern.commentThreads' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/commentThreads/?videoId={gimel.restapi.videoId}&part=snippet%2Creplies&key={gimel.restapi.accessKey}'
)
```

This comment has been minimized.

@bthilakraj

bthilakraj May 17, 2018

Contributor

Can we just mention this as accessKey --> as just saying generate an access key for youtube and point to youtube link rather providing this access key

This comment has been minimized.

@Dee-Pac

Dee-Pac May 17, 2018

Contributor

@bthilakraj Thanks for identifying - I've masked it.

@Dee-Pac Dee-Pac force-pushed the Dee-Pac:fix-91 branch 2 times, most recently from 82cbac7 to 25ba34c May 17, 2018

@Dee-Pac Dee-Pac requested a review from laxpatil May 17, 2018

@Dee-Pac Dee-Pac force-pushed the Dee-Pac:fix-91 branch 4 times, most recently from 3ca8176 to 7a86f6a May 17, 2018

@Dee-Pac Dee-Pac force-pushed the Dee-Pac:fix-91 branch from 7a86f6a to 8eee7be May 18, 2018

@rampallydheeraj
Copy link
Contributor

rampallydheeraj left a comment

Looks good to me 👍

@rampallydheeraj rampallydheeraj merged commit 0892c84 into paypal:master May 18, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment