New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#91] [Connector] Add RestfulAPI connector to Gimel #93

Merged
merged 1 commit into from May 18, 2018
File filter...
Filter file types
Jump to file or symbol
Failed to load files and symbols.
+829 −2
Diff settings

Always

Just for now

Copy path View file
@@ -0,0 +1,211 @@

* [Rest API](#rest-api)
* [Note](#note)
* [Create rest Table](#create-rest-table)
* [Common rest Hive Table](#create-hive-table-pointing-to-rest-table)
* [Common imports](#common-imports-in-all-rest-api-usages)
* [Sample API usage](#rest-api-usage)
* [Sample API usage GSQL](#rest-api-usage-gsql)


--------------------------------------------------------------------------------------------------------------------



# Rest API

## Note

* Experimental API, meaning there is no production use-case on this yet.

--------------------------------------------------------------------------------------------------------------------

## Create Hive Table pointing to Rest table

The following hive table points to a Rest API

```sql
create external table pcatalog.youtube
(payload string )
LOCATION '/tmp/youtube'
TBLPROPERTIES
(
'gimel.restapi.baseURL' = 'https://www.googleapis.com/youtube'
,'gimel.restapi.apiVersion' = 'v3'
,'gimel.restapi.accessKey' = 'YOURKEY'
,'gimel.restapi.url.pattern' = 'gimel.restapi.pattern.subscriptions'
,'gimel.restapi.videoId' = 'F7C0xojv2fE'
,'gimel.restapi.channelId' = 'UCXe1qKfGweMKTnmRrMw9yOg'
,'gimel.restapi.pattern.subscriptions' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/subscriptions/?channelId={gimel.restapi.channelId}&part=snippet%2CcontentDetails&key={gimel.restapi.accessKey}'
,'gimel.restapi.pattern.channelsById' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels/?id={gimel.restapi.channelId}&part=snippet%2CcontentDetails%2Cstatistics&key={gimel.restapi.accessKey}'
,'gimel.restapi.pattern.channelsByUserName' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels?key={gimel.restapi.accessKey}&forUsername={gimel.restapi.userName}&part=id'
,'gimel.restapi.pattern.commentsByChannelId' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/comments/?parentId={gimel.restapi.channelId}&part=snippet&key={gimel.restapi.accessKey}'
,'gimel.restapi.pattern.commentThreads' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/commentThreads/?videoId={gimel.restapi.videoId}&part=snippet%2Creplies&key={gimel.restapi.accessKey}'
)
```

This comment has been minimized.

@bthilakraj

bthilakraj May 17, 2018

Contributor

Can we just mention this as accessKey --> as just saying generate an access key for youtube and point to youtube link rather providing this access key

This comment has been minimized.

@Dee-Pac

Dee-Pac May 17, 2018

Contributor

@bthilakraj Thanks for identifying - I've masked it.

--------------------------------------------------------------------------------------------------------------------

## Catalog Properties

| Property | Mandatory? | Description | Example | Default |
|----------|------------|-------------|------------|-------------------|
| gimel.restapi.parse.payload | N | if set to true, the resulting dataframe will show the json payload parsed into fields | true/false | false |
| gimel.restapi.use.payload | N | if set to true, only payload column from dataframe will be use to write via Post/Put | true/false | false |
| gimel.restapi.url | N | URL will be used to directly read or write without any consideration given to other properties (except above) | complete URL | Empty |

--------------------------------------------------------------------------------------------------------------------



## Common Imports in all Rest API Usages

```scala
import com.paypal.gimel._
import com.paypal.gimel.common.catalog.{DataSetProperties,Field}
val dataset = DataSet(spark);
```


## Rest API Usage

```scala
// Setting catalog provider as user
spark.conf.set("gimel.catalog.provider" , "USER");
spark.conf.set("gimel.logging.level" , "CONSOLE");
// Properties, that can go into either Hive TBLPROPERTIES or as a Map programmatically
val baseDetailsYoutube = Map(
"gimel.restapi.baseURL" -> "https://www.googleapis.com/youtube"
, "gimel.restapi.apiVersion" -> "v3"
, "gimel.restapi.accessKey" -> "YOURKEY"
, "gimel.restapi.url.pattern" -> "gimel.restapi.pattern.subscriptions"
, "gimel.restapi.videoId" -> "F7C0xojv2fE"
, "gimel.restapi.channelId" -> "UCXe1qKfGweMKTnmRrMw9yOg"
, "gimel.restapi.pattern.subscriptions" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/subscriptions/?channelId={gimel.restapi.channelId}&part=snippet%2CcontentDetails&key={gimel.restapi.accessKey}"
, "gimel.restapi.pattern.channelsById" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels/?id={gimel.restapi.channelId}&part=snippet%2CcontentDetails%2Cstatistics&key={gimel.restapi.accessKey}"
, "gimel.restapi.pattern.channelsByUserName" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels?key={gimel.restapi.accessKey}&forUsername={gimel.restapi.userName}&part=id"
, "gimel.restapi.pattern.commentsByChannelId" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/comments/?parentId={gimel.restapi.channelId}&part=snippet&key={gimel.restapi.accessKey}"
, "gimel.restapi.pattern.commentThreads" -> "{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/commentThreads/?videoId={gimel.restapi.videoId}&part=snippet%2Creplies&key={gimel.restapi.accessKey}"
)
// Constructing DataSetProperties object programmatically
val dataSetProperties = DataSetProperties("RESTAPI",Array(),Array(),baseDetailsYoutube)
// Setting dataSetProperties
val props = Map("youtube.dataSetProperties" ->dataSetProperties )
// Data API - Read
val urlData = dataset.read("youtube", props)
// Without Parsing response PayLoad, a resulting DataFrame with just one column - "payload"
spark.conf.set("gimel.restapi.parse.payload" , "true");
val urlData = dataset.read("youtube", props)
urlData.printSchema
root
|-- etag: string (nullable = true)
|-- items: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- contentDetails: struct (nullable = true)
| | | |-- activityType: string (nullable = true)
| | | |-- newItemCount: long (nullable = true)
| | | |-- totalItemCount: long (nullable = true)
| | |-- etag: string (nullable = true)
| | |-- id: string (nullable = true)
| | |-- kind: string (nullable = true)
| | |-- snippet: struct (nullable = true)
| | | |-- channelId: string (nullable = true)
| | | |-- description: string (nullable = true)
| | | |-- publishedAt: string (nullable = true)
| | | |-- resourceId: struct (nullable = true)
| | | | |-- channelId: string (nullable = true)
| | | | |-- kind: string (nullable = true)
| | | |-- thumbnails: struct (nullable = true)
| | | | |-- default: struct (nullable = true)
| | | | | |-- url: string (nullable = true)
| | | | |-- high: struct (nullable = true)
| | | | | |-- url: string (nullable = true)
| | | | |-- medium: struct (nullable = true)
| | | | | |-- url: string (nullable = true)
| | | |-- title: string (nullable = true)
|-- kind: string (nullable = true)
|-- nextPageToken: string (nullable = true)
|-- pageInfo: struct (nullable = true)
| |-- resultsPerPage: long (nullable = true)
| |-- totalResults: long (nullable = true)
// With Parsing Payload into DataFrame with fields.
spark.conf.set("gimel.restapi.parse.payload" , "false");
val urlData = dataset.read("youtube", props)
urlData.printSchema
root
|-- payload: string (nullable = true)
// Adding additional runtime props as example to showcase overriding options
spark.conf.set("gimel.restapi.url.pattern","gimel.restapi.pattern.channelsById")
spark.conf.set("gimel.restapi.channelId", "UCXe1qKfGweMKTnmRrMw9yOg")
spark.conf.set("gimel.restapi.parse.payload" , "false");
val urlData = dataset.read("youtube", props)
urlData.collect.foreach(println)
// Override all properties and just set the complete-URL directly
spark.conf.set("gimel.restapi.url","https://www.googleapis.com/youtube/v3/activities/?maxResults=10&channelId=UC_x5XG1OV2P6uZZ5FSM9Ttw&part=snippet%2CcontentDetails&key=AIzaSyBeYqw8TdtDjwnoXQBfxyokhUmyyxGExY0")
val urlData = dataset.read("youtube", props)
urlData.collect.foreach(println)
```

--------------------------------------------------------------------------------------------------------------------

## Rest API Usage GSQL

```
* GSQL
```scala
val ddl = """
|create external table pcatalog.youtube
|(payload string )
|LOCATION '/tmp/youtube'
|TBLPROPERTIES
|(
| 'gimel.restapi.baseURL' = 'https://www.googleapis.com/youtube'
| ,'gimel.restapi.apiVersion' = 'v3'
| ,'gimel.restapi.accessKey' = 'YOURKEY'
| ,'gimel.restapi.url.pattern' = 'gimel.restapi.pattern.subscriptions'
| ,'gimel.restapi.videoId' = 'F7C0xojv2fE'
| ,'gimel.restapi.channelId' = 'UCXe1qKfGweMKTnmRrMw9yOg'
| ,'gimel.restapi.pattern.subscriptions' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/subscriptions/?channelId={gimel.restapi.channelId}&part=snippet%2CcontentDetails&key={gimel.restapi.accessKey}'
| ,'gimel.restapi.pattern.channelsById' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels/?id={gimel.restapi.channelId}&part=snippet%2CcontentDetails%2Cstatistics&key={gimel.restapi.accessKey}'
| ,'gimel.restapi.pattern.channelsByUserName' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/channels?key={gimel.restapi.accessKey}&forUsername={gimel.restapi.userName}&part=id'
| ,'gimel.restapi.pattern.commentsByChannelId' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/comments/?parentId={gimel.restapi.channelId}&part=snippet&key={gimel.restapi.accessKey}'
| ,'gimel.restapi.pattern.commentThreads' = '{gimel.restapi.baseURL}/{gimel.restapi.apiVersion}/commentThreads/?videoId={gimel.restapi.videoId}&part=snippet%2Creplies&key={gimel.restapi.accessKey}'
|)
|"""
val gsql = com.paypal.gimel.sql.GimelQueryProcessor.executeBatch(_:String,spark)
gsql: String => org.apache.spark.sql.DataFrame = <function1>
// Create DDL
gsql(ddl)
// Set Catalog Provider Hive
gsql("set gimel.catalog.provider=HIVE")
gsql("select * from pcatalog.youtube")
```

--------------------------------------------------------------------------------------------------------------------

Copy path View file
Binary file not shown.
Copy path View file
@@ -27,6 +27,7 @@ Contents
| <img src="images/spark.png" width="90" height="40" /> | 2.2.0 | | | This is the recommended version |
| <img src="images/hadoop.png" width="120" height="40" /> | 2.7.3 | | | This is the recommended version |
| <img src="images/csv.png" width="60" height="60" /> | 2.7.3 | PRODUCTION | [CSV Reader Doc](gimel-connectors/hdfs-csv.md) | CSV Reader & Writer for HDFS |
| <img src="images/restapi.png" width="150" height="60" /> | 2.7.3 | PRODUCTION WITH LIMITATIONS | [Restful/Web-API Doc](gimel-connectors/restapi.md) | <br>Allows Accessing Data<br>- to any source supporting<br>- Rest API<br> |
| <img src="images/alluxio.png" width="120" height="40" /> | 2.7.3 | PRODUCTION WITH LIMITATIONS | [Cross-Cluster Doc](gimel-connectors/hdfs-crosscluster.md) | <br>Allows Accessing Data<br>- Across Clusters<br>- Allxio<br> |
| <img src="images/kafka.png" width="100" height="40" /> | 0.10.2 | PRODUCTION | [Kafka Doc](gimel-connectors/kafka.md) | V0.10.2 is the PayPal's Supported Version of Kafka|
| <img src="images/hbase.png" width="100" height="35" /> | 1.2 | PRODUCTION WITH LIMITATIONS | [HBASE Doc](gimel-connectors/hbase.md) | Leverages SHC Connector internally & also supports Batch/Get/Puts |
@@ -502,6 +502,41 @@ class GimelServiceUtilities(userProps: Map[String, String] = Map[String, String]

}

/**
* Makes a HTTPS PUT call to the URL and returns the output along with status code.
*
* @param url
* @param data
* @return (ResponseBody, Https Status Code)
*/
def httpsPut(url: String, data: String = ""): (Int, String) = {
logger.info(s"PUT request -> $url and data -> ${data}")
try {
val urlObject: URL = new URL(url)
val conn: HttpsURLConnection = urlObject.openConnection().asInstanceOf[HttpsURLConnection]
conn.setRequestProperty("Content-type", "application/json")
conn.setRequestMethod("PUT")
conn.setDoOutput(true)

val wr: DataOutputStream = new DataOutputStream(conn.getOutputStream())
wr.writeBytes(data)
wr.close()

val in: BufferedReader = new BufferedReader(new InputStreamReader(conn.getInputStream()))
val response = in.lines.collect(Collectors.toList[String]).toArray().mkString("")
in.close()

logger.info(s"PUT response is: $response")
(conn.getResponseCode, response)
} catch {
case e: Throwable =>
logger.error(e.getStackTraceString)
e.printStackTrace()
throw e
}

}

/**
* Post Implementation
*
@@ -0,0 +1,96 @@
<?xml version="1.0" encoding="UTF-8"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>gimel-dataapi</artifactId>
<groupId>com.paypal.gimel</groupId>
<version>1.2.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>gimel-restapi</artifactId>
<version>1.2.0-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>com.paypal.gimel</groupId>
<artifactId>gimel-common</artifactId>
<version>1.2.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>${scalatest.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>gimel-shaded.com.google.common</shadedPattern>
</relocation>
<relocation>
<pattern>com.sun.jersey</pattern>
<shadedPattern>gimel-shaded.com.sun.jersey</shadedPattern>
</relocation>
<relocation>
<!-- Shading hadoop transitive dependency packages -->
<pattern>org.apache.hadoop</pattern>
<shadedPattern>gimel-shaded.org.apache.hadoop</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<id>gimel-shading</id>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
Oops, something went wrong.
ProTip! Use n and p to navigate between commits in a pull request.