Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paws Athena wrapper request #195

Closed
DyfanJones opened this issue Sep 28, 2019 · 24 comments
Closed

paws Athena wrapper request #195

DyfanJones opened this issue Sep 28, 2019 · 24 comments

Comments

@DyfanJones
Copy link
Member

Hi all,
First off this is a great package and I really excited to started working with it. I am currently working on a wrapper of the python boto3 to create a R DBI interface into athena (https://github.com/DyfanJones/RAthena). I am really keen to do the same type of wrapper using the paws package as the driver package instead.

I just a quick questions first:

  • Are you comfortable for the me start building the DBI wrapper package using paws as the driver? I am asking this as I don't want to interrupt/ stop future plans of the paws project.

Many thanks

@adambanker
Copy link
Member

That sounds like a really cool tool and we'd be thrilled if you used Paws for it! Let us know if you have any questions about Paws along the way.

@DyfanJones
Copy link
Member Author

Hi all,
From my understanding the paws package sets credentials through the environmental variables.
Even down to profiles that are set in the .aws directory. For example you use the environmental variable AWS_PROFILE to change profiles.

My question is: If I want to assume a role using profile "x", and then use that new role do I need to set those temporary credentials in the
environmental variables? Or is there a way to feed them into the application I am using similar to python's boto3 package i.e.

Method 1

# set aws profile
Sys.setenv("AWS_PROFILE" = "x")

sts <- paws::sts()
Role <- sts$assume_role(RoleArn = "arn:aws:sts::made_up_arn_role",
                                          RoleSessionName = "example_session")

Sys.setenv(AWS_ACCESS_KEY_ID = Role$Credentials$AccessKeyId)
Sys.setenv(AWS_SECRET_ACCESS_KEY = Role$Credentials$SecretAccessKey)
Sys.setenv(AWS_SESSION_TOKEN = Role$Credentials$SessionToken)

# simple example of using the athena application
athena <- paws::athena()
athena$list_named_queries()

OR

Method 2:

# set aws profile
Sys.setenv("AWS_PROFILE" = "x")

sts <- paws::sts()
Role <- sts$assume_role(RoleArn = "arn:aws:sts::made_up_arn_role",
                                          RoleSessionName = "example_session")

athena <- paws::athena(AWS_ACCESS_KEY_ID= Role$Credentials$AccessKeyId,
                       AWS_SECRET_ACCESS_KEY = Role$Credentials$SecretAccessKey,
                       AWS_SESSION_TOKEN = Role$Credentials$SessionToken)

# simple example of using the athena application
athena$list_named_queries()

@adambanker
Copy link
Member

It is interesting that you bring this up as this is a feature that we have been actively working on. In its current state, you would have to use method 1. However, we are getting close to rolling out support for passing in credentials using method 2. We hope to add that ability in this week.

@DyfanJones
Copy link
Member Author

Thanks for getting back in touch, this is great news. I will keep an eye on your package. Currently I have template for the initial dbGetQuery wrapper. I will keep the method basic for now and update and adapt it to align with your up coming changes :)

I will push to github soon, feel free to check out the progress of the wrapper. I am thinking of calling the package paws.athena to reflect the paws package being the driver.

@DyfanJones
Copy link
Member Author

Hi all,

Is there any plans to implement upload_file for s3 (in paws.storage) similar boto3s upload_file method?

import boto3

session = boto3.Session()
s3 = session.client("s3")

s3.Bucket("somebucket").upload_file(Filename = "local/iris.csv", Key = "iris.csv")

So for paws something like:

s3 <- paws::s3()

s3$upload_file(Bucket ="some_bucket",
                         Filename = "local/iris.csv",
                         Key = "iris.csv")

The reason for this request is that I am trying to create a dbWriteTable method and Athena is unable to read raw file types (to my knowledge).

In my other wrapper, I out a csv, tsv or parquet file types and upload them into s3 and then register the ddl in athena.

@adambanker
Copy link
Member

adambanker commented Sep 29, 2019

You can use the s3$put_object function to upload the csv files to S3. An example of this in action can be found here: S3 Example.

@DyfanJones
Copy link
Member Author

Thanks @adambanker that example did help

@DyfanJones
Copy link
Member Author

Hi all,

I am having difficulty uploading object to a partitioned file structure using the current method:

S3 <- paws::s3()

# create tempfile
t <- tempfile()
write.table(mtcars, t, sep = ",", row.names = FALSE, quote=FALSE)

# prepare data
t_con <- file(t, "rb")
obj <- readBin(t_con, "raw", n = file.size(x))

# upload data to s3
S3$put_object(Body = obj, Bucket = "mybucket", Key = "mtcars/timestamp=20090420/mtcars.csv")

# close file connection
close(t_con)

Returned error:

Error: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.

The reason to create a partitioned file structure is for partition tables in athena please see attached for more details:
https://docs.aws.amazon.com/athena/latest/ug/partitions.html

Is there another method for creating partitioned file structures in s3 using the paws package?

@adambanker
Copy link
Member

On first glance, on the line:

obj <- readBin(t_con, "raw", n = file.size(x))

you set n = file.size(x) but x is undefined. Do you get the same error if you change it to:

obj <- readBin(t_con, "raw", n = file.size(t))

@DyfanJones
Copy link
Member Author

hi @adambanker thanks for your reply. My apologises there was an error in my example code. After change it i still get the error when using S3$put_object with the above error. Further digging aws says the "=" might require special handling, and might need to be URL encoded or referenced as HEX.

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata

@davidkretch
Copy link
Member

Ah, sorry about that. We'll fix that tonight and have the newest version on CRAN within the next couple days.

@DyfanJones
Copy link
Member Author

@davidkretch perfect so far i am really impressed with this package and the level in response time. I should have a working DBI interface into athena really soon with all the improvements you guys have been making.

@DyfanJones
Copy link
Member Author

Hi All,

Sorry to be a pain, but I am coming across this issue. I am trying to delete objects using delete_objects

Here is some example code:

S3 <- paws::s3()
S3$delete_objects(Bucket = "mybucket",
Delete = list(Objects = list(list(Key = "subfolder/file_want_to_delete")), 
                     Quiet = F))

The returning error:

$Deleted
list()

$RequestCharged
character(0)

$Errors
$Errors[[1]]
$Errors[[1]]$Key
[1] "subfolder/file_want_to_delete"

$Errors[[1]]$VersionId
character(0)

$Errors[[1]]$Code
[1] "NoSuchVersion"

$Errors[[1]]$Message
[1] "The specified version does not exist."

From my understanding if a versionId is not provided then it should delete all versions. However it looks like it is passing VersionId character(0) and returning the above error.

The reason why delete objects is interesting as it can be used in combination with list_objects to delete objects by prefix for example:

S3 <- paws::s3()

bucket <- "mybucket"
content <- S3$list_objects(Bucket = bucket, 
                                   Prefix = "subfolder/")
content_list <- lapply(content$Contents, function(x) list(Key= x$Key))

S3$delete_objects(Bucket = bucket,
                  Delete = list(Objects = content_list, Quiet = F))

@davidkretch
Copy link
Member

We have a fix ready for this as well. We'll submit it to CRAN probably within the next day unless more comes up.

@davidkretch
Copy link
Member

@DyfanJones I'm going to close this issue if it's alright with you -- we've got another issue that covers setting the credentials as you discuss above and we're leaving that one open till we finish with it.

@davidkretch
Copy link
Member

davidkretch commented Oct 3, 2019

@DyfanJones The bug fixes for the S3 key names and deleting multiple objects are now in the version of paws.common on CRAN. paws.common is paws' low-level API interaction package. If you install the latest version of it from CRAN, let us know if you run into further issues.

@DyfanJones
Copy link
Member Author

@davidkretch Thanks for the update :) I have ran my unit tests and the paws.athena wrapper is working with the update 👍 the good news is now the package can partition athena tables with dbWriteTable and create history tables. I have developed an initial way to assume_roles (by updating the system variables). When you guys have implemented a method to pass credentials to paws object I will update the package accordingly :)

@davidkretch
Copy link
Member

@DyfanJones Cool, glad to hear it. We'll keep you posted.

Also, about the name -- we think the name paws.athena might be a little confusing, since Paws is already made up of a bunch of different packages, all of which start with "paws." (paws, paws.common, paws.compute, etc.).

I think it would be nice to have a common name pattern for helpful interfaces or abstraction layers to the AWS services e.g. Athena, but I don't know what that would be yet. I am not one for coming up with good names.

@DyfanJones
Copy link
Member Author

@davidkretch no worries :) I am happy to change to the name to help prevent confusion. This is why I asked you guys in the first place :D

I will change the package name to noctua (latin for owl). Owl is the symbol of Athena, which seems fitting :)

@davidkretch
Copy link
Member

@DyfanJones Thanks -- that is a cool name.

@DyfanJones
Copy link
Member Author

Hi all,
Just wanted to let you both know that the athena DBI wrapper noctua is now on the cran. Thanks for your quick responses. I will keep up to date with the developments of paws package and help to prompt the package through my personal blog and rbloggers.

@davidkretch
Copy link
Member

Nice! We'll also link to noctua in the paws readme.

@DyfanJones
Copy link
Member Author

Just to let you know, here is the blog: https://www.r-bloggers.com/an-amazon-sdk-for-r/ hope you enjoy the read

@davidkretch
Copy link
Member

Awesome! Thanks for the shout out! Also, if you want to contact us directly, you can email us at david.kretch@gmail.com and adam.banker39@gmail.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants