Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

databricks-connect : Relative path in absolute URI #2883

Closed
martindut opened this issue Jan 11, 2021 · 4 comments · Fixed by #2884
Closed

databricks-connect : Relative path in absolute URI #2883

martindut opened this issue Jan 11, 2021 · 4 comments · Fixed by #2884
Assignees

Comments

@martindut
Copy link

martindut commented Jan 11, 2021

I can succesfully connect to Azure databricks from my local PC, with databricks-connect, but I cannot seem get the path to a mounted Azure Datalake Gen2 working. The code below all works fine if I run it on databricks in a notebook, but running it from my local RStudio doesn't work.

source_delta <- "'dbfs:/mnt/investmentaccountingdata_dev/delta/bronze/rawdata/holdings/taxhld/v1.0'"
source_delta <- "/dbfs/mnt/investmentaccountingdata_dev/delta/bronze/rawdata/holdings/taxhld/v1.0"
source_delta <- "/mnt/investmentaccountingdata_dev/delta/bronze/rawdata/holdings/taxhld/v1.0"

df_bronze <- sparklyr::spark_read_delta(sc, path = source_delta, name = tbl_name_br, memory = FALSE, overwrite = TRUE)

All 3 paths gives me the same error

Error: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: C:%5Cinndx%5Cprojects%5Cmartindut%5Cr_taxadmin%5C'dbfs:%5Cmnt%5Cinvestmentaccountingdata_dev%5Cdelta%5Cbronze%5Crawdata%5Choldings%5Ctaxhld%5Cv1.0'

It seems to try and translate the path to a local path on C:

This code works perfectly, but that's not how I want to run it

source_delta <- "dbfs:/mnt/investmentaccountingdata_dev/delta/bronze/rawdata/holdings/taxhld/v1.0"
DBI::dbExecute(sc, paste0("CREATE TABLE ", tbl_name_br, " USING DELTA LOCATION '", source_delta, "'"))

I also tried this, but cannot get it working even if I set the configs

source_delta <- "abfss://investmentaccountingdata@xxxxxxxx.dfs.core.windows.net/delta/bronze/rawdata/holdings/taxhld/v1.0"
df_bronze <- sparklyr::spark_read_delta(sc, path = source_delta, name = tbl_name_br, memory = FALSE, overwrite = TRUE)

Error: com.databricks.service.SparkServiceRemoteException: Configuration property xxxxxxx.dfs.core.windows.net not found.

I tried setting the configs:

config$fs.azure.account.auth.type.xxxxxxxx.dfs.core.windows.net <- "OAuth"
config$fs.azure.account.oauth.provider.type.xxxxxxxx.dfs.core.windows.net <- "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
config$fs.azure.account.oauth2.client.id.xxxxxxxx.dfs.core.windows.net <- "xxxxxxxxxxxxxxxxxxxx"
config$fs.azure.account.oauth2.client.secret.xxxxxxxx.dfs.core.windows.net <- "xxxxxxxx"
config$fs.azure.account.oauth2.client.endpoint.xxxxxxxx.dfs.core.windows.net <- endpoint

but no luck

First prize will be if I can just get the path to the mounted drive working, if I run the code from my local PC when connected with databricks-connect. Thanks

@martindut
Copy link
Author

The problem seems to be in sparklyr:::spark_normalize_path, Only if the path have "://" (two forward slashes), if will pass the path back as is, otherwise it will run normalizePath op the path. In my case, my path is "dbfs:/......", so it runs normalizePath on my path because it only has one forward slash.

@martindut
Copy link
Author

Sorry, was not meaning to close. Would really like this fixed

@martindut martindut reopened this Jan 11, 2021
@yitao-li yitao-li linked a pull request Jan 11, 2021 that will close this issue
@yitao-li
Copy link
Contributor

yitao-li commented Jan 11, 2021

@martindut Yep this is indeed a bug (and I blame it on ppl overlooking rules governing how URIs work in RFC3986).

I think #2884 should fix it.

@yitao-li yitao-li self-assigned this Jan 11, 2021
@martindut
Copy link
Author

@martindut Yep this is indeed a bug (and I blame it on ppl overlooking rules governing how URIs work in RFC3986).

I think #2884 should fix it.

Thanks for the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants