-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - Databricks absolute path must be written differently #516
Comments
Thank you for reporting this @ametwalli1 |
Thank you very much @hayssams, using However, I have an other trouble here. When doing the import with this setting, starlake pushes all the source files without filtering by domain. This was done by default in the DatasetArea.scala path function that would add the /$domain variable at the end of the path. But it doesn't do it anymore. So in my situation if I wanted to create the domain directories in my pending area, I thought using Do you know how can I use variables in my cluster environment ? |
Could you try {{domain}} instead and let me know |
dbfs:///mnt/pending_area/{{domain}} creates a {{domain}} directory also. The better way to solved this would be to add the /$domain at the end of the area in DatasetArea.scala line 50. |
Can We setup a call 5:30 PM? |
Sure ! Here is my email : a.metwalli@groupeonepoint.com |
Description
I try to distribute my ingestion areas on several containers in an azure data lake mounted on a dbfs path on the databricks platform (ex: /mnt/starlake).
For that I specified in environment variable
SL_AREA_PENDING=dbfs://mnt/pending_area
for example for the pending area.Problem, when you specify an absolute path on databricks with the filesystem included in it, you have to write it with a single '/'.
%fs ls dbfs://mnt/starlake
gives us this error:IllegalArgumentException: Hostname not allowed in dbfs uri. Please use 'dbfs:/' instead of 'dbfs://' in uri: dbfs://mnt/starlake
While
%fs ls dbfs:/mnt/starlake
works correctly.However I can't change the absolute path of
SL_AREA_PENDING=dbfs:/mnt/pending
_area because according to your code, in DatasetArea.scala line 49, a path is recognized as absolute only if it contains "://".So if I include the variable this way, starlake will recognize this path as relative and concatenate me SL_DATASETS with SL_AREA_PENDING : "dbfs:/mnt/datasets_area/dbfs:/mnt/pending_area"
This will cause an error at runtime.
So it might be useful to change this condition to contains(":/") or just create a specific condition for databricks.
The text was updated successfully, but these errors were encountered: