Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues when trying to connect to deltalake #1

Closed
nfoerster2 opened this issue Jan 16, 2024 · 6 comments
Closed

Issues when trying to connect to deltalake #1

nfoerster2 opened this issue Jan 16, 2024 · 6 comments

Comments

@nfoerster2
Copy link

Your PR is especially interested, we tried to test the deltalake write for a local azurite blob store, however we did not yet managed to work. If we use a local path for location everything looks fine, but as soon we are referring azurite emulated blob storage we get this message:
IO Error: Cannot open file "abfs://deltalake/vw_cf_errors": No such file or directory

{{ config(
    materialized='external',
    plugin = 'delta',
    location = 'abfs://deltalake/vw_cf_errors',
    mode = "overwrite",
    storage_options = {
        "account_name": "devstoreaccount1",
        "account_key": "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==",
        "endpoint": "http://127.0.0.1:10000/",
        "connection_string": "AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;DefaultEndpointsProtocol=http;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;QueueEndpoint=http://127.0.0.1:10001/devstoreaccount1;TableEndpoint=http://127.0.0.1:10002/devstoreaccount1;"
    }
) }}
select * from {{source('az','errors')}}

we added the storage_options later on, but nothing changed. We also tried:

    location = 'abfs://deltalake/vw_cf_errors',
    location = 'abfss://deltalake/vw_cf_errors',
    location = 'abfs://deltalake@http://127.0.0.1:10000/devstoreaccount1/vw_cf_errors',
    location = 'abfss://deltalake@http://127.0.0.1:10000/devstoreaccount1/vw_cf_errors',
    location = 'abfs://deltalake@http://127.0.0.1:10000/devstoreaccount1',
    location = 'abfss://deltalake@http://127.0.0.1:10000/devstoreaccount1',

The container is existing. Any idea what goes wrong?

@milicevica23
Copy link
Collaborator

milicevica23 commented Jan 16, 2024

Hi @nfoerster2
I never tried with the emulator so dont know it by heart
So my guess is that we would have to try it first out with delta rs on its own and if we can make it there then this would also work here
Here https://delta-io.github.io/delta-rs/usage/loading-table/ is stated that they use this rust package https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variant.UseEmulator and i would argue that such a config is missing
So you can see it here that i give storage options to the deltalake/delta-rs https://github.com/milicevica23/dbt-duckdb/blob/43fce24b93627e72f680004987bcfb33b0e2aedc/dbt/adapters/duckdb/plugins/delta.py#L105

So i hope that helps to move forward otherwise i can try it out and come back to you when i know more

@nfoerster2
Copy link
Author

Hey @milicevica23,

thanks for your fast feedback.

I tested already reading from a deltalake on emulator and I defined parquet tables as sources on emulator, both worked. However the Jinja Code(is it Jinja Code, the stuff with {{?) seems to not use environment variables correctly, specifying the storage options does not make a difference. I tried to debug it, but if setting the abfs path, the dbt doesnt even reach the storage function you wrote. From failure message it seems that the tmp file can not be written, either because the definition of location is wrong or the blob storage can not be reached.

I checked it also on real blob storage, same issue. But your azure example should work?

@milicevica23
Copy link
Collaborator

milicevica23 commented Jan 16, 2024

Hi @nfoerster2,
Last week i tried to incorporate the delta function into the standard matirialization, before that i had one which was not disturbed by other external implementations. It can be that i broke the flow. So let me check it on my side, will come back with some answers

My example worked the week before but with the newest one i havent tried with the azure, i will also have to write some unit tests with azurite so thank you for speeding me up :D

@milicevica23
Copy link
Collaborator

milicevica23 commented Jan 16, 2024

Hi @nfoerster2,
you were right about your assumptions, the problem is in the current implementation because we save the parquet file independent on the plugin and therefore when you write abfss the duckdb command copty into abfss is called and this path doesn't exists on your local machine. When i find a bit of free time i will post in the initial pull request a way forward and how we have to refactor to solve that problem.
In short sentence we have an intermediate step which saves data in parquet file which doesn't work if you point locally

Independent i added again to dbt-duckd branch the external_table materialization and made it work with azure and azurite (don't like the way how has to be written but it works) i pushed both examples to the repository. You can test and experiment with the plugin without burden of the old implementation. Please be aware that in the near feature i will refactor code and that from external_table should be rewritten just external

Please feel free to open all the issues which you find along the way and be aware that there are still things to implement. Every feedback is valuable, makes impact and moves this work forward :)
Thank you

@nfoerster2
Copy link
Author

@milicevica23 you are totally right, switching external to external_table solves the issue for now. Understood your explanation, I will keep track on your PR, thank you very much for the fast response and solution.

@milicevica23
Copy link
Collaborator

Feel free to open a new issue for some missing functionality or problems which you find along 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants