Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could DuckDBClient load (CSV) files by URL ? #379

Open
ericemc3 opened this issue Jul 18, 2023 · 1 comment
Open

Could DuckDBClient load (CSV) files by URL ? #379

ericemc3 opened this issue Jul 18, 2023 · 1 comment

Comments

@ericemc3
Copy link

I'd like an equivalent of
gentoo = d3.csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381", d3.autoType)

with DuckDBClient.

both

DuckDBClient.of({
    gentoo: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
})

and

DuckDBClient.of({
    gentoo: {
      file: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
    }
})

won't work.

But this, simulating a FileAttachment structure, will work:

db =  {
  const gentoo = {
    url : () => "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381",
    mimeType: 'text/csv',
    name: 'gentoo'
  }
  
  return DuckDBClient.of({
    gentoo: {file: gentoo}
  })
}

although it's rather complicated to memorize.

I would dream of something simple and intuitive like:

DuckDBClient.of({
    gentoo: {
      url: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381",
      fileType: "csv"
    }
})

with fileType that could also be 'json' for instance (or 'parquet', 'arrow'...).

@mbostock
Copy link
Member

mbostock commented Aug 9, 2023

That sounds reasonable to me. 👍

In theory, we could also make a HEAD request for the file to get the MIME type, and then we might be able to make the type optional if the content-type response header is present. That might allow this:

DuckDBClient.of({
  gentoo: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
})

Or this:

DuckDBClient.of({
  gentoo: {
    url: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
  }
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants