Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load might support a cache mechanism #6

Open
femtotrader opened this issue Jul 19, 2018 · 5 comments
Open

load might support a cache mechanism #6

femtotrader opened this issue Jul 19, 2018 · 5 comments

Comments

@femtotrader
Copy link

Hello,

http://www.david-anthoff.com/jl4ds/stable/fileio.html shows usage such as

df = load("https://raw.githubusercontent.com/davidanthoff/CSVFiles.jl/master/test/data.csv") |> DataFrame

maybe a cache mechanism (such as https://github.com/helgee/RemoteFiles.jl or https://github.com/oxinabox/DataDeps.jl ) should be integrated to Queryverse (or at least its usage be better documented)

A possible API could be

df = load(RemoteFile("https://raw.githubusercontent.com/davidanthoff/CSVFiles.jl/master/test/data.csv", "data.csv")) |> DataFrame

or

df = load(RemoteFile("https://raw.githubusercontent.com/davidanthoff/CSVFiles.jl/master/test/data.csv")) |> DataFrame

So file will be stored with data.csv as filename (by default) and inside a default directory

Pinging @oxinabox and @helgee

But this is probably not the right place to suggest this... maybe FileIO?

Kind regards

@oxinabox
Copy link

oxinabox commented Jul 19, 2018

Using DataDeps with this should be trivial.

something like (not tested so maybe has typos)

using DataDeps

function __init__()
    register(DataDep("Queryverse Tests", 
        "Testing data for queryverse", 
        "https://raw.githubusercontent.com/davidanthoff/CSVFiles.jl/master/test/data.csv"
    ))
end

df = load(datadep"Queryverse Tests/data.csv") |> DataFrame

In fact,
one of the examples on my blog
is using DataDeps with ExcelFiles.jl which is part of queryverse right?

@oxinabox
Copy link

I guess I am missing why this wants to actually be part of queryverse, or FileIO.jl

Queryverses in built functionality to work directly on URL is just a convenience for when it is convenient (small data, or single use scripts).
Beyond that the user can just use other packages for external cached fetching,
and such packages are going to address a wider range of surrounding issues than trying to capture this functionality as an extension to the existing convenience functionality.

To be clear,
Is it just a docs thing?
to improve discoverability across the Julia ecosystem?

@femtotrader
Copy link
Author

Yes I think for DataDeps a doc effort is required into Queryverse to give it some highlight.

@femtotrader
Copy link
Author

femtotrader commented Jul 19, 2018

But I think some parameters of DataDep (such as Message) should be optional

@femtotrader
Copy link
Author

femtotrader commented Jul 19, 2018

With RemoteFiles.jl, it can be achieved using

using Queryverse
using RemoteFiles
data = RemoteFile("https://raw.githubusercontent.com/davidanthoff/CSVFiles.jl/master/test/data.csv")
download(data)
df = load(path(data)) |> DataFrame

but maybe a better API should be defined.

julia> load(RemoteFile("https://raw.githubusercontent.com/davidanthoff/CSVFiles.jl/master/test/data.csv")) |> DataFrame
ERROR:
MethodError: no method matching load(::RemoteFiles.RemoteFile)
Closest candidates are:
  load(::Union{AbstractString, IO}, ::Any...; options...) at /Users/femto/.julia/v0.6/FileIO/src/loadsave.jl:113
  load(::FileIO.Formatted{F}, ::Any...; options...) where F at /Users/femto/.julia/v0.6/FileIO/src/loadsave.jl:167

maybe load function should be defined
see helgee/RemoteFiles.jl#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants