-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider renaming CsvProvider #48
Comments
Related to this, I like the fact that we have a JsonValue that's useful on its own and then build a type provider on top of it to add type safety. This way we always have access to underlying JsonValue for edge cases when needed. The XmlProvider is similar, giving access to the underlying XElement. |
But:
|
I've been doing a bunch of R code lately, so I'll try to convert some of it to use FSharp.Data instead so to get a feel what would work better as a JsonValue-like API |
With the latest changes from #122, we already have a decent enough dynamic API. I did a comparison between using the type provider, using the dynamic api, and using R here: https://gist.github.com/ovatsus/5354187 One advantage the dynamic version has is that we can slice the columns directly (https://gist.github.com/ovatsus/5354187#file-csvfile-fsx-L45), but we could eventually be able to do something like that with the typed version. On both cases, the average by column is not very easy to do, unless we consider a csv file to have similar operations to a matrix, and that's not easy to do in unless all the columns are of the same time The R code is still more concise when doing filtering and mapping on the datasets, I think we have a lot of room of improvement here. FMat is able to get a Matlab/R-like syntax, maybe we could get some of that too. A possible idea would be something like this https://gist.github.com/ovatsus/5355630. I'm using the dynamic api and hardcoded a few things to make it look like the typed api. But even if we could make that work on the type provider version, I'm not very happy with it either. @tpetricek do you have any bright idea? |
I think the api is good enough for now, and the csv name is not ideal but it's ok, so I'm closing this. Let's keep things minimal until we have more real world feedback |
I know this suggestion is a little bold, but thinking about it, CsvProvider currently works not only with just csv files but also with tab separated files, or any other similar textual format, and in the future it might well support more formats of tabular data (like xls/xlsx, hdf5/netCDF4, .rdata, .mat, etc...), either directly or maybe as plugins (I have some ideas about how to make that work without changing the api or creating dependencies...). But the inference and generation of typed properties is the same between all the formats.
Both the R tools and the several Python libraries that work with all those kind of files are usually called read.table or read_table (even though they have overloads called read.csv or read_csv that the only thing they do is to set the default separator to ',')
Do you think renaming CsvProvider to TabularDataProvider would be a good idea? Or are people expecting that name and we can always make the same type provider available under other additional names (like we do with freebase and worldbank that have two versions each)?
The text was updated successfully, but these errors were encountered: