-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write DataFrame to CSV #27
Comments
Is anyone taking a stab at this? It seems like something fairly manageable & useful to get started. I'm guessing that RFC4180 and R's write.csv should be able to get a decent start. |
Up for grabs! Go for it! You may also find the Python module's design and http://docs.python.org/library/csv.html Fortunately, writing CSV is much easier than reading! But definitely some Thanks! On Thu, Aug 9, 2012 at 2:41 AM, fpepin notifications@github.com wrote:
|
I'll take a look and see what I can come up with. |
Julia's csvwrite might be good to check out, if you haven't seen it already. |
Code found here: https://github.com/JuliaLang/julia/blob/master/base/datafmt.jl. Real CSV encoding and decoding is unfortunately significantly harder. And unstandardized. I've generally found that tab-separated values (TSV) works better with UNIX commands (though sadly, not all of them; I'm looking at you, Maybe we should create a TSV standard. My proposal is binary data (encoding implies interpretation and this is just a way to express tabular data — if you know something is text or a number, that's the next level up and requires interpretation), where tabs ('\t') delimit fields and newlines ('\n') delimit rows. Embedded tabs, newlines and backslashes get backslash escaped. That's a pretty damned simple format and completely general — any kind of data can be encoded, even binary. And it's trivial to scan and break into pieces: tab characters always delimit fields and newlines always delimit rows. CR ('\r') and any other newlineish characters are just literals. Friends don't let friends end lines with that crap. This isn't DOS or Mac OS 7. |
I dig the standard, especially considering the ease of implementation. I'd
I laughed. --Josh On Sun, Aug 12, 2012 at 11:28 PM, Stefan Karpinski <notifications@github.com
Joshua Holbrook |
One other thought: Do you specify how to parse the entries into some --Josh On Sun, Aug 12, 2012 at 11:47 PM, Joshua Holbrook
Joshua Holbrook |
Someone on G+ linked to this, actually in reference to Julia, yesterday: http://xkcd.com/927/ Python does do a pretty good job of (on reading) auto-detecting tabular data separators/line endings, and in Universal mode, character sets too. And it does a reasonable job of defining dialects for writing too. For a minimum viable pair of routines, perhaps we define a type that specifies the separator/terminator/encoding, instantiate some pre-defined dialects (Excel, Unix CSV, TSV, etc.), and use it for both reading and writing? Auto-detection can wait for another day... |
I agree with Harlan: we should start by making a DelimitedData type that lets you use commas, tabs and whitespace. While full out CSV is hard to get right, I think you'll get a long way just by accounting for quotation rules. I've always preferred TSV as a way to avoid quotations, but I'm pretty sure the majority of data we want to read will use commas. |
I really like the idea of a DelimitedData type. The extra advantage is that this can be used for both reading and writing and can encode the many parameters: delimiter, end of file, quoting/escape mechanism. Then it's just a matter of defining the main types. Based on the RFC, standard csv shouldn't be too hard but it's complex enough that once I get that one, the other ones will basically be free. As for printing numeric types, csvwrite uses print_shortest for floats and print for the rest, which seems like a decent start to me. I'm not worried too much about it because we're not really going to be using this to communicate from Julia to Julia, so just having a reasonably intuitive printed version is fine, especially if the user can overload it. |
While there's still more to be done, |
I'm going to close this because I think |
No description provided.
The text was updated successfully, but these errors were encountered: