Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle invalid CSV data with Serde #237

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
3 participants
@alisha17
Copy link
Contributor

alisha17 commented Jul 11, 2017

fixes #231

@alisha17 alisha17 force-pushed the alisha17:invalid_data branch from 8af57ab to ca6f3bd Jul 11, 2017

@budziq
Copy link
Collaborator

budziq left a comment

@alisha17 almost perfect!
Just few nits below:

  • please also update the master TOC in intro.md
  • please check why the travis tests are failing
  • please update the description with fixes #231 so the issue autocloses with this PR
id: u64,
}
fn run() -> Result<(), Box<Error>> {

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

please avoid using Box use the error_chain instead

Ok(())
}
fn main() {

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

please use quick_main from error-chain crate

@@ -8,6 +8,7 @@
| [Encode a string as application/x-www-form-urlencoded][ex-urlencoded] | [![url-badge]][url] | [![cat-encoding-badge]][cat-encoding] |
| [Encode and decode hex][ex-hex-encode-decode] | [![data-encoding-badge]][data-encoding] | [![cat-encoding-badge]][cat-encoding] |
| [Encode and decode base64][ex-base64] | [![base64-badge]][base64] | [![cat-encoding-badge]][cat-encoding] |
| [Handle invalid CSV data with Serde][ex-invalid-csv] | [![csv-badge]][csv] | [![cat-encoding-badge]][cat-encoding] |

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

please also add the serde badge

<a name="ex-invalid-csv"></a>
## Handle invalid CSV data with Serde

[![csv-badge]][csv] [![cat-encoding-badge]][cat-encoding]

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

please add serde badge

#[serde(rename_all = "PascalCase")]
struct Record {
Name: String,
#[serde(deserialize_with = "csv::invalid_option")]

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

this serde macro is important in avoiding deserialization errors. pleas mention it in description and add links to docs.

use std::process;
#[derive(Debug, Deserialize)]
#[serde(rename_all = "PascalCase")]

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

If this serde macro is important then please mention it in description and add link to docs

[![csv-badge]][csv] [![cat-encoding-badge]][cat-encoding]

We get deserialization errors if invalid CSV data like empty field
is encountered. Serde converts any such errors to a None value, when applied

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

if mentioning identifiers, please linkify them to docs

let data = "\
Name,Place,id
Mark,,46
Ashley,Zurich,92";

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

please add a linebreak here

}
fn run() -> Result<(), Box<Error>> {
let data = "\

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

do we need this linebreak here?

fn run() -> Result<(), Box<Error>> {
let data = "\
Name,Place,id
Mark,,46

This comment has been minimized.

@budziq

budziq Jul 11, 2017

Collaborator

this field is actually valid csv the place will be an empty string
please see the https://docs.rs/csv/1.0.0-beta.3/csv/fn.invalid_option.html for suggestion on invalid data from serde point of view.

@alisha17 alisha17 force-pushed the alisha17:invalid_data branch from 24136f3 to c1f5086 Jul 15, 2017

@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Jul 15, 2017

Hi @alisha17 thanks for all the submissions!

Unfortunately I'll not be able to follow up with anything resembling review for the next two weeks due to holidays 😞 . I may drop some comments when Im on mobile later and I've asked other maintainers to drop from time to time and look into the PR's.


[![csv-badge]][csv] [![serde-badge]][serde] [![cat-encoding-badge]][cat-encoding]

We get deserialization errors if [`invalid CSV data`] is encountered. [`Serde`] is

This comment has been minimized.

@brson

brson Jul 19, 2017

Contributor

Please remove the backticks from "invalid CSV data" - backticks are for formatting text as code and this is just regular text. Same with "Serde".


We get deserialization errors if [`invalid CSV data`] is encountered. [`Serde`] is
essential to convert any such errors to a [`None`] value, when applied to
the [`Option`] field.

This comment has been minimized.

@brson

brson Jul 19, 2017

Contributor

The second sentence here, "Serde is essential to convert any errors ..." seems off to me. This examples looks to be about using csv::invalid_option to deal with bogus CSV data.

I might instead word this as something like

CSV files often contain invalid data. For these cases the csv crate provides a custom deserializer, csv::invalid_option, which automatically converts invalid data to None values.

Place a link on csv::invalid_option, but remove the links from Some and None - those are very fundamental types that don't need links here IMO.

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 19, 2017

I added some further feedback about the description of this test. This needs a rebase.

@alisha17 alisha17 force-pushed the alisha17:invalid_data branch 2 times, most recently from 56bf92d to bffa320 Jul 20, 2017

@budziq
Copy link
Collaborator

budziq left a comment

Nice! just few more details and we are golden.

provides a custom deserializer, [`csv::invalid_option`], which automatically
converts invalid data to None values.

[`csv::invalid_option`]: https://docs.rs/csv/*/csv/fn.invalid_option.html

This comment has been minimized.

@budziq

budziq Jul 20, 2017

Collaborator

please move this link to the bottom of the file under referencje section


```rust
# #[macro_use]
#extern crate error_chain;

This comment has been minimized.

@budziq

budziq Jul 20, 2017

Collaborator

missing space after #

fn run() -> Result<()> {
let data = "name,place,id
mark,\"\",46

This comment has been minimized.

@budziq

budziq Jul 20, 2017

Collaborator

the value here is actually valid ( string containing """").
please note that in the example from docs has a i32 type trying to be deserialized both from such string and xyz which are both invalid i32.

another approach would be to actually leave out one of the fields (remove the , too).

This comment has been minimized.

@alisha17

alisha17 Jul 23, 2017

Author Contributor

If we leave out one field, it will actually give error of fields: "found record with 2 fields, but the previous record has 3 fields".

This comment has been minimized.

@budziq

budziq Jul 23, 2017

Collaborator

Hmm the docs seem of then. I'll lok into it after holidays.

ok then exchange the Optional field to non string one as almost anything you throw at it will produce a string.

fn run() -> Result<()> {
let data = "name,place,id
mark,\"\",46
ashley,zurich,92";

This comment has been minimized.

@budziq

budziq Jul 20, 2017

Collaborator

I would add two more records. one valid and another invalid in another way.

name: String,
#[serde(deserialize_with = "csv::invalid_option")]
place: Option<String>,
id: u64,

This comment has been minimized.

@budziq

budziq Jul 20, 2017

Collaborator

how about making all of the fields optional as in the docs for csv::invalid_option?

@alisha17 alisha17 force-pushed the alisha17:invalid_data branch from cd0db61 to 6d81608 Jul 23, 2017

@budziq
Copy link
Collaborator

budziq left a comment

one final thing and we are good to go.

#[derive(Debug, Deserialize)]
struct Record {
name: Option<String>,

This comment has been minimized.

@budziq

budziq Jul 23, 2017

Collaborator

do the Option's here make sense without csv::invalid_option? these might obfuscate the example.

This comment has been minimized.

@alisha17

alisha17 Jul 23, 2017

Author Contributor

I think it actually doesn't really matter. But yes, in the example it's better to keep Option with invalid field only. I will fix it.

This comment has been minimized.

@budziq

budziq Jul 23, 2017

Collaborator

yep the code is still valid but newb reader might get confused when seeing more Options than the relevant one. 👍

@alisha17 alisha17 force-pushed the alisha17:invalid_data branch from e43448d to 4fa3a38 Jul 24, 2017

@budziq

budziq approved these changes Jul 24, 2017

Copy link
Collaborator

budziq left a comment

@alisha17 Nicely done!

As previously merging will have to wait a little

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 25, 2017

I rebased and merged manually. Thanks @alisha17

@brson brson closed this Jul 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.