Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for space delimited data #14

Closed
Shimuuar opened this issue Oct 20, 2012 · 9 comments
Closed

Add support for space delimited data #14

Shimuuar opened this issue Oct 20, 2012 · 9 comments

Comments

@Shimuuar
Copy link
Contributor

It would be nice to add support for space-delimited data. It appears quite frequienty in the context of data analysis and require only minimal changes to the parser/encoder. I've started working on it.

What do you think about this idea? And how do you prefer to structure the code. Should new functions be placed into separate modules?

@tibbe
Copy link
Collaborator

tibbe commented Oct 20, 2012

In principle I'm not opposed, but it might be tricky as spaces are also allowed inside individual fields, even non-escaped ones.

@Shimuuar
Copy link
Contributor Author

There is no problem for the decoding. Space is delimitier that's all.
Encoding does have problem. Field with spaces must be escaped.

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Nov 4, 2012

Encoding is indeed a problem. CSV and space delimited data require different escaping which is handled by ToField class. Documentation is silent about that so it's possible to define buggy instances. I can see two options.

  1. Remove escaping from ToField and do it elsewhere. Only downside is performance hit — it will try to escape things which do not need escaping.
  2. Change ToField type class and choose escaping using type tag.
class ToField tag a where
  toField :: tag -> a -> Field

data CSV
data SpaceDelim

instance ToField CSV ByteString where toField _ = escapeCSV
instance ToField SpaceDelim ByteString where toField  _ = escapeSpace

--  Int do not need escaping
instance ToField a Int where toField _ = encodeInt

@tibbe
Copy link
Collaborator

tibbe commented Nov 4, 2012

The right thing is probably to handle escaping outside toField. I should probably have done it that way to begin with. I will file a separate bug so I remember to do it before the next major release, which is almost ready (I'm working on streaming parsing).

@tibbe
Copy link
Collaborator

tibbe commented Nov 12, 2012

I've done (1). Can you check if what you want to do now work in HEAD?

@Shimuuar
Copy link
Contributor Author

Thnak you. It looks like it should. I'm quite busy at the moment so it
can take some time to complete

@tibbe
Copy link
Collaborator

tibbe commented Nov 21, 2012

I'm curious if there's any more work needed in the cassava for this to work. Can't you just set decDelimiter to space or do you need any other nobs (e.g. more control over escaping)?

@Shimuuar
Copy link
Contributor Author

No. I think everything could be implemented in the current state. But
I'm quite busy now and probably wouldn't have time to work on it till
~20 December.

Changing decDelimiter won't work. Format for space delimited data is
subtly different. Separators are not single character but one or more
space or tab. Also leading and trailing whitespaces should be dropped.
I already implemented this part. What's left is encoding and
streaming.

@tibbe
Copy link
Collaborator

tibbe commented Nov 21, 2012

I will close this bug then. If there are more features you need (e.g. a fleshed-out DecodeOptions record), please open a new feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants