New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soliciting FAQ material #74

Closed
johnkerl opened this Issue Oct 9, 2015 · 18 comments

Comments

Projects
None yet
3 participants
@johnkerl
Owner

johnkerl commented Oct 9, 2015

What I have so far:

  • No output at all: record separator is CRLF but file data contains LF line endings
  • Fields not being picked out: field separator doesn't match data, e.g. the file is TSV but FS is comma (default)
  • mlr put '$y=string($x);$z=$y.$y' gives (error) on numeric data such as x=123 while mlr put '$z=string($x).string($x)' does not
  • How to handle data in application log files, e.g. 2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [search] various sorts of data {& punctuation} hits=1 status=0 time=2.378

Any other roadblocks/stumpers/head-scratch-moments?

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 9, 2015

This will become a new page within http://johnkerl.org/miller/doc

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 9, 2015

@jungle-boogie

This comment has been minimized.

Contributor

jungle-boogie commented Oct 9, 2015

I would say chaining together several statements would be a good FAQ.

group by, count and give a sum on the last column.

for instance...
mlr --icsv --opprint count-distinct -f Status,Payment_Type then sort -nr count then stats1 -a sum -f Amount

This isn't working

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 9, 2015

@jungle-boogie

This comment has been minimized.

Contributor

jungle-boogie commented Oct 9, 2015

nice work!

What's the difference between icsv and csv?

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 9, 2015

icsv means csv input. ocsv means csv output. csv means both. more info at mlr -h. let me know if that could be clearer (i think it could be).

@helix84

This comment has been minimized.

helix84 commented Oct 13, 2015

Hi, I just wanted to note that the background image containing text makes actual site text hard to read (not just on the FAQ page).

@jungle-boogie

This comment has been minimized.

Contributor

jungle-boogie commented Oct 13, 2015

Hi @helix84,

What do you think would be a more suitable background?

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 14, 2015

@helix84 there are a few options here ... I can lighten the background image with content as-is; lighten the text layer of the image, or remove it entirely; or remove the entire background image. The latter would sadden my internal amateur graphic designer, but legilibility must win out for technical writing. ;)

@helix84

This comment has been minimized.

helix84 commented Oct 14, 2015

I'm now looking at the site on a different screen so I may be mistaken, but it seems to me you lightened the background. It's much more legible now. I now see another issue - 3 scrollbars (menu, content, site).

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 14, 2015

No I haven't changed the background. Must be your screen which is why I particularly value your feedback -- it needs to be legible for everyone.

There is intended to be separate scrolling for the left navpane & the main body. But I'm far from being a CSS expert. It's imperfect for me & I'm tempted to have just a single scroll ... especially if it behaves worse for other folks ... can you include a screenshot so I can see what the issue is?

Thank you!! :)

@jungle-boogie

This comment has been minimized.

Contributor

jungle-boogie commented Oct 19, 2015

Miller's filter will print matching records, but can it be used to exclude records from the output? Cut can exclude but I think that excludes the whole record, not specific data.

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 19, 2015

You can change your == to !=, < to >=, and so on. But adding a -x to negate
the sense of the filter (exclude rather than include) is a simple and
intuitive idea. Later this evening. :)

Miller's filter will print matching records, but can it be used to exclude
records from the output? Cut can exclude but I think that excludes the
whole record, not specific data.


Reply to this email directly or view it on GitHub
#74 (comment).

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 20, 2015

@jungle-boogie

This comment has been minimized.

Contributor

jungle-boogie commented Oct 20, 2015

Hi @johnkerl,

excellent work on the exclude!

I may be missing something really obvious but it seems that I can't get the output in csv format, just opprint.

If you take the sample data from here:
http://johnkerl.org/miller/doc/faq.html#How_do_I_do_arithmetic_on_fields_with_currency_symbols?

Do this:
mlr --opprint filter -x '$3 == "Reason: Payment Stopped"' sample.csv, you get 7 records with 2 excluded--as expected.

And now this:
mlr --csv --rs lf filter -x '$3 == "Reason: Payment Stopped"' sample.csv

I don't get any data even though my file is ascii:
% file sample.csv
sample.csv: ASCII text

Any variation to include csv (icsv, csv and csvlite) also doesn't seem to give me any output.

% mlr --csv filter -v -x '$3 == "Reason: Payment Stopped"' sample.csv
== (operator):
3 (field_name).
Reason: Payment Stopped (literal).

Looks like the filter works correct, though.

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Oct 21, 2015

When you do mlr --opprint then you're specifying pretty-print output format but not specifying the input format, so it defaults to DKVP. When the DKVP reader encounters a field of the form name=value then it uses that name-value pair, but absent the pair-separator, it'll use the positional index as the field name. So with DKVP input, all the fields are named 1, 2, 3, etc., there are ten records in the file, and it's appropriate to filter on $3.

When you specify CSV format, then the field names are taken from the CSV header line and the field values are taken from the subsequent data lines. With CSV format, there are nine records in the file, your third column is named Description, and it's appropriate to filter on $Description:

$ mlr --icsv --rs lf --opprint filter -x '$Description == "Reason: Payment Stopped"' sample.csv 
EventOccurred EventType    Description                               Status   PaymentType NameonAccount TransactionNumber Amount
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    John          1                 $230.36
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Fred          2                 $32.25
10/1/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Bob           3                 $39.02
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Alice         4                 $57.54
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Jungle        5                 $230.36
10/2/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Joseph        7                 $188.19
10/2/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Joseph        8                 $188.19
@jungle-boogie

This comment has been minimized.

Contributor

jungle-boogie commented Oct 21, 2015

That gave me a hint to what I was trying to do, I just didn't take it far enough.

This is what I was trying to accomplish:
mlr --rs lf --csv filter -x '$Description == "Reason: Payment Stopped"' sample.csv

@johnkerl

This comment has been minimized.

Owner

johnkerl commented Apr 15, 2017

No updates in a year and a half -- closing this as it's not an effective avenue for people to use.

@johnkerl johnkerl closed this Apr 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment