Using Miller in a join prepipe #347

mfernandez-turnto · 2020-07-06T23:47:57Z

I guess is more of a question than an issue, but I don't know of other places to ask. BTW Miller is the most wonderful thing that I've encountered in a a long while, I use it every day.

Here is the thing, lets say I have two files I want to join by a colun name identifier, lets call them file1 and file2.

    mlr --csv join -u --lp f2_ --rp f1_ -j identifier -f file2 file1

works great if none of the files contains a list values in identifier which would require nest, if just one (for simplicity lets say file1) has nested values in identifier, it is quite easy to fix this without creating additional files:

    mlr --csv nest --evar ',' -f identifier then join -u --lp f2_ --rp f1_ -j identifier -f file2 file1

but when both files have commas I have to resort to producing an intermediate file. I thought:

    mlr --csv nest --evar ',' -f identifier then join -u --prepipe 'mlr --csv nest --evar "," -f identifier' --lp f2_ --rp f1_ -j identifier -f file2 file1

could do what I wanted, but it did not work.
Is there a standard way of doing that?

The text was updated successfully, but these errors were encountered:

johnkerl · 2020-07-07T02:40:14Z

Miller is the most wonderful thing that I've encountered in a a long while, I use it every day.

Thanks!!! :)

Is there a standard way of doing that?

No, I think not -- I never thought of this combination of nest and join. :^/

mfernandez-turnto · 2020-07-07T18:27:28Z

Well, the temp file is OK, still I could not guess why I cannot use mlr itself as a prepipe, is it because of the "<" redirect restriction?, just curiosity if you know off hand.

sonicdoe · 2020-11-05T13:07:09Z

I have a similar use case and am currently working around the temporary file using process substitution:

mlr --csv join -f <(mlr --csv cat left.csv) -j id right.csv

johnkerl · 2022-01-01T00:27:29Z

@mfernandez-turnto @sonicdoe it looks like process substituation is the right thing to do -- ? I'm closing this out but please let me know if I'm mistaken and we can re-open -- thank you!

sonicdoe · 2022-01-02T14:50:09Z

I agree, especially because process substitution keeps all of Miller’s flexibility. Should we document this in Questions about joins, though?

railgauge · 2023-04-20T21:49:23Z

Process substitution with mlr is not working for me on Windows with Cygwin.

uname -r
3.4.6-1.x86_64

csvdb="$(printf "a,b,c\n1,2,3\n4,5,6\n7,8,9")"

# https://miller.readthedocs.io/en/latest/streaming-and-memory/
# Fully streaming verbs
mlr --csv cat <(echo "$csvdb")
mlr: open /proc/self/fd/11: The system cannot find the path specified..

# Non-streaming, retaining all records
mlr --csv unsparsify <(echo "$csvdb")
mlr: open /proc/self/fd/11: The system cannot find the path specified..

# Process substitution works with commands other than mlr
cat <(echo "$csvdb")
a,b,c
1,2,3
4,5,6
7,8,9

# Works as expected when using a real file
printf "$csvdb" > ./csvdb.csv

mlr --csv cat ./csvdb.csv
a,b,c
1,2,3
4,5,6
7,8,9

mlr --csv unsparsify ./csvdb.csv
a,b,c
1,2,3
4,5,6
7,8,9

The same mlr commands work as expected for me if I switch to a Linux machine. Since process substitution works with everything except mlr, is there something different about the windows build?

johnkerl · 2023-04-20T21:54:03Z

@railgauge I'll check it out.

Windows is definitely different in many ways -- see also https://miller.readthedocs.io/en/latest/miller-on-windows/ -- but Cygwin smooths out many of those differences.

Can we first check, what's your mlr version output?

railgauge · 2023-04-21T00:01:02Z

Thanks!
I have observed this behavior with mlr 6.7.0-dev (git clone https://github.com/johnkerl/miller) and mlr 6.7.0 compiled from source.

I only compiled from source because I had a separate issue: when when I tried using miller from choco install miller or downloading the latest pre-compiled binary from github with wget https://github.com/johnkerl/miller/releases/download/v6.7.0/miller-6.7.0-windows-amd64.zip via Cygwin zsh I always get an error about a file not being found even though it does exist. Also doesn't seem to help if I specify the full file paths eg:

/cygdrive/c/Users/username/Downloads/tmp/mlr.exe --csv cat /cygdrive/c/Users/username/Downloads/tmp/csvdb.csv
C:\Users\username\Downloads\tmp\mlr.exe :  The system cannot find the path specified.

/cygdrive/c/ProgramData/chocolatey/bin/mlr.exe --csv cat ./csvdb.csv
C:\ProgramData\chocolatey\lib\miller\tools\mlr.exe :  The system cannot find the path specified.

The Windows path includes C:\ProgramData\chocolatey\bin

This seems to be something about paths since the error states it is looking for a Windows path rather than a cygwin path. If I use the choco or pre-compiled binaries in powershell they work with regular file input (also mlr 6.7.0), just not from bash/zsh/fish unless I compile from source.

In zsh I just tried $(cygpath -w ./mlr.exe) --csv cat ./csvdb.csv with the pre-compiled miller and this was able to produce expected output, so I think this supports the path theory. Still no luck with getting process substitution to work.

johnkerl closed this as completed Jan 1, 2022

johnkerl reopened this Jan 2, 2022

johnkerl added the needs-documentation label Jan 2, 2022

johnkerl changed the title ~~using miller in a join prepipe~~ Using Miller in a join prepipe Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Miller in a join prepipe #347

Using Miller in a join prepipe #347

mfernandez-turnto commented Jul 6, 2020 •

edited by johnkerl

johnkerl commented Jul 7, 2020

mfernandez-turnto commented Jul 7, 2020

sonicdoe commented Nov 5, 2020

johnkerl commented Jan 1, 2022

sonicdoe commented Jan 2, 2022

railgauge commented Apr 20, 2023

johnkerl commented Apr 20, 2023

railgauge commented Apr 21, 2023 •

edited

Using Miller in a join prepipe #347

Using Miller in a join prepipe #347

Comments

mfernandez-turnto commented Jul 6, 2020 • edited by johnkerl

johnkerl commented Jul 7, 2020

mfernandez-turnto commented Jul 7, 2020

sonicdoe commented Nov 5, 2020

johnkerl commented Jan 1, 2022

sonicdoe commented Jan 2, 2022

railgauge commented Apr 20, 2023

johnkerl commented Apr 20, 2023

railgauge commented Apr 21, 2023 • edited

mfernandez-turnto commented Jul 6, 2020 •

edited by johnkerl

railgauge commented Apr 21, 2023 •

edited