Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Miller in a join prepipe #347

Open
mfernandez-turnto opened this issue Jul 6, 2020 · 8 comments
Open

Using Miller in a join prepipe #347

mfernandez-turnto opened this issue Jul 6, 2020 · 8 comments

Comments

@mfernandez-turnto
Copy link

mfernandez-turnto commented Jul 6, 2020

I guess is more of a question than an issue, but I don't know of other places to ask. BTW Miller is the most wonderful thing that I've encountered in a a long while, I use it every day.

Here is the thing, lets say I have two files I want to join by a colun name identifier, lets call them file1 and file2.

    mlr --csv join -u --lp f2_ --rp f1_ -j identifier -f file2 file1

works great if none of the files contains a list values in identifier which would require nest, if just one (for simplicity lets say file1) has nested values in identifier, it is quite easy to fix this without creating additional files:

    mlr --csv nest --evar ',' -f identifier then join -u --lp f2_ --rp f1_ -j identifier -f file2 file1

but when both files have commas I have to resort to producing an intermediate file. I thought:

    mlr --csv nest --evar ',' -f identifier then join -u --prepipe 'mlr --csv nest --evar "," -f identifier' --lp f2_ --rp f1_ -j identifier -f file2 file1

could do what I wanted, but it did not work.
Is there a standard way of doing that?

@johnkerl
Copy link
Owner

johnkerl commented Jul 7, 2020

Miller is the most wonderful thing that I've encountered in a a long while, I use it every day.

Thanks!!! :)

Is there a standard way of doing that?

No, I think not -- I never thought of this combination of nest and join. :^/

@mfernandez-turnto
Copy link
Author

Well, the temp file is OK, still I could not guess why I cannot use mlr itself as a prepipe, is it because of the "<" redirect restriction?, just curiosity if you know off hand.

@sonicdoe
Copy link
Contributor

sonicdoe commented Nov 5, 2020

I have a similar use case and am currently working around the temporary file using process substitution:

mlr --csv join -f <(mlr --csv cat left.csv) -j id right.csv

@johnkerl
Copy link
Owner

johnkerl commented Jan 1, 2022

@mfernandez-turnto @sonicdoe it looks like process substituation is the right thing to do -- ? I'm closing this out but please let me know if I'm mistaken and we can re-open -- thank you!

@johnkerl johnkerl closed this as completed Jan 1, 2022
@sonicdoe
Copy link
Contributor

sonicdoe commented Jan 2, 2022

I agree, especially because process substitution keeps all of Miller’s flexibility. Should we document this in Questions about joins, though?

@johnkerl johnkerl reopened this Jan 2, 2022
@johnkerl johnkerl changed the title using miller in a join prepipe Using Miller in a join prepipe Mar 6, 2023
@railgauge
Copy link

Process substitution with mlr is not working for me on Windows with Cygwin.

uname -r
3.4.6-1.x86_64

csvdb="$(printf "a,b,c\n1,2,3\n4,5,6\n7,8,9")"

# https://miller.readthedocs.io/en/latest/streaming-and-memory/
# Fully streaming verbs
mlr --csv cat <(echo "$csvdb")
mlr: open /proc/self/fd/11: The system cannot find the path specified..

# Non-streaming, retaining all records
mlr --csv unsparsify <(echo "$csvdb")
mlr: open /proc/self/fd/11: The system cannot find the path specified..

# Process substitution works with commands other than mlr
cat <(echo "$csvdb")
a,b,c
1,2,3
4,5,6
7,8,9

# Works as expected when using a real file
printf "$csvdb" > ./csvdb.csv

mlr --csv cat ./csvdb.csv
a,b,c
1,2,3
4,5,6
7,8,9

mlr --csv unsparsify ./csvdb.csv
a,b,c
1,2,3
4,5,6
7,8,9

The same mlr commands work as expected for me if I switch to a Linux machine. Since process substitution works with everything except mlr, is there something different about the windows build?

@johnkerl
Copy link
Owner

@railgauge I'll check it out.

Windows is definitely different in many ways -- see also https://miller.readthedocs.io/en/latest/miller-on-windows/ -- but Cygwin smooths out many of those differences.

Can we first check, what's your mlr version output?

@railgauge
Copy link

railgauge commented Apr 21, 2023

Thanks!
I have observed this behavior with mlr 6.7.0-dev (git clone https://github.com/johnkerl/miller) and mlr 6.7.0 compiled from source.

I only compiled from source because I had a separate issue: when when I tried using miller from choco install miller or downloading the latest pre-compiled binary from github with wget https://github.com/johnkerl/miller/releases/download/v6.7.0/miller-6.7.0-windows-amd64.zip via Cygwin zsh I always get an error about a file not being found even though it does exist. Also doesn't seem to help if I specify the full file paths eg:

/cygdrive/c/Users/username/Downloads/tmp/mlr.exe --csv cat /cygdrive/c/Users/username/Downloads/tmp/csvdb.csv
C:\Users\username\Downloads\tmp\mlr.exe :  The system cannot find the path specified.

/cygdrive/c/ProgramData/chocolatey/bin/mlr.exe --csv cat ./csvdb.csv
C:\ProgramData\chocolatey\lib\miller\tools\mlr.exe :  The system cannot find the path specified.

The Windows path includes C:\ProgramData\chocolatey\bin

This seems to be something about paths since the error states it is looking for a Windows path rather than a cygwin path. If I use the choco or pre-compiled binaries in powershell they work with regular file input (also mlr 6.7.0), just not from bash/zsh/fish unless I compile from source.

In zsh I just tried $(cygpath -w ./mlr.exe) --csv cat ./csvdb.csv with the pre-compiled miller and this was able to produce expected output, so I think this supports the path theory. Still no luck with getting process substitution to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants