# Streams, Pipes, and Redirection 

1. What is streams?
2. How does it work?
3. Power of operating streams

## Terminal Streams:
Input >> Command >> Output

## Pipe (" | "): 
passing standard output of one program to the standard input to another program

In [1]:
!ls

README.md          Streams.ipynb      [34mcode[m[m               [34mpictures[m[m
Streams copy.ipynb bill_unpaid1.csv   [34mdata[m[m


In [2]:
!ls | wc

       7       8      79


## Input/Output Redirection ("<" or ">"):
hook process standard input/output to a file

In [3]:
!head -5 ./data/Bill.csv > ./data/bill_shortcut.csv
!cat bill_shortcut.csv

cat: bill_shortcut.csv: No such file or directory


In [4]:
!sort < ./data/bill_shortcut.csv

Connie,30,30-Jul,paid
David,20,1-Aug,paid
Jams,20,16-Jul,paid
Jim,30,20-Aug,paid
﻿Name,Bill,Time,Pay


## Append:
redirect and append output of a program to a file

In [5]:
!grep Wei ./data/Bill.csv >> ./data/bill_shortcut.csv
!cat ./data/bill_shortcut.csv

﻿Name,Bill,Time,Pay
Jams,20,16-Jul,paid
Connie,30,30-Jul,paid
David,20,1-Aug,paid
Jim,30,20-Aug,paid
Wei,20,10-Sep,unpaid
Wei,30,12-Sep,unpaid
Wei,20,15-Sep,unpaid
Wei,30,17-Sep,unpaid
Wei,20,20-Sep,unpaid


## More complex in reality:

In [6]:
!cat ./data/input.txt

1112223333
8006927753
800myapple


In [7]:
!python ./code/split_num.py < ./data/input.txt > ./data/output.txt
!cat ./data/output.txt

(111) 222-3333
(800) 692-7753
(800) mya-pple


In [8]:
!csvcut -c 1 -e latin1 ./data/bill_shortcut.csv | tail +2 | sort | uniq

Connie
David
Jams
Jim
Wei


## Power of operating streams:

In [9]:
# Data processing in python.
import pandas as pd

df = pd.read_csv('./data/Bill.csv')
df = df[df['Pay'] == 'unpaid']
df = df.iloc[0:5]
df.to_csv('./data/bill_unpaid1.csv', sep='\t', index=False)
df


Unnamed: 0,Name,Bill,Time,Pay
4,Wei,20,10-Sep,unpaid
5,Wei,30,12-Sep,unpaid
6,Wei,20,15-Sep,unpaid
7,Wei,30,17-Sep,unpaid
8,Wei,20,20-Sep,unpaid


In [10]:
# Using streams and I/O redirection.
!grep 'unpaid' ./data/Bill.csv | head -5 > ./data/bill_unpaid2.csv
!cat './data/bill_unpaid2.csv'

Wei,20,10-Sep,unpaid
Wei,30,12-Sep,unpaid
Wei,20,15-Sep,unpaid
Wei,30,17-Sep,unpaid
Wei,20,20-Sep,unpaid


In [11]:
# Even in more complicated case, we still just need one line.
!csvcut -c 1 -e latin1 ./data/Bill.csv |tail +2 |tr 'A-Z' 'a-z'| uniq -c |sort -r -n

   5 wei
   1 jim
   1 jams
   1 david
   1 connie
