## Using stdopen
stdopen is designed to handle the logic of using STDIN/STDOUT or a file. So it is best used in stripts that can take an input from STDIN or a file and can output to STDOUT or a file. It also can write to a temp file and then seamlessly copy it over to it's desired location upon success or delete it upon failure. It is a very simple contextmanager function that just implements the logic. As it is a function and not a class it must be used as a solely contextmanager

In [1]:
# The only import you need to worry about in your code is
# `import stdopen`. All the others are for illustration
from contextlib import contextmanager
from stdopen.example_data import examples
import stdopen
import csv
import gzip
import io
import sys
import tempfile
import os
import shutil
import time

In [2]:
# Here are all the example datasets we will use test_text_path
# and compressed_test_text_path
examples.list_datasets()

[('dummy_data', 'A dummy dataset function that returns a small list.'),
 ('dummy_load_data',
  'A dummy dataset function that loads a string from a file.'),
 ('test_text_path', 'Return a file path to a compressed test file.'),
 ('compressed_test_text_path',
  'Return a file path to a compressed test file.')]

In [3]:
DELIMITER = "\t"

### Reading files
Reading from a file is pretty similar to using a normal open call and if you know that is all you ever whant to do then there is no advantage to using `stdopen`. However, if the file may sometimes be a file name and somethimes STDIN then there is.

In [4]:
# A regular file (no real difference to using open)
with stdopen.open(examples.get_data("test_text_path")) as infile:
    reader = csv.reader(infile, delimiter=DELIMITER)
    for row in reader:
        print(row)

['colA', 'colB', 'colC', 'colD']
['row1colA', 'row1colB', 'row1colC', 'row1colD']
['row2colA', 'row2colB', 'row2colC', 'row2colD']
['row3colA', 'row3colB', 'row3colC', 'row3colD']
['row4colA', 'row4colB', 'row4colC', 'row4colD']
['row5colA', 'row5colB', 'row5colC', 'row5colD']
['row6colA', 'row6colB', 'row6colC', 'row6colD']


What if the input file is gzip compressed. In that case you can supply the open `method` argument.

In [5]:
# A compressed file
with stdopen.open(examples.get_data("compressed_test_text_path"), method=gzip.open) as infile:
    reader = csv.reader(infile, delimiter=DELIMITER)
    for row in reader:
        print(row)

['colA', 'colB', 'colC', 'colD']
['row1colA', 'row1colB', 'row1colC', 'row1colD']
['row2colA', 'row2colB', 'row2colC', 'row2colD']
['row3colA', 'row3colB', 'row3colC', 'row3colD']
['row4colA', 'row4colB', 'row4colC', 'row4colD']
['row5colA', 'row5colB', 'row5colC', 'row5colD']
['row6colA', 'row6colB', 'row6colC', 'row6colD']


### Reading from STDIN
`stdopen.open` can be easily re-configured to read from STDIN by changing the file name to either of:

* `None`
* `''` (emptry string)
* `'-'` - similar to the unix command line

To demonstrate this we will have to mock some input from STDIN. You can ignore the `fake_stdin` function in your code. 

In [6]:
@contextmanager
def fake_stdin():
    """Temp overide stdin and reset after test
    """
    # Slurp all the input in
    data = io.StringIO(
        gzip.open(
            examples.get_data("compressed_test_text_path"), 'rt'
        ).read()
    )
    # Backup and overide STDIN
    bak = sys.stdin
    sys.stdin = data
    # Yield anything
    yield True
    # Restore
    sys.stdin = bak

In [7]:
# You can ignore this call
with fake_stdin():
    # Should also work with NoneType and ''
    with stdopen.open('-') as infile:
        reader = csv.reader(infile, delimiter=DELIMITER)
        for row in reader:
            print(row)

['colA', 'colB', 'colC', 'colD']
['row1colA', 'row1colB', 'row1colC', 'row1colD']
['row2colA', 'row2colB', 'row2colC', 'row2colD']
['row3colA', 'row3colB', 'row3colC', 'row3colD']
['row4colA', 'row4colB', 'row4colC', 'row4colD']
['row5colA', 'row5colB', 'row5colC', 'row5colD']
['row6colA', 'row6colB', 'row6colC', 'row6colD']


### Writing files
`stdopen` allows you to write via temp files as well. This is useful if you do not want the output file to appear until after the process has been successful.

In [8]:
# We will write each of these elements to file
my_list = ['A', 'B', 'C', 'D']

# Create a working directory in your home dir
# We can delete this later
working_dir = tempfile.mkdtemp(prefix='stdopen', dir=os.environ['HOME'])

# Test file
test_file = os.path.join(working_dir, "my_test_file.txt")

try:
    # Enseure the test file does not exist
    os.unlink(test_file)
except FileNotFoundError:
    pass

Here we demonstrate that the output file does not exist until the context manager has exited

In [9]:
# Write to the output file via a tmp file
with stdopen.open(test_file, 'wt', use_tmp=True, tmpdir=working_dir) as outfile:
    for i in my_list:
        print("Output file exists:", os.path.exists(test_file))
        outfile.writelines(i)
        time.sleep(1)
print("Output file exists:", os.path.exists(test_file))

Output file exists: False


Output file exists: False


Output file exists: False


Output file exists: False


Output file exists: True


### Writing to STDOUT
As with STDIN, we can redirect to STDOUT by altering the file name to `None`, `''`, or `'-'`. Below illustrates writing to STDOUT in text mode. In non-jupyter settings we can also write to STDOUT in binary as well.

In [10]:
# Write to the output file via a tmp file
with stdopen.open('-', 'wt') as outfile:
    for i in my_list:
        outfile.writelines(i)
        time.sleep(1)

A

B

C

D

In [11]:
# Delete the temp dir
shutil.rmtree(working_dir)