# Working with gzipped files in Python

Since biological data files are often very large text files, they may be compressed using compression methods such as gzip. This reduces their storage requirments dramatically but can also increase the speed at which these files can be processed. 

Opening a gzipped file in Python is as easy as importing the gzip module, then using the gzip.open() function in place of the usual open(). Note that since file reading will default to assuming the file is binary, you need to specify the 'rt' parameter to ensure text format reading. 


In [None]:
import gzip

f = gzip.open("big_data_file.gz", "rt") #open, read as text
for line in f:
    print(line) #You probably want to do something more interesting than just print the line here...             
f.close()



Alternatively, you can use Python's with  statement to automatically close the file after the nested block of code. One advantage of this is that it will close the file regardless of how the nested block exits - i.e. even if the block contains return, break  or continue .

In [None]:
import gzip

with gzip.open("big_data_file.gz", "rt") as f:
    for line in f:
        if line_number == 1:
            line = line.rstrip("\n")
            barcode = line.split(":")[-1]
            print(barcode)
            line_number += 1

# Writing to gzipped files

Writing directly to a gzip compressed file is equally simple as reading from one. 

In [3]:
with gzip.open('output.txt.gz', 'wt') as outfile: #wt writes in text mode – wb for binary
    outfile.write("Test to Write" )