# Reading and writing files in Python

## Native Python

### Opening Files

To open a file in python, you can use the built-in `open(...)` function. When you open a file, you need to tell Python what you want to do with it. You do this by specifying 
the mode argument. Possible modes are:  

* __r__ - Read only.

* __w__ - Write only. Anything previously in the file will be erased 

* __a__ - Append. Anything previously in the file is kept, and new things can be written at the end of the file. 

* __r+__ - Read and write.


If you do not supply a mode argument in the open function, it will default to read.


### Writing 

You can write to a file using the `write(...)` function. The content to be written must be given as a single `str`
argument. Invisible characters such as newlines must be written explicitly in the string. 




In [10]:
# Open a file in the current directory with write access - if the file 
# already exists its contents will be overwritten. 

fileForWriting = open('platonicWisdom', 'w')
fileForWriting.write('"The beginning is the most important part of the work."\n')
# Invoke the close() method to ensure that all output operations are completed
# and acquired system resources are released.
fileForWriting.close()

In [11]:
# Open the previously created "platonicWisdom" file with append access. 
# The file already exists and its contents will be appended to. 
fileForAppending = open('platonicWisdom', 'a')
anotherPlatoQuote = '"Wise men speak because they have something to say; Fools because they have to say something."\n'
fileForAppending.write(anotherPlatoQuote)
fileForAppending.close()

In [12]:
# Assign a literal list containing a single str-type element to the 
# identifier "theBestPlatoQuotes"
theBestPlatoQuotes = ['"I have hardly ever known a mathematician who was capable of reasoning."\n', 
                     '"Honesty is for the most part less profitable than dishonesty."\n', 
                     '"He was a wise man who invented beer."\n']

fileForAppending = open('platonicWisdom', 'a')

# Invoke the writelines(...) method to insert multiple strings of characters 
# into the file. 
# A list of str-type variables is provided as the single required argument.
fileForAppending.writelines(theBestPlatoQuotes)
fileForAppending.close()

# Alternatively, you can call write() multiple times before you close the file. 
fileForAppending = open('platonicWisdom', 'a')
fileForAppending.write('"A good decision is based on knowledge and not on numbers."\n')
fileForAppending.write('"Opinions is the medium between knowledge and ignorance."\n')
fileForAppending.write('"There is no harm in repeating a good thing."\n')
fileForAppending.close()


### Reading


There are several built-in ways to read a file.

* `read(...)` - reads the whole file at once into a string.
* `readline(...)` - Read a single line from the file. 
* `readlines(...)` - Reads multiple lines from a file and puts each line as an entry in a list. 

In [14]:
fileForReading = open('platonicWisdom', 'r')
stuff = fileForReading.read()
fileForReading.close()
print stuff


"The beginning is the most important part of the work."
"Wise men speak because they have something to say; Fools because they have to say something."
"I have hardly ever known a mathematician who was capable of reasoning."
"Honesty is for the most part less profitable than dishonesty."
"He was a wise man who invented beer."
"A good decision is based on knowledge and not on numbers."
"Opinions is the medium between knowledge and ignorance."
"There is no harm in repeating a good thing."



In [15]:
fileForReading = open('platonicWisdom', 'r')

# Extract a single line from the file. The extracted line is returned
# as a str-type object.
firstLine = fileForReading.readline()

print "First Line:\n", firstLine

# Subsequent calls to readline() will start from the current position
# in the file, so to extract the second line, simply invoke readline()
# again.
secondLine = fileForReading.readline()

print "Second Line:\n", secondLine

fileForReading.close()



First Line:
"The beginning is the most important part of the work."

Second Line:
"Wise men speak because they have something to say; Fools because they have to say something."



In [19]:
fileForReading = open('platonicWisdom', 'r')

# Invoke the readlines() method to extract the file contents as
# a list of str-type instances.
allTheQuotes = fileForReading.readlines()
fileForReading.close()

print "List of Quotes:\n", allTheQuotes, "\n"

# Use the list indexing syntax to extract a subset of the extracted
# lines. 
# Note that the built-in len(...) function returns an int-type
# value corresponding to the number of elements comprising the required
# sequence-type argument.
selectedQuotes = allTheQuotes[-3:]

print "Selected Quotes:"
for quote in selectedQuotes :
    print quote
    




List of Quotes:
['"The beginning is the most important part of the work."\n', '"Wise men speak because they have something to say; Fools because they have to say something."\n', '"I have hardly ever known a mathematician who was capable of reasoning."\n', '"Honesty is for the most part less profitable than dishonesty."\n', '"He was a wise man who invented beer."\n', '"A good decision is based on knowledge and not on numbers."\n', '"Opinions is the medium between knowledge and ignorance."\n', '"There is no harm in repeating a good thing."\n'] 

Selected Quotes:
"A good decision is based on knowledge and not on numbers."

"Opinions is the medium between knowledge and ignorance."

"There is no harm in repeating a good thing."



# Numpy

A far more efficient way to read data from a file is using NumPy's `genfromtxt(...)` function. See this page for some good examples on how to use some of the parameters: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.io.genfromtxt.html 

__Parameters:__

* __filename__  - The name of the file you want to read
* __dtype__ - The datatype of the resulting array. If not specified, will be determined automatically
* __comments__ - Specify the character that indicates a commment. Lines beginning with this character will not be read
* __delimiter__ - Specify the delimiter of your file, default is whitespace. 
* __skip_header__ - The number of lines to skip at the beginning of the file. 
* __skip_footer__ - The number of lines to skip at the end of the file
* converters
* __missing_values__ - The set of strings that correspond to missing data
* __filling_values__ - The set of values to be used for missing data
* __usecols__ - Specify which columns should be read
* __names__ - Specify the names of each column 
* excludelist
* deletechars
* defaultfmt
* autostrip - Boolean value indicating whether or not to strip whitespace from variables. 
* replace_space - Replace whitespaces with a specified character
* case_sensitive - Specify whether field names should be case sensitive or not
* __unpack__ - Transpose the array so that the columns are returned as separate variables. 
* loose - Boolean value indicating if errors should be raised for invalid values
* invalid_raise - Raise an exception if inconsistency in number of columns is detected
* max_rows - Maximum number of rows to read
    


In [1]:
import numpy as np

# Attempt to read the full file.
data = np.genfromtxt('star.txt')

print data

# Read a subset of the columns, and save the results into different variables. 
date, mag, err = np.genfromtxt('star.txt', usecols=(1,2,3), unpack=True)

print date

[[           nan 5.64462941e+04 1.09640000e+01 8.40000000e-02]
 [           nan 5.64462417e+04 1.09820000e+01 8.57000000e-02]
 [           nan 5.64461956e+04 1.09750000e+01 8.32000000e-02]
 [           nan 5.64461470e+04 1.09720000e+01 8.31000000e-02]
 [           nan 5.64460936e+04 1.11000000e+01 8.17000000e-02]
 [           nan 5.64460390e+04 1.12350000e+01 9.64000000e-02]
 [           nan 5.64459864e+04 1.11030000e+01 8.36000000e-02]
 [           nan 5.64459315e+04 1.10520000e+01 8.59000000e-02]
 [           nan 5.64458878e+04 1.09870000e+01 8.25000000e-02]
 [           nan 5.64458312e+04 1.09450000e+01 8.06000000e-02]
 [           nan 5.64457824e+04 1.09160000e+01 9.54000000e-02]
 [           nan 5.64457186e+04 1.09630000e+01 8.42000000e-02]
 [           nan 5.64462941e+04 1.08370000e+01 3.10000000e-02]
 [           nan 5.64462417e+04 1.08380000e+01 3.22000000e-02]
 [           nan 5.64461956e+04 1.08470000e+01 3.12000000e-02]
 [           nan 5.64461470e+04 1.08720000e+01 3.270000

In [2]:
# Specify the data type (and name) of each column
datatype = [('filter', 'S2'), ('date', float), ('mag', float), ('err', float)]
data = np.genfromtxt('star.txt', dtype=datatype)
print data[0]
print data['mag']

('I1', 56446.2941, 10.964, 0.084)
[10.964 10.982 10.975 10.972 11.1   11.235 11.103 11.052 10.987 10.945
 10.916 10.963 10.837 10.838 10.847 10.872 10.976 11.111 11.013 10.954
 10.902 10.871 10.855 10.845]


### Writing in NumPy

In [5]:
# Say we want to create a new file with an extra column.

# First let's create our new column, in this case the "phase" of each observation
phase = np.mod(data['date']/0.53568192, 1)
print phase

# Now we create a new array containing all of the data we want. 
data_save = np.array(zip(data['filter'], data['date'], phase, data['mag'], data['err']), \
                    dtype=[('1', 'S2'), ('2', float), ('3', float), ('4', float), ('5', float)])

# Now save this into a new file. 
np.savetxt('star-new.txt', data_save, fmt='%2s %10.4f %6.4f %6.3f %5.3f')

[0.78185532 0.68403608 0.59797755 0.50725206 0.40756604 0.30563988
 0.20744729 0.10496109 0.02338283 0.91772311 0.82662428 0.70752375
 0.78185532 0.68403608 0.59797755 0.50725206 0.40756604 0.30563988
 0.20744729 0.10496109 0.02338283 0.91772311 0.82662428 0.70752375]


# Astropy Table

In [12]:
from astropy.table import Table

t = Table.read('star.txt', format='ascii')

print t

col1    col2     col3   col4 
---- ---------- ------ ------
  I1 56446.2941 10.964  0.084
  I1 56446.2417 10.982 0.0857
  I1 56446.1956 10.975 0.0832
  I1  56446.147 10.972 0.0831
  I1 56446.0936   11.1 0.0817
  I1  56446.039 11.235 0.0964
  I1 56445.9864 11.103 0.0836
  I1 56445.9315 11.052 0.0859
  I1 56445.8878 10.987 0.0825
  I1 56445.8312 10.945 0.0806
 ...        ...    ...    ...
  I2 56446.2417 10.838 0.0322
  I2 56446.1956 10.847 0.0312
  I2  56446.147 10.872 0.0327
  I2 56446.0936 10.976 0.0329
  I2  56446.039 11.111  0.029
  I2 56445.9864 11.013 0.0313
  I2 56445.9315 10.954 0.0308
  I2 56445.8878 10.902 0.0296
  I2 56445.8312 10.871 0.0301
  I2 56445.7824 10.855   0.03
  I2 56445.7186 10.845 0.0349
Length = 24 rows


In [16]:
# Sometimes, you will get data in a format that isn't easily read by a text editor. Astropy's Table module 
# is built to read these tables easily. In this example, I've downloaded multiepoch data from the Gaia 
# mission for a single star. Gaia provides files in VOTable format. 

t = Table.read('star-data.vot', format='votable')

print t.columns
print t
print t['mag']

<TableColumns names=('solution_id','source_id','band','transit_id','time','mag','flux','flux_error','flux_over_error','rejected_by_photometry','rejected_by_variability','other_flags','p1','p2','p3','pf')>
   solution_id          source_id      band ...  p2  p3         pf        
                                            ...  d   d          d         
------------------ ------------------- ---- ... --- --- ------------------
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP ...  --  -- 0.5995735454969419
369295549951641967 1546016672688675200   BP .