# Intro to I/O

I/O stands for input/output and is extremely important for accomplishing our science goals. This is not the same type of input/output we discussed with functions. Rather, in this case I/O refers to how we retrieve external data and how we write data to external files. This is very common as data is often far too large to hand type into our programs, and we would rather not copy-paste terminal output into permanent storage files. 

Python gives us easy access to files on our computer for reading and writing. 

Accessing a file is as simple as invoking the built-in open() function. This creates a file input or output stream which we can save to a variable. When we open a file, we must tell Python what we plan to do with it: read or write. If a file is opened in read-mode, you will not be able to write to it. 

**CAUTION** Opening a file (which already exists) in write-mode will delete the contents of the file, immediately and irrevocably, without warning. Always be careful when opening a file in write-mode. This is why the default behavior of open() is to use read-mode.

If we want to write a file, we can pass 'w' as the second argument to open() (with or without the keyword mode=). If we want to write to a file which exists *without* deleting it first, we can use the mode 'r+' or 'a' for append-mode. This will allow us to read the file and add to it without wiping it first. 

Once we are done using a file, it is good coding habit to close() is using the .close() method. This ensures that Python disconnects from the file, allowing other processes on your computer to access it without issue. Generally, Python will close files automatically when the session terminates, but it's always good to keep track of what files you have open. There is a simple block syntax built into Python that can handle file cleanup and variable assignment for us using the 'with' keyword, which we'll introduce below.

The final consideration for file I/O goes back to filesystems and working dirctories. Whenever you launch Python, you do so from some working directory. Referencing files in Python is identical to how you'd reference files in bash given your working directory. Files in the same directory can be accessed simply with their file names. Files outside the working directory require either an absolute or relative path. Keep this in mind if you ever get a FileNotFoundError. Double check that you are pointing to the correct directory.

With all that said, let's open our first file. The Week_05 directory contains a data file named 'wavelength.dat', which we will use for this example.

In [None]:
# The open() function returns an I/O stream so we have to save it to a variable if we want to use it.
# Note we can use the file name directly if it lives in the same directory we launched this notebook from.
# If you have an error trying to open it, you may be in a different working directory. Run !pwd to check and 
#  modify the file name accordingly
# We need to pass our file name as a string. Otherwise, Python will try to interpret it as a variable name
# Note that 'file' is a reserved word in Python so it cannot be used as a variable name
# Since we give no 'mode' argument, it uses the default value 'r' and opens the file in read-only mode.
in_file = open("wavelength.dat")
print(type(in_file))

# To read a file in its entirety, use the .read() method. 
# This returns the file's contents as 1 large string
x = in_file.read()
print(type(x))
print(x)

# Get in the habit of closing files
in_file.close()

Now let's look at the 'with' syntax we teased before.

In [None]:
# The with keyword should be followed by some file stream object, like the one returned by open()
# We then use the 'as' keyword to give the file stream a name
# The block of code following the 'with' statement will be executed with the local variable named after 'as'
# The file will then be automatically cleaned-up/closed at the end of the block, ensuring we don't forget.
with open("wavelength.dat") as in_file:
    print(in_file.readline())

# Notice we've used a new function readline() which returns just 1 line of the file rather than the whole thing

You'll notice we used readline() instead of read() in the above example. Often times, we don't want to read an entire file all at once. We usually want to process each line independently. We can do that with readline(). Each call to readline() returns the file's contents up to the next newline character as a string. It also advances the *pointer* of the file stream to just after the final read character. This means that when we call readline() again, it will read the next line, rather than the first one again. In general, our file pointer only moves forward in the file as we read it. It can be moved back or to specific locations, but this is rarely necessary. When the pointer reaches the end of the file, readline() will return only empty strings for all subsequent calls.


## Writing

Writing is very simple. We open the file just as before, except now we set the mode to 'w' (or 'a' if we want to add to an existing file). Remember that using open() with mode='w' will immediately delete the file if it exists and replace it with a blank file. This cannot be undone.

We then use the .write() method to push text into the file. Note that unlike print(), write() does not automatically add a newline character at the end. We can add those using \n. Also note that we can use f-strings or variable names to write data rather than text to the file. It will operate just like print().

As before, and more importantly now, we must remember to close the file when we're done.

Let's practice writing a new file.

In [None]:
# Same 'with-as' syntax
with open("output.dat", 'w') as out_file:
    out_file.write("Hello, file")
    out_file.write("A second line\n") # Note the newline character
    out_file.write("A third")

In [None]:
# We are using mode='w' with a file that already exists. What's going to happen?
with open("output.dat", mode='w') as out_file:
    out_file.write("Goodbye, file")
    out_file.write("Next\n")
    out_file.write("Last line")

# Exercise

Read the data from 'wavelength.dat' and compute the redshifts of the sources assuming the rest wavelength is $1216\AA$. Write a new file that contains the redshift data. For an added challenge, try writing a file that adds the redshift as a new column alongside the old data. 

Hint: Take a look at the file first and plan how you want to process it. Also, be careful with the header!