# Lecture 8 - CME 193 - Python scripts
So far, we've been working in Jupyter notebooks, which are nice and interactive. However, they can be clunky if you have lots of code. It can be annoying to have to run one cell at a time. For large codebases, most people work with Python scripts. The code in the cells below are meant to be run in Python scripts, not Jupyter notebooks. That means that they are saved in files that end in `.py` and run via the command line.

As a first example, save the following code in a file called `main.py`.

In [None]:
# main.py
import numpy as np
print(np.pi)

To run this script, open a terminal, change into the directory with `main.py`, and run

    python main.py
    
If all goes well, then you should see $\pi$ printed!

Running Python scripts is very similar to running a cell in a notebook, where Python executes each line in the file in a sequence. One difference is that anything you want to output needs to be explicitly `print`ed. Otherwise, you won't see any output.

One difference is that with Python scripts, it's sometimes useful to accept command line arguments. For example, if your script saves its output to a file, it might be useful to accept a filename, like:

    python main.py output.txt

To access the command line arguments from a Python script, you can use `sys.argv`, which is a list, in this case `['main.py', 'output.txt']`. We can access `output.txt` using the standard indexing notation, like `sys.argv[1]`.

In [None]:
# main.py
import sys
print(sys.argv)
print(sys.argv[1])

Reading elements of this list works fine if you only have one or two arguments. For anything more complicated, Python comes with a built-in library called `argparse`. This is beyond the scope of this class, but here is an example that accepts a command line argument called `num_iters`, which defaults to `100` if nothing is passed. You can pass a different value like this:

    python main.py --num_iters 10000

In [None]:
# main.py
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--num_iters', default=100, type=int, help='number of iterations')
args = parser.parse_args()
print(args.num_iters)

One thing that is easy in Jupyter notebooks but tricky in Python scripts is plotting. If you're lucky, the plot will still pop up, but one thing you can always do is save the plot to a file, as below.

In [None]:
# main.py
import pandas as pd

# The following two lines might be necessary, depending on your operating system. If you get
# an error message initially, you can uncomment these lines to see if it works.
# import matplotlib
# matplotlib.use('Agg')

import matplotlib.pyplot as plt

# Load the abalone dataset from lecture 7.
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data',
                   header=None, names=['sex', 'length', 'diameter', 'height', 'weight', 'shucked_weight',
                                       'viscera_weight', 'shell_weight', 'rings'])

df.plot('weight', 'rings', kind='scatter')

# This line might pop open a plot, but it might not on some computers.
plt.show()

# An alternative to showing the plot is to save it as an image.
plt.savefig('figure.png')

Sometimes it's useful to save things to disk, after we've done a long calculation for example. Here are several ways to do so:

In [None]:
# Save a Pandas dataframe.
df.to_csv('data.csv')

# Save a NumPy array.
np.save(arr, 'data.npy')

# Write text to a file. Warning: the 'w' means overwrite, which deletes anything already
# existing in that file! To append to the file instead, use 'a' instead of 'w'.
f = open('test.txt', 'w')
f.write('hello world')
f.close()

# This is how to read a file (you can treat f as a list of lines that we can iterate over).
f =  open('test.txt', 'r')
for line in f:
    print(line)
f.close()

# Closing a file when we're done saves memory. This is alternate syntax that automatically
# closes the file after executing all the indented code inside the if block.
with open('test.txt', 'r') as f:
    for line in f:
        print(line)

Often it makes sense to split Python scripts into multiple files. For example, suppose we have another file called `other.py`, where we define useful functions.

In [None]:
# other.py
import numpy as np

def compute_pi():
    return np.pi

You can access the function using the `import` command, just like other libraries.

In [None]:
# main.py
import other
print(other.compute_pi())

# Alternatively:
import other.compute_pi as compute_pi
print(compute_pi())

# To import a file in a subdirectory, for example, computations/other.py, we would use:
import computations.other as other
print(other.compute_pi())

You might encounter this weird block of code, `if __name__ == '__main__':`, in other people's Python code. It denotes code that is executed only if the file is run directly (`python other.py`), and not if it is merely imported. That way, `other.py` operates as both a standalone script that can be run or as a library file that can be imported.

In [None]:
# other.py
import numpy as np

def compute_pi():
    return np.pi

# Note that this will get executed even if this file is imported.
print('Computing pi...')

if __name__ == '__main__':
    # This only gets executed if this file is run with `python other.py`.
    print('Computing pi...')