# Project Iteration

Real code projects change over time as developers add features, refactor code to improve organization, and write tests to make sure their code is working as intended. These iterative changes can make the difference between projects delivering reliable results or being abandoned. We will discuss some important ways that projects can be developed over time to improve usability, flexibility, and extensability.

## Python scripts

Python scripts are useful for running programs that do not need to be *interactive*; that is, once they are started, they do not require the user to do anything. They can be very powerful for completing repetitive tasks that are not well-suited to Jupyter notebooks.

We have focused on Jupyter notebooks, which are very flexible and well-suited for data analysis and visualization. However, they are relatively complicated to run. First, you have to open an IDE like Visual Studio Code or a web browser. Then you have to set up a kernel to run the notebook. Finally, you can then run commands to execute the code you want to run, either by running all cells or running individual cells manually.

Python scripts, in contrast, can be executed by running one command in a terminal. The script *interface* can be written to be flexible, allowing the user to easily change options that affect how the program runs.

### Script types

There are two basic kinds of scripts: script files and installed scripts. 

Script files are individual `.py` files that can be run using the `python` function. To use them, you must either be in the same directory where they are or know the full path to them.

Installed scripts work like any other command installed on your computer, like `python` or `pip`. You can run them just by typing the name of the script, so you don't have to specify the full path to their location. They can be executed anytime regardless of what your current directory is.

## Script files

Script files are simple to write; you just start by creating a file with a `.py` extension.

### A very simple script

Let's try making a simple script. We'll start with a script that just prints "Hello world."

Create a new file in the main directory of this project called `hello.py`, with the following contents:

```python
print("Hello world.")
```

This isn't much of a script, but it technically qualifies. Run it by opening a terminal and typing `python hello.py`.

### Using arguments

Most scripts take at least one *argument*. Arguments are used to specify something about how the script will run. Arguments come after the name of the script. Arguments are separated by spaces:

```
python myscript.py argument1 argument2 argument3 ...
```

In the code for the script, we can fetch any arguments that the user supplied using the `sys` module. The `sys.argv` variable holds whatever arguments have been passed into the current script.

Let's edit `hello.py` to take one argument, which we will call `user`.

```python
import sys
user = sys.argv[1]  # sys.argv[0] is the name of the script; arguments come after
print(f"Hello {user}.")
```

Now we can call our script with an argument. For example, we can greet Dave using `python hello.py Dave`.

### Exercise: script file

Write a new script called `describe.py` that loads a CSV file using Polars and runs `describe` to display an overview. 

Your script should take one argument that gives the path to a CSV file. For example, from the main project directory, the path to the Osth2019 dataset is `src/datascipsych/data/Osth2019.csv`. Your script should read the CSV using Polars, get a description of the dataset using the `describe` function, and print it.

Use your script to print a description of the Osth2019 dataset.

## Installed script commands

Installed scripts are commands that have been installed into your virtual environment, so that you can access them anywhere. They are a little more complicated to set up, however.

First, we need a function in one of the modules in our package that we want to turn into a script. We can use the `hello` function in the `cli` module ("cli" stands for command-line interface). The function looks like this:

```python
def hello():
    """Print a greeting."""
    if len(sys.argv) > 1:
        user = sys.argv[1]
    else:
        user = "world"
    print(f"Hello {user}.")
```

We now have an optional `user` argument. If it is not specified (which we can tell if `sys.argv` has less than two items), we will use the default setting, `"world"`.

To make the `hello` function available as a command, we must have settings in the `pyproject.toml` file to indicate the name we want for the new command and where the function can be found:

```
[project.gui-scripts]
hello = "datascipsych:cli.hello"
```

To specify where the function is, start with the package and a colon (`datascipsych:`), then the module and a dot (`cli.`), and finally the name of the function (`hello`). That gives us `"datascipsych:cli.hello"`. The text on the left side of the equal sign indicates what the new command should be named (`hello`).

Run `pip install -e .` to install the datascipsych package, including our `hello` command. When installing the package, `pip` will find our `hello` function and make it into a command that we can call from the terminal.

Open a terminal and try running `hello` and `hello Dave`. Note that, unlike when we used a script file, now we are using an installed command. That is why we don't have to write `python` first or give a full filename this time; instead we can just type `hello`.

### Using Click

Packages have been developed to make it easier to create new commandline tools using Python. Click allows you to quickly add more advanced features like optional inputs, just by adding a few lines of code before a function.

In the `cli.py` module, we have another function called `hello_click`:

```python
@click.command()
@click.option("--user", default="world", help="User to greet.")
def hello_click(user):
    print(f"Hello {user}.")
```

The `@click` statements are an example of what is called a *function decorator*. Function decorators are a newer feature of Python that allow functions to be modified. In this case, the Click package uses decorators to turn an ordinary function into a command that can get inputs from the terminal and feed them into the function.

In `pyproject.toml`, this line under `[project.gui-scripts]` sets up a new command called `hello-click`.

```
hello-click = "datascipsych:cli.hello_click"
```

Run `pip install -e .` to install the new command.

In the terminal, run `hello-click --help`. You should see a message showing the options for running the command. Click automatically puts this message together for us, based on how we have set up the function. You can customize the user with the `--user` flag. For example, try `hello-click --user Dave`.

Click has a lot of features to make it easier to define inputs to Python scripts. See the [website](https://click.palletsprojects.com/en/stable/) for details.

## Using unit tests to ensure code correctness

There are many ways to test code to ensure it is working as expected. Testing code is easier when it is well-organized into functions.

### Using assert statements

The `assert` statement makes it easy to run a check of some assumption or output from a function.

## Sharing Python packages

Python packages can be shared with others through PyPI and GitHub. Both methods make it possible for others to install your package using Pip.

### Sharing through the Python Package Index

Python packages can be published to the official Python Package Index (PyPI) to make them easily accessible to users. Packages hosted there can be installed by just running `pip install [packagename]`, where `[packagename]` is the name of your package. For example, my package for analysis of free-recall data, Psifr, can be installed by running `pip install psifr`. If you have followed the directions in this course for setting up an installable package with a `pyproject.toml` file, you have already done most of the work necessary to host a package on PyPI. See the [Python Packaging User Guide](https://packaging.python.org/en/latest/tutorials/packaging-projects/) for details.

### Sharing through GitHub

Users can also install packages from GitHub. For example, to install Psifr from the latest code on GitHub:

```bash
pip install psifr@git+https://github.com/mortonne/psifr
```

To install a package from PyPI, we only have to indicate the name of the package (for example, `psifr`). When installing from GitHub, we need to specify more information. First, we indicate the name the package should be installed under using `psifr@`. Next, `git+` indicates that we want to access a Git repository. Finally, we have the URL for the GitHub webpage for the project we want to install: `https://github.com/mortonne/psifr`. See the Pip documentation page on [VCS support](https://pip.pypa.io/en/stable/topics/vcs-support/) for details.

The full specifier, `psifr@git+https://github.com/mortonne/psifr`, can also be used in a dependency list in a `pyproject.toml` file.

### Using a third-party package

Using Pip to install Psifr from PyPI or GitHub makes it so we can now run code from modules in the Psifr package. For example, this code will load some sample data and convert it to a Polars DataFrame. After installing Psifr, uncomment the code to run it. In Visual Studio Code, you can uncomment a block of code by highlighting the code and running `Edit > Toggle Line Comment`.

In [1]:
# import polars as pl
# from psifr import fr
# df = fr.sample_data("Morton2013")
# data = pl.DataFrame(fr.merge_free_recall(df))
# data.head()

## Sharing Jupyter notebooks

### Static notebooks on GitHub

GitHub makes it easy to show people your latest results. Jupyter notebooks are automatically rendered, allowing visitors to see your code and the results that you got the last time you ran it. However, notebooks are only updated when you push changes to GitHub, and users will not be able to edit and run code themselves.

For example, the sample project has a [notebook on GitHub](https://github.com/mortonne/datascipsych-project/blob/main/jupyter/replication.ipynb) that you can view. The output of each cell shows what the results were the last time the notebook was run and changes were pushed to GitHub.

### Executable notebooks on Binder

The [Binder](https://mybinder.org/) service lets you create an interactive notebook that you can share with anyone, to let them run your code interactively. This can be a convenient way to share results with a collaborator without them having to clone your project, create a Python virtual environment, install your project, open your notebook, and specify the kernel.

To use Binder, you provide the URL for a GitHub repository and the path to a Jupyter notebook relative to the main directory of your repository. After you fill in a form with information about the repository and the location of the notebook you want to run, Binder will create an environment to run the notebook and open an interface where you can make edits and run code.

Binder currently does not support installing dependencies from a `pyproject.toml` file, unfortunately. You can either create a file in your main project directory called `requirements.txt` with one dependency per line, or you can instruct users to add code at the top of the notebook to install any necessary dependencies.

For example, to install Polars, users can add this line at the top of the notebook:

```
!pip install polars
```

The `!` indicates to Jupyter that you want to run something outside the usual Jupyter environment. It lets you run `pip` directly from inside a notebook to install Polars into the environment that is running the notebook.

To use a module from a code project on GitHub (for example, if the notebook is designed to use a module defined in the same project), you can use the same method as in the Sharing through GitHub section. For example:

```
!pip install project@git+https://github.com/mortonne/datascipsych-project
```

### Google Colab

The [Google Colab](https://colab.research.google.com/) website is another option for hosting interactive notebooks. The default kernel has many common data science packages already installed, so setup is often easier.

## Soliciting bug reports and improvements