# Assignment 14

For this assignment, instead of uploading a notebook, you will clone an existing code repository on GitHub, make changes to your local copy, create a zip file with your changed code, and upload it to Canvas.

First, clone the repository you will be working from. In Visual Studio Code, open a new window (`File > New Window`). In the new window, under the Start menu, select `Clone Git Repository...` and enter: `https://github.com/mortonne/datascipsych-refactor`

Choose a destination folder on your computer to clone the project to. Create a new Python virtual environment (`View > Command Palette`, then `Python: Create Environment...`, followed by `Venv`, and select a Python 3.12 interpreter). Set your interpreter to the virtual environment (`View > Command Palette`, then `Python: Select Interpreter...`, and select the virtual environment).

Follow the directions below to make changes to the project. Note that you cannot sync your changes with GitHub, because you do not have permissions to edit the `datascipsych-refactor` project. However, you can make commits to your local copy. When you are finished, create a zip file of your whole directory, and upload that to Canvas.

If you get stuck, read through the other notebooks in this directory, ask us for help in class, or ask other students for help in class or on the weekly discussion board.

## Problem: standardize code formatting (2 points)

Imagine that you have downloaded a project that someone else wrote, and see that it is not using Black formatting. Run Black on the code in `src/refactor/analysis.py` using either the VSCode extension for Black or the Black commandline tool.

### Reformat a module file (1.5 points)

To use the Black extension, open the Extensions tab in VSCode and install the Black Formatter extension from Microsoft. Then open the `analysis.py` file, right click on the code, and run `Format Document`.

To use the Black commandline tool, open the terminal, use Pip to install Black (`pip install black`), and then run `black src/refactor/analysis.py`.

### Commit your change (0.5 points)

Make a commit to the project with the formatting changes to `analysis.py`, with this message: `style: black reformatting`.

## Problem: revert part of a change (2 points)

After looking at `analysis.py` again, you decide that the way Black formatted the input to `pl.DataFrame` in the `get_sample_data` function is actually not quite how you want it. Placing each number on its own line seems like overkill and is harder to read than if they were all on the same line. Black formatting is great, but as the [Python style guide](https://peps.python.org/pep-0008/) says, "A foolish consistency is the hobgoblin of little minds." Sometimes, standardized formatting can end up being harder to read than a more customized solution.

### Revert a function to a previous version (1.5 points)

Look in your commit history in the Source Control tab (you may have to expand the Graph pane). Click on the commit where you changed the code to use Black styling; this will show the older version on the left, and the changes you made on the right. Copy the old version of the `df = pl.DataFrame...` code. Open the `analysis.py` file in the editor and replace the newer, Black-formatted version of DataFrame creation in `get_sample_data` with the older version you just copied.

### Commit your change (0.5 points)

Make a commit to the project with the changes to `analysis.py`, with this message: `style: revert reformatting of get_sample_data`.

## Problem: move a function in a notebook to a module (2 points)

Open the notebook in `jupyter/exploration.ipynb`. Note that there is a function, `condition_accuracy`, which is used to calculate accuracy in each condition in the sample dataset. You decide to move the function definition to a code module, where it can be reused more easily.

### Refactor by moving the function to a module (1.5 points)

Move the `condition_accuracy` function in the second cell to `src/refactor/analysis.py`. Edit the second cell in `exploration.ipynb` to use the `analysis.condition_accuracy` function. Run all cells in the notebook to test our your new code.

### Commit your changes (0.5 points)

Make a commit to the project with the changes to `analysis.py` and `exploration.ipynb`, with the message `refactor: move function to module`.

## Problem: use a dependency installed from GitHub (2 points)

Sometimes, we may want to get access to a Python package that is not on the Python Package Index (PyPI), which Pip installs packages from by default. We can also install Python packages that are hosted on GitHub. You decide to add the datascipsych package as a dependency, to get access to the datasets included in that package.

### Install the datascipsych package (0.5 points)

Install the datascipsych package from GitHub by adding it as a dependency in `pyproject.toml`. You can write the dependency as `"datascipsych@git+https://github.com/mortonne/datascipsych"`. To install it, open the terminal and run: `pip install -e .`

### Use a dependency function in the notebook (1 point)

Edit the `exploration.ipynb` notebook to add a cell at the bottom with the following code:

```python
from datascipsych import datasets
file = datasets.get_dataset_file("Morton2013")
fr_data = pl.read_csv(file, null_values="n/a")
fr_data.head()
```

Restart the kernel and run all cells to test the new code.

### Commit your changes (0.5 points)

Commit your changes to `pyproject.toml` and `exploration.ipynb` with the message `feat: load dataset from datascipsych package`.

## Problem: write a Python script (2 points)

You see that there is a function in `analysis.py` for converting Excel files to CSV files, but there isn't a convenient way to use the function. You decide to write a script that can be used to convert simple Excel files to CSV format.

### Write a script (1 point)

Create a new file in the main directory of the project called `convert.py`. Write a script that imports the `analysis` module from the `refactor` package. Run the `analysis.convert_excel_to_csv` function using the first argument to the script.

### Use the script (0.5 points)

Use your new script to convert the Excel file in the project. Run `python convert.py src/refactor/data/data.xlsx`. You should now see a CSV file in `src/refactor/data/data.csv`.

### Commit your changes (0.5 points)

Commit your new `convert.py` file and `data.csv` file with the message `feat: conversion script and converted data file`.

## Problem: document script usage (2 points)

To make your script easier to discover, you decide to document it in the project `README.md` file.

### Document script usage (1.5 points)

Edit the `README.md` file to add a second-level header with the text "Conversion script" (0.5 points). Under that header, add text: "The convert.py function can be used to convert Excel files to CSV format. To run it:" (0.5 points). Under the text, add a fenced code block with the following text: "python convert.py path_to_excel_file.xlsx". See the [Markdown Cheat Sheet](https://www.markdownguide.org/cheat-sheet/) for Markdown syntax.

### Commit your changes (0.5 points)

Commit your new documentation in `README.md` with the message `docs: convert script usage`.

## Problem (graduate students): make an improved script (4 points)

You decide to make a version of the conversion script that is installable, so that you can use it anywhere without your current directory having to be in the main project directory. [Installable scripts](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#creating-executable-scripts) are a feature of `pyproject.toml` files. 

You also decide to use the [Click](https://click.palletsprojects.com/en/stable/) package, which helps with quickly making flexible scripts that can take in multiple inputs and have helpful documentation. Click uses *function decorators* to add command line support to Python functions. Decorators are functions that modify functions. They are added to a function by placing `@decorator_name` in lines immediately before a function definition. Don't worry too much about the details of decorators; you can just follow the examples on the Click website.

### Define inputs to the conversion script (2 points)

First, add `click` as a dependency for the project in `pyproject.toml` (0.5 points). 

Edit the `convert_excel_to_csv` function in `analysis.py` to use Click to set it up as a command (see documentation on [adding parameters](https://click.palletsprojects.com/en/stable/quickstart/#adding-parameters)). Add a required argument, `excel_file`. Add an option, `--csv-file`, with the help string "Path to converted CSV file". You should have three `@click` function decorators above the `convert_excel_to_csv` function definition (0.5 points each).

### Set up your script (1 point)

Add the following code to your `pyproject.toml` file, below the `[project]` table:

```
[project.gui-scripts]
convert-excel = "refactor:analysis.convert_excel_to_csv"
```

This indicates that a new script called `convert-excel` should be created, calling `analysis.convert_excel_to_csv` with the arguments you set up using Click.

### Test and commit your changes (1 point)

To install your script, open a terminal and run `pip install -e .`. To see the documentation, run `convert-excel --help`. To test your script, run `convert-excel src/refactor/data/data.xlsx --csv-file data.csv`. That should create a `data.csv` file in the current directory.

Commit your new script interface, with changes to `pyproject.toml` and `analysis.py`, with the message `feat: improved convert-excel script`.

## Problem (graduate students): add unit tests to check your code (4 points)

You decide that it would be a good idea to check that your code is working correctly. You decide to use the [pytest package](https://docs.pytest.org/en/stable/) to run tests.

### Create a unit test (2 points)

First, add `pytest` as a dependency for the project in `pyproject.toml` (0.5 points). In the terminal, run `pip install -e .` to install it.

Create a subdirectory in the main project directory called `tests`. Create a file in `tests` called `test_analysis.py`.

In `test_analysis.py`, add this code (1.5 points):

```python
import polars as pl
from refactor import analysis


def test_condition_accuracy():
    """Test accuracy by condition calculation."""
    df = pl.DataFrame(
        {"condition": [1, 1, 1, 1, 2, 2, 2, 2], "correct": [0, 1, 1, 0, 1, 0, 1, 1]}
    )
    m = analysis.condition_accuracy(df)
    assert m["condition"].equals(pl.Series([1, 2]))
    assert m["correct"].equals(pl.Series([0.5, 0.75]))
```

This code sets up a test case for which we have manually calculated the correct answer (0.5 accuracy in condition 1, 0.75 accuracy in condition 2).

You can run your test by opening a terminal and running `pytest`. The test should pass.

### Create another unit test (1.5 points)

Create another unit test to check the `analysis.get_sample_data` function. Create a function called `test_sample_data`. It should call `get_sample_data` to set up a sample DataFrame called `df`. Use an assertion to check that `df.shape` is `(30, 3)`.

You can run both tests by opening a terminal and running `pytest`. Both tests should pass.

### Commit your changes (0.5 points)

Commit your change to `pyproject.toml` and unit tests in `tests/test_analysis.py` with the message: `test: unit tests of analysis module`.