# Overview

This is meant as a supporting document for Sherry and Emmit's group meeting about basic git control, package management and testing. 

## First steps - building a new repo

In general there are two ways to set up a new git repo. If you have some code that you would like to turn into a fresh repo then you can start things from the command line. If you have a plan for a new repo but haven't done anything yet, it's likely easier to start from the github website and clone into an empty repo that you set up there. We'll start with the command line repo from an existing repository. In this case you can begin by initializing git and adding everything locally. 


In [None]:
!git init
!git add .
!git commit -m "First commit"
!git branch -M main

This will initialize local git management and make the first commit for the new project. After this we can create a new github repository and push to that. 

In [None]:
#!git remote add origin $YOUR_GIT_REPO_URL$ #for example: https://github.com/sherryli59/Ising_demo.git
!git fetch origin
!git remote add origin https://github.com/sherryli59/Ising_demo.git

!git branch --set-upsRream-to=origin/main main
!git push -u origin main

These commands will create the new remote (changing to your preferred location) and push existing work to that file. 

### Starting from GitHub

If you have to start from github, then you will initialize your directory and locally run 


In [None]:
!git clone https://github.com/sherryli59/Ising.git

## Basics of pushing and pulling

Once you have some changes that you are ready to add to an existing project, you need to do a few things to add your changes to github. You start by adding them to your local git repository with 

In [None]:
git add some_file.ext

You can add any number of files and use wildcard matching as usual. You can also use mv or rm with git appended before to duplicate other file management functions in the git record. You can see all the changes made with 

In [None]:
git status 

and you can stage every single change at once with

In [None]:
git add -A 

This will locally stage your changes. You then need commit them locally with 

In [None]:
git commit -m "some commit message"

You will need to have some commit message. At this point you can push your changes to the remote repository with 

In [None]:
git push 

If your local repository is behind the remote one, then you will need to use 

In [None]:
git pull

to move yourself into date with the remote repository. If you have changes locally that you want to discard (because you did something wrong or because you want to merge and ignore your local changes) you can use 

In [None]:
git restore some_file.ext

to discard the local changes and return the file to the status in the remote. This is a dangerous operation as it will remove local data and should be used sparingly and thoughtfully.

Since this is simpler, this is generally preferred for new projects. In either method, at this point we can begin using git to manage our changes as desired. 

## Building a package

One useful way to manage our code once we have it built is to create a package that we can install locally. This can be done by writing a setup.py file (an example can be found in the ising projects root directory). Once this is done we can build the package for use locally. This will also reflect any changes in the underlying code in python so there's no need to reinstall the package if we change the code. To install the package we should create a new conda directory as

In [None]:
!conda create -n ising -y
!conda init
!conda activate ising

It's helpful to build a package out of our project from the very beginning so we don't have to deal with relative import drama. One way to do this is through a setup.py file. We can install the package using pip and dependencies will automatically be added to our conda environment. Note that the -e flag is to make sure the package is in an editable state so that so you don't have to re-install the package every time you edit it locally.

In [None]:
!pip install -e .
# alternatively, you can run `!sudo python setup.py develop clean` to install the package in development mode

# Other useful git features

One good option to make git usage more consistent is to use pre-commit hooks. These are local instructions that will be executed whenever a commit is made. They can be stored in .git/hooks/pre-commit (there is an existing file called pre-commit.sample that provides basic funcitonality). One of the better usages of pre-commit hooks is to automatically use a code formatting software such as black or autopep8. An example of a commit to automatically apply black to python files is below. 

In [None]:
!/bin/bash

# Find all Python files in the repository
python_files=$(git ls-files | grep '\.py$')

# Check if there are any Python files
if [ -n "$python_files" ]; then
    # Run black on each Python file
    for file in $python_files; do
        black --check "$file"
        # Check if black modified the file
        if [[ $? -ne 0 ]]; then
            echo "Error: black failed to run on $file"
            exit 1
        fi
    done
fi

# Continue with the commit
exit 0

### .gitignore

.gitignore is a file that is in the root directory of all git projects. It contains a list of directories and file types that will not be included in commits unless you force them to be added. Generally this is particularly useful for exlucing \_\_pycache__ and output files that might be large in size. To excluced something you just put the name of the file on a separate line in gitignore and can use normal regex matching. 

# Unit testing 

Unit testing is a broad topic, but there are two main packages that can be used for unit testing in python. There is unittest, which is the built-in python unit testing suite and pytest, which is a separate packaged. Both work fine, pytest is generally a bit easier to use and is what we will be describing here. 

## Unit testing philosophy

Generally there are a few ways to design unit tests. Emmit's preferred approach, which is probably not the best is to have two kinds of unit tests - some smaller ones that check core functionality, and one large one that guarantees that the entire program continues to produce the same results after you've made changes. 

For the latter, generally the best bet is to seed the random number generator,  instatiate either the entire system to run some dynamics or to run the entire process on random data and store the output. From here you can then run the exact same process in the testing suite with the same seed. By simply storing the output this means that you can guarantee that any changes you make the underlying code base do not change the behavior that you actually care about (usually trajectory information). If you make changes that are expected to change the codes's functionality, then you will want to rerun the data generating file since you would expect the trajectory to change. You may have more than one of these if you have different simulation conditions, or you may just have one for the most complicated simulations you are likely to encounter. 

Smaller unittests are generally most useful when you have some sort of mathematical or physical guarantee that should hold true. If you have a conserved property, then you may want a unit test that will guarantee that property is actually being conserved as you make development decisions to implement new features. 

For most scientific code, I don't find unit testing to assert that things are object types or intermediate range testing to be useful. You mostly just need to know if your changes are actually changing code behavior for different simulations types. Testing the things that you need to know and knowing if your changes change observable behavior is generally sufficient for most purposes and avoids having to write lots of tests. 

## Basic pytest commands

Pytest (and unittest) use an assertion framework. You define tests and within each test you assert that some statement is true. If all statements are true, the test will pass, if any are false, the test will fail. The other basic structure is to use the 

In [None]:
@pytest.fixture

decorator before functions that you want to run for every test case or things you need to set up. This lets you clean up the code and not have to repeatedly define the same function many times. Fixtures are also very flexible for setting up more complicated testing regimes with variable dependencies for different tests. Any sort of advanced fixture is far beyond the scope of this talk, but pytest makes it easy to implement a variety of testing regimes. 

# Git branches
Having multiple branches in a Git repository enables isolation of Features: Each branch can represent a separate feature, bug fix, or experiment. We can work on these features independently without affecting the main codebase. In the next exercise, we want to duplicate the main branch into another branch that implements lattice gas instead. To do this, we can run

In [None]:
!git checkout -b lattice_gas main

We will add a new script to the ising package that implements a simple lattice gas model, as well as a testing script
ising/lattice_gas.py
test/test_lattice_gas.py
The scripts are under the branch lattice_gas.