# *autodiff_NARS* Documentation
#### Group members: Nora Hallqvist, Anna Midgley, Robin Robinson, Sebastian Weisshaar

### Table of Contents:
1. Introduction
2. Background
3. How to Use *autodiff_NARS*
4. Software Organization
5. Implementation Details
6. Extension: Reverse Mode
7. Broader Impact and Sensitivity Statement
8. Future Extensions/Applications


# 1. Introduction

There are four main approaches to computing the derivative which are namely manual, symbolic, numerical, or automatic differentiation. Manual differentiation is the calculation of the derivative expression by hand and its coding. It is time-consuming and prone to error especially as systems become more complex. Numerical differentiation is the finite difference approximation of the derivative using values of the original function evaluated at points. Its advantage is that it is easy to implement whilst its disadvantage is truncation and round-off errors. Symbolic differentiation uses packages such as Mathematica, to produce the exact derivative expression as its output. It addresses the weaknesses of manual and numerical differentiation; however, it frequently results in “expression swell”. Expression swell is the phenomenon of much larger representation of a derivative as opposed to representation of the original function. Symbolic differentiation does not consider the fact that many sub-expressions in the different derivative expressions are common. This leads to inefficiency in its calculation. Symbolic and manual differentiation require the model to be defined as closed-form expressions which limits the expressivity of the model as it cannot use branches or loops. The fourth technique, automatic differentiation (AD), obtains the exact result of the derivative by breaking down the expression into elementary arithmetic operations and elementary functions and applying the chain rule to the derivatives of these operations.


# 2. Background

### **2.1 Automatic Differentiation**

The key idea of Automatic Differentation is to decompose the calculations into elementary functions and  evaluate the functions derivative by combining each elementary function's derivative by using the chain rule.

Important components of Automatic Differenation include the chain rule, computational graph, evaluation trace, the seed vector and dual numbers, all of which are described below:


### **2.2 Chain Rule**

The **chain rule** is a method to differentiate composite functions. This is especially useful in automatic differentation when extended to the multivariate case.

For example, suppose we have a multivariate function $f(u_{1}(t),u_{2}(t))$ and we want to evaluate the derivative $f$ with respect $t$: 

$$ \frac{\partial f(u_{1}(t),u_{2}(t))}{\partial t} = \frac{\partial f}{\partial u_{1}}\frac{\partial u_{1}}{\partial t} + \frac{\partial f}{\partial u_{2}}\frac{\partial u_{2}}{\partial t}$$

The above can also be represented in vector notation. Replacing $t$ with $x \in R^{m}$, the chain rule can more elegantly be written in terms of the differential vector operator $\nabla$ with respect to the vector $x$.

$$\nabla_{x}f = \frac{\partial f}{\partial u_{1}}\nabla_{x}u_{1}  + \frac{\partial f}{\partial u_{2}}\nabla_{x}u_{2}  \text{ where } f = f(u_{1}(x_{1},...,x_{m}),u_{2}(x_{1},...,x_{m}))$$


Automatic differentiation commonly involves several dependent variables (i.e intermediate steps), hence we need to generalize the chain rule to support a function $f$ of $n$ other functions i.e 

$$f(u_{1}(x_{1},...,x_{m}),u_{2}(x_{1},..,x_{m}),..,u_{n}(x_{1},..,x_{m}))$$

Now the gradient of $f$ is instead given by: 

$$\nabla_{x}f = \sum_{i=1}^{n} \frac{\partial f}{\partial u_{i}(x)} \nabla_{x}u_{i}(x) \text{ where } x \in R^{m}$$




### **2.3 Compuational Graph**

Automatic differentiation can be represented as a graph structure. This is done by decomposing the function into its primitive operations (e.g binary arithmic operations and unary operations). The computation graph is then simply constructed by representing each elementary operation as a node, and connecting each node by an edge. 

By convention the nodes are represented by the following variables:  
- Input variables : $v_{i-m} = x_{i}$ , $i = 1,...,m$
 - Intermediate variables: $v_{i}$ , $i = 1,..,n$

An example of a computational graph is shown below for the function $f(x)=sin(x_1x_2) + x_2$,

![](./images/computationalgraph_example.png)

### **2.4 Evaluation Trace**

The evaluation trace stores the intermediate results $v_{i}$ evaluated at a given point, and includes a tangent trace, which records the directional derivative of each intermediate variable $v_{i}.$ The final derivative of the original function is calculated by combining derivatives of the intermediate results by using the chain rule.

Below is an example of an evaluation trace table using the previous example function and computational graph:  
  
| **Primal Trace**  | **$f(x)$**              |
|-------------------|-------------------------|
| $v_{-1} = x_1$    | $v_{-1} = x_1$          |
| $v_{0} = x_2$     | $v_{0} = x_2$           |
| $v_1 = v_{-1}v_0$ | $v_1 = x_1x_2$          |
| $v_2 = sin(v_1)$  | $v_2 = sin(x_1x_2)$     |
| $v_3 = v_2 +v_0$  | $v_3 = sin(x_1x_2)+x_2$ |
  
<br>

| **Tangent Trace**                                                               | $\frac{\partial f}{\partial x_1}$ --> Direction: $p = [1, 0]^T$ | $\frac{\partial f}{\partial x_2}$ --> Direction: $p = [0, 1]^T$          |
|---------------------------------------------------------------------------------|-----------------------------------------------------------------|--------------------------------------------------------------------------|
| $D_pv_{-1} = p_1$                                                               | $D_pv_{-1} = 1$                                                 | $D_pv_{-1} = 0$                                                          |
| $D_pv_{0} = p_2$                                                                | $D_pv_{0} = 0$                                                  | $D_pv_{0} = 1$                                                           |
| $D_pv_{1} = v_{-1}D_pv_{0} + v_{0}D_pv_{-1} = x_1p_2 + x_2 p_1$                 | $D_pv_{1} =x_2 p_1 = x_2$                                       | $D_pv_{1} =x_1 p_0 = x_1$                                                |
| $D_pv_{2} = cos(v_1)D_pv_{1}= cos(x_1 x_2)(x_1p_2 + x_2 p_1)$            | $D_pv_{2} = cos(x_1x_2)(x_2 p_1) = cos(x_1 x_2)(x_2)$    | $D_pv_{2} = cos(x_1 x_2)(x_1 p_2)$                                       |
| $D_pv_{3} = D_pv_{2} + D_pv_{0} = cos(x_1 x_2)(x_1p_2 + x_2 p_1) + p_2$  | $D_pv_{3} = cos(x_1 x_2)(x_2)$                                  | $D_pv_{3} = cos(x_1 x_2)(x_1 p_2) + p_2 = cos(x_1 x_2)(x_1) + 1$  |


  
### **2.5 Seed vector**
The methods mentioned above describe how to calculate the derivative of a function using a computational graph. However, we also have to consider the direction of the derivative. We could be interested in the derivative in the direction of a specific input variable or in a different direction. To find the derivative in the direction that we are interested in we input a seed vector into the graph. This seed vector described the direction of the desired derivative and is used in the various steps of calculating the derivative. For example, if we are interested in $\frac{\partial f(x_1,x_2)}{\partial x_1}$ we would input $[1,0]^T$ as our seed vector to indicate the direction of the derivative we are interested in. 
  

### **2.6 Dual Numbers**

A dual number is given by the expression:

$$z = a + b \epsilon \text{  such that  }  \epsilon>0 \text{ and }  \epsilon^{2}=0$$


Similiar to complex numbers, dual numbers adhere to the following rules for addition and multiplication:

$$z_{1} + z_{2} = (a_{1}  + a_{2})  + (b_{1} + b_{2}) \epsilon$$
$$z_{1} z_{2} = (a_{1} *a_{2})  + (a_{1}*b_{2} + a_{2}*b_{1}) \epsilon$$


Let the real part be equal to the functions $f_{1}(x) \text{ and } f_{2}(x)$ (i.e $a_{1} = f_{1}(x) \text{ and } a_{2} = f_{2}(x) $ ) and the dual part be equal to their respective derivatives (i.e $ b_{1} = f_{1}'(x) \text{ and } b_{2} = f_{2}'(x)$). Then we get the following expressions: 

$$z_{1} + z_{2} = (f_{1}  + f_{2})  + (f_{1}'(x) + f_{2}'(x)) \epsilon$$
$$z_{1} z_{2} = (f_{1}f_{2})  + (f_{1}f_{2}'(x) + f_{2}f_{1}'(x)) \epsilon$$


By using dual numbers in forward automatic differentiation, both the primal and tangent trace can be carried as a pair. This can be seen by letting the functions represent the intermediate results in the primal trace (i.e $f_{i}(x) = v_{i}$) and the functions' derivatives be equal to the tangent trace (i.e $f_{i}'(x) = D_{p}v_{i})$.

For dual numbers to be useful in automatic differentiation, a function applied to a dual number must satisfy the chain rule. In general, any analytic function can be extended to the dual numbers by looking at the function's taylor series: 

$$f(z) = f(a+b\epsilon) = f(a) + bf'(a)\epsilon$$

note: all higher terms disappear since $\epsilon^{0}=0$

Applying the function $f$ on the dual number $z_{i} = v_{i} + D_{p}v_{i}\epsilon$, we see that any analytic function applied to a dual number returns another dual number:

$$f(z_{i}) = f(v_{i}+ D_{p}v_{i}\epsilon) = f(v_{i}) + f'(v_{i})D_{p}v_{i}\epsilon$$

From the above expression we can see that the dual part is equal to the dual part of $z_{i} \text{ times by }  f'(v_{i})$  **i.e the chain rule**






# 3. How to Use *autodiff_NARS*

### **3.1 Installation**
To install the *autodiff_NARS* package, the user should use the command `pip install -i https://test.pypi.org/simple/autodiff-NARS`. 

### **3.2 How to Use autodiff_NARS**
The name of our package is _autodiff_NARS_. Its functionality is to provide the user with methods to undergo automatic differentiation of both univariate and multivariate functions. We envision the user should import the autodiff and functions modules. The imported unary functions must be used to define the function that is to be evaluated. If a user wants to calculate the derivative of a function with unary functions she must use the ones from the `functions` module. 

The function has a unidimensional output, it should be a callable object whilst if it has a  multidimensional output, it should be a list of callable objects. The user should instantiate the autodiff class with this function. The user can then use this class to compute either the function value (`autodiff.f(x)`) or derivative (`autodiff.df(x, method='backward')`) at input x. 

If $x$ is one-dimensional then it must be a scalar (float or integer), whilst if it is multidimensional then it must be an array or list. As shown in examples below, the derivative can be computed using either **forward** or **reverse** mode by specifying the method type. The standard output of `autodiff.df(x)` is the Jacobian, the user can specify a seed vector to obtain a specific directional derivative for example `autodiff.df(x, seed=[1, 0])`. Note that if a seed vector is not specified by the user in the `autodiff.df` method, the default output is the derivative in the scalar case or Jacobian in the multivariate case. In the multivariate function case, the output will be of type numpy ndarray. 
<br>

### **3.3 How to interact with package**

The below picture shows the output type of the different combinations of function (i.e callable, list) and input (i.e scalar, list) types. In addition, we provide demos for each scenario. 


#### Figure: Output scenarios for different input and function types:
![](./images/scenario_table.jpg)


### **Scenario 1:** Single (callable) Function + Scalar Input 

#### $f: \R \to \R$


**Note:**
1. Important that if the input $x$ is a scalar, the input of the user defined function $f(x)$ should not be a list/array and hence should not be indexed (i.e $x[0]$)
2. The user defined function $f(x)$ should not be placed in a list

Demo: 

- Aim: find value and derivative of $f(x) = \log(x) + \sin(x)$ at $x=1$ using both forward and backward modes.

```Python
from autodiff_NARS import autodiff as AD
from autodiff_NARS.functions import sin, log

# user specified function 
def f(x):
    return log(x) + sin(x)

input = 1 # input is a scalar and not in a list 

ad = AD.AutoDiff(f) # instantiate Autodiff Class

function_value = ad.f(x=input) # function value at x = 1
df_forward = ad.df(x=input) # derivative using forward mode (default mode)
df_backward = ad.df(x=input, method = 'backward') # derivative using backward mode


print(function_value)
> 0.8414709848078965
print(df_forward)
> 1.5403023058681398
print(df_backward)
> 1.5403023058681398
```

### **Scenario 2:** Single Function + Multivariate Input

#### $f: \R^{m} \to \R$

**Note:**
1. In order to find a partial derivative, the user must input a seed vector

Demo:

- Aim: find the function value, Jacobian using reverse mode, and partial derivative $\frac{\partial f}{\partial x_1}$  of $f(\boldsymbol{x})=log(x_1) + sin(x_2) + x_3$ at $x = [1, 3 , 5]$ using forward mode.

```Python
from autodiff_NARS import autodiff as AD
from autodiff_NARS.functions import sin, log

# user specified function
def f(x):
    return log(x[0]) + sin(x[1]) + x[2]

input = [1, 3 , 5]  
ad = AD.AutoDiff(f) # instantiate Autodiff Class

function_value = ad.f(x=input) # function value at x=[1,3,5]
dfdx1 = ad.df(x=input, seed = [1, 0, 0]) # partial derivative of f wrt x1
df_backward = ad.df(x=input, method = 'backward') # derivative using backward mode


print(function_value)
> 5.141120008059867
print(dfdx1)
> [1.]
print(df_backward)
> [[ 1.        -0.9899925  1.       ]]

```

### **Scenario 3:** Multivariate Function + Scalar Input

#### $f: \R \to \R^{m}$

**Note:**
1. Important that if the input $x$ is a single scalar the input of the user defined function should not be a list/array and hence should not be indexed (i.e $x[0]$)
2. In order to define a multivariate functions, the user defined function should be placed in a list

  
Demo:
- Aim: find the function value and Jacobian of $f(\boldsymbol{x})=\begin{bmatrix}
\log(x) + \sin(x) + x\\ 
\sinh(x)*\exp(x) - x\\ 
x*\sin(x)
\end{bmatrix}$ at $x = 3$ using both forward and backward modes.

```Python
from autodiff_NARS import autodiff as AD
from autodiff_NARS.functions import sin,log, sinh, exp

#user specified function 
def f1(x):
    return log(x) + sin(x) + x

def f2(x):
    return sinh(x) * exp(x) - x

def f3(x):
    return x * sin(x)

input = 3 # input is a scalar and not in a list  
ad = AD.AutoDiff([f1, f2, f3]) # instantiate Autodiff Class

function_value = ad.f(x=input) # function value at x=3
df_forward = ad.df(x=input) # derivative using forward mode
df_backward = ad.df(x=input, method = 'backward') # derivative using backward mode


print(function_value)
> [  4.2397323  198.21439675   0.42336002]
print(df_forward)
> [[ 3.43340837e-01]
   [ 4.02428793e+02]
   [-2.82885748e+00]]
print(df_backward)
> [[ 3.43340837e-01]
   [ 4.02428793e+02]
   [-2.82885748e+00]]
```

### **Scenario 4:** Multivariate Function + Multivariate Input

#### $f: \R^{m} \to \R^{m}$

**Note:**
1. In order to define a multivariate functions, the user defined function should be placed in a list
2. In order to find a partial derivative, the user must input a seed vector

Demo:
- Aim: find the function value, Jacobian using reverse mode, and partial derivative $\frac{\partial f}{\partial x_1}$  of $f(\boldsymbol{x})=\begin{bmatrix}
\log(x_1) + \sin(x_2) + x_3\\ 
\sinh(x_1)*\exp(x_2) - x_3\\ 
x_1*\sin(x_2)/x_3
\end{bmatrix}$ at $x = [1,3,5]$ using forward mode.

```Python
from autodiff_NARS import autodiff as AD
from autodiff_NARS.functions import sin,log, sinh, exp

#user specified function 
def f1(x):
    return log(x[0]) + sin(x[1]) + x[2]

def f2(x):
    return sinh(x[0]) * exp(x[1]) - x[2]

def f3(x):
    return x[0] * sin(x[1]) / x[2]

input = [1, 3 , 5] 
ad = AD.AutoDiff([f1, f2, f3]) # instantiate Autodiff Class

function_value = ad.f(x=input) # function value at x=[1,3,5]
dfdx1 = ad.df(x=input, seed = [1, 0, 0]) # partial derivative of f wrt x1
df_backward = ad.df(x=input, method = 'backward') # derivative using backward mode


print(function_value)
> [ 5.14112001 18.60454697  0.028224  ]
print(dfdx1)
> [1.00000000e+00 3.09936031e+01 2.82240016e-02]
print(df_backward)
> [[ 1.00000000e+00 -9.89992497e-01  1.00000000e+00]
   [ 3.09936031e+01  2.36045470e+01 -1.00000000e+00]
   [ 2.82240016e-02 -1.97998499e-01 -5.64480032e-03]]
```


### **3.4 Implementation Example:** Newton's Method

Below we demonstrate how to use the AutoDiff package with Newtons Method, both for the scalar and multivariate functions. In addition, we will show two different implementation methods, where in the first case we call the function value $f$ and derivative $df$ methods directly from the AutoDiff class while in the second example we make use of the implemented dunder method \_\_call\_\_.

### Implementation 1: Scalar Function 
- In this example, we use Newtons Method to find the single root of the scalar function $f(x) = x^{2}-3$. 
- In addition call, we directly call function value and derivative methods of the AutoDiff Class. 

```Python
import numpy as np
from autodiff_NARS import autodiff as AD

# function to find the root of
def f(x):
    return x**2 -3

ad = AD.AutoDiff(f) 

# Newtons method
def newton(x0, max_iter=10000,tol=1e-6):
    x=x0

    for i in range(max_iter):
        
        x -= ad.f(x)/ad.df(x)

        if np.abs(ad.f(x)) < tol: 
            return x
        
    return False

sol = newton(10)
print(sol)
> 1.7320508082191834
```

### Implementation 2: Multivariate Function
- In this example, we use Newtons Method to find the roots of the function $f(\boldsymbol{x})= \begin{bmatrix}
x_1^{2}x_2^{3} -x_1x_2^{3} -1\\ 
x_1^{3} - x_1x_2^{3}-4
\end{bmatrix}$
- In addition, we make use of the implemented dunder method, \_\_call\_\_, which returns the function value and Jacobian (for a multivariate function). Note that in this example, we used the default method, 'forward', however the user can specify the method as 'backward' if they choose to calculate the Jacobian using reverse mode.


```Python
import numpy as np
from autodiff_NARS import autodiff as AD

def f1(x):
    return (x[0]**2)*(x[1]**3) - x[0]*x[1]**3-1

def f2(x):
    return (x[0]**3) - x[0]*x[1]**3-4


ad = AD.AutoDiff([f1,f2])

def newton(x0, max_iter=10000,tol=1e-6):
    x=x0

    for i in range(max_iter):

        f, df = ad(x)
        
        x -= np.linalg.multi_dot([np.linalg.inv(df),f])
        
        if np.linalg.norm(ad.f(x)) < tol: 
            return x
        
    return False

sol = newton([1,1])
print(sol)
> [1.7476236  0.91472321]

```

# 4. Software Organization

### **4.1 Directory Structure**

The Directory takes the following structure:

![](./images/root_structure.png)

### **4.2 Module Overview**

- **\_\_init__.py**:   
    <br>    
    - Imports all of the necessary modules including dependencies of the autodiff (numpy, random, copy, Callable, AutoDiff and Node) and the unary functions.  
<br>
- **autodiff.py**: Contains the AutoDiff & Node class.  
<br>  
    1. **AutoDiff Class**  
        <br>
        - The AutoDiff class is the main interface between the user and the package. 
        - The class takes in a function defined by the user (either scalar or multivariate).
        - It provides the user the functionality to compute function evaluation and the derivative (Jacobian in multivariate case) using either forward or reverse mode. The default method is set to forward and returns the Jacobian if a seed is not specified.  
    <br>
    2. **Node Class**  
        <br>
        - The Node class is the main data structure which stores the information of the a specific node in the computational graph including name, value, child, parents, and adjoint. 
        - In addition the Node class overloads the binary arithmatic operations.  
<br>         
- **functions.py**: Contains defined unary functions for a **node**, **float** or **int**  
    <br>
    - The following functions are included: sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, sqrt, exp (any base), log (any base), sigmoid
    - A valid input to the functions include an object of the Node class, float or int  
<br>           
- **test_autodiff.py**:  
<br>    
    - includes testing covering the AutoDiff and Node class   
<br>   
- **test_functions.py**:    
<br>      
    - Includes testing covering all the unary functions

### **4.3 Test Suite**
The test suite lives in the root directory under the *tests*. The test folder contains Python files that test both the AutoDiff and Node class in the autodiff module, and the functions module. For the functions module, the unary functions are tested individually as well as when integrated with each other. Using Github actions, the tests are run in a container when a new commit is pushed to the project. This is done automatically. The tests can also be executed using the test harness script. This can be run using the command `./tests/run_tests.sh -argument` where argument is either 'coverage' or 'test'.

### **4.4 Package Distribution**
The package is distributed using PyPI with PEP517/518. We have added a pyproject.toml file to our project. This enables us to build our package using a PEP517 builder and distribute our package using PyPI. Developers can install our package using ```pip install -i https://test.pypi.org/simple/autodiff-NARS```

# 5. Implementation Details

### **5.1 Core Data Structures, Classes and Attributes**
  
The table below lists all of the classes required to implement our package, with their respective methods and attributes. The core data structure are Nodes which are used for both forward and reverse mode.   
  
| **Class**      | **AutoDiff**               | **Node**                                                                                      |  
|----------------|----------------------------|------------------------------------------------------------------------------------------------------|  
| **Attributes** &emsp;&emsp;| x_dim, f_dim, function, input_nodes, output_nodes       | name, value, child, parent, for_deriv, back_deriv, adjoint                                                                                            |
| **Methods**    | f, df, \_forward, \_backward, \_\_call\_\_ &emsp;&emsp;| \_\_add\_\_, \_\_radd\_\_, \_\_mul\_\_, \_\_rmul\_\_, \_\_truediv\_\_, \_\_rtruediv\_\_, \_\_pow\_\_, \_\_rpow\_\_, \_\_sub\_\_, \_\_rsub\_\_, \_new_name|   
  
<br>  
As mentioned earlier, for the Node class, we will overload the binary arithmatic operations. Unary functions such as 'sin' or 'log' will be defined in functions.py file for the Node data structures. These functions will work for both forward and reverse mode of AD, because both rely on the Node data structure.   
  
We will not use DualNumbers. This is due to the fact that dual numbers as the primary data structure can only be used in forward mode, and not in reverse mode. To prevent redundancy of code, and the definition of two different data structures, we have not implemented Dual Numbers. Instead, we implemented the Node data structure which can be used for both forward and reverse mode. 

We would like our package to be able to compute the derivative of both scalar and vector functions that can depend on both scalar and vector parameters. To ensure this, we have two requirements for how the user defines the function. The first is that the input of the function must be the same size as the number of parameters (m). The second is that the function that is passed to the AutoDiff must be a list  of functions of the same size of the number of outputs (n).      
  

### **5.2 External Dependencies**

We will depend on the Numpy library, and Pytest-cov at this stage. We use Numpy to define the unary functions, to ensure that the functions are evaluated efficienctly and accurately. Pytest-cov is used to output to create test coverage reports. An intentional decision was made to not depend on other libraries to ensure the package had lightweight requirements. 

# 6. Extension: Reverse Mode

We have implemented **Reverse Mode** functionality. In **Reverse Mode**, the chain rule is not explicitly applied. Instead the partial derivatives with respect to the node's parent node(s) is calculated in the forward pass, and in the reverse pass the chain rule is reconstructed. The chain rule is "built up" in the reverse pass by traversing the computational graph backwards and consequently recovering the partial deritiaves of the i-th output of $f_i$ with respect to the $n$ variables $v_{j-m}$ with $j=1,2,..,n$. For each node $v_{i}$ the goal is to calculate the adjoint $\bar{v}_{i}$ by iterating over all its children i.e 

$$\bar{v}_{i} = \frac{\partial f}{\partial v_{i}} = \sum_{j \text{ a child of i } } \frac{\partial f}{\partial v_{j}}\frac{\partial v_{j}}{\partial v_{i}}$$


Consequenlty, to implement **Reverse Mode** we have kept track of a node's parents, children, its partial derivatives with respect to its parent node(s), and its adjoint. This is accomplished by storing all the above as attributes in the Node Class. Further, the unary functions are be constructed to return a dictionary of partial derivatives with respect to the node's parent node(s). The method of diferentiation is specified by `ad.df(x, method='backward')`. Examples of how we expect the user to use this functionality can be seen in the examples above
```

# 7. Broader Impact and Inclusivity Statement

### **7.1 Broader Impact**
Our software enables a user to efficiently compute the automatic derivative of a complex function. As the derivative of fucntions is a cornerstone of any computational the uses of our package are widespread. There is a risk that the package could be used by users to negatively impact society. For example one can use AD to find the optimal adversarial perturbation to a street sign to confuse self-driving vehicles, or to optimize energy efficiency of nuclear weapons. There are many other examples of misuse that would result in harm to human life. There are also examples of how the user may use the code to positively contribute to society. For example, training a neural network that leads to discovery of a beneficial drug, or optimizing a chemical process to reduce the amount of carbon emissions. We are uncertain of what users of our package will build with it. We will remain knowledgeable of who is using our code and for what implications by monitoring pulls from Github. If we see unethical actions, we will report them, and in extreme cases remove our package from PyPI. 

### **7.2 Inclusivity**
We as a team encourage and highly value code contribution from all coders, coming from diverse backgrounds. Examples of contributions that we value include translation of documentation into other languages, help with outreach and onboarding of new contributors, development of tutorials and other educational material. We do however acknowledge that various barriers exist for underrepresented groups in accessing the code base and contributing towards it. The main barrier we identify is language. Our package is solely documented in English. However, we believe that anyone willing to put in extra work by running our explanations into a internet based translater should get a good understanding of our software.

 We are in the initial stages of the project, but plan to translate the documentation into other languages, and develop more in depth tutorials for how to use the code to improve accessibility.  Currently, pull requests are categorized and then assigned to the team member responsible for the area, for example testing by Sebastian. New pull requests are checked weekly, reviewed, and either approved or denied. If denied, the team member may provide recommended adjustments to the pull request. This method provides a defense against harmful changes to the package. It also is a chance to interact with developers willing to contribute to our package. To remove bias and increase fairness, we will try keep pull requests anonymous. 

# 8. Future Extensions/Applications

### **8.1 Future**
On a high-level, the next extension we would like to make is to implement various optimizations methods that use derivatives. Our package could then be used in a variety of scientific projects for the optimization of some objective function. Examples of optimization routines that we would like to implement are Newton’s method and gradient descent. Since, we aim for our package to be used in research, it is essential that we also improve the efficiency of the package. One way we plan to do this is to merge computational graphs, to prevent the same node being computed multiple times. Since machine learning is a rapidly growing area of research where AD plays an integral role in backpropagation during training of neural networks, we would like to create tutorials that showcase this application.