## Implementation

The ANFIS implemented is based on a TSK inference system, that is, the rules are given in the form:

\begin{equation}
R_i: x_1 \text{ is } X^{(i)}_1 \text{ and } \cdots \text{ and } x_n \text{ is } X^{(i)}_n, \text{ then } y_i = a^{(i)}_0 + \sum_{j=1}^n a^{(i)}_jx_j.
\end{equation}

The standard model starts with all possible rules of the form shown above, although there is an option to include initial rules. That is, given we have $n$ input variables, each one with a certain number of fuzzy sets defined in its universe of discourse, the total amount of initial rules is given by the product of all fuzzy sets. The model then select the rules via their activation strength. 

To define the information of the entries of the system, we will use a dictionary called `VARIABLES`. It has two keys:
- `'inputs'`
- `'outputs'`

The values of these keys are a pair of variable and information, where the information of the variables is given by dictionaries. The information needed is:
- `'n_sets'`: integer containing the number of fuzzy sets defined for that variable;
- `'terms'`: list of strings representing the linguistic terms for each set fuzzy set of the variable (in ascending order, e.g., low, medium and high);
- `'universe_of_discourse'`: list representing the limits of the interval where the variable is defined.

The dictionary `VARIABLES` is given in the example.

The weights are optimized after every training data the models predicts. The algorithm used is a fixed step gradient descent, and the error function used for the gradient calculation is the mean squared error, that is,

\begin{equation}
E = \frac{1}{2}(\hat y - y)^2,
\end{equation}

where $\hat y$ is the predicted value and $y$ is the actual value. From a TSK FRBS, we know that

\begin{equation}
\hat y = \frac{\sum w_iy_i}{\sum w_i},
\end{equation}

where $w_i$ is the firing strength from the $i$-th rule and $y_i$ is the output function of the $i$-th rule. As a convention, all functions here are of the form

\begin{equation}
y_i = b_i + \sum a_jx_j,
\end{equation}

where $a_j$ and $b_i$ are coefficients of the function, and $x_j$ is the $j$-th input variable. Therefore, the partial derivatives of $E$ with respect to these coefficients are:

\begin{equation}
\frac{\partial E}{\partial b_i} = (\hat y - y) \overline{w}_i \text{ and } \frac{\partial E}{\partial a_j} = (\hat y - y) \overline{w}_i x_j,
\end{equation}

where $\overline{w}_i$ is the normalized firing strength. 