<h1 align="center">Numerical Computation</h1>

<h3 align="center">Basics of order theory</h3>

Let "$\leq$" be a binary relation over a set $X$, defined by some set $X_{\leq} \subset (X , X)$. 
<br>
If $(x,y) \in X_{\leq}$, then we say that $x$ is **related** to $y$, and denote $x \leq y$.

$\textbf{Definition}$. A **partial order** is a binary relation "$\leq$" over a set $X$ with the following properties:

&emsp; 1. order is **reflexive**: $x \leq x$, for all $x\in X$,
<br> &emsp; &emsp; i.e.  for all $x \in X$ we have $(x,x) \in X_{\leq}$;
<br> &emsp; 2. order is **transitive**: if $x \leq y$ and $y \leq z$, then $x \leq z$,
<br> &emsp; &emsp; i.e. if $(x,y) \in X_{\leq}$ and $(y,z) \in X_{\leq}$, then $(x,z) \in X_{\leq}$;
<br> &emsp; 3. order is **anti-symmetric**: if $x \leq y$ and $y \leq x$, then $x = y$.
<br> &emsp; &emsp; i.e. if $(x,y) \in X_{\leq}$ and $(y,x) \in X_{\leq}$, then $x = y$.


$\textbf{Definition}$. A **linear order** is a partial order under which **every pair of elements is comparable**, i.e.:

&emsp; 4. order is **connexive**: or $x \leq y$ or $y \leq x$,
<br> &emsp; &emsp; i.e. if $x,y \in X$, then or $(x,y) \in X_{\leq}$ or $(y,x) \in X_{\leq}$.




<h3 align="center">Ordered field</h3>

Let set $X$ be a field, i.e. $X \equiv \mathcal{F}$.

$\textbf{Definition}$. A field $\mathcal{F}$ is called an **ordered field** if a **linear order** is defined for its elements, and this order is **consistent with the addition and multiplication** operations, i.e.:


&emsp; 5. order is consistent with **addition**: if $x \leq y$, then for any $z$ we have: $x + z \leq y + z$,
<br> &emsp; &emsp; i.e. if $x,y \in X$ and $z \in X$, then or $(x+z,y+z) \in X_{\leq}$;

&emsp; 6. order is consistent with **multiplication**: if $0 \leq x$ and $0 \leq y$, then $0\leq x\cdot y$,
<br> &emsp; &emsp; i.e. if $x,y \in X$ and $(0,x) \in X_{\leq}$ and $(0,y) \in X_{\leq}$, then $(0, x \cdot y)\in X_{\leq}$.

In other words, field $F$ is an **ordered field** if all $6$ axioms are satisfied.

<h3 align="center">Examples 1</h3>

Let's consider the field $Z_p$, where $p$ is a prime number and $Z_p$ is the set of all positive integers less than $p$:

$$Z_p = \{0, 1, 2, ..., p-1\}.$$

The sum and product in $Z_p$ is defined in the following way:

&emsp; $\bullet$ $\alpha + \beta$ be the least positive remainder obtained by dividing the (ordinary) sum by $p$;
<br> &emsp; $\bullet$ $\alpha \cdot \beta$ be the least positive remainder obtained by dividing the (ordinary) product by $p$.

Is a field $Z_p$ an **ordered field** in the "natural" sence?

$\textbf{Solution}$. 

Let's consider $Z_3$ and assume that it's an **ordered field**. A "natural" order tells that: $0 \leq 1$, $1 \leq 2$ and $0 \leq 2$.
<br>
If, for instance, we take $x = 1 $, $y=2$ and $z = 1$, then from the axiom of **consistency with the addition** $(5)$, 
<br> we have: $1 + 1 \leq 2 + 1$, i.e. $2 \leq 0$, since $1 + 1 = 2$ and $2 + 1 = 0$. This contradicts the natural order.

<h3 align="center">Examples 2</h3>

Is a complex field $\mathbb{C}$ an **ordered field**?

$\textbf{Solution}$. 

Let's assume that complex field $\mathbb{C}$ is an **ordered field**.
From the axiom of **consistency with multiplication** $(6)$, 
<br> we have that, if $x \neq 0$ and $0 \leq x$, then $0 \leq x \cdot x$, i.e. the square of any element is non-negative. 
<br>But $i^2 = i\cdot i = -1$, i.e. the imaginary unit $i$ cannot exist in **ordered complex field** $\mathbb{C}$.

<h3 align="center">Normed vector spaces</h3>

$\textbf{Definition}$. A **norm** on a vector space $\mathcal{V}$ over an ordered field $\mathcal{F}$, is a function $\left \| \text{ . } \right \|: X \to \mathcal{F}$, with the following properties:

&emsp; &ensp; $\bullet$ $\left \| x \right \| \geq 0$, for all $x \in X$;
<br>
&emsp; &ensp; $\bullet$ $\left \| x \right \| = 0$ implies that $x = 0$.
<br>
&emsp; &ensp; $\bullet$ $\left \| \alpha x \right \| = |\alpha| \cdot \left \| x \right \|$, for all $x \in \mathcal{V}$ and $\alpha \in \mathcal{F}$;
<br>
&emsp; &ensp; $\bullet$ $\left \| x + y \right \| \leq \left \| x \right \| + \left \| y \right \|$, for all $x,y \in \mathcal{V}$;

In other words, the definition states that:
<br>
&emsp; &ensp; $\bullet$ norm is **nonnegative**;
<br>
&emsp; &ensp; $\bullet$ norm is **strictly positive**;
<br>
&emsp; &ensp; $\bullet$ norm is **homogenous**;
<br>
&emsp; &ensp; $\bullet$ norm satisfies the **tirangle inequality**.


$\textbf{Definition}$. A **normed vector space** $(\mathcal{V}, \left \| \text{ . } \right \|)$ is a vector space $\mathcal{V}$ equipped with a norm $\left \| \text{ . } \right \|$.

<h3 align="center">Examples of Norm</h3>

Let $\mathcal{V} = \mathbb{R}^n$ be a vector space over an ordered field $\mathbb{R}$, and let $p \geq 1$ be a real number $p\in \mathbb{R}$.
<br>
The $p$-**norm** of a vector $x = \{\xi_1, ..., \xi_n\}$, also known as **Hölder norm**,  is defined as:
$$\left \| \text{ x } \right \|_{p} = \left ( \sum_{i=1}^{n} |\xi_i|^p \right )^{\frac{1}{p}}.$$

&emsp; &ensp; $\bullet$ When $p=1$, we get the $L_1$ norm, also known as **Manhattan Distance** or **Taxicab norm**:
$$\left \| \text{ x } \right \|_{1} = \sum_{i=1}^{n} |\xi_i|.$$
&emsp; &ensp; $\bullet$ When $p=2$, we get the $L_2$ norm, also known as **Euclidian norm**: 
$$\left \| \text{ x } \right \|_{2} = \sqrt{ \sum_{i=1}^{n} |\xi_i|^2}$$
&emsp; &ensp; $\bullet$ When $p=+\infty$, we get the $L_\infty$, also known as **infinity norm** or **maximum norm**: 

$$\left \| \text{ x } \right \|_{\infty} = \max_{i} |\xi_i|$$




<h3 align="center">Visualisation of Norm</h3>

A visual representation of these norms is provided by the set of elements $x \in \mathbb{R}^n$, where $\left \| x \right \| = 1$, or the so-called **unit sphere**.
In the flat case, i.e. for $\mathbb{R}^2$, the unit sphere for different norms is shown below:

<img src="images/RM_Norm.png" alt="Example" />

<h3 align="center">Exercises 14.1</h3>

Prove the norm properties of the **Taxibac**, **Euclidian** and **Maximum** norms.

<h3 align="center">Operator norm</h3>

Let $(\mathcal{U}, \left \| \text{ . } \right \|_u)$ and $(\mathcal{V}, \left \| \text{ . } \right \|_v)$ be a normed vector spaces over an ordered field $\mathcal{F}$.

$\textbf{Definition}$. A linear transformation $A: \mathcal{U} \to \mathcal{V}$ is called **bounded transformation**, if and only if there exists a scalar $c\in \mathcal{F}$ such that:

$$\left \| A x \right \|_v \leq c \cdot \left \| x \right \|_u,$$
for all $x\in \mathcal{U}$. 

$\textbf{Definition}$. The **norm** of bounded transformation $A: \mathcal{U} \to \mathcal{V}$ is a function $\left \| \text{ . } \right \|: A \to \mathcal{F}$, with the following properties:

$$\left \| A \right \| = max \left \{\left \| Ax \right \|_v: x\in X \text{ and } \left \| x \right \|_u =1 \right \} = sup \left \{ \frac{\left \| Ax \right \|_v}{ \left \| x \right \|_u }: x\in X \text { and } x \neq 0 \right \}$$

$\textbf{Statement}$. If $\mathcal{U}$ is finite-dimensional, then every transformation $A: \mathcal{U} \to \mathcal{V}$ is bounded transformation, i.e. it has a norm.


<h3 align="center">Matrix norms</h3>

$\bullet$ **Matrix norms induced by transformation norms**:
<br> &ensp;
These norms treat $m \times n$ matrix $A$ as a linear transformation from $m$-dimensional vector space $\mathcal{U}$ to 
<br> &ensp;$n$-dimensional vector space $\mathcal{V}$, and define the matrix norm as the corresponding transformation norm.

&ensp; For example, if the $p$-norm for vectors is used for both vector spaces, then the matrix norm is defined as:

$$\left \| A \right \|_p = \sup_{x\neq 0} \frac{\left \| Ax \right \|_p}{\left \| x \right \|_p}.$$

$\bullet$ **Entrywise matrix norms**:
<br> &ensp;
These norms treat $ m \times n$ matrix $A$ as a vector of size $m \cdot n$, and one of the familiar vector norms are used.

&ensp; For example, using the $p$-norm for vectors, we get:
$$\left \| A \right \|_p = \left ( \sum_{i=1}^{n} \sum_{j=1}^{m} |\alpha_{ij}^p| \right )^{1/p}.$$


$\bullet$ **Schatten matrix norms**:
<br> &ensp; 
The **Schatten** $p$-**norms** arise when applying the $p$-norm to the vector of **singular values** of a matrix $A$.
<br> &ensp;
If the singular values of the matrix $A$ are denoted by $\sigma_i$, then the **Schatten** $p$-norm is defined by:

$$\left \| A \right \|_p = \left ( \sum_{i=1}^{min\{m,n\}} \sigma_i^p(A) \right )^{1/p}.$$

<h3 align="center">Matrix norms induced by transformation norms</h3>

In the special cases of  $ p = 1, 2, \infty$, the **induced matrix norms** can be computed or estimated by:

$\bullet$ $p = 1$:
$$\left \| A \right \|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^{m} |\alpha_{ij}|,$$
&ensp; which is called **L1 norm** and is simply the maximum absolute column sum of the matrix $A$;

$\bullet$ $p = 2$:
$$\left \| A \right \|_2 = \max_{i} \sigma_{i} = \sigma_{max}(A),$$
&ensp; where $\sigma_{i}$ represents the **singulars value** of the matrix $A$.
<br>
&ensp; This norm is called **L2 norm**, or **Spectral norm**, of the matrix $A$.

$\bullet$ $p = \infty$:
$$\left \| A \right \|_\infty = \max_{1 \leq i \leq m} \sum_{j=1}^{n} |\alpha_{ij}|,$$
which is called the **L$_\infty$ norm**, or **Infinit norm**, and is simply the maximum absolute row sum of the matrix $A$;



<h3 align="center">Entrywise matrix norms</h3>

In the special cases of  $ p = 1, 2, \infty$, the **entrywise matrix norms** can be computed or estimated by:

$\bullet$ $p = 1$:
$$\left \| A \right \|_{sum} = \sum_{i=1}^{n} \sum_{j=1}^{m} |\alpha_{ij}|,$$
&ensp; which is called **L$_{sum}$ norm** and is simply the absolute sum of all elements of the matrix $A$;

$\bullet$ $p = 2$:
$$\left \| A \right \|_F = \sqrt{\sum_{i=1}^{n} \sum_{j=1}^{m} |\alpha_{ij}|^2},$$
&ensp; which is called the **Frobenius norm** or the **Hilbert–Schmidt norm** of the matrix $A$.

$\bullet$ $p = \infty$:
$$\left \| A \right \|_{max}  = \max_{i,j} |\alpha_{ij}|.$$
&ensp; which is called the **Maximum norm** and is simply the maximum absolute element the matrix $A$;



<h3 align="center">Schatten matrix norms</h3>

In the special cases of  $ p = 1, 2, \infty$, the **Schatten matrix norm** can be computed or estimated by:

$\bullet$ $p = 1$:
$$\left \| A \right \|_* = \sum_{i=1}^{min\{m,n\}} \sigma_i(A) = Tr(\sqrt{A^*A}),$$
&ensp; 
which is called the **trace norm**, or **nuclear norm**, of the matrix $A$;


$\bullet$ $p = 2$:
$$\left \| A \right \|_F = \sqrt{ \sum_{i=1}^{min\{m,n\}} \sigma_i^2(A) } = \sqrt{Tr(A^*A)} = \sqrt{\sum_{i=1}^{n} \sum_{j=1}^{m} |\alpha_{ij}|^2} $$
&ensp; 
which yields the **Frobenius norm** of the matrix $A$.


$\bullet$ $p = \infty$:
$$\left \| A \right \|_2 = \sigma_{max} (A)$$

which yields the **L2 norm**, or **Spectral norm**, of the matrix $A$.



<h3 align="center">Examples of norm equivalence</h3>

Let $A$ be an $m \times n$ matrix of rank $r$, then the following inequalities hold:

- $\left \| A \right \|_1 \leq \sqrt{m}\left \| A \right \|_2 \leq  \sqrt{mn} \left \| A \right \|_1;$

- $\left \| A \right \|_2 \leq \left \| A \right \|_F \leq  \sqrt{r} \left \| A \right \|_2;$

- $\left \| A \right \|_\infty \leq \sqrt{n} \left \| A \right \|_2 \leq  \sqrt{mn} \left \| A \right \|_\infty;$

- $\left \| A \right \|_F \leq \left \| A \right \|_* \leq  \sqrt{r} \left \| A \right \|_F;$

- $\left \| A \right \|_{max} \leq \left \| A \right \|_2 \leq  \sqrt{mn} \left \| A \right \|_{max};$

Another useful inequality between matrix norms is:

- $\left \| A \right \|_2 \leq \sqrt{\left \| A \right \|_1 \left \| A \right \|_\infty};$

<h3 align="center">Examples</h3>

For example, for:
$$A = \begin{bmatrix}
5  &  6 & 3  \\
-1 &  0 & 1  \\
1  &  3 & -1 \\
\end{bmatrix},$$
we have:

$\bullet$ $\left \| A \right \|_1 = max(5 + |-1| + 1; 6 + 0 + 3; 3 + 1 + |-1|) = max(7, 9, 5) = 9;$
<br>$\bullet$ $\left \| A \right \|_2 = max(8.54; 2.04; 0.92) = 8.54$;
<br>$\bullet$ $\left \| A \right \|_\infty = max(5 + 6 + 3; |-1| + 0 + 1; 1 + 3 + |-1|) = max(14, 2, 5) = 14;$
<br>$\bullet$ $\left \| A \right \|_{sum} = 5 + |-1| + 1 + 6 + 0 + 3 + 1 + |-1| = 21;$
<br>$\bullet$ $\left \| A \right \|_F = \sqrt{5^2 + (-1)^2 + 1^2 + 6^2 + 0^2 + 3^2 + 3^2 + 1^2 + (-1)^2} = \sqrt{83};$
<br>$\bullet$ $\left \| A \right \|_{max} = max(5; |-1|; 1; 6; 0; 3; 3; 1; |-1|) = 6;$
<br>$\bullet$ $\left \| A \right \|_{*} = 8.54 + 2.04 + 0.92 = 11.5;$

<h3 align="center">Metric spaces</h3>

$\textbf{Definition}$. A **metric** on a set $X$ is a **distance function** $d: (X,X) \to \mathcal{F}$ ($\mathcal{F}$ is an **ordered field**), with the following properties:

&emsp; &ensp; $\bullet$ $d(x, y) \geq 0$  for all $x,y \in X$, and $d(x,y) = 0$ if and only if $x = y$;
<br>
&emsp; &ensp; $\bullet$ $d(x, y) = d(y, x)$ for all $x,y \in X$;
<br>
&emsp; &ensp; $\bullet$ $d(x, y)\leq{d(x,z)+d(z,y)}$ for all $x, y, z \in X$.

In other words, the definition states that:

&emsp; &ensp; $\bullet$ distance is **nonnegative**, and the only point at zero distance from $x\in X$ is $x$ itself;
<br>
&emsp; &ensp; $\bullet$ the distance is a **symmetric** funciton;
<br>
&emsp; &ensp; $\bullet$ distance satisfies the **tirangle inequality**.


$\textbf{Definition}$. A **metric space** $(X,d)$ is a set $X$ equipped with a metric $d$.




<h3 align="center">Metric versus Norm</h3>

$\textbf{Statement}$. A **normed vector space** $(\mathcal{V}, \left \| \text{ . } \right \|)$ is a **metric space** $(\mathcal{V}, d)$ with the metric:

$$d(x,y) = \left \| x - y \right \|,$$

in other words, if you have **norm**, you have a **metric**.

The converse is not allways true. For example, let $\delta$ be the discrete metric:

$$\delta(x, y) = \left\{\begin{matrix}
0 \text{ for } x = y \\
1 \text{ for }  x \neq y\\
\end{matrix}\right. .$$

Then $δ$ clearly does not satisfy the **homogeneity property** of the a metric induced by a norm:

$$\delta(\alpha x, \alpha y)= |\alpha| \cdot \delta(x,y).$$

<h3 align="center">Basics of topology</h3>

Let $(X,d)$ be a metric space over an ordered field $\mathbb{R}$.

$\textbf{Definition}$. An **open ball** of radius $r \in \mathbb{R}$ with center in $x \in X$ 
is the set $B(x, r) = \{ y| d(x, y) < r \}$.

$\textbf{Statement}  \space \textbf{1}$. For any $x \in X$ and any $r \in \mathbb{R}$, and for any element $y \in B(x, r)$ there exists $r' \in \mathbb{R}$ such that $B(y, r') \subset B(x, r)$.

$\textbf{Proof}$. Let's consider $x \in R$, $r \in \mathbb{R}$, $y \in B(x, r)$, and $l = d(x, y)$. From the problem statement
$l = d(x, y) \lt r$, therefore $r' = r - l \gt 0$. If we take $z \in B(y, r')$, then, from triangle inequality, we have that $d(x, z) \lt r$ $d(x, z) \leq d(x, y) + d(y, z)$, but $d(x, y) = l$ and $d(y, z) \lt r' = r - l$ therefore $d(x, z) \lt l + r - l = r$, which implies that $z \in B(x, r)$. So we've got thet each element of $B(y, r')$ belongs to $B(x, r)$.

$\textbf{Definition}$. The set $U \subset X$ is open in metric space $(X, d)$, if for all point $x \in U$, there exists $r \in \mathbb{R}$ such that $B(x, r) \subset U$.

$\textbf{Statement}  \space \textbf{2}$.
<br> &emsp; $\bullet$ For any $\mathcal{U} = \{U | U$ is open in $(X, d)\}$ holds $V = \bigcup_{U \in \mathcal{U}} U$ is open in $(X, d)$;
<br> &emsp; $\bullet$ For any two $U, V \subset X$ open in $(X, d)$ holds: $W = U \cap V$ is open in $(X, d)$;
<br> &emsp; $\bullet$ $X$ is open in $(X, d)$;
<br> &emsp; $\bullet$ $\emptyset$ is open in $(X, d)$;

$\textbf{Proof}$. 
<br> $\bullet$ Let's consider any point $x \in V$. From the problem statement, there exists such open $U_x \in \mathcal{U}$, that $x \in U_x$, therefore for some $r \in \mathbb{R}$: $B(x, r) \subset U_x$. But $U_x \subset V$, therefore  $B(x, r) \subset U_x \subset V$, i.e. $B(x, r) \subset V$;
<br> $\bullet$ Let's consider any point $x \in W$. From the problem statement, $x \in U$ and $x \in V$, therefore there exist $r_u$ and $r_v$, such that $B(x, r_u) \subset U$ and $B(x, r_v) \subset V$. Let's denote $r = min(r_u, r_v)$, then $B(x, r) \subset U$ and $B(x, r) \subset V$, and by definition, $B(x, r) \subset W$;
<br> $\bullet$ Let's consider any point $x \in X$, then by deinition, for any $r \in \mathbb{R}$ $B(x, r) \subset X$, i.e. $X$ is open in $(X, d)$:
<br> $\bullet$ $\emptyset$ is open due to the fact that you can not choose the point from $\emptyset$, therefore the statement will be true.

$\textbf{Statement}  \space \textbf{3}$. For any finity set $U_1, ..., U_n$ of open sets in $(X, d)$, the set $V =\bigcap_{i=1}^{n}U$ is open in $(X, d)$.

$\textbf{Proof}$. 

Let's consider any point $x \in V$. From the problem statement, $x \in V$ implies that $x \in U_i$ for all $i = \overline{1, n}$, therefore there exist $r_1, r_2, ..., r_n \in \mathbb{R}$, such that $B(x, r_i) \subset U_i$ for all $i = \overline{1, n}$. 

Let's denote $r = min(r_1, r_2, ..., r_n)$. Due to the fact that we have finite number of elements $\rightarrow$ $r>0$. Therefore $B(x, r) \subset U_i$ for all $i = \overline{1, n}$, and $B(x, r) \subset V$.

$\textbf{Definition}$. Subset $U \subset X$ in metric space $(X, d)$ is called **open subset** if it is open in $(X, d)$

$\textbf{Definition}$. The set $U \subset X$ is open if and only if for any point $x \in U$, there exists the open neighbourhood $V$ of $x$ such that $V \subset U$.

$\textbf{Statement}  \space \textbf{4}$. The set $U \subset X$ is open if and only if for each point $x \in U$ there exists the open neighbourhood $V$ of $x$ such that $V \subset U$.

$\textbf{Proof}$. 

$\bullet$ Let's assume, that $U$ is open, then by definition, for any point $x \in U$, there exists $r \in \mathbb{R}$ such that $B(x, r) \subset U$. $V=B(x, r) \subset U$ is a good candidate for the open neighbourhood.

$\bullet$ Let's assume, that for each point $x \in U$ there exists the open neighborhood $V$ of $x$, such that $V \subset U$.
<br>
Since $V$ is an open set, then for each $x$ there should exist $r \in \mathbb{R}$ such that $B(x, r) \subset V$. But $V \subset U$, therefore $B(x, r) \subset U$, Q.E.D.

$\textbf{Definition}$. The set $V \subset X$ is **closed** if $U = X - V$ is open in $(X, d)$.

$\textbf{Statement}  \space \textbf{5}$. If the set $U \subset X$ is open in $(X, d)$, then $V = X - U$ is closed in $(X, d)$ 

$\textbf{Proof}$. 
<br>
Let's consider $V = X - U$. Since $U = X - V$ is open in $(X, d)$, then, by definition, we get that $V$ is closed.

$\textbf{Definition}$. An **open ball** of radius $r \in \mathbb{R}$ with center in $x \in X$ is the set $\overline{{B}(x, r)} = \{y | d(x, y) \le r \}$.

$\textbf{Statement}  \space \textbf{6}$. The closed ball is closed subset in $(X, d)$.

$\textbf{Proof}$.
<br>
From the problem statement, we have some $x \in X$ and $r_x \in \mathbb{R}$. Let's denote $U = \overline{{B}(x, r_x)}$ and $V = X - U$, and consider any point $y \in V$. From the definition of $V$ we have that $d(x,y) = d_{xy}> r_x$. 

Now let's take $r_y = d_{xy} - r_x$, and consider open ball $B(y, r_y)$. If $z \in B(y, r_y)$, then $d(y, z)=d_{yz} < r_y \rightarrow d_{yz} + r_x < r_y + r_x \rightarrow d_{yz} + r_x < d_{xy} \rightarrow r_x < d_{xy} - d_{yz}$.
<br>
But from the triangle inequality we hava $d_{x,y} - d_{y,z} \leq d_{xz}$, therefore $r_x < d_{xz}$, which means that $z \notin U$, i.e. $z \in V$ and $B(y, r_y) \subset V$.

$\textbf{Statement}  \space \textbf{7}$.
<br>$\bullet$ For any $\mathcal{V} = \{V | V$ is closed in $(X, d)\}$ holds $\bigcap_{V \in \mathcal{V}} V$ is closed in $(X, d)$
<br>$\bullet$ For any two $V_1, V_2 \subset X$ closed in $(X, d)$ holds: $V_1 \cup V_2$ is closed in $(X, d)$
<br>$\bullet$ $X$ is closed in $(X, d)$
<br>$\bullet$ $\emptyset$ is closed in $(X, d)$

$\textbf{Proof}$.
<br> Let's for each $V \in \mathcal{V}$ consider $U = X - V$ and define the set $\mathcal{U} = \{U | V$ is closed in $(X, d)\}$. 
<br> By definition, $U$ will be the open set, therefore $\mathcal{U} = \{U | U$ is open in $(X, d)\}$, so we reduced this task to the **statement (2)**.


$\textbf{Statement}  \space \textbf{8}$. For any finity set $U_1, ... U_n$ of closed sets in $(X, d)$, the set $\bigcup_{i=1}^{n}U$ is closed in $(X, d)$.

$\textbf{Proof}$.
<br> Let's denote $V_1 = U_1 \cup U_2$. From the **statement (7)**, we have $V_1$ is closed in $(X, d)$. Using the principle of induction, denote $V_{i+1} = U_i \cup V_i$, and from the **statement (7)**, we will have that all $V_i$ will be closed in $(X, d)$, including the $V_n$, which will be equal to $V_n=\bigcup_{i=1}^{n}U$.


$\textbf{Definition}$. A **closure** of the set $A \subset X$ is the set $Cl(A) = \bigcap\{F \subset X | F \text{ is closed in } (X, d)\}$ (the smallest set which contains $A$).

$\textbf{Definition}$. A **frontier** of the set $A \subset X$ is the set $Cl(A) - Int(A)$.

$\textbf{Definition}$. The set $A \subset X$ in metric space $(X, d)$ is called **bounded** if there exists $r \in \mathbb{R}$ such that $d(x, y) < r$ for each $x, y \in A$.

<h3 align="center">Sequences and limits</h3>

Let $(X,d)$ be a metric space over an ordered field $\mathbb{R}$.

$\textbf{Definition}$. A **sequence** is an **enumerated** collection of objects in which repetitions are allowed.
<br>
In other words, the **sequenc**e as a **bijection** from $\mathbb{N}$ to some set $A$.

We write the finit sequence as $(x_n)_{n=1}^{N} = (x_1, x_2, \dots x_N)$ for some $N \in \mathbb{N}$.
<br>
For infinite sequence, we write $(x_n)_{n=1}^{\infty} = (x_1, x_2, x_3, \dots)$.

$\textbf{Definition}$. We say that point $x \in X$ is a **limit of the sequence** $(x_n)_{n=1}^{\infty}$ and write:

$$\lim_{n \to \infty}{x_n} = x,$$

if for any open set $U \subset X$, such that $x \in U$, there exists $N \in \mathbb{N}$, such that $x_n \in U$ for any $n > N$.

$\textbf{Statement}  \space \textbf{9}$. $\lim_{n\to\infty}{x_n}= x \in X$ if and only if for any $r \in \mathbb{R}$ there exists $N \in \mathbb{N}$ such that: $x_n \in B(x, r)$ for any $n > N$.

$\textbf{Proof}$.
 - Let's assume that $x \in X$ is a limit of the sequence $(x_i)_{i=1}^{\infty}$, and for choosen $r$ consider $U = B(x,r)$. From the problem statement, there exist $N \in \mathbb{N}$ such that $x_n \in U=B(x,r)$ for every $n>N$.
<br>
 - Now let's assume that $x \in X$ and for the sequence $(x_i)_{i=1}^{\infty}$ for every $r \in \mathbb{R}$ there exists $N \in \mathbb{N}$ such that: $x_n \in B(x, r)$ for every $n > N$. Let's consider any open set $U \subset X$ such that $x \in U$. Since $U$ is opens set, there exists $r_0 \in \mathbb{R}$ such that $B(x, r_0) \subset U$.  From the problem statement, we have that for $r_o$ there must exists $N \in \mathbb{N}$ such that $x_n \in B(x,r_0)$ for every $n>N$. But $B(x, r_0) \subset U$, therefore $x_n \in U$ for all $n > N$.

$\textbf{Definition}$. **Convergent sequence**, or **Cauchy sequence**, is a sequence $(x_n)_{n=1}^{\infty} = (x_1, x_2, x_3, \dots)$ such that, for any $r \in \mathbb{R}$ there exists $N \in \mathbb{N}$ such that $d(x_n, x_m) < r$ for any $n, m > N$.

$\textbf{Statement}  \space \textbf{10}$. If $x = \lim_{n\to\infty}{x_n}$ and $x \in X$, then $(x_n)_{i=n}^{\infty} = (x_1, x_2,  x_3 \dots)$ is a Cauchy sequence.

$\textbf{Proof}$.
<br>
Let's assumte that $x\in X$ is a limit o the sequence $(x_i)_{i=1}^{\infty}$,i.e. $x = \lim_{n\to\infty}{x_n}$, then for every open set $U=B(x,r_0)\subset X$ such that $x \in B(x,r_0)$ there exists $n_0 \in \mathbb{R}$ such that $x_i \in B(x,r_0)$ for every $i > n_0$.

Let's take $i, j > n_0$, then from the definition of the open ball $B(x, r_0) = \{y|d(x, y) \lt r_0 \}$ and from triangle inequality, we have $d(x_i, x_j) \leq d(x_i,x) + d(x,x_j) < 2r_0$. Taking $r_0 = r/2$ we get for $n_0 \in \mathbb{N}$  that $d(x_i,x_j)<r$ for all $i, j > n_0$, i.e. $(x_i)_{i=1}^{\infty}$ is convergent sequence.

<h3 align="center">Limit properties</h3>

- The limit of a sequence is unique defined.
- $\lim _{n\to \infty }(a_{n}\pm b_{n})=\lim _{n\to \infty }a_{n}\pm \lim _{n\to \infty }b_{n}$
<br>
- $\lim _{n\to \infty }ca_{n}=c\cdot \lim _{n\to \infty }a_{n}$
<br>
- $\lim _{n\to \infty }(a_{n}\cdot b_{n})=(\lim _{n\to \infty }a_{n})\cdot (\lim _{n\to \infty }b_{n})$
<br>
- $\lim _{n\to \infty }\left({\frac {a_{n}}{b_{n}}}\right)={\frac {\lim \limits _{n\to \infty }a_{n}}{\lim \limits _{n\to \infty }b_{n}}}$ provided $\lim _{n\to \infty }b_{n}\neq 0$
<br>
- $\lim _{n\to \infty }a_{n}^{p}=\left[\lim _{n\to \infty }a_{n}\right]^{p}$
<br>
- if $a_{i}\leq b_{i}$ for some $n_0 \in \mathbb{R}^n$ and all $i > n_0$ then ${\displaystyle \lim _{n\to \infty }a_{n}\leq \lim _{n\to \infty }b_{n}}$
<br>
- If $a_{i}\leq c_{i}\leq b_{i}$ for some $n_0 \in \mathbb{R}^n$ and all $i > n_0$ and $\lim _{n\to \infty }a_{n}=\lim _{n\to \infty }b_{n}=L$ then $\lim _{n\to \infty }{c_n} = L$
- If a sequence is bounded and monotonic then it is convergent.
<br>
- A sequence is convergent if and only if every subsequence is convergent.
<br>

<h3 align="center">Examples</h3>

Let's consider the sequence:
$$x_n = \left\{\begin{matrix}
-\frac{1}{2} \text{ if } n \text{ is odd} \\
 \frac{1}{2} \text{ if } n \text{ is even} \\
\end{matrix}\right. .$$

Every subset of this set in discret space is open and close simultaneously.
<br>
Therefore, this sequence is bounded but does not converges.

<h3 align="center">Limits of functions</h3>

Let $(X, d_x)$ and $(Y, d_Y)$ be a metric spaces over an ordered field $\mathcal{R}$.
<br>
Let's consider the subset $S \subset X$ and some function $f:S \to Y$.

$\textbf{Definition}$. The **limit of a function** $f$ at some limit point $x$ of the subset $S$, $x \in X$, is a point $y \in Y $ if:

1) for each open neighborhood of $y \in V \subset Y$, there exists the open neighborhood of $x \in U \subset X$ such that $f(U \cap S) \subset V$;
<br>
2) for each open ball $B(y, r_y) \subset Y$ there exists the ball $B(x, r_x) \subset X$ such that $f(B(x, r_x) \cap S) \subset B(y, r_y)$;
<br>
3) for any $r_y \in \mathbb{R}$ there exists $r_x \in \mathbb{R}$ such that for every $z \in S$ with $d_X(x, z) < r_x$ implies that $d_Y(f(z), y) < r_y$.

We write: $\lim_{x \to c}f(x) = y$

$\textbf{Statement} \space \textbf{11}$. These three definitions of the limit of a function are equiualent definitions.

$\textbf{Proof}$.

$1 \rightarrow 2$: Let's consider $B(y, r_y) \subset Y$. Since it's open, we may use it as an open neighborhood of $y \in Y$. From the problem statement, there exists the open neighborhood $x \in U \subset X$ such that $f(U \cap S) \subset V$. Since $U$ is open, we can find $r_x \in \mathbb{R}$ so that open ball $B(x,r_x) \subset U$. Therefore, $f(B(x, r_x) \cap S) \subset f(U \cap S) \subset B(y, r_y)$.

$2 \rightarrow 3$: Let's take any $r_y \in \mathbb{R}$ and consider open ball $B(y,r_y) \subset Y$. From the problem statement, there exists the ball $B(x,r_x) \subset X$ such that $f(B(x,r_x) \cap S) \subset B(y,r_y)$. Therefore, if we take $z \in B(x,r_x)$ (which means that $d_X(x,z)<r_x$), then $f(z) \in B(y,r_y)$ (which means that $d_Y(f(z),y)< r_y$).

$3 \rightarrow 1$: Lets consider some open neighborhood of $y\in V \subset Y$ and some element $z \neq y \in V$. Let's denote $d_Y(y,z) = r_y$. From the problem statement, for $r_y$ there exists $r_x \in \mathbb{R}$ such that for every $s \in S$ with $d_X(x,s)<r_x$ we have that $d_Y(f(s),y)<r_y$. But $s \in d_X(x,s)<r_x$ and $s \in S$ is the same as $B(x,r_x) \cap S$ and it's an open neighborhood of $x$ as an intersection of two open sets. Now lets for all points $y\in V_y$ consider the corresponding open neighborhood $B_y(x,r_x) \cup S$. From the statement (2), we know that any union of open sets is an open set, therefore $U = \bigcup B_y(x,r_x) \cap S$ will be an open neighborhood of $x \in S \subset X$ and $f(x) \in V$, where $V = \bigcup Vy$.

<h3 align="center">Continuous functions</h3>

$\textbf{Definition}$. The function $f:(X, d_X) \to (Y, d_Y)$ is called continuous in point $x_0 \in X$ if: 

1. $\forall \varepsilon >0 \text{ } \exists \delta >0 \text { that } \forall x \in X \text { such that } d_X(x,x_0) < \delta \text{ valid } d_Y(f(x), f(x_0)) < \varepsilon;$

2. $\forall$ open neighborhood of $f(x_0) \in V \subset Y$ there exists the open neighborhood $x_0 \in U \subset X$ such that $f(U) \subset V$,

and we write $\lim_{x \to x_0}f(x) = f(x_0)$


$\textbf{Definition}$. The function $f:(X, d_X) \to (Y, d_Y)$ is continuous if it is continuous $\forall x \in X$.

$\textbf{Statement} \space \textbf{12}$. $\lim_{n\to\infty}{x_n}= x \in X$ if and only if for any $r \in \mathbb{R}$ there exists $N \in \mathbb{N}$ such that: $x_n \in B(x, r)$ for any $n > N$.

$\textbf{Proof}$.
<br>
1) Let's consider $x \in U = f^{-1}(V)$. If function $f:(X, d_X) \to (Y, d_Y)$ is continuous $\forall x \in U$ then for each $f(x) \in V$ there exists open neighborhood$x\in U' \subset X$ such that $f(U') \subset V$. Lets take $U''= U \cap U'$, which will be also open neighborhood of $x$ and $U'' \subset U$, therefore $U = f^{-1}(V)$ is open set in $X$.


Now let's assume that $\forall$ open set $V \subset Y$ the $f^{-1}(V)$ is open in $X$, and take any $x \in X$. Then for any open neighborhood of $f(x_0) \in V \subset Y$ valid $U' = f^{-1}(V)$ is open. Let's take $U''= U \cap U'$, which will be also open  neighborhood of $x_0 \in U'' \subset X$ and $f(U'') \subset V$.

2) Let's consider any closed set $V \subset Y$, then $\bar{V} = Y \setminus V$ is an open set. Since $f:(X, d_X) \to (Y, d_Y)$ is continuous, then $U = f^{-1}(\bar{V})$ is open set in $X$. But $\bar{U} = f^{-1}(V) = X \setminus U$, therefore it's closed set in $X$.

Now let's assume that $\forall$ closed set $V \subset Y$ the $U = f^{-1}(V)$ is closed in $X$. But $\bar{U}$ is an open set in $X$ and $\bar{V}$ is an open set $Y$ and $\bar{U} = f^{-1}(\bar{V})$, therefore, from 1) we get that function $f$ is continuous.

3) Let's assume that $f:(X, d_X) \to (Y, d_Y)$ and $g:(Y, d_Y) \to (Z, d_Z)$ are continuous functions.
<br>Let's introduce the following notation:
<br>
Composition of the functions $f$ and $g$ is $h=gf:(X, d_X) \to (Z, d_Z)$.
<br>
For any $x_0 \in X$ let's  $y_0 = f(x_0)$ and $z_0 = g(f(x_0))=h(x_0)$

Now consider any open neighborhood of $z_0 \in W \subset Z$. Since $g$ is a continuous function, then for any open neighborhood of $W \subset Z$ there exists open neighborhood of $y_0 \in V \subset Y$ such that $g(V) \subset W$. On the other hand, $f$ is a continuous function, for open neighborhood $y_0 \in V \subset Y$ there exists open neighborhood of $x_0 \in U \subset X$ such that $f(U)\subset V$. Therefore $h(U) = g(f(U)) \subset W $, i.e. $h$ is a continuous function.

<h3 align="center">Derivatives</h3>

Let $U \subset \mathbb{R}$ be a open subset and $f:U \subset \mathbb{R} \to \mathbb{R} $:

$\textbf{Definition}$. Function has a **derivative** at point $x_0 \in \mathbb{R}$ if for every $h \in \mathbb{R}$ there exists the limit of:
$$\lim_{h \to 0}\frac{f(x_0 + h) - f(x_0)}{h}.$$
This limit point is called **derivate at point** $x_0$ and denoted by $f'(x_0)$.

$\textbf{Definition}$. A function is **differentiable** if it has derivative at any point $x\in U$.

<center><img src="images/RM_Derivative.gif" width="550" height="300" alt="Example" /></center>

<h3 align="center">Derivative Properties</h3>

Let $f:\mathbb{R} \to \mathbb{R}$ and $g:\mathbb{R} \to \mathbb{R}$ be differentiable functions, then for any $x\in \mathbb{R}$:
<br> &emsp; $\bullet$ $(f + g)'(x) = f'(x) + g'(x)$;
<br> &emsp; $\bullet$ $(af(x))' = af'(x)$;
<br> &emsp; $\bullet$ $(fg)'(x) = f'(x)g(x) + f(x)g'(x)$;
<br> &emsp; $\bullet$ $(g \circ f)'(x) = g'(f(x)) f'(x)$ (also known as **chain rule**).

Derivatives of some known functions:

| Derivatives of Power Functions                  | Derivatives of trigonometric functions                          | Derivatives of inverse trigonometric functions                  |
|-------------------------------------------------|-----------------------------------------------------------------|-----------------------------------------------------------------|
|                     $(c)'=0$                    |                  $\left(\sin x\right)'=\cos x$                  |     $\left(\arcsin x\right)'={\dfrac {1}{\sqrt {1-x^{2}}}}$     |
|          $\left(x^{a}\right)'=ax^{a-1}$         |                  $\left(\cos x\right)'=-\sin x$                 |     $\left(\arccos x\right)'=-{\dfrac {1}{\sqrt {1-x^{2}}}}$    |
|         $\left(a^{x}\right)'=a^{x}\ln a$        |  $\left(\operatorname {tg} x\right)'={\dfrac {1}{\cos ^{2}x}}$  |  $\left(\operatorname {arctg} x\right)'={\dfrac {1}{1+x^{2}}}$  |
| $\left(\log _{a}x\right)'={\dfrac {1}{x\ln a}}$ | $\left(\operatorname {ctg} x\right)'=-{\dfrac {1}{\sin ^{2}x}}$ | $\left(\operatorname {arcctg} x\right)'=-{\dfrac {1}{1+x^{2}}}$ |

<h3 align="center">Partial derivatives</h3>

Let $\overrightarrow{f}:\mathbb{R}^n \to \mathbb{R}^m$ be a vector function between two normed vector spaces.

$\textbf{Definition}$. The **partial derivative** of an function $\overrightarrow{f} = f(x_1, \dots, x_m$) in the direction $x_i$ at the point ($a_1, \dots, a_n$) is defined as:

$$
\frac{\partial f}{\partial x_i}(a_1, \ldots, a_n) = \lim_{h \to 0}\frac{f(a_1, \ldots, a_i+h,\ldots,a_n) - f(a_1,\ldots, a_i, \dots,a_n)}{h}
$$

All the variables are fixed except $x_i$. That choice of fixed values determines a function of one variable

$$f_{a_1,\ldots,a_{i-1},a_{i+1},\ldots,a_n}(x_i) = f(a_1,\ldots,a_{i-1},x_i,a_{i+1},\ldots,a_n)$$

and by definition,

$$\frac{df_{a_1,\ldots,a_{i-1},a_{i+1},\ldots,a_n}}{dx_i}(a_i) = \frac{\partial f}{\partial x_i}(a_1,\ldots,a_n)$$

<h3 align="center">Jacobian matrix</h3>

Let $\overrightarrow{f} = \{f_1,  ..., f_m\}:\mathbb{R}^n \to \mathbb{R}^m$ be a function such that each of its partial derivatives exist on $\mathbb{R}^n$.
<br>
This function takes a vector $\overrightarrow{x}\in \mathbb{R}^n$ as input and produces the vector $\overrightarrow{f}(\overrightarrow{x})\in \mathbb{R}^m$ as an output.

$\textbf{Definition}$. The Jacobian matrix of $\overrightarrow{f}$ is defined to be an $m \times n$ matrix, denoted by $\mathbf{J}$, whose $\mathbf{J}_{ij} = \frac{\partial f_i}{ \partial x_j}$:

$$\mathbf{J} = \begin{bmatrix}
\frac{\partial \overrightarrow{f}}{\partial x_1} & \cdots  &  \frac{\partial \overrightarrow{f}}{\partial x_n}
\end{bmatrix}
= 
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n}\\ 
\vdots & \ddots & \vdots\\ 
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}$$

Generalized **chain rule** for the Jacobian:
<br> &emsp; $\bullet$ if $\overrightarrow{f} : \mathbb{R}^n \to \mathbb{R}^m$ and $\overrightarrow{g} : \mathbb{R}^m \to \mathbb{R}^l$ then for any $x \in \mathbb{R}^n$:
$$\mathbf{J}_{\overrightarrow{g} \circ \overrightarrow{f}}(x) = \mathbf{J}_{\overrightarrow{g}}(\overrightarrow{f}(x)) \mathbf{J}_{\overrightarrow{f}}(x)
$$


<h3 align="center">Directional derivative</h3>

Let $U$ be an open subset of $\mathbb{R}^n$. 

$\textbf{Definition}$. The **directional derivative** of a function $f: U \subset \mathbb{R}^n \to \mathbb{R}$ with respect to any vector $\mathbf{v} \in \mathbb{R}^n$ is define as:
$$\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \lim_{h \rightarrow 0}{\frac{f(\mathbf{x} + h\mathbf{v}) - f(\mathbf{x})}{h}}.$$ 

<h3 align="center">Total derivative</h3>

Let $U$ be an open subset of $\mathbb{R}^n$. 

$\textbf{Definition}$. A function $f: U \to \mathbb{R}^m$ is said to be (**totally**) **differentiable** at a point $a \in U$, if there exists a linear transformation $df_a: \mathbb{R}^n \to \mathbb{R}^m$, such that:

$$ \lim_{x\rightarrow a}\frac{\|f(x)-f(a)-df_a(x-a)\|}{\|x-a\|}=0.$$


<h3 align="center">Gradient</h3>

Let $f :\mathbb{R}^n \to \mathbb{R}$ be a function such that each of its partial derivatives exist on $\mathbb{R}^n$.

$\textbf{Definition}$. The gradient of the function $f$, denoted by $\nabla f$, is a vector of it's Jacobian matrix:

$$ \nabla f = \mathbf{J} = 
\begin{bmatrix}
\frac{\partial f}{\partial x_1} & \cdots  &  \frac{\partial f}{\partial x_n}
\end{bmatrix}
$$
Gradient properties:
<br> &emsp; $\bullet$ $\nabla\left(\alpha f+\beta g\right)(a) = \alpha \nabla f(a) + \beta\nabla g (a)$;
<br> &emsp; $\bullet$ $\nabla (fg)(a) = f(a)\nabla g(a) + g(a)\nabla f(a)$;
<br> &emsp; $\bullet$ $(f\circ g)'(c) = \nabla f(a)\cdot g'(c)$ (**Chain rule**).

The **relationship** between the **gradien**t and the **derivative in the direction**:

$$\big(\nabla f(x)\big)\cdot \mathbf{v} = \nabla_{\mathbf v}f(x).$$

The **relationship** between the **gradient** and the **differential derivative** (if function is differentiable in $x$):

$$(\nabla f)_x\cdot v = df_x(v).$$

<h3 align="center">Geometrical interpretation of gradient</h3>

Gradient points to the steepest slope and it's magnitude gives the rate of increase in that direction:

$$\big(\nabla f(x)\big)\cdot \mathbf{v} = \nabla_{\mathbf v}f(x)$$

<center><img src="images/RM_Gradient.gif" width="600" height="300" alt="Example" /></center>

<h3 align="center">Extrema</h3>

$\textbf{Definition}$. A funciton $f$ on a metrix space $X$ has a **global**, or **absolute**, **maximum** at point $x\in X$, if:

$$f(x) \geq f({x}') \text{ for any } {x}' \in X.$$

$\textbf{Definition}$. A funciton $f$ on a metrix space $X$ has a **global**, or **absolute**, **minimum** at point $x\in X$, if:

$$f(x) \leq f({x}') \text{ for any } {x}' \in X.$$

$\textbf{Definition}$. A funciton $f$ on a metrix space $X$ has a **local**, or **relative**, **maximum** at point $x\in X$, 
<br> 
if there exists an open neighborhood of $x$, call it $U_x$, such that:

$$f(x) \geq f({x}') \text{ for any } {x}' \in U.$$

$\textbf{Definition}$. A funciton $f$ on a metrix space $X$ has a **local**, or **relative**, **minimum** at point $x\in X$, 
<br>
if there exists an open neighborhood of $x$, call it $U_x$, such that:

$$f(x) \leq f({x}') \text{ for any } {x}' \in U.$$

<h3 align="center">Fermat's theorem</h3>

Let $U$ be an open subset of $\mathbb{R}^n$ and suppose that function $f: U \to \mathbb{R}$ is **differentiable** at point $x \in U$. 

$\textbf{Fermat's theorem}$. If $f$ has a **local extremum** (**maximum** or **minimum**) in $x \in U$, then: 

$$\nabla f(x) = 0.$$

<center><img src="images/RM_Extrema.jpg" width="1500" height="300" alt="Example" /></center>

<h3 align="center">Hessian  matrix</h3>

Let $f:\mathbb{R}^n \to \mathbb{R}$ be a function such that all **second partial derivatives** exist on $\mathbb{R}$.
<br>
This function takes a vector $\overrightarrow{x}\in \mathbb{R}^n$ as input and produces a scalar $f(\overrightarrow{x})\in \mathbb{R}$ as an output.

$\textbf{Definition}$. The Hessian matrix of $f$ is a square $n \times n$ matrix, denoted by $\mathbf{H}$, whose $\mathbf{H}_{ij} = \frac{\partial^2 f}{ \partial x_i \partial x_j}$:
$$\mathbf{H} = 
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}\\
\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}\\
\vdots & \vdots & \ddots & \vdots\\ 
\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2}
\end{bmatrix}$$
If the **Hessian matrix**:
<br> &emsp; $\bullet$ is **positive definite** at $x$, then $f$ attains an isolated **local minimum** at $x$
<br> &emsp; $\bullet$ is **negative definite** at $x$, then $f$ attains an isolated **local maximum** at $x$. 
<br> &emsp; $\bullet$ has **both positive and negative eigenvalues** then $x$ is a **saddle point** for $f$.


<h3 align="center">Gradient descent method</h3>

**Gradient descent** is:
<br> &emsp; $\bullet$ a first-order iterative optimization algorithm for finding the local minimum of a differentiable function.
<br> &emsp; $\bullet$ based on the observation that $f(x)$ decreases **fastest** if one goes in the direction of a **negative gradient.**

It **starts with a guess** $x_0$  for a local minumum of $f$, and consider the **sequence** $x_0, x_1, x_2 ...$  such that:
$$x_{n+1} = x_n  - \gamma \nabla f(x_n),$$
where $\gamma$ is a **step size** and is allowed to change at every iteration.  

<center><img src="images/RM_Gradient_Descent_2.png" width="650" height="300" alt="Example" /></center>

<center><img src="images/RM_Gradient_Descent.gif" width="650" height="300" alt="Example" /></center>
