# <center>Vectors, matrices, tensors</center>
### <center>Alfred Galichon (NYU & ScPo)</center>
## <center>'math+econ+code' masterclass on optimal transport and economic applications</center>
#### <center>With python code examples</center>
© 2023 by Alfred Galichon with contributions by Clément Montes. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/team).

**If you reuse material from this masterclass, please cite as:**<br>
Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim

# Vectors, matrices and tensors in NumPy

* Unlike R or Matlab, Python has no built-in matrix algebra interface. Fortunately, the NumPy library provides powerful matrix capabilities, on par with R or Matlab. Here is a quick introduction to vectorization, operations on vectors and matrices, higher-dimensional arrays, Kronecker products and sparse matrices, etc. in NumPy.

* This is *not* a tutorial on Python itself. They are plenty good ones available on the web.

* First, we load numpy (with its widely used alias):

In [1]:
import numpy as np

## Vectors

In NumPy, an `array` is built from a list of numbers as follows:

In [2]:
u = np.array([1,2,3])
print(u)
v = np.array([3,2,5])
print(v)

[1 2 3]
[3 2 5]


One can then add arrays as:

In [3]:
print(np.array([1,2,3])+np.array([3,2,5]))

[4 4 8]


Note the difference between the + operator when applied to numpy arrays vs. when applied to lists:

In [4]:
[1,2,3]+[3,2,5]

[1, 2, 3, 3, 2, 5]

In the latter case, it returns list concatenation.

## Matrices

To input matrices in NumPy, one simply inputs a list of rows, which are themselves represented as lists.

In [5]:
A = np.array([[11,12],[21,22],[31,32]])
A

array([[11, 12],
       [21, 22],
       [31, 32]])

The `shape` attribute of an array indicated the dimension of that array.

In [6]:
A.shape

(3, 2)

Let's change that attribute and see what happens.

In [7]:
A.shape=(6)
A

array([11, 12, 21, 22, 31, 32])

By removing the shape attribute, or rather, by setting it to $6$ instead of $(3,2)$, we took a glimpse at how the matrix is represented in the computer's memory: the rows ($[11,12],[21,22]$, and $[31,32]$) are listed one after another. This is the *row-major order*, used by default in Python, but also in C, as opposed to the *column-major order*, used in Fortran, R and Matlab.

## Vectorization and memory order

* Matrices in all mathematical softwares are represented in a *vectorized* way as a sequence of numbers in the computers memory. This representation can involve either stacking the lines, or stacking the columns.

* Different programming languages can use either of the two stacking conventions:
    + Stacking the lines (Row-major order) is used by `C`, and is the default convention for Python (NumPy). A matrix $M$ is represented by varying the last index first, i.e. a $2\times2$ matrix will be represented as $vec_C\left(M\right) = \left(M_{11}, M_{12}, M_{21}, M_{22}\right).$ 
    + Stacking the columns (Column-major order) is used by `Fortran`, `Matlab`, `R`, and most underlying core linear algebra libraries (like BLAS). A 2x2x2 3-dimensional array $A$ will be represented by varying the first index first, then the second, i.e. $vec_C\left(A\right) = \left( A_{111}, A_{112}, A_{121}, A_{122}, A_{211}, A_{212}, A_{221}, A_{222} \right)$. 

The command `flatten()` provides the vectorized representation of a matrix.

In [8]:
A.flatten()

array([11, 12, 21, 22, 31, 32])

Remember, NumPy represents matrices by **varying the last index first**.

In order to reshape the matrix `a`, one modifies its `shape` attribute. The following reshapes the matrix `a` into a row vector. 

In [9]:
A.shape = 1,6
A

array([[11, 12, 21, 22, 31, 32]])

The previous output evidences the fact that Python uses the row-major order: rows are stacked one after the other. 
To reshape the vector into a column vector, do:

In [10]:
A.shape = 6,1
A

array([[11],
       [12],
       [21],
       [22],
       [31],
       [32]])

Equivalently, one could have set `A.shape=6,-1`, where Python would replace `-1` by the integer needed for the formula to make sense (in this case, `1`). 
Another way to reshape is to use the method `reshape,` which returns a duplicate of the object with the requested shape.

In [11]:
A1=np.array(range(6))
A2 = A1.reshape(3,2)
print("A1=\n", A1)
print("A2=\n",A2)

A1=
 [0 1 2 3 4 5]
A2=
 [[0 1]
 [2 3]
 [4 5]]


Note that `NumPy` also supports the column-major order, but you have to specifically ask for it, by passing the optional argument `order='F'`, where 'F' stands for `Fortran`.

In [12]:
A3 = np.array(range(6)).reshape(3,2, order='F')
A3

array([[0, 3],
       [1, 4],
       [2, 5]])

## Tensors

We now introduce multi-dimensional arrays or *tensors*.

In [13]:
nbx,nby,nbz=3,5,4
T_xyz =np.array([i*1.0 for i in range(nbx*nby*nbz) ])
T_x_y_z = T_xyz.reshape((nbx,nby,nbz))
print(T_x_y_z)

[[[ 0.  1.  2.  3.]
  [ 4.  5.  6.  7.]
  [ 8.  9. 10. 11.]
  [12. 13. 14. 15.]
  [16. 17. 18. 19.]]

 [[20. 21. 22. 23.]
  [24. 25. 26. 27.]
  [28. 29. 30. 31.]
  [32. 33. 34. 35.]
  [36. 37. 38. 39.]]

 [[40. 41. 42. 43.]
  [44. 45. 46. 47.]
  [48. 49. 50. 51.]
  [52. 53. 54. 55.]
  [56. 57. 58. 59.]]]


We see that we are given a list of slices of the z's dimension, which is consistent with the row-major order representation: under that representation, when elements of a tensor are listed, *the last index varies first*.

In [14]:
print(T_x_y_z[0,0,:])
print(T_x_y_z[0,1,:])
print(T_x_y_z[0,2,:])

[0. 1. 2. 3.]
[4. 5. 6. 7.]
[ 8.  9. 10. 11.]


We obtain the submatrix `T_x_y_z[0,:,:]` by:

In [15]:
T_x_y_z[0,:,:]

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.],
       [16., 17., 18., 19.]])

## Kronecker product


Given two matrices $A$ of size $(m \times n)$ and $B$ of size $p \times q$, the Kronecker product $A \otimes B$ is the matrix of size $mp \times nq$ matrix, defined in blockwise way as:

\begin{align*}
A \otimes B = \begin{bmatrix}
a_{11}B & a_{12}B & \cdots & a_{1n}B \\
a_{21}B & a_{22}B & \cdots & a_{2n}B \\
\vdots  & \vdots  & \ddots & \vdots  \\
a_{m1}B & a_{m2}B & \cdots & a_{mn}B 
\end{bmatrix}
\end{align*}

Let's try an example:

In [16]:
A = np.eye(3)
B = np.array([[1,2,3],[4,5,6]])
AXB = np.kron(A, B)
print("A=\n",A,"\nB=\n",B,'\nkron(A,B)=\n',AXB)

A=
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]] 
B=
 [[1 2 3]
 [4 5 6]] 
kron(A,B)=
 [[1. 2. 3. 0. 0. 0. 0. 0. 0.]
 [4. 5. 6. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 2. 3. 0. 0. 0.]
 [0. 0. 0. 4. 5. 6. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 2. 3.]
 [0. 0. 0. 0. 0. 0. 4. 5. 6.]]


A very important identity is
\begin{align*}
vec_C\left(AXB\right) = \left(  A\otimes B^\top\right)  vec_C\left(X\right),
\end{align*}
where $vec_C$ is the vectorization under the C (row-major) order.

In [17]:
X = np.array([[3,2],[7,1],[2,4]])
print('vec(A @ X @ B)       = ',(A @ X @ B).flatten())
print('kron(A,B.T) @ vec(X) = ', np.kron(A,B.T)@(X.flatten() ))

vec(A @ X @ B)       =  [11. 16. 21. 11. 19. 27. 18. 24. 30.]
kron(A,B.T) @ vec(X) =  [11. 16. 21. 11. 19. 27. 18. 24. 30.]


## Broadcasting


**Broadcasting** allows one to work with tensors of different shapes by adding an extra dimension. 



In [18]:
u_i = np.array([1,2,3])
v_j = np.array([0.1,0.2,.03])
u_i[:,None]+v_j[None,:]

array([[1.1 , 1.2 , 1.03],
       [2.1 , 2.2 , 2.03],
       [3.1 , 3.2 , 3.03]])

Subject to certain constraints, the smaller array is “broadcasted” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations.

In [19]:
A = 10*np.array([[1],[2],[3]]) #Simplest broadcasting
B =  np.array([1,2])
print('A=\n',A)
print('B=\n',B)
print('A+B=\n',A+B)

A=
 [[10]
 [20]
 [30]]
B=
 [1 2]
A+B=
 [[11 12]
 [21 22]
 [31 32]]


The operation `A[:,np.newaxis]` creates a new dimension. In fact, np.newaxis is a constant whose value equals `None`, so `A[:,None]` yiels the same result.

In [20]:
v = np.array([3,4,5])
print(v)
print(v[:,np.newaxis])
print(v[np.newaxis,:])
print(v[None,:])

[3 4 5]
[[3]
 [4]
 [5]]
[[3 4 5]]
[[3 4 5]]


## Reduction

One can decrease the dimension of an array, by summing all entries across a given diemnsion.
This is done by:

$
(\sum_{y}T_{xyz})_{xz}=\left( I_{X}\otimes 1^\top_{X}\otimes I_{Y}\right) T$<br>
and $\left( I_{X}\otimes 1^\top_{Y}\otimes I_{Z}\right)$  is a $XZ \times XYZ$ matrix.


This is equivalent with numpy's partial sum `T_x_y_z.sum(axis = 1)`.

In [21]:
print(T_x_y_z.sum(axis=1).flatten())
print(np.kron(np.eye(nbx), np.kron(np.ones((1,nby) ), np.eye(nbz)) ) @ T_x_y_z.flatten())

[ 40.  45.  50.  55. 140. 145. 150. 155. 240. 245. 250. 255.]
[ 40.  45.  50.  55. 140. 145. 150. 155. 240. 245. 250. 255.]


## Expansion

Now, $1_{Y}$ is the column vector of dimension $|Y|$ with unit entries. 
The matrix $\left( I_{X}\otimes 1_{Y} \otimes I_{Z}\right)$  is a $XYZ \times XZ$ matrix, which creates an extra dimension.


We have for T a tensor of dimension (nbx,nbz):<br>
$ \left( I_{X}\otimes   1_{Y} \otimes I_{Y}\right) T = \bar{T} $<br>
where $\bar{T}_{xyz} = T_{xz}$

In [22]:
B = np.array([[1,2,3],[4,5,6]])
nbx,nbz = B.shape
nby=2
(np.kron(np.kron(np.eye(nbx),np.ones((nby,1))),np.eye(nbz)) @ B.flatten() ). reshape( (nbx,nby,nbz) )

array([[[1., 2., 3.],
        [1., 2., 3.]],

       [[4., 5., 6.],
        [4., 5., 6.]]])

In [23]:
nbx,nby,nbz = 2,4,3
T2_x_z = np.array([[1,2,3],[4,5,6]]) 
print( (np.kron(np.eye(nbx), np.kron(np.ones((nby,1) ), np.eye(nbz)) ) @ T2_x_z.flatten()).reshape((nbx,nby,nbz)))
print(T2_x_z[:,None,:]+np.zeros((nbx,nby,nbz)))

[[[1. 2. 3.]
  [1. 2. 3.]
  [1. 2. 3.]
  [1. 2. 3.]]

 [[4. 5. 6.]
  [4. 5. 6.]
  [4. 5. 6.]
  [4. 5. 6.]]]
[[[1. 2. 3.]
  [1. 2. 3.]
  [1. 2. 3.]
  [1. 2. 3.]]

 [[4. 5. 6.]
  [4. 5. 6.]
  [4. 5. 6.]
  [4. 5. 6.]]]


# Multiplication 

### Multiplication of tensors

There are several ways to multiply two tensors using NumPy. The most commonly used is the following.

In [24]:
A = np.ones((2,2))
B = 3*np.eye(2)
A@B #@ is left associative. If you have A@B@C, it will compute (A@B)@C

array([[3., 3.],
       [3., 3.]])

Note that `np.matmul(A,B)` would give the same result as well, but it is more difficult to read `np.matmul(A,np.matmul(B,C))` than `A@B@C`.

### Multiplication by a scalar

In [25]:
4*np.eye(2)

array([[4., 0.],
       [0., 4.]])

The above assignation of B corresponds to the multiplication by a scalar. It is the simplest broadcasting allowed by numpy (which makes this library more powerful than just using lists -it is also much quicker-). More on broadcasting will arrive later in that Notebook.

# Sparse matrices in Scipy

Sparse matrices are available in the `sparse` module of the `scipy` library. 

In [26]:
import scipy.sparse as spr

In [27]:
n = 1000

print('size of sparse identity matrix of size '+str(n) +' in MB = ' + str(spr.identity(n).data.size  / (1024**2)))

print('size of dense identity matrix of size '+str(n) +' in MB  = ' + str(spr.identity(n).todense().nbytes  / (1024**2)))

size of sparse identity matrix of size 1000 in MB = 0.00095367431640625
size of dense identity matrix of size 1000 in MB  = 7.62939453125


Working with sparse matrices requires less storage. It is explained by the fact that while a dense matrix needs to encode every coefficient on a byte, sparse matrices only store the non-null coefficients. It is really convenient to work with such objects when it comes to matrices with really high sizes.

In [28]:
spr.identity(1000).data.size  , spr.identity(1000).todense().nbytes 

(1000, 8000000)

## Creating sparse matrices...

### ... with standard forms

In [29]:
I5 = spr.identity(5) # spr.eye(5) also works
I5

<5x5 sparse matrix of type '<class 'numpy.float64'>'
	with 5 stored elements (1 diagonals) in DIAgonal format>

You can convert your sparse matrix into a dense one in order to visualise it. 

In [30]:
I5.todense()

matrix([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])

### ... from a dense matrix

Let's create a dense matrix and make it sparse.

In [31]:
# import uniform module to create random numbers
from scipy.stats import uniform

In [32]:
np.random.seed(seed=42)
dense_matrix = uniform.rvs(size=16, loc = 0, scale=2) #List of 16 random draws between 0 and 2
dense_matrix = np.reshape(dense_matrix, (4, 4))
dense_matrix

array([[0.74908024, 1.90142861, 1.46398788, 1.19731697],
       [0.31203728, 0.31198904, 0.11616722, 1.73235229],
       [1.20223002, 1.41614516, 0.04116899, 1.9398197 ],
       [1.66488528, 0.42467822, 0.36364993, 0.36680902]])

In [33]:
dense_matrix[dense_matrix < 1] = 0 #Arbitrar criterion
dense_matrix

array([[0.        , 1.90142861, 1.46398788, 1.19731697],
       [0.        , 0.        , 0.        , 1.73235229],
       [1.20223002, 1.41614516, 0.        , 1.9398197 ],
       [1.66488528, 0.        , 0.        , 0.        ]])

In [34]:
sparse_matrix = spr.csr_matrix(dense_matrix)
print(sparse_matrix) #It prints a tuple giving the row and columns of the non-null component and its value.

  (0, 1)	1.9014286128198323
  (0, 2)	1.4639878836228102
  (0, 3)	1.1973169683940732
  (1, 3)	1.7323522915498704
  (2, 0)	1.2022300234864176
  (2, 1)	1.416145155592091
  (2, 3)	1.9398197043239886
  (3, 0)	1.6648852816008435


### ... from scratch

You can create two arrays containing respectively the rows and the column of the non-null coefficients.
A third array would give the value of the non-null coefficient. The result is as follows:

In [35]:
# row indices
row_ind = np.array([0, 1, 1, 3, 4])
# column indices
col_ind = np.array([0, 2, 4, 3, 4])
# coefficients
data = np.array([1, 2, 3, 4, 5], dtype=float)

mat_coo = spr.coo_matrix((data, (row_ind, col_ind)))
print(mat_coo)

  (0, 0)	1.0
  (1, 2)	2.0
  (1, 4)	3.0
  (3, 3)	4.0
  (4, 4)	5.0


Every common operation seen below works with sparse matrices.

In [36]:
I5 = spr.identity(5)
I5 + np.ones((5,5))

matrix([[2., 1., 1., 1., 1.],
        [1., 2., 1., 1., 1.],
        [1., 1., 2., 1., 1.],
        [1., 1., 1., 2., 1.],
        [1., 1., 1., 1., 2.]])

In [37]:
I5 + np.diag([1.,2.,3.,4.,5.])

matrix([[2., 0., 0., 0., 0.],
        [0., 3., 0., 0., 0.],
        [0., 0., 4., 0., 0.],
        [0., 0., 0., 5., 0.],
        [0., 0., 0., 0., 6.]])

In [38]:
I5 @ np.diag([1.,2.,3.,4.,5.])

array([[1., 0., 0., 0., 0.],
       [0., 2., 0., 0., 0.],
       [0., 0., 3., 0., 0.],
       [0., 0., 0., 4., 0.],
       [0., 0., 0., 0., 5.]])

In [39]:
kron_product = spr.kron(I5 , 10 * np.array([[1,2],[3,4]]))

In [40]:
kron_product.todense()

matrix([[10., 20.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [30., 40.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0., 10., 20.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0., 30., 40.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0., 10., 20.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0., 30., 40.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 10., 20.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0., 30., 40.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 10., 20.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 30., 40.]])

# Torch

In [41]:
import torch

In [42]:
A_i_j = torch.tensor( [[1,2],[3,4],[5,6]])
A_i_j

tensor([[1, 2],
        [3, 4],
        [5, 6]])

In [43]:
A_i_j[:,1]

tensor([2, 4, 6])

In [44]:
B_i_j = torch.tensor( [[2,1],[6,2],[4,3]])
A_i_j + B_i_j

tensor([[3, 3],
        [9, 6],
        [9, 9]])

# Automatic differentiation

Assume you want to compute the value of $y = (x_0+1)^2+2*x_1$ when $(x_0,x_1)=(2,1)$. In order to do that, torch builds a tree which is represented as:

y=sum<br>
├──&nbsp;**2<br>
│&nbsp;&nbsp;&nbsp;└──&nbsp;sum<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;├──&nbsp;1<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└──&nbsp;x_0<br>
└──&nbsp;mul<br>
&nbsp;&nbsp;&nbsp;&nbsp;├──&nbsp;2<br>
&nbsp;&nbsp;&nbsp;&nbsp;└──&nbsp;x_1<br>


To compute the *value* of $y$, we do *forward propagation*: we input the values of $x_0$ and of $x_1$ at the leaves, and we move up the tree towards the root. This is done by:

In [45]:
x_i = torch.tensor([2.0,1.0])
y = (x_i[0]+1)**2 + 2 * x_i[1]
y

tensor(11.)

However, in order to compute the derivatives of $y$ with respect to the entries of $x$, we start at the root, and we descend the tree towards the leaves. This can be done automatically in `torch` -- you simply need to add one parameter `requires_grad = True` when setting up the tensor. 

In [46]:
x_i =  torch.tensor([2.0,1.0],requires_grad = True)
y = (x_i[0]+1)**2+2 * x_i[1]
y.backward()
print(x_i.grad)

tensor([6., 2.])


Here the `backward` command computed the gradient, which is stored in a `grad` attribute. Here is a glimpse of how this network is actually organized:

In [47]:
print(y.grad_fn.next_functions)
print(y.grad_fn.next_functions[0][0].next_functions)
# etc.

((<PowBackward0 object at 0x0000020F76AB7D30>, 0), (<MulBackward0 object at 0x0000020F76AB7C40>, 0))
((<AddBackward0 object at 0x0000020F76AB7FA0>, 0),)


We can build slightly more complicated functions, such as the logistic regression.

In [48]:
f = x_i.sum() - torch.log( torch.exp(x_i).sum())
f.backward()
print(x_i.grad)

tensor([6.2689, 2.7311])


Here is how we can use models from Torch to run a linear regression. 

In [49]:
import torch.optim as optim
import torch.nn as nn


X = torch.tensor([[1.0], [2.0], [3.0]])
Y = torch.tensor([[2.0], [4.0], [6.0]])

# Define a simple regression model
model = nn.Linear(1, 1)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(500):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, Y)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 50 == 0:
        print(f'Epoch [{epoch}/500], Loss: {loss.item()}')


Epoch [0/500], Loss: 2.1106984615325928
Epoch [50/500], Loss: 0.1719595342874527
Epoch [100/500], Loss: 0.1351642906665802
Epoch [150/500], Loss: 0.10625144094228745
Epoch [200/500], Loss: 0.08352325111627579
Epoch [250/500], Loss: 0.06565685570240021
Epoch [300/500], Loss: 0.05161232873797417
Epoch [350/500], Loss: 0.04057201370596886
Epoch [400/500], Loss: 0.03189336135983467
Epoch [450/500], Loss: 0.02507101185619831


Now, logistic regression:

In [50]:
import torch
import torch.nn as nn
import torch.optim as optim

# Prepare the dataset
# Features (X) and Labels (y)
# Assume we have 3 features for each instance and 3 classes
X = torch.tensor([[1.0, 2.0, 3.0], 
                  [1.0, 3.0, 2.0], 
                  [4.0, 5.0, 6.0], 
                  [6.0, 5.0, 4.0]], requires_grad=True)  # example features

y = torch.tensor([0, 1, 2, 2])  # example labels (3 classes: 0, 1, and 2)

# Build the Logistic Regression Model
class SoftmaxRegressionModel(nn.Module):
    def __init__(self, input_size, num_classes):
        super(SoftmaxRegressionModel, self).__init__()
        self.linear = nn.Linear(input_size, num_classes)  # Output size is num_classes

    def forward(self, x):
        out = self.linear(x)  # No softmax here, PyTorch's cross-entropy function includes it
        return out

input_size = X.shape[1]  # Number of input features
num_classes = 3  # Number of output classes
model = SoftmaxRegressionModel(input_size, num_classes)

# Define the Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()  # This includes softmax
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training the Model
num_epochs = 1000  # Number of iterations

for epoch in range(num_epochs):
    # Forward pass: Compute predicted y by passing x to the model
    outputs = model(X)

    # Compute loss
    loss = criterion(outputs, y)

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

# Making Predictions
with torch.no_grad():  # We don't need gradients for making predictions
    new_data = torch.tensor([[2.0, 3.0, 4.0]])  # New data point with 3 features
    outputs = model(new_data)
    _, predicted = torch.max(outputs.data, 1)
    print(f'Predicted class for input {new_data}: {predicted.item()}')

for param_tensor in model.parameters():
    print(param_tensor)

Epoch [100/1000], Loss: 0.8159709572792053
Epoch [200/1000], Loss: 0.6501827239990234
Epoch [300/1000], Loss: 0.5333539247512817
Epoch [400/1000], Loss: 0.4475764334201813
Epoch [500/1000], Loss: 0.38273656368255615
Epoch [600/1000], Loss: 0.33249256014823914
Epoch [700/1000], Loss: 0.29270243644714355
Epoch [800/1000], Loss: 0.26058661937713623
Epoch [900/1000], Loss: 0.23423263430595398
Epoch [1000/1000], Loss: 0.21229243278503418
Predicted class for input tensor([[2., 3., 4.]]): 0
Parameter containing:
tensor([[-6.9156e-01, -6.4423e-01,  9.4417e-01],
        [-6.3537e-01,  1.0929e+00, -6.5314e-01],
        [ 1.0754e+00, -2.5664e-04, -2.4126e-01]], requires_grad=True)
Parameter containing:
tensor([ 0.4769,  0.3183, -0.4636], requires_grad=True)


In [51]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
skl_model = LogisticRegression(multi_class='multinomial', solver='lbfgs',fit_intercept=False,penalty=None)
skl_model.fit(X.detach().numpy(), y)
print(skl_model.coef_)

[[-6.15391882 -3.93751403  7.07746034]
 [-4.68446332  8.37413249 -5.18121697]
 [10.83838214 -4.43661846 -1.89624337]]


In [54]:
from anytree import Node, RenderTree

root = Node('sum',parent=None)
square = Node('**2',parent= root)
sum_n = Node('sum',parent=square)
one = Node('1',parent= sum_n)
x0 = Node('x_0',parent = sum_n)
mul = Node('mul',parent = root)
two = Node('2', parent = mul)
x1 = Node('x_1',parent = mul)

for pre, fill, node in RenderTree(root):
    print("%s%s" % (pre, node.name))


sum
├── **2
│   └── sum
│       ├── 1
│       └── x_0
└── mul
    ├── 2
    └── x_1
