In [1]:
using LinearAlgebra

Consider a regular linear regression task with $m \times n$ dimensional features matrix $A$ and outputs $b$ where we want to learn weights $w$. The least squares solution ($\text{min}||b-Aw||^2$) for this task is: $w = (A^T A)^{-1} A^T b$. Note that $A^T A$ should be invertible so that the solution exists. In order for this $n \times n$ dimensional matrix to be invertible, rank($A$) must be $n$. Examples with $m>n$ case given below. 

In [10]:
A = [1 0 1; 0 1 1; 1 0 1; 2 1.5 3]

4×3 Array{Float64,2}:
 1.0  0.0  1.0
 0.0  1.0  1.0
 1.0  0.0  1.0
 2.0  1.5  3.0

In [11]:
inv(transpose(A)*A)

3×3 Array{Float64,2}:
  17.5   16.0  -19.0
  16.0   16.0  -18.0
 -19.0  -18.0   21.0

In [12]:
A = [1 0 1; 0 1 1; 1 0 1; 2 1.5 3.5]

4×3 Array{Float64,2}:
 1.0  0.0  1.0
 0.0  1.0  1.0
 1.0  0.0  1.0
 2.0  1.5  3.5

In [13]:
inv(transpose(A)*A)

SingularException: SingularException(3)

In case $m<n$, rank(A) can be $m$ at most which is less then $n$ and do not satisfy the condition. Intuitively, this means that there are more than one way to determine $w$ such that the objective is minimized. Therefore, there should be a goodness criteria for $w$ to favor some solutions to others. This is achieved by an additional regularization term and this criteria differs from application to application.

A simple example is with a constraint on 2-norm of $w$: $\underset{w}{\text{min}}||Aw-b||^2 + \alpha ||w||^2$ for $\alpha > 0$. Taking the graident of objective w.r.t $w$ and equalizing to vector of 0s: $0 = 2 A^T A w - 2 A^T b + 2 \alpha w$, leads to $w = (A^T A + \alpha I)^{-1} A^T b$. So we forces Gram matrix to be positive definite. Examples below:

In [41]:
A = randn(4,6)
b = randn(4)
alfa = 0.1
w = inv(transpose(A)*A + diagm(alfa*zeros(6)))*transpose(A)*b

6-element Array{Float64,1}:
  2.0936845085708597
 -1.9215995697441337
 -0.5548507046716599
 -2.215379891904948 
  0.8435287188611905
  3.4535147912784514