In [1]:
import numpy as np
import pandas as pd

## 1. Dimensionality Reduction
### Part A) 
Here is a table of 1-5 star ratings for five movies (M, N, P. Q. R) by three raters (A, B, C).
<table>
    <thead><th>Rater</th><th>M</th><th>N</th><th>P</th><th>Q</th><th>R</th></thead>
    <tbody>
        <tr><td>A</td><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td></tr>
        <tr><td>B</td><td>2</td><td>3</td><td>2</td><td>5</td><td>3</td></tr>
        <tr><td>C</td><td>5</td><td>5</td><td>5</td><td>3</td><td>2</td></tr>
    </tbody>
</table>
Normalize the ratings by subtracting the average for each row and then subtracting the average for each column in the resulting table.

### Create Dataframe for table

In [2]:
df = pd.DataFrame(
    {"M": [1, 2, 5], "N": [2, 3, 5], "P": [3, 2, 5], "Q": [4, 5, 3], "R": [5, 3, 2]}
)

Rows-wise average

In [3]:
df.mean(axis=1)

0    3.0
1    3.0
2    4.0
dtype: float64

Normalize by subtracting row-wise mean from each element

In [4]:
df = df.sub(df.mean(axis=1), axis=0)
df

Unnamed: 0,M,N,P,Q,R
0,-2.0,-1.0,0.0,1.0,2.0
1,-1.0,0.0,-1.0,2.0,0.0
2,1.0,1.0,1.0,-1.0,-2.0


Column-wise average

In [5]:
df.mean(axis=0)

M   -0.666667
N    0.000000
P    0.000000
Q    0.666667
R    0.000000
dtype: float64

Normalize by subtracting column mean from each element

In [6]:
df = df.sub(df.mean(axis=0), axis=1)
df

Unnamed: 0,M,N,P,Q,R
0,-1.333333,-1.0,0.0,0.333333,2.0
1,-0.333333,0.0,-1.0,1.333333,0.0
2,1.666667,1.0,1.0,-1.666667,-2.0


### Part B) 
This is a table giving the profile of three items:
<table>
    <tr><td>A</td><td>1</td><td>0</td><td>1</td><td>0</td><td>1</td><td>2</td></tr>
    <tr><td>B</td><td>1</td><td>1</td><td>0</td><td>0</td><td>1</td><td>6</td></tr>
    <tr><td>C</td><td>0</td><td>1</td><td>0</td><td>1</td><td>0</td><td>2</td></tr>
</table>
The first five attributes are Boolean, and the last is an integer "rating." Assume that the scale factor for the rating is α. Compute, as a function of α, the cosine distances between each pair of profiles. For each of α = 0, 0.5, 1, and 2, determine the cosine of the angle between each pair of vectors.

In [7]:
def cal_len(vec):
    return np.sqrt(np.sum(np.square(vec)))


def dot_prod(vec1, vec2):
    return np.dot(vec1, vec2)


def cosine(vec1, vec2):
    dot_prod = np.dot(vec1, vec2)
    len_mult = cal_len(vec1) * cal_len(vec2)
    return dot_prod / len_mult

In [8]:
for alpha in [0, 0.5, 1, 2]:
    arr = np.array(
        [
            [1, 0, 1, 0, 1, 2 * alpha],
            [1, 1, 0, 0, 1, 6 * alpha],
            [0, 1, 0, 1, 0, 2 * alpha],
        ]
    )
    print("\n", "*" * 10, "Scale factor:", alpha, "*" * 10)
    print("Cosine of angle between A and B", cosine(arr[0], arr[1]))
    print("Cosine of angle between A and C", cosine(arr[0], arr[2]))
    print("Cosine of angle between B and C", cosine(arr[1], arr[2]))


 ********** Scale factor: 0 **********
Cosine of angle between A and B 0.6666666666666667
Cosine of angle between A and C 0.0
Cosine of angle between B and C 0.40824829046386296

 ********** Scale factor: 0.5 **********
Cosine of angle between A and B 0.7216878364870323
Cosine of angle between A and C 0.2886751345948129
Cosine of angle between B and C 0.6666666666666667

 ********** Scale factor: 1 **********
Cosine of angle between A and B 0.8473185457363233
Cosine of angle between A and C 0.6172133998483676
Cosine of angle between B and C 0.8498365855987975

 ********** Scale factor: 2 **********
Cosine of angle between A and B 0.9460945407607455
Cosine of angle between A and C 0.8651809126974003
Cosine of angle between B and C 0.9525793444156805


### Part C
In this question, all columns will be written in their transposed form, as rows, to make the typography simpler. Matrix M has three rows and two columns, and the columns form an orthonormal basis. One of the columns is [2/7,3/7,6/7]. There are many options for the second column [x,y,z]. Write down those constraints on x, y, and z. Then, identify in the list below the one column that could be [x,y,z]. All components are computed to three decimal places, so the constraints may be satisfied only to a close approximation.

#### Constraints for orthonormals

u=np.arr[2/7, 3/7, 6/7]
v = np.arr[x, y, z]

constraints u.v = 0 dot product should be zero
length of v should 1

In [9]:
u = np.array([2 / 7, 3 / 7, 6 / 7])

In [10]:
def find_orthoonal_vector(input_vec):
    x = np.random.randn(3)  # take a random vector
    x -= x.dot(input_vec) * input_vec  # make it orthogonal to k
    return x


def find_random__orothgonal_unit_vectors(input_vec):
    x = find_orthoonal_vector(input_vec)
    x /= np.linalg.norm(x)
    print(np.round(x))
    return x


for i in range(10):
    v = find_random__orothgonal_unit_vectors(u)
    print(
        "Dot product:", np.round(u.dot(v)), "Length:", np.round(np.linalg.norm(v)), "\n"
    )

[-0.  1. -0.]
Dot product: 0.0 Length: 1.0 

[-1. -0.  0.]
Dot product: 0.0 Length: 1.0 

[ 1.  1. -1.]
Dot product: -0.0 Length: 1.0 

[ 0.  1. -0.]
Dot product: -0.0 Length: 1.0 

[ 1. -1.  0.]
Dot product: 0.0 Length: 1.0 

[-1.  1.  0.]
Dot product: 0.0 Length: 1.0 

[-1.  1. -0.]
Dot product: 0.0 Length: 1.0 

[ 0.  1. -1.]
Dot product: -0.0 Length: 1.0 

[ 1.  0. -0.]
Dot product: 0.0 Length: 1.0 

[ 0. -1.  0.]
Dot product: 0.0 Length: 1.0 



### Part D) 
Suppose we have three points in a two dimensional space: (1,1), (2,2), and (3,4). We want to perform PCA on these points, so we construct a 2-by-2 matrix whose eigenvectors are the directions that best represent these three points. Construct this matrix.

In [11]:
const = 0.01


def calculate_eignvalue(M, Xk):
    for i in range(5):
        MXk = np.matmul(M, Xk)
        # print(MXk)
        forb_norm = np.linalg.norm(MXk)
        # print(forb_norm)
        XkPlus1 = MXk / forb_norm
        # print(XkPlus1)
        if np.linalg.norm(Xk - XkPlus1) < const:
            break
        Xk = XkPlus1

    Xk = XkPlus1
    lambd = MXk = np.matmul(np.matmul(np.transpose(Xk), M), Xk)
    return Xk, np.round(lambd, 3)


def calculate_MStar(M, lam, x):
    mult = lam * np.matmul(x, np.transpose(x))
    return M - mult


# M = np.array([[3, 2], [2, 6]])
# Xk = np.array([[1], [1]])
# Xk, lam = calculate_eignvalue(M, Xk)
# print("EigenVector:\n", Xk, "\nEginValue:", lam)
# mStar = calculate_MStar(M, lam, Xk)
## print(mStar)
# Xk, lam = calculate_eignvalue(mStar, Xk)
# print("EigenVector:\n", Xk, "\nEginValue:", lam)

In [12]:
# M = np.array([[1, 2], [2, 1], [3, 4], [4, 3]])
# MtM = np.matmul(np.transpose(M), M)
# print("MtM", MtM)
# ident = np.array([[1], [1]])
# Xk, lam = calculate_eignvalue(MtM, Xk)
# print("EigenVector:\n", Xk, "\nEginValue:", lam)
# mStar = calculate_MStar(MtM, lam, Xk)
# Xk, lam = calculate_eignvalue(mStar, ident)
# print("EigenVector:\n", Xk, "\nEginValue:", lam)
# print("*************************************")
M = np.array([[1, 1], [2, 2], [3, 4]])
MtM = np.matmul(np.transpose(M), M)
print("MtM", MtM)
ident = np.array([[1], [1]])
Xk, lam = calculate_eignvalue(MtM, ident)
print("EigenVector:\n", Xk, "\nEginValue:", lam)
mStar = calculate_MStar(MtM, lam, Xk)
Xk, lam = calculate_eignvalue(mStar, ident)
print("EigenVector:\n", Xk, "\nEginValue:", lam)

MtM [[14 17]
 [17 21]]
EigenVector:
 [[0.63180316]
 [0.77512887]] 
EginValue: [[34.857]]
EigenVector:
 [[ 0.77486565]
 [-0.63212595]] 
EginValue: [[0.143]]


### Part E) 
Identify the vector that is orthogonal to the vector [1,2,3].

In [13]:
vec = np.array([1, 2, 3])
orth = find_orthoonal_vector(vec)
print("Orthogonal Vector:", np.round(orth))
print("Normalized orthogonal Vector:", np.round(orth / np.linalg.norm(orth)))

Orthogonal Vector: [-2. -3. -5.]
Normalized orthogonal Vector: [-0. -1. -1.]


Part F) Consider the diagonal matrix M =

1	0	0<br/>
0	2	0<br/>
0	0	0<br/>
Compute its Moore-Penrose pseudoinverse.


Compute Σ + , the Moore-Penrose pseudoinverse of the diagonal matrix
Σ. That is, if the ith diagonal element of Σ is σ 6 = 0, then replace it by
1/σ. But if the ith element is 0, leave it as 0. 

ANS: 
<table>
    <tr><td>1</td><td>0</td><td>0</td></tr>
    <tr><td>0</td><td>1/2</td><td>0</td></tr>
    <tr><td>0</td><td>0</td><td>0</td></tr>
   </table>



## Problem 2: 
Exercise 11.3.2 : Use the SVD from Fig. 11.7. Suppose Leslie assigns rating 3
to Alien and rating 4 to Titanic, giving us a representation of Leslie in “movie
space” of [0, 3, 0, 0, 4]. Find the representation of Leslie in concept space. What
does that representation predict about how well Leslie would like the other
movies appearing in our example data?

Ratings after adding Leslie's ratings

In [14]:
df = pd.DataFrame(
    {
        "User": ["Joe", "Jim", "John", "Jack", "Jill", "Jenny", "Jane", "Leslie"],
        "Matrix": [1, 3, 4, 5, 0, 0, 0, 0],
        "Alien": [1, 3, 4, 5, 0, 0, 0, 3],
        "Star Wars": [1, 3, 4, 5, 0, 0, 0, 0],
        "Casablanca": [0, 0, 0, 0, 4, 5, 2, 0],
        "Titanic": [0, 0, 0, 0, 4, 5, 2, 4],
    }
)
df.set_index("User", drop=True, inplace=True)
df

Unnamed: 0_level_0,Matrix,Alien,Star Wars,Casablanca,Titanic
User,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Joe,1,1,1,0,0
Jim,3,3,3,0,0
John,4,4,4,0,0
Jack,5,5,5,0,0
Jill,0,0,0,4,4
Jenny,0,0,0,5,5
Jane,0,0,0,2,2
Leslie,0,3,0,0,4


we can map Leslie into “concept space” by multiplying him by the matrix V
of the decomposition.

In [15]:
V = np.array([[0.58, 0.58, 0.58, 0, 0], [0, 0, 0, 0.71, 0.71]])
q = np.array([[0,3,0,0,4]])
#q = q = np.array([[4,0,0,0,0]])
print("V:\n", V)
print("q:\n", q)
leslie_concept =  np.matmul(q, np.transpose(V))
print("qV", leslie_concept)
movie_space_leslie = np.matmul(leslie_concept, V)
print("lesle_movie_space:\n", movie_space_leslie)

V:
 [[0.58 0.58 0.58 0.   0.  ]
 [0.   0.   0.   0.71 0.71]]
q:
 [[0 3 0 0 4]]
qV [[1.74 2.84]]
lesle_movie_space:
 [[1.0092 1.0092 1.0092 2.0164 2.0164]]


From above Movie Space numbers we can conclude that Leslie Likes both types of movies Science-Fiction as well as Romantic, The numbers shows that Leslie is more inclined towards romantic movies. We can say that Leslie will also other movies from our dataset(Casablanca, Matrix, and Star Wars). 