# **US financial market | Linear algebra**
*Original dataset source: Content property of Economática, financial information platform*
<br>*Used dataset source: **[us_2022q1_service_industries.csv](https://github.com/myrosandrade89/IA95022/tree/main/Statistics/Reto/dataset)***
<br>*Author: Myroslava Sánchez Andrade A01730712*
<br>*Creation date: 05/10/2022*
<br>*Last updated: 13/10/2022*

---
## **Overview**
The purpose of this repository is the analysis of the first 2022 quarter financial statements of all US public service industry companies listed on the New York Exchange and NASDAQ using linear algebra; being `['epsp', 'medium firm', 'big firm', 'profit margin', 'book/market', 'short leverage']` the explicative variables (independent) and `['f1 stock return']` the dependent variable.

---
## **Configuration**

In [1]:
# Importing the necessary libraries
import pandas as pd
import numpy as np
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm

In [3]:
# Importing the dataset
us_2022q1_service_industries = pd.read_csv('data/us_2022q1_service_industries.csv')
us_2022q1_service_industries

Unnamed: 0,index,firm,q,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return
0,1,AAWW,2022q1,1,0,0.032303,0.078591,0.090121,0.104785,-0.093966
1,2,ABM,2022q1,1,0,0.024686,0.039252,-0.620570,0.013874,-0.008219
2,3,ABNB,2022q1,0,1,-0.000173,-0.012454,-3.134384,0.003722,-0.418310
3,5,ACCD,2022q1,1,0,-0.029390,-0.368574,-0.317665,0.005126,-0.863745
4,6,ACHC,2022q1,0,1,0.010327,0.098657,-0.827813,0.009602,0.077769
...,...,...,...,...,...,...,...,...,...,...
644,784,ZNGA,2022q1,0,1,-0.002341,-0.035446,-1.281077,0.003055,-0.230480
645,785,ZS,2022q1,0,1,-0.002950,-0.392936,-4.144927,0.008119,-0.308016
646,786,ZUO,2022q1,1,0,-0.018348,-0.387928,-2.419384,0.029738,-0.481159
647,787,ZVO,2022q1,0,0,-0.266288,-0.120666,-0.907766,0.000000,-0.625444


In [4]:
# Defining the dataset of the variables
us_2022q1_service_industries_variables = us_2022q1_service_industries[['medium firm', 'big firm', 'epsp', 'profit margin', 'book/market', 'short leverage', 'f1 stock return']]
us_2022q1_service_industries_variables

Unnamed: 0,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return
0,1,0,0.032303,0.078591,0.090121,0.104785,-0.093966
1,1,0,0.024686,0.039252,-0.620570,0.013874,-0.008219
2,0,1,-0.000173,-0.012454,-3.134384,0.003722,-0.418310
3,1,0,-0.029390,-0.368574,-0.317665,0.005126,-0.863745
4,0,1,0.010327,0.098657,-0.827813,0.009602,0.077769
...,...,...,...,...,...,...,...
644,0,1,-0.002341,-0.035446,-1.281077,0.003055,-0.230480
645,0,1,-0.002950,-0.392936,-4.144927,0.008119,-0.308016
646,1,0,-0.018348,-0.387928,-2.419384,0.029738,-0.481159
647,0,0,-0.266288,-0.120666,-0.907766,0.000000,-0.625444


---
## **Exploratory analysis**

#### ***Variance-Covariance matrix***

In [4]:
# Defining the global variables
x = us_2022q1_service_industries_variables
x_transpose = x.T
n = x.shape[0]
matrix_one = np.full((n, 1), 1)

In [5]:
# Calculating variance-covariance matrix
var_cov_matrix = (1 / (n - 1)) * (x_transpose.dot(x) - (1 / n) * (x_transpose.dot(matrix_one)).dot(x_transpose.dot(matrix_one).T))
var_cov_matrix

Unnamed: 0,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return
medium firm,0.218113,-0.112271,0.005541,-0.242756,0.009881,-0.002491,-0.00165
big firm,-0.112271,0.227782,0.004596,0.375262,-0.239489,0.000397,0.013645
epsp,0.005541,0.004596,0.009382,0.106124,-0.013917,-0.000397,0.010968
profit margin,-0.242756,0.375262,0.106124,170.943724,-0.181628,0.019931,0.5163
book/market,0.009881,-0.239489,-0.013917,-0.181628,1.238903,-0.004713,-0.027044
short leverage,-0.002491,0.000397,-0.000397,0.019931,-0.004713,0.006409,0.001658
f1 stock return,-0.00165,0.013645,0.010968,0.5163,-0.027044,0.001658,0.141972


In [6]:
# Proving that the calculation is correct
x.cov()

Unnamed: 0,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return
medium firm,0.218113,-0.112271,0.005541,-0.242756,0.009881,-0.002491,-0.00165
big firm,-0.112271,0.227782,0.004596,0.375262,-0.239489,0.000397,0.013645
epsp,0.005541,0.004596,0.009382,0.106124,-0.013917,-0.000397,0.010968
profit margin,-0.242756,0.375262,0.106124,170.943724,-0.181628,0.019931,0.5163
book/market,0.009881,-0.239489,-0.013917,-0.181628,1.238903,-0.004713,-0.027044
short leverage,-0.002491,0.000397,-0.000397,0.019931,-0.004713,0.006409,0.001658
f1 stock return,-0.00165,0.013645,0.010968,0.5163,-0.027044,0.001658,0.141972


**Variance:** the variance of a variable X is the average of squared deviations from each individual value Xi from its mean (the average of the squared difference between the observed values of a variable and its mean).

**Covariance:** it measures the joint probability of two variables, it is the average of product deviations between a variable X and a variable Y from their corresponding means (we cannot understand the magnitude, only its sign).

In the above calculation we can observe the variance in the diagonal and the covariance the in non-diagonal.

#### ***Correlation matrix***

In [7]:
# Defining the global variables
variance = np.diag(var_cov_matrix).reshape(1, 7)
denominator = np.sqrt(variance) * np.sqrt(variance.T) # Standard deviation

In [8]:
# Calculating the correlation matrix
corr_matrix = var_cov_matrix / denominator
corr_matrix

Unnamed: 0,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return
medium firm,1.0,-0.503697,0.122485,-0.039756,0.019008,-0.066632,-0.009375
big firm,-0.503697,1.0,0.099425,0.060138,-0.450825,0.010402,0.075877
epsp,0.122485,0.099425,1.0,0.083799,-0.129089,-0.051197,0.300534
profit margin,-0.039756,0.060138,0.083799,1.0,-0.012481,0.019042,0.104803
book/market,0.019008,-0.450825,-0.129089,-0.012481,1.0,-0.052887,-0.064483
short leverage,-0.066632,0.010402,-0.051197,0.019042,-0.052887,1.0,0.054977
f1 stock return,-0.009375,0.075877,0.300534,0.104803,-0.064483,0.054977,1.0


In [9]:
# Proving that the calculation is correct
x.corr()

Unnamed: 0,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return
medium firm,1.0,-0.503697,0.122485,-0.039756,0.019008,-0.066632,-0.009375
big firm,-0.503697,1.0,0.099425,0.060138,-0.450825,0.010402,0.075877
epsp,0.122485,0.099425,1.0,0.083799,-0.129089,-0.051197,0.300534
profit margin,-0.039756,0.060138,0.083799,1.0,-0.012481,0.019042,0.104803
book/market,0.019008,-0.450825,-0.129089,-0.012481,1.0,-0.052887,-0.064483
short leverage,-0.066632,0.010402,-0.051197,0.019042,-0.052887,1.0,0.054977
f1 stock return,-0.009375,0.075877,0.300534,0.104803,-0.064483,0.054977,1.0


**Correlation:** is the statistical relationship between two variables (a scaled interpretation of the covariance). 

In the above calculation we can appreciate that the diagonal of the matrix is full of 1s, which makes sense since the correlation of a variable with itself is 100%. The non-diagonal values that approximate to 1 or -1 mean that there is a strong positive or negative relation between 2 variables (when a variable x goes up/down, variable y goes up/down). On the other hand, the non-diagonal values that approximate to 0 mean that there is a low correlation between two variables.

Since we are looking for new variables to predict the stock returns, a set of independet variables with really high correlation could cause a unreliable estimation of the beta coefficients (multicollinearity). In this case, we can observe that the highest correlation is `-0.4508` between  `'book/market'` and `'big firm'` which is tolerable.

#### ***Detection of leverage points***

In [10]:
# Defining the beta 0 coefficient variable
x['beta0 coefficient'] = np.reshape(matrix_one, (x.shape[0], )).tolist()
x_variables = x[['medium firm', 'big firm', 'epsp', 'profit margin', 'book/market', 'short leverage', 'beta0 coefficient']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x['beta0 coefficient'] = np.reshape(matrix_one, (x.shape[0], )).tolist()


In [11]:
# Calculating the hat matrix
def calculate_hat_matrix(x):
    x_variables = x[['medium firm', 'big firm', 'epsp', 'profit margin', 'book/market', 'short leverage', 'beta0 coefficient']]
    x_variables_transpose = x_variables.T
    hat_matrix = x_variables.dot(np.linalg.inv(x_variables_transpose.dot(x_variables)).dot(x_variables_transpose))
    return hat_matrix

In [12]:
hat_matrix = calculate_hat_matrix(x)
hat_matrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,639,640,641,642,643,644,645,646,647,648
0,0.010067,0.006281,-0.003488,0.006484,0.002343,0.005062,0.000751,0.003699,-0.002086,0.014652,...,-0.000977,-0.003705,-0.001806,0.001307,-0.000224,0.001010,-0.005881,0.001962,-0.003002,0.001645
1,0.006281,0.005609,-0.001221,0.005751,0.001308,0.005032,0.000660,0.004572,-0.000804,-0.002303,...,0.000064,-0.002053,-0.000446,0.003394,0.000283,0.000779,-0.002337,0.003481,-0.001061,-0.000176
2,-0.003488,-0.001221,0.007014,-0.001673,0.002287,-0.000270,0.003525,0.000658,0.006056,-0.000419,...,0.000731,0.007777,0.005602,0.002807,0.004339,0.003280,0.009018,0.002333,0.001323,-0.001744
3,0.006484,0.005751,-0.001673,0.006550,0.001802,0.005146,0.000960,0.004699,-0.001176,-0.004591,...,-0.000597,-0.002910,-0.000571,0.002833,0.000318,0.001196,-0.003235,0.003163,0.000952,0.001870
4,0.002343,0.001308,0.002287,0.001802,0.006717,0.000363,0.005606,-0.000354,0.002967,-0.000804,...,-0.000162,0.000744,0.003662,-0.002532,0.004889,0.005843,0.000325,-0.002303,-0.000959,0.000783
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
644,0.001010,0.000779,0.003280,0.001196,0.005843,0.000251,0.005224,-0.000084,0.003594,-0.001839,...,-0.000058,0.002118,0.004098,-0.001439,0.004801,0.005377,0.002113,-0.001334,-0.000027,0.000570
645,-0.005881,-0.002337,0.009018,-0.003235,0.000325,-0.000566,0.002572,0.001045,0.007381,0.001105,...,0.001075,0.010895,0.006390,0.005102,0.004046,0.002113,0.012753,0.004342,0.002154,-0.002782
646,0.001962,0.003481,0.002333,0.003163,-0.002303,0.004508,-0.001110,0.005361,0.001527,0.001648,...,0.000176,0.003629,0.000923,0.007516,-0.000373,-0.001334,0.004342,0.007257,0.001853,-0.000710
647,-0.003002,-0.001061,0.001323,0.000952,-0.000959,0.000093,-0.000180,0.001230,0.000565,-0.008856,...,0.002639,0.001230,0.000784,0.001347,-0.000218,-0.000027,0.002154,0.001853,0.014376,0.010896


In [13]:
# Calculating the leverages of the hat matrix
def calculate_leverages(hat_matrix):
    leverages = np.diagonal(hat_matrix)
    return leverages

In [14]:
leverages = calculate_leverages(hat_matrix)

In [15]:
# Storing unusual Xs indexes in an array
def calculate_unusual_x(leverages):
    leverages_mean = (leverages.sum()) / leverages.shape[0]
    unusual_x = np.nonzero(leverages > (3 * leverages_mean))
    return unusual_x

In [16]:
# Storing unusual Xs indexes in an array
unusual_x = calculate_unusual_x(leverages)
unusual_x

(array([  9,  20,  48, 182, 236, 343, 372, 418, 420, 422, 455, 465, 488,
        516, 542, 598, 624], dtype=int64),)

The **leverage** is a number between 0 and 1 that calculates the distance between the $x_i$ data point and the mean of the $x$ values for all $n$ data points. The sum of the $h_{ii}$ ($k + 1$) equals the number of parameters including the intercept.

The **hat matrix** (the $n x n$ matrix) was calculated using the formula $H = X(X'X)^{-1}X'$, it contains the leverages that allow us to determine whether the $x$ values are extreme and therefore potentially influential in the regression model analysis.

To determine extreme $x$ values there is a rule that recommends to flag observations whose leverage value ($h_{ii}$) is more than 3 times larger than the mean leverage value: $unusual X = h_{ii} > 3\left(\sum_{i=1}^n h_{ii} \over n \right)$

Actually an $x_i$ data point that has a high leverage, may or may not be influential. A data point has large influence only if it affects the estimated regression function.

#### ***Detection of outliers***

In [17]:
# Global functions
def y_hat(hat_matrix, y_variable):
    return hat_matrix.dot(y_variable)

def calculate_mse(y_variable, y_predictions):
    return np.square(y_variable - y_predictions).sum() / y_variable.shape[0]

In [18]:
# Defining the global variables
y_variable = x['f1 stock return']
y_predictions = y_hat(hat_matrix, y_variable)
mse = calculate_mse(y_variable, y_predictions)
mse

0.12706871213116497

In [19]:
# Calculating the outliers
def calculate_outliers(y_variables, y_predictions, mse):
    
    # Calculating the residuals
    residuals = y_variable - y_predictions
    
    # Calculating the standardize residuals
    standardize_residuals = residuals / np.sqrt(mse * (1 - leverages)) 
    standardize_residuals = np.array(standardize_residuals)
    
    # Storing outliers indexes in an array
    outliers = np.nonzero(np.absolute(standardize_residuals) > 3)
    return outliers

In [20]:
# Storing the outliers in an array
outliers = calculate_outliers(y_variable, y_predictions, mse)
outliers

(array([131, 447, 461, 488, 544, 585, 624], dtype=int64),)

An **outlier** is a data point whose $y$ response does not follow the general trend of the rest of the data. For the identification of outliers two measures were used `residuals` and `standardized residuals`. 

The **ordinary residuals** are definied for each observation in the data as the difference between observed $y$ and the predicted $y$: $e_i = y_i - \hat{y}_i$ 

The **standardized residuals** are defined as well for each observation in the data as an ordinary residual divided by an estimate of its standard deviation: ${r_i} = {e_i\over\sqrt{MSE(1-h_{ii})}}$. Therefore, the standardized residual depends as well on the size of the mean square error (MSE) and the leverage $h_{ii}$. An observation with a standardized residual (in absolute value) larger than 3 is considered to be an outlier.

---
## **Multicollinearity**

In [21]:
# VIF dataframe
vif_data = pd.DataFrame()
vif_data["x variables"] = x_variables.columns
vif_data

Unnamed: 0,x variables
0,medium firm
1,big firm
2,epsp
3,profit margin
4,book/market
5,short leverage
6,beta0 coefficient


In [22]:
# Calculating VIF for each feature
vif_data["VIF"] = [variance_inflation_factor(x_variables.values, i) for i in range(len(x_variables.columns))]
vif_data

Unnamed: 0,x variables,VIF
0,medium firm,1.505328
1,big firm,1.858203
2,epsp,1.063746
3,profit margin,1.011408
4,book/market,1.364984
5,short leverage,1.013401
6,beta0 coefficient,3.824185


Multicollinearity occurs when two or more independent variables have high correlation themselves and it might cause an unreliable estimation, thus, these variables must be detected and discarded.

For the detection of multicollinearity, the **Variance Inflation Factor (VIF)** technique was used. This method regress each independent variable against all others. The VIF is calculated: $VIF = {1\over 1 - R^2}$, where $R^2$ is the coefficient of determination in linear regression. A higher VIF denotates a strong collinearity. Generally, a VIF above 5 indicates a high multicollinearity. 
<br>As we see from the results above, there is no multicollinearity between the independent variables.

---
## **Transformations**

In [23]:
def drop_outliers_unusualx(outliers, unusual_x, data):
    
    # Dropping the intersection indexes of the arrays unusual_x and outliers from the data
    while((outliers_unusualx:=np.intersect1d(unusual_x, outliers)).size > 0):

        # Dropping the outliers and unusual_x 
        data = data.drop(outliers_unusualx, axis=0).reset_index(drop=True)
        
        # Recalculating the variables for the outliers and unusual x
        hat_matrix = calculate_hat_matrix(data)
        unusual_x = calculate_unusual_x(calculate_leverages(hat_matrix))
        y_variable = data['f1 stock return']
        y_predictions = y_hat(hat_matrix, data['f1 stock return'])
        outliers = calculate_outliers(y_variable, y_predictions, calculate_mse(y_variable, y_predictions))
        
    # Once there a no more intersections, the outliers are dropped from the data
    data = data.drop(np.asarray(outliers)[0], axis=0).reset_index(drop=True)
    
    return data

In [24]:
x = drop_outliers_unusualx(outliers, unusual_x, x)
x

Unnamed: 0,medium firm,big firm,epsp,profit margin,book/market,short leverage,f1 stock return,beta0 coefficient
0,1,0,0.032303,0.078591,0.090121,0.104785,-0.093966,1
1,1,0,0.024686,0.039252,-0.620570,0.013874,-0.008219,1
2,0,1,-0.000173,-0.012454,-3.134384,0.003722,-0.418310,1
3,1,0,-0.029390,-0.368574,-0.317665,0.005126,-0.863745,1
4,0,1,0.010327,0.098657,-0.827813,0.009602,0.077769,1
...,...,...,...,...,...,...,...,...
627,0,1,-0.002341,-0.035446,-1.281077,0.003055,-0.230480,1
628,0,1,-0.002950,-0.392936,-4.144927,0.008119,-0.308016,1
629,1,0,-0.018348,-0.387928,-2.419384,0.029738,-0.481159,1
630,0,0,-0.266288,-0.120666,-0.907766,0.000000,-0.625444,1


Since there was no multicollinearity, there was no need to drop any column. On the other hand, since there were some extreme values in the independent and dependent variables, it was decided to handle them by the next steps:
- Identify the intersection between the outliers and unusual x. (Set composed of all values[indexes] that belong to both arrays).
- Drop the intersection values of the dataset and reset the index.
- Recalculate the variables for the outliers and unusual x (hat_matrix, unusual_x, y_variables, y_predictions, outliers).
- **Repeat the above steps until there are no more intersections**
- Drop the left outliers

---
## **Model | OLS derivation**

In [25]:
# Defining the variables
y_variable = x[['f1 stock return']]
x_variables = x[['medium firm', 'big firm', 'epsp', 'profit margin', 'book/market', 'short leverage', 'beta0 coefficient']]
x_variables_transpose = x_variables.T

#### ***Beta coefficients***

In [26]:
# Calculating the beta coefficients
beta_coefficients = (np.linalg.inv(x_variables_transpose.dot(x_variables))).dot(x_variables_transpose.dot(y_variable))
beta_coefficients

array([[-8.63034548e-02],
       [-6.10872417e-02],
       [ 3.79616492e+00],
       [ 1.16807895e-03],
       [ 2.44535601e-03],
       [ 4.45913537e-01],
       [-2.33662655e-01]])

#### ***Standard errors of betas***

In [27]:
# Defining the y_prediction
y_predict = pd.DataFrame()
y_predict['f1 stock return'] = x_variables.dot(beta_coefficients)

# Defining the global variables
n = y_predict.shape[0]
matrix_one = np.full((n, 1), 1)

In [28]:
# Calculating the error
error = y_variable - y_predict
error_transpose = error.T
error

Unnamed: 0,f1 stock return
0,0.056337
1,0.213320
2,-0.116885
3,-0.433289
4,0.330945
...,...
627,0.074971
628,0.004906
629,-0.098431
630,0.621453


In [29]:
# Calculating the covariance variance matrix of the error
cov_var_error = (1 / (n - 1)) * (error_transpose.dot(error) - (1 / n) * (error_transpose.dot(matrix_one)).dot(error_transpose.dot(matrix_one).T))
cov_var_error['f1 stock return'][0]

0.1100727297517122

In [30]:
# Calculating the covariance-variance matrix of the betas
cov_var_betas = cov_var_error['f1 stock return'][0] * np.linalg.inv(x_variables_transpose.dot(x_variables))
cov_var_betas

array([[ 1.22107714e-03,  7.82717981e-04, -2.19715737e-03,
         1.21138122e-06,  1.22330380e-04,  3.76103015e-04,
        -5.42906603e-04],
       [ 7.82717981e-04,  1.47532354e-03, -2.41043524e-03,
        -7.65708574e-07,  2.59102504e-04,  2.27909887e-04,
        -4.63976025e-04],
       [-2.19715737e-03, -2.41043524e-03,  9.22403604e-02,
        -3.85877600e-05,  2.46277009e-04,  4.44704765e-03,
         2.37303594e-03],
       [ 1.21138122e-06, -7.65708574e-07, -3.85877600e-05,
         1.02345912e-06, -3.50665032e-07, -4.69484103e-06,
         3.65049929e-07],
       [ 1.22330380e-04,  2.59102504e-04,  2.46277009e-04,
        -3.50665032e-07,  1.90634049e-04,  1.60683756e-04,
         1.13748006e-04],
       [ 3.76103015e-04,  2.27909887e-04,  4.44704765e-03,
        -4.69484103e-06,  1.60683756e-04,  2.76934341e-02,
        -8.50946863e-04],
       [-5.42906603e-04, -4.63976025e-04,  2.37303594e-03,
         3.65049929e-07,  1.13748006e-04, -8.50946863e-04,
         7.0888587

In [31]:
# Calculating the standard errors of the betas
standard_errors_betas = np.sqrt(np.diagonal(cov_var_betas))
standard_errors_betas

array([0.03494391, 0.03840994, 0.30371098, 0.00101166, 0.01380703,
       0.16641344, 0.02662491])

In [32]:
# Proving that the calculation is correct
X = sm.add_constant(x_variables)
mkmodel = sm.OLS(y_variable, X).fit()

print(mkmodel.summary())

                            OLS Regression Results                            
Dep. Variable:        f1 stock return   R-squared:                       0.212
Model:                            OLS   Adj. R-squared:                  0.205
Method:                 Least Squares   F-statistic:                     28.07
Date:                Thu, 13 Oct 2022   Prob (F-statistic):           9.49e-30
Time:                        22:44:53   Log-Likelihood:                -198.98
No. Observations:                 632   AIC:                             412.0
Df Residuals:                     625   BIC:                             443.1
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
medium firm          -0.0863      0.03

- GENERAL RESULTS:
AFTER THE APPLICATION OF A MULTIPLE REGRESSION MODEL, WE CAN OBSERVE THAT THE R-SQUARED (THE ERROR) IS 21%, WHICH MEANS THAT THIS MODEL REPRESENTS A RELIABLE PREDICTION MODEL FOR THE ANNUAL STOCK RETURNS.

- INDEPENDENT VARIABLES ANALYSIS:
WE CAN OBSERVE THAT THERE ARE 4 POSITIVE COEFFICIENTS (EPSP, PROFIT MARGIN, BOOK/MARKET AND THE SHORT LEVERAGE), WHICH MEANS THAT THEY HAVE A POSITIVE IMPACT OF MAGNITUDE X IN THE PREDICTION OF THE ANNUAL STOCK RETURN.
ON THE OTHER HAND, THERE ARE 3 NEGATIVE  COEFFICIENTS (MEDIUM FIRM, BIG FIRM AND THE BETA 0 COEFFICIENT), WHICH MEANS THAT THEY HAVE A NEGATIVE IMPACT OF MAGNITUDE X IN THE PREDICTION OF THE ANNUAL STOCK RETURN.

*FOR THE VARIABLES: MEDIUM FIRM,  EPSP, SHORT LEVERAGE AND BETA 0 COEFFICIENT*

WE CAN OBSERVE THAT THEIR T-VALUE IS GREATER THAN 2 (ABSOLUTE VALUE). THE T-VALUE IS THE MEAN MINUS THE NULL HYPOTHESIS DIVIDED BY THE STANDARD ERROR, IN THIS CASE THE MEAN OF THE VARIABLES ARE THE COEFFICIENTS AND THE NULL HYPOTHESIS IS THE HYPOTHESIS WHERE THE COEFFICIENT OF THE VARIABLE IS ACTUALLY 0. THE RESULT OF THE T-VALUE TELLS US HOW MANY STANDARD DEVIATIONS IS OUR HYPOTHESIS AWAY FROM THE NULL HYPOTHESIS; LETS REMEMBER THAT FOR A NORMAL DISTRIBUTION, 2 STANDARD DEVIATIONS REPRESENT A 95% PROBABILITY UNDER THE CURVE, THUS, FOR A HYPOTHESIS TO BE APPROVED, THE T-VALUE MUST BE AT LEAST 2 (-2 FOR A NEGATIVE RELATION). 
THE P VALUE IS THE PROBABILITY OF THE NULL HYPOTHESIS TO BE CORRECT. AS WE CAN SEE, THE P-VALUE FOR THESE VARIABLES IS BASICALLY 0, WHICH MEANS THAT THE PROBABILITY OF THE COEFFCIENTS TO BE 0 IS BASICALLY 0.


*FOR THE VARIABLES: BIG FIRM, PROFIT MARGIN AND BOOK/MARKET*

THE T-VALUE IS LESS THAN 2 (ABSOLUTE VALUE), WHICH MEANS THAT THERE IS LESS THAN A 95% OF PROBABILITY THAT THE COEFFICIENT IS ACTUALLY THAT VALUE; THE P-VALUE OF THESE VARIABLES REPRESENT THAT THERE IS A PROBABILITY THAT THE NULL HYPOTHESIS IS TRUE.

- Small firm: 
annual_stock_return = 3.7962(epsp) + 0.0012(profit_margin) + 0.0024(book/market) + 0.4459(short_leverage) - 0.2337

- Medium firm: 
annual_stock_return = 3.7962(epsp) + 0.0012(profit_margin) + 0.0024(book/market) + 0.4459(short_leverage) - 0.2337 - 0.0863

- Big firm: 
annual_stock_return = 3.7962(epsp) + 0.0012(profit_margin) + 0.0024(book/market) + 0.4459(short_leverage) - 0.2337 - 0.0611