# Variables Research (Part IV - Sales-related Variables)

### `Misael Ramirez - A00821781`

<img src="autlan-logo.png" alt="Autlan logo">


In this notebook, we will perform a research for a financial forecast of variables which driver is directly related with sales. Sales are divided in the following:
1. *Mining-Metallurgical Manganese Products*
2. *Energy*
3. *Precious Metals*

Nonetheless, the contribution of the **energy** division is technically none, since the resources related with energy are used in the mines possesed by Autlán to perform the extraction of minerals, manganese for the most part. As stated in 2Q14 report: 
> "The electricity generation of the Atexcaco hydroelectric plant was favored by the constant flow of water during the dry months, so its generation grew 56%, contributing 30% of Autlán's needs and representing savings in the first half of the year of 40.2 million MXN"

**Source:** Autlan

Now, we will focus on variables that use sales as a driver too, but this are more related with the way Autlán operates based on the forecasted sales.

In [2]:
# import required libraries 
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl

# setting up future plots
sns.set_style(  
    "darkgrid",  
    {  
        'legend.frameon': True,  
    }  
)
mpl.rc('figure', figsize=(14, 8))

## Cost of Gods Sold Forecast (2020-2025)

We perform a linear regression with the power of machine learning, dividing our data for trainign and the prediction itself. The COGS for FY20 will be calculated with the Sales from that same year we estimated through our research. 

The way we calculate gradients will be different, instead of modeling the next monetary quantity of COGS for the next years we will focus on the gradients directly for the forecast model. The percentage of COGS does not vary a lot over time (not even the gradient for FY20 which comes from forecasted numbers, COGS and Sales respectively), and this can be seen in its variance over time, which is `5.2415`, althugh we must keep in mind these are millions of dollars, it doesn't change as much.

We will use the following logic for the gradients

>**The years that the team considered of economic slowdown (FY20-FY22) will show a constat gradient of `plus 2 percent` in COGS until arriving to the years of economic recovery where COGS will show a constant gradient of `minus 2 percent`. It might not seem significant, but these are percentage movement for hunderds of millions of dollars**

These forecasted results in variables show a tendency very similar to the previous five years of analysis

In [3]:
# import required libraries
from sklearn.linear_model import LinearRegression 

# create data for model
data = pd.read_excel('other_vars.xlsx', sheet_name='COGS_1')
data = data.drop('Date', axis=1)

Y_train = data['COGS'].values.reshape(-1, 1)
X_train = data['Sales'].values.reshape(-1, 1)
X_test = np.array([320.35]).reshape(-1, 1)

print('COSG FY2014-FY2019: \n{}\n'.format(Y_train))
print('Sales FY2014-FY2019:\n{}\n'.format(X_train))
print('Predicted Sales FY20: {}\n'.format(X_test))

# create model
reg = LinearRegression()
reg.fit(X_train,Y_train)
y_pred = reg.predict(X_test)
print('COGS FY20: {}'.format(y_pred))

COSG FY2014-FY2019: 
[[244.723]
 [189.118]
 [152.038]
 [196.723]
 [275.567]
 [285.362]]

Sales FY2014-FY2019:
[[338.04 ]
 [267.729]
 [230.711]
 [359.34 ]
 [413.504]
 [420.128]]

Predicted Sales FY20: [[320.35]]

COGS FY20: [[212.62949882]]


In [10]:
print('Standard Deviation: {}'.format(np.std([72.39, 70.64, 65.90, 54.75, 66.64, 67.92, 66])))
data = pd.read_excel('other_vars.xlsx', sheet_name='COGS_2')
display(data)

Standard Deviation: 5.241501965767337


Unnamed: 0,Date,Sales,COGS,%
0,2014-12-31,338.04,244.723,0.7239
1,2015-12-31,267.729,189.118,0.7064
2,2016-12-31,230.711,152.038,0.659
3,2017-12-31,359.34,196.723,0.5475
4,2018-12-31,413.504,275.567,0.6664
5,2019-12-31,420.128,285.362,0.6792
6,2020-12-31,320.35,212.629,0.66374
7,2021-12-31,317.90969,216.88158,0.682211
8,2022-12-31,315.851145,221.219212,0.700391
9,2023-12-31,325.75642,216.794827,0.665512


## General Expenses Forecast (2020-2025)

The General Expenses depend on the internal operations of Autlán, but it shows a imilar low variance gradients such as cost of good sold. As a result, GE for FY20 will be calculated with the Sales from that same year we estimated through our research.

The way we calculate gradients will be different, instead of modeling the next monetary quantity of GE for the next years we will focus on the gradients directly for the forecast model. The percentage of COGS does not vary a lot over time (not even the gradient for FY20 which comes from forecasted numbers, GE and Sales respectively), and this can be seen in its variance over time, which is `5.2415`, although we should keep in mind these are millions of dollars, it doesn't change as much.

We will use the following logic for the gradients

>**The years that the team considered of economic slowdown (FY20-FY22) will show a constat gradient of `plus 1 percent` in GE until arriving to the years of economic recovery where GE will show a constant gradient of `minus 1 percent`. It might not seem significant, but these are percentage movement for hunderds of millions of dollars.**

This results in variables show a tendency very similar to the previous five years of analysis

In [5]:
# import required libraries
from sklearn.linear_model import LinearRegression 

# create data for model
data = pd.read_excel('other_vars.xlsx', sheet_name='GE_1')
data = data.drop('Date', axis=1)

Y_train = data['GE'].values.reshape(-1, 1)
X_train = data['Sales'].values.reshape(-1, 1)
X_test = np.array([320.35]).reshape(-1, 1)

print('GE FY2014-FY2019: \n{}\n'.format(Y_train))
print('Sales FY2014-FY2019:\n{}\n'.format(X_train))
print('Predicted Sales FY20: {}\n'.format(X_test))

# create model
reg = LinearRegression()
reg.fit(X_train,Y_train)
y_pred = reg.predict(X_test)
print('GE FY20: {}'.format(y_pred))

GE FY2014-FY2019: 
[[40.693]
 [36.314]
 [31.707]
 [39.654]
 [43.98 ]
 [52.983]]

Sales FY2014-FY2019:
[[338.04 ]
 [267.729]
 [230.711]
 [359.34 ]
 [413.504]
 [420.128]]

Predicted Sales FY20: [[320.35]]

GE FY20: [[39.36026898]]


In [11]:
print('Standard Deviation: {}'.format(np.std([40.69, 36.31, 31.71, 39.65, 43.98, 52.98])))
data = pd.read_excel('other_vars.xlsx', sheet_name='GE_2')
display(data)print('Standard Deviation: {}'.format(np.std([40.69, 36.31, 31.71, 39.65, 43.98, 52.98])))
data = pd.read_excel('other_vars.xlsx', sheet_name='GE_2')
display(data)

Standard Deviation: 6.614914125586882


Unnamed: 0,Date,Sales,General Expenses,%
0,2014-12-31,338.04,40.693,0.12
1,2015-12-31,267.729,36.314,0.14
2,2016-12-31,230.711,31.707,0.14
3,2017-12-31,359.34,39.654,0.11
4,2018-12-31,413.504,43.98,0.11
5,2019-12-31,420.128,52.983,0.13
6,2020-12-31,320.35,39.36,0.122866
7,2021-12-31,317.90969,40.1472,0.126285
8,2022-12-31,315.851145,40.950144,0.12965
9,2023-12-31,325.75642,40.131141,0.123194


## Capital Expenditure Forecast (2020-2025)

Due to the economic uncertainty, unnecesary risks will not be taken. Based on that, we consider a low CAPEX such as 2016 and 2017 for the years of economic turmoil defined by the team (2020-2022). It is important to highlight that this was before before acquiring Metallorum and having a 55% increase in CAPEXin 2018 as a result. CAPEX is not low-variance since this operations depend on acquisitions and investment decisions that are directly related with the operations that affected by the sales volume (e.g. Metallorum).

As a conclusion, we will apply the following logic for the gradients:

>**The forecast will be a low variance gradient as FY19 with a base gradient which increases 1% of the previous gradient each year as the economic turmoil finishes its period and the environment is open for more investment opportunities for autlán and to improve management operations of Metallorum as well. It might not seem significant, but these are percentage movement for hunderds of millions of dollars. We follow this model in order to keep a constant investment that displays the most normal fluctuations for the company**

In [19]:
# import required libraries
from sklearn.linear_model import LinearRegression 

# create data for model
data = pd.read_excel('other_vars.xlsx', sheet_name='CAPEX_1')
data = data.drop('Date', axis=1)

Y_train = data['CAPEX%'].values.reshape(-1, 1)
X_train = data['Sales'].values.reshape(-1, 1)
X_test = np.array([320.35]).reshape(-1, 1)

print('CAPEX FY2014-FY2019: \n{}\n'.format(Y_train))
print('Sales FY2014-FY2019:\n{}\n'.format(X_train))
print('Predicted Sales FY20: {}\n'.format(X_test))

# create model
reg = LinearRegression()
reg.fit(X_train,Y_train)
y_pred = reg.predict(X_test)
print('CAPEX FY20: {}'.format(y_pred))

CAPEX FY2014-FY2019: 
[[0.1034]
 [0.0291]
 [0.0699]
 [0.5559]
 [0.1832]]

Sales FY2014-FY2019:
[[267.729]
 [230.711]
 [359.34 ]
 [413.504]
 [420.128]]

Predicted Sales FY20: [[320.35]]

CAPEX FY20: [[0.15906041]]


In [20]:
data = pd.read_excel('other_vars.xlsx', sheet_name='CAPEX_2')
display(data)

print('Standard Deviation: {}'.format(np.std([10.34, 2.91, 6.99, 55.59, 18.32])))

Unnamed: 0,Date,Sales,CAPEX%
0,2015-12-31,338.04,0.1034
1,2016-12-31,267.729,0.0291
2,2017-12-31,230.711,0.0699
3,2018-12-31,359.34,0.5559
4,2019-12-31,413.504,0.1832
5,2020-12-31,420.128,0.159
6,2021-12-31,320.35,0.16059
7,2022-12-31,317.90969,0.162196
8,2023-12-31,315.851145,0.163818
9,2024-12-31,325.75642,0.165456


Standard Deviation: 19.064468521309482


## Días Cuentas por Cobrar (Sales)

> Similar low variance gradients such as cost of good sols

> Follow similar gradient logic as COGS, low first, high in the end

In [21]:
# import required libraries
from sklearn.linear_model import LinearRegression 

# create data for model
data = pd.read_excel('other_vars.xlsx', sheet_name='D2_1')
data = data.drop('Date', axis=1)

Y_train = data['D. CxC'].values.reshape(-1, 1)
X_train = data['Sales'].values.reshape(-1, 1)
X_test = np.array([320.35]).reshape(-1, 1)

print('D. CxcC FY2015-FY2019: \n{}\n'.format(Y_train))
print('Sales FY2014-FY2019:\n{}\n'.format(X_train))
print('Predicted Sales FY20: {}\n'.format(X_test))

# create model
reg = LinearRegression()
reg.fit(X_train,Y_train)
y_pred = reg.predict(X_test)
print('D. CxC FY20: {}'.format(y_pred))

D. CxcC FY2015-FY2019: 
[[56.26]
 [76.21]
 [75.54]
 [67.64]
 [58.17]]

Sales FY2014-FY2019:
[[267.729]
 [230.711]
 [359.34 ]
 [413.504]
 [420.128]]

Predicted Sales FY20: [[320.35]]

D. CxC FY20: [[67.20862387]]


In [23]:
data = pd.read_excel('other_vars.xlsx', sheet_name='D2_2')
display(data)

print('Standard Deviation: {}'.format(np.std([56.26, 76.21, 75.54, 67.64, 58.17])))

Unnamed: 0,Date,Sales,D. CxC
0,2015-12-31,267.729,56.26
1,2016-12-31,230.711,76.21
2,2017-12-31,359.34,75.54
3,2018-12-31,413.504,67.64
4,2019-12-31,420.128,58.17
5,2020-12-31,320.35,
6,2021-12-31,317.90969,
7,2022-12-31,315.851145,
8,2023-12-31,325.75642,
9,2024-12-31,353.382093,


Standard Deviation: 8.380970349547837


## Días Inventarios (COGS)

> Similar low variance gradients such as cost of good sols

> Follow similar gradient logic as COGS

In [25]:
# import required libraries
from sklearn.linear_model import LinearRegression 

# create data for model
data = pd.read_excel('other_vars.xlsx', sheet_name='D3_1')
data = data.drop('Date', axis=1)

Y_train = data['D. Inventarios'].values.reshape(-1, 1)
X_train = data['COGS'].values.reshape(-1, 1)
X_test = np.array([320.35]).reshape(-1, 1)

print('D. Inventarios FY15-FY19: \n{}\n'.format(Y_train))
print('COGS FY15-FY19:\n{}\n'.format(X_train))
print('Predicted COGS FY20: {}\n'.format(X_test))

# create model
reg = LinearRegression()
reg.fit(X_train,Y_train)
y_pred = reg.predict(X_test)
print('D. Inventarios FY20: {}'.format(y_pred))

D. Inventarios FY15-FY19: 
[[108.05]
 [121.95]
 [103.35]
 [121.84]
 [146.55]]

COGS FY15-FY19:
[[216.686]
 [179.917]
 [221.194]
 [309.716]
 [341.905]]

Predicted COGS FY20: [[320.35]]

D. Inventarios FY20: [[131.96651762]]


In [27]:
data = pd.read_excel('other_vars.xlsx', sheet_name='D3_2')
display(data)

print('Standard Deviation: {}'.format(np.std([108.05, 121.95, 103.35, 121.84, 146.55])))

Unnamed: 0,Date,COGS,D. Inventarios
0,2015-12-31,216.686,108.05
1,2016-12-31,179.917,121.95
2,2017-12-31,221.194,103.35
3,2018-12-31,309.716,121.84
4,2019-12-31,341.905,146.55
5,2020-12-31,212.629,
6,2021-12-31,216.88158,
7,2022-12-31,221.219212,
8,2023-12-31,216.794827,
9,2024-12-31,212.458931,


Standard Deviation: 15.043337927468098


## Días Proveedores (COGS)

> Similar low variance gradients such as cost of good sols

> Follow similar gradient logic as COGS

In [None]:
# import required libraries
from sklearn.linear_model import LinearRegression 

# create data for model
data = pd.read_excel('other_vars.xlsx', sheet_name='CAPEX_1')
data = data.drop('Date', axis=1)

Y_train = data['CAPEX%'].values.reshape(-1, 1)
X_train = data['Sales'].values.reshape(-1, 1)
X_test = np.array([320.35]).reshape(-1, 1)

print('CAPEX FY2014-FY2019: \n{}\n'.format(Y_train))
print('Sales FY2014-FY2019:\n{}\n'.format(X_train))
print('Predicted Sales FY20: {}\n'.format(X_test))

# create model
reg = LinearRegression()
reg.fit(X_train,Y_train)
y_pred = reg.predict(X_test)
print('CAPEX FY20: {}'.format(y_pred))

In [28]:
data = pd.read_excel('other_vars.xlsx', sheet_name='D4_2')
display(data)

print('Standard Deviation: {}'.format(np.std([59.154, 96.296, 104.122, 119.385, 150.894])))

Unnamed: 0,Date,COGS,D. Proveedores
0,2015-12-31,216.686,59.154
1,2016-12-31,179.917,96.296
2,2017-12-31,221.194,104.122
3,2018-12-31,309.716,119.385
4,2019-12-31,341.905,150.894
5,2020-12-31,212.629,
6,2021-12-31,216.88158,
7,2022-12-31,221.219212,
8,2023-12-31,216.794827,
9,2024-12-31,212.458931,


Standard Deviation: 29.95619213718593
