<img src="logounav.png" width="150" img style="float: right;"> 

**Analysis of the influence of hyperparameters in a neural network for an analysis of glassdoor bids for data scientists.**<br>
Author: Lucía Colín Cosano

The following notebook develops the use of neural networks. For this purpose, a database related to job offers for data scientist that have been published in Glassdoor has been selected.

The points treated have been the following ones:
- Definition of the problem and description of the variables.
- Data loading.
- Exploratory analysis of the data.
- Transformation of the variables.
- Visualization of distributions, identification of outliers and correlation matrix.
- Modeling and analysis of results. Modification of hyperparameters and implementation of different architectures.
- Conclusions.

### DEFINITION OF THE PROBLEM AND DESCRIPTION OF THE VARIABLES.

**Glassdoor** is a Web site that provides information about companies, jobs and salaries. It allows users to search for and rate companies, read reviews written by current and former employees, and compare salaries and benefits in different industries and geographic regions. It also has a job interview section, where users can read about the experience of other candidates in the company's selection process. 

Glassdoor is used by professional job seekers to learn about working conditions and company cultures, and by companies to attract and retain talent.

The available dataset has the variables described below:
- **Index:** contains the number of the observation.
- **Job title:** refers to the position being offered.
- **Salary estimate:** salary range of the position expressed in dollars.
- **Job description:** description of the activity to be carried out.
- **Rating:** rating that the job offer has received.
- **Company name:** Company name.
- **Location:** Location of the company's offices where the activity is to be carried out.
- **Headquarters:** where the boss to whom you report is located.
- **Size:** size of the company. This is expressed in different ranges, not giving an exact amount.
- **Founded:** Year in which the company was founded.
- **Type of ownership:** what type of company it is (public, private...).
- **Industry:** industry in which you work.
- **Sector:** field in which the data scientist will work.
- **Revenue:** total revenue of the company.
- **Competitors**: list of direct competitors.

Knowing the salary range in an industry or for a specific job is **important when making decisions** about a job offer as it allows to set expectations, negotiate salaries, compare job offers and make a proper financial planning. The goal is to be able to predict the **average salary** using neural networks.

### DATA LOADING.

First, the libraries needed to solve the problem are imported and the database is read.

In [None]:
import pandas as pd
import numpy as np

from skimpy import skim
import ydata_profiling

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.feature_selection import VarianceThreshold
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

from sklearn.pipeline import Pipeline
from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.feature_selection import RFE
from sklearn.feature_selection import RFECV
from sklearn.tree import DecisionTreeClassifier
from collections import Counter
from sklearn.metrics import accuracy_score,recall_score,f1_score

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras import optimizers

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

import shap
import ipywidgets as widgets

import os
import tensorflow as tf
import random as rn

In [None]:
df=pd.read_csv('DS_jobs.csv')

### EXPLORATIVE DATA ANALYSIS

Once the dataset has been read, an exportable analysis of the samples to be analyzed will be performed. To do this, the **skim** and **pd.profiling** functions will be used first, as they allow a quick view of the database. The existence of null values and the existence of columns with unique values are checked.

In [None]:
#skim(df)

In [None]:
#report = ydata_profiling.ProfileReport(df)
#report

The following information is obtained from the report:
- **Data shape:** there are 15 variables and 672 observations. 
- There are no **duplicate observations**.
- There are no columns with a **single value**.
- The columns with **few values** are those that refer to **categories**, so it makes sense.
- It is a **complete database in terms of null values** although it requires numerous transformations.
- The **anomalous values** in the variables are identified with a -1, which, as they occur in cateogric variables, a category called "Unknown" has been defined.

Although the information offered is greater, this will be done again once the variables have been transformed as it is considered to be more useful.

Secondly, variables that will be irrelevant for the modeling are eliminated, such as **Job Title**, since the study is carried out on a single profession, **Job Description**, due to the difficulty of processing such a large number of words, **Competitors**, since these are very numerous and different between companies and **index**.

In [None]:
print("Before", df.shape)
#df = df.drop(['index'], axis=1)
df = df.drop(['Job Title'], axis=1)
df = df.drop(['Job Description'], axis=1)
df = df.drop(['Competitors'], axis=1)
print("After", df.shape)

In addition, from the information obtained from the **pandas profiling** it has been verified that the repetitions of the company name (variable **Company Name**) are not high, so that transforming the names would be very costly, since it would increase the dimensionality of the problem when this information can be reflected in other variables such as the sector in which it operates, along with the year of foundation, the number of employees and the total revenue.

For this reason, it was decided to dispense with the **Company Name** variable for this particular study.

In [None]:
df = df.drop(['Company Name'], axis=1)

### FEATURE TRANSFORMATION

The dataset needs numerous transformations in order to understand it better and for this purpose the variables that are expressed in the form of ranges will be decomposed.
- **Salary Estimate**
- **Size**
- **Revenue** 

The column is divided into two values, minimum and maximum, and intervals consistent with the magnitude of each variable are defined. In order to be able to define this magnitude with criterion, these variables are represented graphically.

In [None]:
size = df["Size"].str.split(expand=True)
size.columns = ['Size_min', 'str1','Size_max','str2']
df = pd.concat([df, size], axis=1)

df = df.drop(['Size'], axis=1)
df = df.drop(['str1'], axis=1)
df = df.drop(['str2'], axis=1)

In [None]:
sns.countplot(x='Size_min', data=df,palette='crest',order=['Unknown','-1','1','51','201','501','1001','5001','10000+'])
plt.xlabel('Número mínimo de empleados')
plt.ylabel('Frecuencia')
plt.title('Número mínimo de empleados de las empresas')
plt.show()

In [None]:
sns.countplot(x='Size_max', data=df,palette='crest',order=['50','200','500','1000','5000','10000'])
plt.xlabel('Número máximo de empleados de las empresas')
plt.ylabel('Frecuencia')
plt.title('Número máximo de empleados de las empresas')
plt.show()

In [None]:
def classify_companies(num_empleados):
    if num_empleados < 500:
        return "Small business"
    elif num_empleados <1000:
        return "Medium business"
    elif num_empleados >1000:
        return "Large business"
    else:
        return "Unknown"


In [None]:
df['Size_max'] = df['Size_max'].fillna(0)
df['Size_max'] = df['Size_max'].astype(int)
df['Size'] = df['Size_max'].apply(classify_companies)

In [None]:
sns.countplot(x='Size', data=df,palette='crest', order=["Unknown","Small business",'Medium business','Large business'])

plt.xlabel('')
plt.ylabel('Frecuencia')
plt.title('Tipo de empresa analizada')
plt.show()

In [None]:
df.loc[ df['Size'] =='Unknown', 'Size'] = 0
df.loc[df['Size'] =='Small business', 'Size'] = 1
df.loc[df['Size'] =='Medium business', 'Size']   = 2
df.loc[ df['Size'] =='Large business', 'Size'] = 3
df['Size'] = df['Size'].astype(int)

The column referring to **Type of ownership** is given with an irregular format depending on the type, since when the company is public or private it is accompanied by a hyphen and the word company. The column is divided in such a way that only the type of company is obtained. The different existing categories are analyzed and grouped in such a way as to reduce the number of categories to 5**. This grouping is done in order to be able to generalize better since otherwise the number of observations of each type is too few. Therefore, more general adjectives are used.

In [None]:
owner = df["Type of ownership"].str.split('-',expand=True)
owner.columns = ['Type_ownership1', 'Type_ownership2']
df = pd.concat([df, owner], axis=1)

In [None]:
df['Type_ownership'] = np.where(df['Type_ownership1'] != 'Company ', df['Type_ownership1'], df['Type_ownership2'])
df = df.drop(['Type_ownership1'], axis=1)
df = df.drop(['Type_ownership2'], axis=1)
df['Type_ownership'] = df['Type_ownership'].fillna(0)

In [None]:
df['Type_ownership']

In [None]:
tabla = df['Type_ownership'].value_counts()
tabla = tabla.reset_index()
tabla.columns = ['Type of ownership', 'Frecuencia']
tabla

In [None]:
df['Type_ownership'] = df['Type_ownership'].replace(['', 'Other Organization','Unknown'], 'Unknown')
df['Type_ownership'] = df['Type_ownership'].replace(['Self','Private Practice / Firm','Contract'], ' Private')
df['Type_ownership'] = df['Type_ownership'].replace(['College / University','Hospital','Government'], ' Public')
df['Type_ownership'] = df['Type_ownership'].replace(['Subsidiary or Business Segment'], 'Business')

In [None]:
tabla = df['Type_ownership'].value_counts()
tabla = tabla.reset_index()
tabla.columns = ['Type of ownership', 'Frecuencia']
tabla

In [None]:
sns.countplot(x='Type_ownership', data=df,palette='crest')
plt.xlabel('Número máximo de empleados de las empresas')
plt.ylabel('Frecuencia')
plt.title('')
plt.show()

In [None]:
df['Size_min'] = pd.to_numeric(df['Size_min'], errors='coerce')
df['Size_max'] = pd.to_numeric(df['Size_max'], errors='coerce')

df_publicvsprivate = df[df['Type_ownership'].isin([' Public', ' Private'])]
fig, axs = plt.subplots(ncols=2, figsize=(12,5))

sns.violinplot(x='Type_ownership', y='Size_min', data=df_publicvsprivate,palette='crest',ax=axs[0])
sns.violinplot(x='Type_ownership', y='Size_max', data=df_publicvsprivate,palette='crest',ax=axs[1])
plt.ylabel('Size_max')
plt.show()

In [None]:
pivot_table = df.pivot_table(index='Type_ownership', columns='Size', values='Rating', aggfunc='mean')
print(pivot_table)

In [None]:
df.loc[ df['Type_ownership'] == 'Unknown', 'Type_ownership'] = 0
df.loc[df['Type_ownership'] == 'Business', 'Type_ownership'] = 1
df.loc[df['Type_ownership'] == 'Nonprofit Organization', 'Type_ownership']   = 2
df.loc[ df['Type_ownership'] == ' Public', 'Type_ownership'] = 3
df.loc[ df['Type_ownership'] == ' Private','Type_ownership'] = 4
df['Type_ownership'] = df['Type_ownership'].astype(int)

As for the **longevity** of the company, this is expressed according to the year of foundation. We believe it is convenient to transform this variable to age ranges. For this purpose, the datetime function is used, which compares each time the current year is run with the year of foundation. Five segments are defined, which do not comprise the same number of years but try to ensure that each segment is composed of a similar number of observations.

In [None]:
from datetime import datetime

def def_seniority(founded):
    # Obtener el año actual
    year_actual = datetime.now().year

    # Calcular la antigüedad de la empresa
    company_tenure = year_actual - founded

    # Definir los rangos de antigüedad
    if founded == -1:
        return "Unknown"
    elif company_tenure < 15:
        return "Menos de 15 años"
    elif company_tenure < 25:
        return "15-25 años"
    elif company_tenure < 45:
        return "25-45 años"
    else:
        return "45 años o más"

In [None]:
df['company_tenure'] = df['Founded'].apply(def_seniority)

sns.countplot(x='company_tenure', data=df, palette='crest')
plt.xlabel('Rango de antigüedad')
plt.ylabel('Número de empresas')
plt.title('Antigüedad de las empresas')

sns.set_style('whitegrid')
sns.despine(left=True)
plt.tight_layout()
plt.show()

In [None]:
df.loc[ df['company_tenure'] == 'Unknown', 'company_tenure'] = 0
df.loc[df['company_tenure'] == 'Menos de 15 años', 'company_tenure'] = 1
df.loc[df['company_tenure'] == '15-25 años', 'company_tenure']   = 2
df.loc[ df['company_tenure'] == '25-45 años', 'company_tenure'] = 3
df.loc[ df['company_tenure'] == '45 años o más','company_tenure'] = 4
df['company_tenure'] = df['company_tenure'].astype(int)
df = df.drop(['Type of ownership'], axis=1)

Data related to the industry and sector in which the company operates are available in the database. It has been decided to use the **Sector** variable for the study since the number of categories is smaller and it is easier to group. The same methodology that has been used for the **Type_ownership** is followed.

In [None]:
tabla = df['Sector'].value_counts()
tabla = tabla.reset_index()
tabla.columns = ['Sector', 'Frecuencia']
tabla

In [None]:
df['Sector'] = df['Sector'].replace(['Biotech & Pharmaceuticals','Health Care'], 'Health')
df['Sector'] = df['Sector'].replace(['Information Technology','Aerospace & Defense','Government','Telecommunications','Oil, Gas, Energy & Utilities','Agriculture & Forestry','Construction, Repair & Maintenance'], 'Engineering')
df['Sector'] = df['Sector'].replace(['Consumer Services','Business Services','Finance',], 'Business')
df['Sector'] = df['Sector'].replace(['Transportation & Logistics','Manufacturing'], 'Logistics')
df['Sector'] = df['Sector'].replace(['Accounting & Legal','Real Estate','Government','Insurance'], 'Law')
df['Sector'] = df['Sector'].replace(['-1','Non-Profit','Retail', 'Media','Travel & Tourism','Government','Education'], 'Other')

In [None]:
tabla = df['Sector'].value_counts()
tabla = tabla.reset_index()
tabla.columns = ['Sector', 'Frecuencia']
tabla

In [None]:
sns.countplot(x='Sector', data=df,palette='crest')

# añade etiquetas a los ejes y título a la gráfica
plt.xlabel('Número de ofertas según el sector')
plt.ylabel('Frecuencia')
plt.title('Sector de la empresa')

# muestra la gráfica
plt.show()

In order to choose a sector in which to work and the quality of the offers offered by Glasdoor, it is interesting to compare the scores given by users according to the sector. It can be seen that in most cases the average rating is similar, with a deviation depending on the sector.

In [None]:
sns.boxplot(x = 'Sector', y = 'Rating', data = df,palette='crest')
plt.tight_layout()
plt.show()

In [None]:
df.loc[ df['Sector'] =='Logistics', 'Sector'] = 0
df.loc[df['Sector'] =='Law', 'Sector'] = 1
df.loc[df['Sector'] =='Health', 'Sector']   = 2
df.loc[ df['Sector'] =='Other', 'Sector'] = 3
df.loc[ df['Sector'] =='Business', 'Sector'] = 4
df.loc[ df['Sector'] =='Engineering', 'Sector'] = 5
df['Sector'] = df['Sector'].astype(int)

For the processing of locations, two columns are involved in the database, one with the location of the company and the other corresponding to the location of the boss to whom the worker answers. First of all, the **Location** column has been transformed using the geopy library. By defining the **get_coordinates** function, the coordinates of the city are obtained, which in turn have been broken down into latitude and longitude. 

In [None]:
!pip install geopy

In [None]:
import pandas as pd
from geopy.geocoders import Nominatim

In [None]:
geolocator = Nominatim(user_agent="my_app")

In [None]:
def get_coordinates(city):
    location = geolocator.geocode(city + ", USA")
    return (location.latitude, location.longitude)

In [None]:
df['Coordenadas'] = df['Location'].apply(get_coordinates)

In [None]:
df[['Latitud', 'Longitud']] = pd.DataFrame(df['Coordenadas'].tolist(), index=df.index)

In [None]:
frec_location = df['Location'].value_counts()
categorias_comunes = frec_location.head(5).index.tolist()

In [None]:
datos_filtrados = df.loc[df['Location'].isin(categorias_comunes)]
sns.countplot(x='Location', data=datos_filtrados,palette='crest')

In [None]:
import folium
mapa = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
frecuencias = df['Location'].value_counts()

for index, row in df.iterrows():
    ciudad = row['Location']
    latitud = row['Latitud']
    longitud = row['Longitud']
    ofertas = frecuencias

    marcador = folium.Marker(location=[latitud, longitud], tooltip=f'{ofertas} ofertas en {ciudad}')
    marcador.add_to(mapa)
mapa

In [None]:
! pip install folium

As for **Headquearters**, a new binary column has been created so that it takes a value of 1 when the location of the head and the company coincide and 0 otherwise.

In [None]:
def location_comparer(df, col1, col2, new_col_name):
    # crea una nueva columna con 0 por defecto
    df[new_col_name] = 0
    
    # compara los valores de las dos columnas y asigna 1 a la nueva columna si son iguales
    df.loc[df[col1] == df[col2], new_col_name] = 1
    
    return df

In [None]:
df = location_comparer(df, 'Location', 'Headquarters', 'Unique_location')

In [None]:
sns.countplot(x='Unique_location', data=df,palette='crest')

# añade etiquetas a los ejes y título a la gráfica
plt.xlabel('Coincidencia de localización')
plt.ylabel('Frecuencia')
plt.title('Jefe y empleado desarrollan su trabajo en el mismo lugar')

# muestra la gráfica
plt.show()
df = df.drop(['Headquarters'], axis=1)

The variables **Salary** and **Revenue** are expressed in ranges but in character type variable. Therefore, it is necessary to make transformations in the variables and also the ranges will be restructured.

In [None]:
df['min_salary']=0
df['max_salary']=0
df['avg_salary']=0
for i in range(len(df)):
    try:
        df.loc[i,"min_salary"]=int(df['Salary Estimate'][i].split(" ")[0].split("-")[0].replace("$","").replace("K",""))
        df.loc[i,"max_salary"]=int(df['Salary Estimate'][i].split(" ")[0].split("-")[1].replace("$","").replace("K",""))
    except:
        df.loc[i,"min_salary"]=int(df['Salary Estimate'][i].split("(E")[0].split("-")[0].replace("$","").replace("K",""))
        df.loc[i,"max_salary"]=int(df['Salary Estimate'][i].split("(E")[0].split("-")[1].replace("$","").replace("K",""))
    finally:
        df.loc[i,"Salary Estimate"]=str(df.loc[i,"min_salary"])+"-"+str(df.loc[i,"max_salary"])
        df.loc[i,"avg_salary"]=np.mean([df.loc[i,"min_salary"],df.loc[i,"max_salary"]])

In [None]:
df = df.drop(['Salary Estimate'], axis=1)
df = df.drop(['Revenue'], axis=1)

In [None]:
sns.histplot(x='avg_salary', data=df,palette='crest')

# añade etiquetas a los ejes y título a la gráfica
plt.xlabel('Coincidencia de localización')
plt.ylabel('Frecuencia')
plt.title('Jefe y empleado desarrollan su trabajo en el mismo lugar')

# muestra la gráfica
plt.show()

The variables that are not necessary for the modeling are eliminated and a copy of the dataset is made up to this point, which will be needed later.

In [None]:
df = df.drop(['Location'], axis=1)
df = df.drop(['Size_min'], axis=1)
df = df.drop(['Size_max'], axis=1)
df = df.drop(['Industry'], axis=1)
df = df.drop(['Coordenadas'], axis=1)
df = df.drop(['Founded'], axis=1)

### VISUALIZATION OF DISTRIBUTIONS, IDENTIFICATION OF OUTLIERS AND CORRELATION

Once the transformation of the variables has been completed, a visualization of the quantitative variables is made to better understand the current state of the database.

In [None]:
sns.distplot(df['min_salary'])
sns.kdeplot(df['min_salary'])
plt.show()

sns.distplot(df['max_salary'])
sns.kdeplot(df['max_salary'])
plt.show()

The existence of outliers is analyzed once the variables have been transformed. These will be eliminated even though, being so few, they will have no influence on the model.

In [None]:
colsNumeros = ['min_salary','max_salary','avg_salary','Rating','Size']
fig,ax=plt.subplots(2,3,figsize=(12,10))
index=0
ax=ax.flatten()
for col in colsNumeros:
    sns.boxplot(y=col, data=df, color='b', ax=ax[index],palette='crest')
    index+=1
plt.tight_layout(pad=0.5, w_pad=1, h_pad=5.0)

In [None]:
def get_percentile(df, percentile_rank, column):
    
    # First, sort by ascending gdp, reset the indices
    df = df.sort_values(by=column).reset_index()
    
    index = (len(df.index)-1) * percentile_rank / 100.0
    index = int(index)
    
    return (df.at[index, column])
def interquartile_range(df,column):
    
    p75 = get_percentile(df, 75,column)  # 75th percentile country and gdp
    p25 = get_percentile(df, 25,column)  # 25th percentile country and gdp
    iqr = p75 - p25  # Interquartile Range
    return iqr
def get_outliers(df,column,k=1.5):
    
    # Compute the 25th percentile, the 75th percentile and the IQR
    p25 = get_percentile(df, 25,column)
    p75 = get_percentile(df, 75,column)
    iqr = interquartile_range(df,column)
    
    # "Minimum non-outlier value": 25th percentile - 1.5 * IQR
    min_val = p25 - k*iqr
    # "Maximum non-outlier value": 75th percentile + 1.5 * IQR
    max_val = p75 + k*iqr
    #print(min_val,max_val)
    
    outliers = df[(df[column] < min_val) | (df[column] > max_val)].index #añadi .index sobre el codigo de la clase anterior
    return outliers
  
def detect_outliers(columns,df):
    outlier_indices = []

    for column in columns:        
        outlier_indices.extend(get_outliers(df,column))
        return outlier_indices

In [None]:
print(len(df))
df = df.drop(detect_outliers(colsNumeros,df),axis = 0).reset_index(drop = True)
print(len(df))
df.head()

Once the outliers have been eliminated, the correlation between the variables is analyzed.

In [None]:
fig = plt.figure(figsize=(12,8))
sns.heatmap(df.corr(),annot=True)

Once the correlation matrix is analyzed, the minimum and maximum variables are considered to be eliminated since the information they collect is included in the avg_salary variable. 

In [None]:
df = df.drop(['min_salary'], axis=1)
df = df.drop(['max_salary'], axis=1)

In [None]:
df.to_csv('datos_glassdoor.csv', index=False)

In [None]:
df1 = df.copy() #se utilizara posteriormente para modelado

### MODELING AND ANALYSIS OF RESULTS

In [None]:
df = df.sample(frac=1, random_state=20)
df = df.drop(['df_index'], axis=1)

In [None]:
df.head()

In [None]:
target = 'avg_salary'
df_original = df.copy(deep=True)
df

In [None]:
import matplotlib.pyplot as plt

def plot_history_train(history, string):
    plt.plot(history.history[string])
    plt.plot(history.history['val_'+string])
    plt.xlabel("Epochs")
    plt.ylabel(string)
    plt.legend([string, 'val_'+string])
    plt.show()

#### MODEL A - REGRESSION MODEL WITH NEURAL NETWORKS

For the first proposed model, a basic structure is chosen, with the same number of neurons in each layer except for the last one, which must correspond to the result to be obtained.

Firstly, the variables are scaled.

In [None]:
scaler_dict = {}

for col_name in df.columns:
    if (df[col_name].dtype == 'int32') or (df[col_name].dtype == 'float64'):
        print(col_name + ' ' + str(df[col_name].dtype))
        scaler = MinMaxScaler(feature_range=(0, 1))
        scaler = scaler.fit(df[col_name].values.reshape(-1, 1))
        df[col_name] = scaler.transform(df[col_name].values.reshape(-1, 1))
        scaler_dict[col_name] = scaler
scaler_dict

Because the number of observations is small compared to those frequently used for neural network modeling, it was decided that the size of the test set should be 20%, while the validation set should be 10%. The variation of these samples has a great influence on the results obtained.

In [None]:
test_size = 0.2
val_size = 0.1
epochs = 20
batch_size = 128

In [None]:
train_df, test_df = train_test_split(df
                                     , test_size = test_size, random_state=120)
train_df, val_df = train_test_split(train_df
                                    , test_size = val_size, random_state=120)

# Form np arrays of labels and features.
train_features = np.array(train_df[train_df.columns.difference([target])])
val_features = np.array(val_df[val_df.columns.difference([target])])
test_features = np.array(test_df[test_df.columns.difference([target])])

train_labels = np.array(train_df[[target]])
val_labels = np.array(val_df[[target]])
test_labels = np.array(test_df[[target]])

input_len = train_features.shape[1]

Before adjusting the parameters it is important to check that a simple neural network works. 

In [None]:
def make_model():
    # create model
    model = Sequential()
    model.add(Dense(units = input_len
                    , input_dim = input_len
                    , kernel_initializer='normal'
                    , activation='relu'))

    model.add(Dense(1, activation='relu', kernel_initializer='normal'))
    
    # Compile model    
    model.compile(optimizer = tf.keras.optimizers.RMSprop(0.01)
                  , loss='mse' 
                  , metrics=['mae', 'mse'])
    
    return model

model = make_model()
model.summary()

In [None]:
history = model.fit(train_features,
                    train_labels,
                    batch_size = batch_size,
                    epochs = epochs,
                    validation_data = (val_features, val_labels), 
                    verbose = 0)

In [None]:
plot_history_train(history, "mse")
plot_history_train(history, "loss")
plot_history_train(history, "mae")

It is verified that the neural network works and therefore a network as dynamic as possible is defined in order to subsequently perform a grid search.


The parameters for regression with neural networks are defined:

- activation function of the last layer: linear, so it is decided to implement relu, 
- loss function should be mean squared error
- chosen metrics are mse and mae. 
- number of neurons at the output must be 1 since the result of the model is a single variable.

In [None]:
def make_model(dense_layers=1, dense_dropout=0.0, RMS=0.01, verbose=False, seed=42):
    np.random.seed(seed)
    tf.random.set_seed(seed)

    # create model
    model = Sequential()
    for i in range(0, dense_layers):
        model.add(Dense(units=input_len, input_dim=input_len, kernel_initializer='normal', activation='relu'))
        model.add(Dropout(dense_dropout))

    model.add(Dense(1, activation='relu', kernel_initializer='normal'))

    # Compile model    
    model.compile(optimizer=tf.keras.optimizers.RMSprop(RMS), loss='mse', metrics=['mae', 'mse'])
    
    if verbose:
        print('dense_layers:', dense_layers)
        print('RMS:', RMS)
        print(model.summary())
        
    return model


A random seed has been added to the model definition function to ensure reproducibility of the results. Setting the seeds ensures that the same sequences are generated in each model run. This is essential to be able to reproduce the same results in different runs and to make accurate comparisons between different configurations.

In [None]:
def make_model(dense_layers=1, dense_dropout=0.0, RMS=0.01, verbose=False, seed=22):
    os.environ['PYTHONHASHSEED'] = '0'
    np.random.seed(seed)
    rn.seed(seed)
    tf.random.set_seed(seed)

    # create model
    model = Sequential()
    model.add(Dense(units=10, input_dim=input_len, kernel_initializer='normal', activation='relu'))

    # Add dense layers
    for _ in range(dense_layers):
        model.add(Dense(units=10, kernel_initializer='normal', activation='relu'))
        model.add(Dropout(dense_dropout))

    model.add(Dense(1, activation='relu', kernel_initializer='normal'))

    # Compile model    
    model.compile(optimizer=tf.keras.optimizers.RMSprop(RMS), loss='mse', metrics=['mae', 'mse'])

    if verbose:
        print('dense_layers:', dense_layers)
        print('RMS:', RMS)
        print(model.summary())

    return model


When looking for the hyperparameters of the neural network, it is important to start with those that will have a greater weight in the result obtained from the network.

First of all, the number of **number of dense layers** will be determined.
Dense layers are the simplest layers of neural networks. In these layers, neurons are connected to each neuron of the previous layer, so increasing the number of layers, increases the information transmission units, and the number of times the weights are updated.

To try to determine the optimal number, different models are defined with fixed parameters in which different numbers of layers are used.

For this purpose, a number of layers between 2 and 6 is used, in order not to increase the size of the network excessively. It is important to analyze both the metric value and the resulting graphs. 

In [None]:
dense_layers = [2,3,4,5,6]
batch_size = [32]
epochs = [10]
dense_dropout = [0.0]
RMS = [0.001]

# make a list of dictionaries containing every possible 
# combination in the grid as a (smaller) dictionary 
import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

When the number of dense layers is 2, the model does not learn. The loss function is constant, so it is concluded that the weights are not being updated, they are always the same.

Since the result obtained with 3 and 4 is practically the same, it is decided to use 3 dense layers, to avoid an unnecessary increase of the network.

Although the optimal way to proceed is to continue with the study of those parameters that have a greater influence on the model, it can be seen from the previous graphs that the number of epochs is insufficient since the curves begin to descend but they lack the number of epochs sufficient to stabilize.

In [None]:
dense_layers = [3]
batch_size = [120]
epochs = [20,40,60,80]
dense_dropout = [0.0]
RMS = [0.001]

import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

When analyzing the graphs, it is observed that 20 epochs continue to be insufficient and it is when it is increased to 40 that the loss function has the necessary epochs to stabilize. For 60 and 80 the appearance and metrics are good, but it can be considered unnecessary, it is more favorable to stop earlier.

Next, the batch size is modified, in which it is important to consider that if the number is small, the network has in memory a small amount of data, and it trains faster but it is possible that it does not learn the characteristics and details that can be significant in the prediction. On the other hand, if it is very large, the training will be slower.

In [None]:
dense_layers = [3]
batch_size = [16,32,64,128]
epochs = [40]
dense_dropout = [0.0]
RMS = [0.001]

# make a list of dictionaries containing every possible 
# combination in the grid as a (smaller) dictionary 
import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

By modifying the batch_size values, it is contemplated that depending on the batch size, the loss function takes more or less time to stabilize. This makes sense with the above mentioned feature identification capability for prediction. 

Therefore, the optimal number is defined to be 64.

The final model is shown below.

In [None]:
dense_layers = [3]
batch_size = [64]
epochs = [40]
dense_dropout = [0.0]
RMS = [0.001]

# make a list of dictionaries containing every possible 
# combination in the grid as a (smaller) dictionary 
import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

Finally, the results obtained are checked.

In [None]:
shap_test_df = test_df.copy(deep = True)

In [None]:
preds = model.predict(test_features)
original_target = 'original_' + target
pred_target = 'pred_' + target

test_df[original_target] = scaler_dict[target].inverse_transform(test_df[target].values.reshape(-1, 1))
test_df[pred_target] = scaler_dict[target].inverse_transform(preds)


pred = 'pred_' + str(target)
original = 'original_' + str(target)
test_df[[original, pred]]

In [None]:
test_df['tot_mean_target'] = test_df[original_target].mean()
mean_baseline_error = abs((test_df[original_target]-test_df['tot_mean_target'])).mean()
std_baseline_error = abs((test_df[original_target]-test_df['tot_mean_target'])).std()

print('mean baseline error: '+ str(mean_baseline_error))
print('std baseline error: '+ str(std_baseline_error))
print('_______________________________________________')
mean_model_error = abs((test_df[original_target]-test_df[pred_target])).mean()
std_model_error = abs((test_df[original_target]-test_df[pred_target])).std()

print('mean model error: '+ str(mean_model_error))
print('std model error: '+ str(std_model_error))

**MODEL B - OTHER ARCHITECTURE**

A melon-like arrangement of the number of neurons is used for the development of this model. This refers to an architecture in which the hidden layers have a melon-like shape, i.e., they become wider and then gradually reduce in size. 

It has been decided to implement this network because of the advantages it presents:

- Ability to capture complex features: it allows the neural network to have a large number of neurons in the intermediate layers, which gives it a greater ability to capture complex features and patterns in the data.

- Hierarchical feature extraction: as the hidden layers become wider and then narrower, the neural network can learn features at different levels of abstraction. Initial layers can capture simpler and more local features, while later layers can learn more abstract and global features. 

- Increased generalization capability: can help avoid overfitting by gradually reducing the size of hidden layers. This limits the network's ability to memorize training data and encourages greater generalization to new data. 

In [None]:
def make_model(dense_layers=1, dense_dropout=0.0, RMS=0.01, verbose=False, seed=20):
    os.environ['PYTHONHASHSEED'] = '0'
    np.random.seed(seed)
    rn.seed(seed)
    tf.random.set_seed(seed)
    session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
    tf.compat.v1.set_random_seed(seed)
    sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
    tf.compat.v1.keras.backend.set_session(sess)

    # create model
    model = Sequential()
    model.add(Dense(units=10, input_dim=input_len, kernel_initializer='normal', activation='relu'))

    # Add melon-shaped layers
    for i in range(1, dense_layers + 1):
        if i <= (dense_layers + 1) // 2:
            units = i + 1
        else:
            units = dense_layers + 1 - i
        model.add(Dense(units=units, kernel_initializer='normal', activation='relu'))
        model.add(Dropout(dense_dropout))

    model.add(Dense(1, activation='relu', kernel_initializer='normal'))

    # Compile model    
    model.compile(optimizer=tf.keras.optimizers.RMSprop(RMS), loss='mse', metrics=['mae', 'mse'])

    if verbose:
        print('dense_layers:', dense_layers)
        print('RMS:', RMS)
        print(model.summary())

    return model


In this case, for the determination of the number of dense layers, higher numbers are used to make it possible to appreciate the aforementioned structure.

In [None]:
dense_layers = [6,7,8,9,10,11]
batch_size = [64]
epochs = [40]
dense_dropout = [0.0]
RMS = [0.001]

# make a list of dictionaries containing every possible 
# combination in the grid as a (smaller) dictionary 
import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

In this case, all network architectures learn and the validation curve is above the training curve. In general, the number of epochs is not sufficient for training the network. It is decided to use a total of 10 dense layers.

In [None]:
dense_layers = [10]
batch_size = [120]
epochs = [20,40,60,80]
dense_dropout = [0.0]
RMS = [0.001]

import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

When increasing the number of epochs, a behavior similar to that of the previous structure is observed, but in this case using 40 epochs could be risky for the model. In this case, 50 is defined as the optimum number of epochs.

Finally, this architecture varies the value of the optimizer to analyze the effect it has on the results of the network.

In [None]:
dense_layers = [10]
batch_size = [64]
epochs = [50]
dense_dropout = [0.0]
RMS = [0.001,0.01,0.1]

import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

It can be seen that when this value changes, the training does not really work for that test sample, so it would be necessary to perform a different division or obtain more samples to obtain consistent results.

In [None]:
results.sort_values(['val_mae'],ascending = True)

Check this out which takes dropout into account here.

In [None]:
dense_layers = [10]
batch_size = [64]
epochs = [50]
dense_dropout = [0.0,0.1,0.5]
RMS = [0.001]

# make a list of dictionaries containing every possible 
# combination in the grid as a (smaller) dictionary 
import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS,dense_dropout=dense_dropout
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

In [None]:
results.sort_values(['val_mae'],ascending = True)

**MODELO C - FEATURE ENGINEERING**

Since the feature engineering work had been done previously due to the format of the available variables, in order to improve the accuracy of the network, the categorical variables are encoded using mean encoding.

Using this model as a basis, the defined variables are transformed into ranges with mean encoding. Instead of assigning a label or an arbitrary number to each category, Mean Encoding uses the information of the target variable to assign a numerical value to each category.

In [None]:
print(df1.groupby(['company_tenure'])['avg_salary'].mean())
Mean_encoded_subject = df1.groupby(['company_tenure'])['avg_salary'].mean().to_dict()
df1['company_tenure'] =  df1['company_tenure'].map(Mean_encoded_subject)

In [None]:
print(df1.groupby(['Size'])['avg_salary'].mean())
Mean_encoded_subject = df1.groupby(['Size'])['avg_salary'].mean().to_dict()
df1['Size'] =  df1['Size'].map(Mean_encoded_subject)

In [None]:
print(df1.groupby(['Sector'])['avg_salary'].mean())
Mean_encoded_subject = df1.groupby(['Sector'])['avg_salary'].mean().to_dict()
df1['Sector'] =  df1['Sector'].map(Mean_encoded_subject)

In [None]:
print(df1.groupby(['Type_ownership'])['avg_salary'].mean())
Mean_encoded_subject = df1.groupby(['Type_ownership'])['avg_salary'].mean().to_dict()
df1['Type_ownership'] =  df1['Type_ownership'].map(Mean_encoded_subject)

In [None]:
scaler_dict = {}

for col_name in df1.columns:
    if (df1[col_name].dtype == 'int32') or (df1[col_name].dtype == 'float64'):
        print(col_name + ' ' + str(df1[col_name].dtype))
        scaler = MinMaxScaler(feature_range=(0, 1))
        scaler = scaler.fit(df1[col_name].values.reshape(-1, 1))
        df1[col_name] = scaler.transform(df1[col_name].values.reshape(-1, 1))
        scaler_dict[col_name] = scaler
scaler_dict

In [None]:
df1 = df1.sample(frac=1)
train_df, test_df = train_test_split(df1
                                     , test_size = test_size, random_state=120)
train_df, val_df = train_test_split(train_df
                                    , test_size = val_size, random_state=120)

# Form np arrays of labels and features.
train_features = np.array(train_df[train_df.columns.difference([target])])
val_features = np.array(val_df[val_df.columns.difference([target])])
test_features = np.array(test_df[test_df.columns.difference([target])])

train_labels = np.array(train_df[[target]])
val_labels = np.array(val_df[[target]])
test_labels = np.array(test_df[[target]])

input_len = train_features.shape[1]

In [None]:
def make_model(dense_layers=1, dense_dropout=0.0, RMS=0.01, verbose=False, seed=20):
    os.environ['PYTHONHASHSEED'] = '0'
    np.random.seed(seed)
    rn.seed(seed)
    tf.random.set_seed(seed)
    session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
    tf.compat.v1.set_random_seed(seed)
    sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
    tf.compat.v1.keras.backend.set_session(sess)

    # create model
    model = Sequential()
    model.add(Dense(units=10, input_dim=input_len, kernel_initializer='normal', activation='relu'))

    # Add melon-shaped layers
    for i in range(1, dense_layers + 1):
        if i <= (dense_layers + 1) // 2:
            units = i + 1
        else:
            units = dense_layers + 1 - i
        model.add(Dense(units=units, kernel_initializer='normal', activation='relu'))
        model.add(Dropout(dense_dropout))

    model.add(Dense(1, activation='relu', kernel_initializer='normal'))

    # Compile model    
    model.compile(optimizer=tf.keras.optimizers.RMSprop(RMS), loss='mse', metrics=['mae', 'mse'])

    if verbose:
        print('dense_layers:', dense_layers)
        print('RMS:', RMS)
        print(model.summary())

    return model


In [None]:
dense_layers = [10]
batch_size = [64]
epochs = [50]
dense_dropout = [0.0,0.1,0.5]
RMS = [0.001]

# make a list of dictionaries containing every possible 
# combination in the grid as a (smaller) dictionary 
import itertools

param_grid = dict(dense_layers = dense_layers
                    , RMS = RMS
                    , batch_size = batch_size
                    , epochs = epochs
                    , dense_dropout=dense_dropout
                 )

keys = param_grid.keys()
values = (param_grid[key] for key in keys)
param_grid = [dict(zip(keys, param_grid)) for param_grid in itertools.product(*values)]

print('Proposed ' + str(len(param_grid)) + ' models')

for j in range(0, len(param_grid)):
    
    dense_layers = param_grid[j].get("dense_layers")
    dense_dropout = param_grid[j].get("dense_dropout")
    RMS = param_grid[j].get("RMS")
    batch_size = param_grid[j].get("batch_size")
    epochs = param_grid[j].get("epochs")
    
    model = make_model(dense_layers = dense_layers
                   , RMS = RMS,dense_dropout=dense_dropout
                   , verbose = 1)
    
    history = model.fit(train_features,
                        train_labels,
                        batch_size = batch_size,
                        epochs = epochs,
                        validation_data = (val_features, val_labels), 
                        verbose = 0)
    
    plot_history_train(history, 'mae')
    mae = pd.DataFrame.from_dict(history.history)['val_mae'].iloc[-1]
    param_grid[j].update( {"val_mae":mae})
    
results = pd.DataFrame(param_grid)

### CONCLUSIONS

A neural network can be a useful tool for employers and job applicants by providing a deeper understanding of salary trends in a specific field or industry. Employers can use this technique to determine the competitiveness of their salary compared to other employers in the same field and applicants can use it to get an estimate of the salary to be received.