# Multi Layer Perceptron

parameters: 
* one hidden layer with three neurons, to avoid overfitting; 
* L-FBGS as a solver, which performs well with small samples; 
* an alpha set to 0.05 for fast convergence, 
* a constant learning rate for fast training, 
* and logistic activation.

In [1]:
from sklearn.neural_network import MLPRegressor

import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import MinMaxScaler
import seaborn as sns
plt.style.use('fivethirtyeight')

In [2]:
# Utils
def average(lst): 
    return sum(lst) / len(lst) 
days = [1,2,3,4,5,6,7,14,21,28]
# IMPORTANT: Seeds to try
seeds = [1,2,3,4,5]
# Create a StandardScaler instance
scaler = StandardScaler()
normalizer = MinMaxScaler()

### Columns Description

*Corresponding to the number of instances before a certain polling release date (e.g 24 XPosts located on a 1_1_ file belong to the number of X posts for candidate 1 over a span of 1 day before a specified date)*

* XPosts: Number of overall posts in X (Twitter)
* Xcomments: Number of overall comments in X
* XRts: Number of overall Rt´s in X
* XLikes: Number of overall likes in X
* XCommsPPost: Average number of comments per post for X
* XRtsPPost: Average number of Rts per post for X
* XLikesPPost: Average number of likes per post for X

* FBPosts: Number of overall posts in Facebook
* FBReactions: Number of overall reactions in Facebook
* FBComments: Number of overall comments in Facebook
* FBShares: Number of overall comments in Facebook
* FBCommsPPost: Average number of comments per post for Facebook
* FBReactsPPost: Average number of reactions per post for Facebook
* FBLikesPPost: Average number of likes per post for Facebook

* IGPosts: Number of overall posts in Instagram
* IGLikes: Number of overall likes in Instagram
* IGLikesPPost: Average number of likes per post for Instagram

* YTPosts: Number of overall posts in YouTube
* YTViews: Number of overall views in YouTube
* YTViewsPPost: Average number of views per post for YouTube

* Target: the reported vote share for the candidate

In [3]:
#Setting columns to use (see New_DB)
columns = ['XPosts', 'Xcomments', 'XRts', 'Xlikes', 'XCommsPPost', 'XRTsPPost', 'XlikesPPost', 'FBPosts', 'FBReactions', 'FBComments', 'FBShares', 'FBReactsPPost', 'FBCommsPPost', 'FBSharesPPost', 'IGPosts', 'IGLikes', 'IGLikesPPost', 'YTPosts', 'YTViews', 'YTViewsPPost', 'Target']

target = ['Target']

feature_columns_all = ['XPosts', 'Xcomments', 'XRts', 'Xlikes', 'XCommsPPost', 'XRTsPPost', 'XlikesPPost', 'FBPosts', 'FBReactions', 'FBComments', 'FBShares', 'FBReactsPPost', 'FBCommsPPost', 'FBSharesPPost', 'IGPosts', 'IGLikes', 'IGLikesPPost', 'YTPosts', 'YTViews', 'YTViewsPPost']

feature_columns_notall = ['XPosts', 'Xcomments', 'XRts', 'Xlikes', 'XCommsPPost', 'XRTsPPost', 'XlikesPPost', 'FBPosts', 'FBReactions', 'FBComments', 'FBShares', 'FBReactsPPost', 'FBCommsPPost', 'FBSharesPPost']

testing_columns = ['XPosts', 'Xcomments', 'XRts', 'Xlikes', 'XCommsPPost', 'XRTsPPost', 'XlikesPPost']

In [4]:
# Helper function for plotting feature importance
def plot_features(columns, importances, length):
    df = (pd.DataFrame({"features": columns, "feature_importance": importances}) .sort_values("feature_importance", ascending=False) .reset_index(drop=True))
    sns.barplot(x="feature_importance", y="features", data=df[:length], orient="h")

In [5]:
print(f"Number of features including all features: {len(feature_columns_all)}")
print(f"Number of features including only Facebook and X: {len(feature_columns_notall)}")
print(f"Number of features including some: {len(testing_columns)}")

Number of features including all features: 20
Number of features including only Facebook and X: 14
Number of features including some: 7


## Predictions

Model creation

In [6]:
regr = MLPRegressor(solver="lbfgs", hidden_layer_sizes=(3,), alpha=0.05, learning_rate="constant", activation="logistic", max_iter=10000)
regr

Xóchitl Gálvez

In [12]:
model1 = regr
predictions = []

features_included = feature_columns_all

for i in days:
  # Scan the file and set data
  data = pd.read_csv(f'../galvez/2_{i}.csv', usecols=columns, encoding="utf-8")
  # Training and testing data; Remove last row which is the testing row
  training = data.iloc[:-1]
  testing = pd.DataFrame(data.iloc[-1])
  testing = testing.T

  # Features and target columns as NumPy arrays
  X_train = training[features_included].values
  X_test = testing[features_included].values
  y_train = training[target].values.ravel()  # Flatten y_train to 1D array
  y_test = testing[target].values.ravel()

  x_train_scaled = normalizer.fit_transform(X_train)
  x_test_scaled = normalizer.transform(X_test)

  regr.fit(x_train_scaled, y_train)
  prediction = regr.predict(x_test_scaled)
  predictions.append(prediction)

In [9]:
len(predictions)

10

In [10]:
average(predictions)

array([21.27534617])

Claudia Sheinbaum

In [13]:
model1 = regr
predictions = []

features_included = feature_columns_all

for i in days:
  # Scan the file and set data
  data = pd.read_csv(f'../claudia/1_{i}.csv', usecols=columns, encoding="utf-8")
  # Training and testing data; Remove last row which is the testing row
  training = data.iloc[:-1]
  testing = pd.DataFrame(data.iloc[-1])
  testing = testing.T

  # Features and target columns as NumPy arrays
  X_train = training[features_included].values
  X_test = testing[features_included].values
  y_train = training[target].values.ravel()  # Flatten y_train to 1D array
  y_test = testing[target].values.ravel()

  x_train_scaled = normalizer.fit_transform(X_train)
  x_test_scaled = normalizer.transform(X_test)

  regr.fit(x_train_scaled, y_train)
  prediction = regr.predict(x_test_scaled)
  predictions.append(prediction)

In [14]:
len(predictions), average(predictions)

(10, array([48.83820267]))

Alvarez Maynez

In [15]:
model1 = regr
predictions = []

features_included = feature_columns_all

for i in days:
  # Scan the file and set data
  data = pd.read_csv(f'../maynez/3_{i}.csv', usecols=columns, encoding="utf-8")
  # Training and testing data; Remove last row which is the testing row
  training = data.iloc[:-1]
  testing = pd.DataFrame(data.iloc[-1])
  testing = testing.T

  # Features and target columns as NumPy arrays
  X_train = training[features_included].values
  X_test = testing[features_included].values
  y_train = training[target].values.ravel()  # Flatten y_train to 1D array
  y_test = testing[target].values.ravel()

  x_train_scaled = normalizer.fit_transform(X_train)
  x_test_scaled = normalizer.transform(X_test)

  regr.fit(x_train_scaled, y_train)
  prediction = regr.predict(x_test_scaled)
  predictions.append(prediction)

In [16]:
len(predictions), average(predictions)

(10, array([7.92547477]))