# Purpose
* We load the data from the continuous survey of Spanish households (CSSH).
* We select the variables from CSSH that correspond to the spanish living conditions (SLC) data and obtain their statistics. The objective of these statistics is to introduce them as the parameters of the conjugated priors in our article.

In [3]:
import pandas as pd
import numpy as np

echhogares18 = pd.read_csv("../data/mixed/spanish_living_conditions/ECHHogares_2018.csv", sep="\t")
echhogares18.name = "ECHHogares_2018"
print(echhogares18.shape)

(100542, 27)


We select 3 variables from the SLC data that have an equivalent in CCSH.

In [4]:
home_ownership = echhogares18.REGVI # Discrete
family_members = echhogares18.TAMTOHO # Continuous
home_rooms = echhogares18.HABVI # Continuous

## Priors

In [5]:
pseudocounts = 1 # Prior strength

#### home_ownership


Since <code>home_ownership</code> is a discrete variable, the parameters of its Dirichlet prior are simply the estimated frequencies multiplied by the pseudocounts, which represent the strength of the prior. We simply follow the provisions of <a href="https://en.wikipedia.org/wiki/Dirichlet_distribution#Conjugate_to_categorical/multinomial"> Wikipedia </a>.

Note: In this case it is not necessar, but when only a few instances are present, it would be advisable to do a Laplace smoothing.

In [6]:
home_ownership_freqs = home_ownership.value_counts()/home_ownership.count()
print(home_ownership_freqs*pseudocounts)

1    0.531549
2    0.272612
3    0.145899
4    0.049939
Name: REGVI, dtype: float64


#### family_members


Since <code>family_members</code> is a continuous variable, we assume Gaussianity. Therefore, its prior conjugate in our model is a Gaussian-Gamma. For the establishment of their <a href="https://en.wikipedia.org/wiki/Normal-gamma_distribution#Interpretation_of_parameters"> parameters </a> we follow <a href = "https: //en.wikipedia. org / wiki / Normal-gamma_distribution # Interpretation_of_parameters "> Wikipedia </a>.

In [7]:
import statistics as stats

mean = stats.mean(family_members)
precision = 1/stats.stdev(family_members)
print("Mean: " + str(mean))
print("Precision: " + str(precision))

gg_param1 = mean
gg_param2 = pseudocounts
gg_param3 = pseudocounts/2.0
gg_param4 = pseudocounts / (2.0 * precision)
print("\nGaussian-Gamma parameters")
print(gg_param1)
print(gg_param2)
print(gg_param3)
print(gg_param4)

Mean: 2.5228063893696167
Precision: 0.7935804137514377

Gaussian-Gamma parameters
2.5228063893696167
1
0.5
0.6300558725188096


#### home_rooms

Same as <code>family_members</code>, <code>home_rooms</code> is a continuous variable, so we assume Gaussianity and estimate the parameters of the prior in the same way.

In [8]:
import statistics as stats

mean = stats.mean(home_rooms)
precision = 1/stats.stdev(home_rooms)
print("Mean: " + str(mean))
print("Precision: " + str(precision))

gg_param1 = mean
gg_param2 = pseudocounts
gg_param3 = pseudocounts/2.0
gg_param4 = pseudocounts / (2.0 * precision)
print("\nGaussian-Gamma parameters")
print(gg_param1)
print(gg_param2)
print(gg_param3)
print(gg_param4)

Mean: 5.442680670764457
Precision: 0.7374335840970437

Gaussian-Gamma parameters
5.442680670764457
1
0.5
0.6780271617439676
