**<center><font size=5>LGBT Survey Analysis</font></center>**

<center><img src="https://i.ibb.co/qFF3K6f/equal-2495950-1920.jpg" alt="equal-2495950-1920" border="0" width="700"></center>

*<center>Image by <a href="https://pixabay.com/users/Wokandapix-614097/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=2495950">Wokandapix</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=2495950">Pixabay</a></center>*

***
**author**: Ruslan Klymentiev

**date**: 20th July, 2019

**Table of Contents**
- <a href='#intro'>1. Project overview and objectives</a> 
    - <a href='#survey'>1.1. The aim of the survey</a>
    - <a href='#data'>1.2. Data set overview</a>
- <a href='#bi'>2. Choropleth map visualization of responses</a>
- <a href='#score'>3. Country 'suitable' scores</a>
    - <a href='#method'>3.1. Scoring methodology</a>
    - <a href='#dl'>3.2. Daily Life</a>
    - <a href='#ra'>3.3. Right Awareness</a>
    - <a href='#disc'>3.4. Discrimination</a>
    - <a href='#vah'>3.5. Violence and Harassment</a>
    - <a href='#overall'>3.6. Overall rank</a>
- <a href='#lbgt'>4. What the LGBT community says</a>
    - <a href='#satisfied'>4.1. Do people fell satisfied in EU countries?</a>
    - <a href='#open'>4.2. Are people being open about their orientation?</a>
    - <a href='#comf'>4.3. What would allow to live more comfortable?</a>
- <a href='#end'>5. Conclusions</a>

**Note: I've hidden all the code block since they have taken so much place and have been somewhat distracting. However, for some reason I don't always see the `Code` buttons so I cannot expand the code while reading through the notebook. If you experience the same, but you would like to see the code, I pushed the [notebook](https://nbviewer.jupyter.org/github/ruslan-kl/lgbt/blob/master/lgbt-survey-analysis.ipynb) to [GitHub repo](https://github.com/ruslan-kl/lgbt).**

In [1]:
import numpy as np 
import pandas as pd 
import os
import flag
import pycountry
import json

from plotly.utils import PlotlyJSONEncoder
import plotly.graph_objs as go
from plotly import tools
import colorlover as cl

from IPython.display import clear_output

pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_colwidth', 1000)

In [2]:
def plot_table(df):
    values = []
    for col_name in df.columns:
        values.append(df[col_name])
        
    trace0 = go.Table(
        header = dict(
            values = ['<b>'+x.upper()+'</b>' for x in df.columns],
            line = dict(color = 'black'),
            fill = dict(color = 'yellow'),
            align = ['center'],
            font = dict(color = 'black', size = 9)
        ),
        cells = dict(
            values = values,
            align = 'center',
            font = dict(color = 'black', size = 11)
        ))

    fig = go.Figure([trace0])
    return fig
    
    
def calculate_and_plot_total_score(df, title):
    temp_df = df[df['CountryName'] != 'Average']
    temp_df = temp_df.merge(subset_size_df, how='left', on=['CountryName', 'CountryFlag', 'subset'])
    temp_df['perc_ratio'] = temp_df['percentage']*temp_df['subset_weight'] / 100
    
    # calculate the score
    temp_df[title+' Score'] = temp_df['weight'] * temp_df['perc_ratio']
    
    # get the average
    score_df = temp_df.groupby(['CountryName', 'CountryFlag'], as_index=False)[[title+' Score']].mean()
    score_df = score_df[['CountryFlag', 'CountryName', title+' Score']]
    score_df = score_df.sort_values([title+' Score'], ascending=False).reset_index(drop=True)
    score_df[title+' Rank'] = score_df[title+' Score'].rank(method='dense', ascending=False).astype('int')
    score_df[title+' Score'] = score_df[title+' Score'].apply(lambda x: round(x,4))
        
    # plot the score_df
    fig = plot_table(score_df[[title+' Rank', 'CountryFlag', 'CountryName', title+' Score']])
    return fig, score_df


def show_InEx_questions(df, included=True):
    temp_df = df.groupby(['question_label', 'question_code'], as_index=False)[['weight']].max()
    if included:
        display(pd.DataFrame(
            temp_df[['question_label', 'question_code']][~temp_df['weight'].isnull()].drop_duplicates(), 
            columns=['question_label', 'question_code']).rename(columns={'question_label': 'Included Questions'})
               )
    else:
        display(pd.DataFrame(
            temp_df[['question_label', 'question_code']][temp_df['weight'].isnull()].drop_duplicates(), 
            columns=['question_label', 'question_code']).rename(columns={'question_label': 'Excluded Questions'})
               )

# <a id='intro'>1. Project overview and objectives</a>

The main purpose of this project is the visualization of survey results conducted in EU countries (and Croatia) among 93000 LGBT people (2012). I tried to estimate the overall score of "suitability" (in other words, how good is this county for LGBT community?) by assigning weights to answers and getting average scores for each of the question block. Then I look at some particular questions to explore how satisfied LGBT communiy is and what they think would improve their lives in the countries they live in.

## <a id='survey'>1.1. The aim of the survey</a>

> *The aim of the EU LGBT survey was to obtain robust and comparable data that would allow a better understanding of how lesbian, gay, bisexual and transgender (LGBT) people living in the European Union (EU) and Croatia experience the enjoyment of fundamental rights. The survey collected data from 93,079 people across the EU and Croatia through an anonymous online questionnaire, collecting the views, perceptions, opinions and experiences of persons aged 18 years or over, who self-identify as lesbian, gay, bisexual or transgender. The topics related to various fundamental rights issues with an emphasis on experienced discrimination, violence and harassment. The survey and all related activities covered the 27 current EU Member States as well as Croatia. FRA designed the questionnaire and finalised it in consultation with its Scientific Committee, relevant stakeholders and civil society organisations, as well as independent academics and national experts with expertise in the area of discrimination on grounds of sexual orientation and
gender identity.*
>
> *The survey asked a range of questions about LGBT people’s experiences including:*
> * *public perceptions and responses to homophobia and/or transphobia;*
> * *discrimination;*
> * *rights awareness;*
> * *safe environment;*
> * *violence and harassment;*
> * *the social context of being an LGBT person;*
> * *personal characteristics, including age and income group.*

*Taken from [EU LGBT survey technical report. Methodology, online survey, questionnaire and sample](https://fra.europa.eu/sites/default/files/eu-lgbt-survey-technical-report_en.pdf)*

## <a id='data'>1.2. Data set overview</a>

Data set consist of 5 .csv files that represent 5 blocks of questions.

The schema of all the tables is identical:

| Variable | Note/Example |
|:-:|:-:|
| `CountryCode` | name of the country |
| `subset` | `Lesbian`, `Gay`, `Bisexual women`, `Bisexual men` or `Transgender` |
| `question_code` | unique code ID for the question |
| `question_label` | full question text |
| `answer` | answer given |
| `percentage` | % |
| `notes` | `[0]`: small sample size; `[1]`: NA due to small sample size; `[2]`: missing value |


* Total amount of countries that participated in the survey is 28
* All answers are different (i.e, can be binary (`Yes-No`), numerical (`1-10`) or scale (`Always-Often-Never`))

In [3]:
# get the list of countries ID (like Germany - DE)
countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_2

# data inport
daily_life_df = pd.read_csv('data/LGBT_Survey_DailyLife.csv')
rights_awareness_df = pd.read_csv('data//LGBT_Survey_RightsAwareness.csv')
violence_harassment_df = pd.read_csv('data//LGBT_Survey_ViolenceAndHarassment.csv')
discrimination_df = pd.read_csv('data//LGBT_Survey_Discrimination.csv')
subset_size_df = pd.read_csv('data//LGBT_Survey_SubsetSize.csv')

In [4]:
# data cleaning
def clean_data(df):
    df.rename(columns={'CountryCode': 'CountryName'}, inplace=True)
    codes = [countries.get(country, 'Unknown code') for country in df['CountryName']]
    df['CountryID'] = codes
    df.loc[df['CountryName'] == 'Czech Republic', 'CountryID'] = 'CZ'
    df['CountryFlag'] = df['CountryID'].apply(lambda x: x+flag.flagize(':'+x+':'))
    df.loc[df['notes'] == ' [1] ', 'notes'] = '[1]'
    df.loc[df['notes'] == '[1]', 'percentage'] = np.NaN
    df['percentage'] = df['percentage'].astype('float')
    return df


daily_life_df = clean_data(daily_life_df)
rights_awareness_df = clean_data(rights_awareness_df)
violence_harassment_df = clean_data(violence_harassment_df)
discrimination_df = clean_data(discrimination_df)

In [5]:
overview_df = pd.DataFrame({
    'Data set': [], 
    'Total number of questions': [], 
    'Number of records with small sample size': [],
    '% of total records(0)': [],
    'Number of missing values due to the small sample size': [],
    '% of total records(1)': [],
    'Number of missing values': [],
    '% of total records(2)': []
})


def data_overview(df, df_name=''):
    global overview_df
    temp = [[
        df_name, 
        df['question_label'].nunique(), 
        np.sum(df['notes'] == '[0]'),
        round(np.sum(df['notes'] == '[0]') * 100/ len(df), 1),
        np.sum(df['notes'] == '[1]'),
        round(np.sum(df['notes'] == '[1]') * 100/ len(df), 1),
        np.sum(df['notes'] == '[2]'),
        round(np.sum(df['notes'] == '[2]') * 100/ len(df), 1),
    ]]
    temp_df = pd.DataFrame(temp, columns=overview_df.columns)
    overview_df = overview_df.append(temp_df)
    
    
data_overview(daily_life_df, df_name='Daily Life')
data_overview(rights_awareness_df, df_name='Rights Awareness')
data_overview(violence_harassment_df, df_name='Violence and Harassment')
data_overview(discrimination_df, df_name='Discrimination')
display(overview_df.set_index(['Data set']))

Unnamed: 0_level_0,Total number of questions,Number of records with small sample size,% of total records(0),Number of missing values due to the small sample size,% of total records(1),Number of missing values,% of total records(2)
Data set,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Daily Life,50.0,13447.0,39.5,1849.0,5.4,0.0,0.0
Rights Awareness,10.0,785.0,20.8,130.0,3.4,0.0,0.0
Violence and Harassment,47.0,25072.0,55.3,6897.0,15.2,0.0,0.0
Discrimination,32.0,5782.0,36.7,1198.0,7.6,0.0,0.0


# <a id='bi'>2. Choropleth map visualization of responses</a>

This visualization allows to explore single question response by country. The dashboard was done using **Microsoft Power BI**. Original map visualization can be find [here](https://fra.europa.eu/en/publications-and-resources/data-and-maps/survey-fundamental-rights-lesbian-gay-bisexual-and).

<iframe width="700" height="800" src="https://app.powerbi.com/view?r=eyJrIjoiMzI4MzMzN2QtYTA5NC00MTZkLTllYTAtMWMzOWQxNjlmZjI5IiwidCI6ImMzNWFiZTIwLTI1N2QtNDcxZi04ZDI3LWU3MTI5ZjA5MjJmNSIsImMiOjl9" frameborder="0" allowFullScreen="true"></iframe>

# <a id='score'>3. Country 'suitable' scores</a>

In this section I am going to score each country by the survey answers to find out which county is "most suitable" for LGBT community. Each country will get a score in 4 blocks **Daily Life**, **Discrimination**, **Violence and Harassment** and **Rights Awareness** (I didn't include **Transgender Specific Questions** here since the segment of people is transgenders only) and a **final score**. 

## <a id='method'>3.1. Scoring methodology</a>

First of all the ratio of Lesbians/Gays/Bisexuals/Transgenders are not equal amoung countries. In order to 'normalize' I am going to set the weight of each subset:

\begin{align*}
\textrm{Weight}_{\text{subset}} = \frac{\text{# Subset for a Country}}{\text{# Total for a Country}}
\end{align*}

In [6]:
subset_size_df.rename(columns={'Lesbian women': 'Lesbian', 'Gay men':'Gay'}, inplace=True)

for column in subset_size_df.loc[:,"Lesbian":].columns:
    subset_size_df[column + ' weight'] = subset_size_df[column] / subset_size_df['N']

    
subset_size_df = subset_size_df.round(2)
subset_size_df = subset_size_df.merge(daily_life_df[['CountryID', 'CountryFlag', 'CountryName']], how='left')
subset_size_df = subset_size_df.drop_duplicates().reset_index(drop=True)

In [7]:
clmnstkp = ['Lesbian', 'Gay', 'Bisexual women', 'Bisexual men', 'Transgender']

    
subset_size_df.loc[subset_size_df['CountryID'] == 'EU Total', 'CountryName'] = 'EU Total'
subset_size_df.loc[subset_size_df['CountryID'] == 'EU Total', 'CountryFlag'] = 'EU Total'

subset_size_df = subset_size_df[['CountryName', 'CountryFlag', 'N'] + clmnstkp + [x + ' weight' for x in clmnstkp]]

Final `Subset Weight` values look like this:

In [8]:
fig = plot_table(subset_size_df)

fig.write_html("plotly-output/1.html")
# fig.show()

In [9]:
subset_size_df = subset_size_df[['CountryName', 'CountryFlag'] + [x + ' weight' for x in clmnstkp]]

subset_size_df = pd.melt(
    subset_size_df, 
    id_vars=['CountryName', 'CountryFlag'], 
    value_vars=list(subset_size_df.columns[2:]),
    var_name='subset', 
    value_name='subset_weight'
).sort_values(['CountryName'])

subset_size_df['subset'] = subset_size_df['subset'].apply(lambda x: x.replace(' weight', ''))
subset_size_df.head()

Unnamed: 0,CountryName,CountryFlag,subset,subset_weight
1,Austria,AT🇦🇹,Lesbian,0.17
117,Austria,AT🇦🇹,Transgender,0.07
30,Austria,AT🇦🇹,Gay,0.61
59,Austria,AT🇦🇹,Bisexual women,0.06
88,Austria,AT🇦🇹,Bisexual men,0.09


After calculating the `Subset Weight` values I am going to get new value of `Percent` of responses for each subset by multiplying the original `Percent` value by the `Subset Weight`.

\begin{align*}
\textrm{Percent}_{\textrm{weighted}} = \textrm{Percent} \times \textrm{Weight}_{\textrm{subset}}
\end{align*}

After this I am adding a `Response Weight` which will show how 'good' the answer is. Let's take a look at imaginary example for two qestions for `Italy`:

| Country | Question | Answer | Percent (Weighted) |
|:-:|:-:|:-:|:-:|
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very widespread | 25 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly widespread | 15 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Dont know | 10 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly rare | 30 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very rare | 20 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Yes | 30 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Don't know | 20 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| No | 50 |



First step is going to be adding a weight to each answer in range `[-1, 1]` with `-1` being negative and `1` being positive. Looking at the first question `In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender?` the answer `Very rare` is the best possible scenario among all the answer options while `Very widespread` is the worst. So I'm assigning weight `-1` to `Very widespread` and `1` to `Very rare`. The rest weight of the answers are splited evenly (`-0.5` to `Fairly widespread` and `0.5` to `Fairly rare`). For example if there is 6 answer options, the weights look like this `[-1, -0.66, -0.33, 0.33, 0.66, 1]`. Answer option `Don't know` gets `np.NaN`. 

*Note: before I thought that `Don't know` answer weight should be `0` but then I changed it to `np.NaN` so it doesn't affect the total score since that answer is not really helpful. If you think it should be `0` I would love to hear your reasons.*

Then I compute the `Score` by following formula:

\begin{align*}
\textrm{Score} = \textrm{Weight}_{\textrm{response}} \times \frac{\textrm{Percent}_{\textrm{weighted}} }{100}
\end{align*}

In that case `Score` can also be in the range `[-1, 1]` with `-1` being negative and `1` being positive. The final `Total Block Score` for the country is just taking the average of all the scores.

\begin{align*}
\textrm{Total Block Score} = \textrm{Average(Score)}
\end{align*}

| Country | Question | Answer | Percent (Weighted) | Response Weight | Score |
|:-:|:-:|:-:|:-:|:-:|:-:|
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very widespread | 25 | -1 | -0.25 | 
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly widespread | 15 | -0.5 | -0.075 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Don't know | 10 | np.NaN | np.NaN |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly rare | 30 | 0.5 | 0.15 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very rare | 20 | 1 | 0.2 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Yes | 30 | -1 | -0.3 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Don't know | 20 | np.NaN | np.NaN |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| No | 50 | 1 | 0.2 |


<br>
So the `Total Block Score` for this block is going to be $\frac{-0.25 -.075 + 0.15 + 0.2 - 0.3 + 0.2}{6} = −0.0125$. After computing the scores for 4 blocks the `Total Score` is going to be the average of four `Total Block Scores`.

## <a id='dl'>3.2. Daily Life</a>

Let's start with `Daily Life` questions block where subjects answered questions about day to day living as a lesbian, gay, bisexual or transgender person.

In [10]:
def set_WidespreadRare_weight(df, questions_list, rare_negative=False):
    if rare_negative:
        weight = -1
    else:
        weight = 1
    for quesID in questions_list:
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Very widespread'), 'weight'] = -weight
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Fairly widespread'), 'weight'] = -weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Fairly rare'), 'weight'] = weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Very rare'), 'weight'] = weight
        

def set_YesNo_weight(df, questions_list, yes_negative=False):
    if yes_negative:
        weight = -1
    else:
        weight = 1
    for quesID in questions_list:
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Yes'), 'weight'] = weight
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'No'), 'weight'] = -weight
        
        
def set_AlwaysNever_weight(df, questions_list, alsways_negative=False):
    if alsways_negative:
        weight = -1
    else:
        weight = 1
    for quesID in questions_list:
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Always'), 'weight'] = weight
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Often'), 'weight'] = weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Rarely'), 'weight'] = -weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Never'), 'weight'] = -weight

In [11]:
daily_life_df['weight'] = np.NaN
daily_life_df.loc[daily_life_df['answer'] == 'Don`t know', 'weight'] = np.NaN

set_WidespreadRare_weight(
    df=daily_life_df,
    questions_list=[
        'b1_a', 'b1_b', 'b1_c', 'b1_d', 'c1a_a', 'c1a_b', 'c1a_c', 'c1a_d', ''
    ],
    rare_negative=False
)
set_WidespreadRare_weight(
    df=daily_life_df,
    questions_list=[
        'b1_e', 'b1_g', 'b1_h', 'b1_i'
    ],
    rare_negative=True
)

daily_life_df.loc[(daily_life_df['question_code'] == 'g4_a') & (daily_life_df['answer'] == 'Never happened in the last sixth months'), 'weight'] = 1
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_a') & (daily_life_df['answer'] == 'Happened only once in the last six months'), 'weight'] = 0.5
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_a') & (daily_life_df['answer'] == '2-5 times in the last six months'), 'weight'] = -0.5
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_a') & (daily_life_df['answer'] == '6 times or more in the last six months'), 'weight'] = -1

daily_life_df.loc[(daily_life_df['question_code'] == 'g4_b') & (daily_life_df['answer'] == 'Never happened in the last sixth months'), 'weight'] = 1
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_b') & (daily_life_df['answer'] == 'Happened only once in the last six months'), 'weight'] = 0.5
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_b') & (daily_life_df['answer'] == '2-5 times in the last six months'), 'weight'] = -0.5
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_b') & (daily_life_df['answer'] == '6 times or more in the last six months'), 'weight'] = -1

daily_life_df.loc[(daily_life_df['question_code'] == 'g4_c') & (daily_life_df['answer'] == 'Never happened in the last sixth months'), 'weight'] = 1
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_c') & (daily_life_df['answer'] == 'Happened only once in the last six months'), 'weight'] = 0.5
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_c') & (daily_life_df['answer'] == '2-5 times in the last six months'), 'weight'] = -0.5
daily_life_df.loc[(daily_life_df['question_code'] == 'g4_c') & (daily_life_df['answer'] == '6 times or more in the last six months'), 'weight'] = -1

daily_life_df.loc[(daily_life_df['question_code'] == 'h15') & (daily_life_df['answer'] == 'Yes'), 'weight'] = -1
daily_life_df.loc[(daily_life_df['question_code'] == 'h15') & (daily_life_df['answer'] == 'No'), 'weight'] = 1
daily_life_df.loc[(daily_life_df['question_code'] == 'h15') & (daily_life_df['answer'] == 'I did not need or use any benefits or services'), 'weight'] = np.NaN

In [12]:
daily_life_df.head()

Unnamed: 0,CountryName,subset,question_code,question_label,answer,percentage,notes,CountryID,CountryFlag,weight
0,Austria,Lesbian,b1_a,"In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?",Very widespread,8.0,,AT,AT🇦🇹,-1.0
1,Austria,Lesbian,b1_a,"In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?",Fairly widespread,34.0,,AT,AT🇦🇹,-0.5
2,Austria,Lesbian,b1_a,"In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?",Fairly rare,45.0,,AT,AT🇦🇹,0.5
3,Austria,Lesbian,b1_a,"In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?",Very rare,9.0,,AT,AT🇦🇹,1.0
4,Austria,Lesbian,b1_a,"In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?",Don`t know,4.0,[0],AT,AT🇦🇹,


In [13]:
fig, dl_scores = calculate_and_plot_total_score(daily_life_df, title='Daily Life')
# fig.show()
fig.write_html("plotly-output/2.html")

* The first place goes to **Netherlands**🇳🇱 (which means that the responses about daily life for this country were more positive comparing to other countries).
* The last place goes to **Cyprus**🇨🇾.

## <a id='ra'>3.3. Right Awareness</a>

In [14]:
rights_awareness_df['weight'] = np.NaN
rights_awareness_df.loc[rights_awareness_df['answer'] == 'Don`t know', 'weight'] = np.NaN
rights_awareness_df.loc[rights_awareness_df['answer'] == 'No', 'weight'] = -1
rights_awareness_df.loc[rights_awareness_df['answer'] == 'Yes', 'weight'] = 1

In [15]:
fig, ra_scores = calculate_and_plot_total_score(rights_awareness_df, title='Rights Awareness')
# fig.show()
fig.write_html("plotly-output/3.html")

* The first place goes to **Sweden🇸🇪** (which means that the people from the LGBT community are much more aware about their rights in that country comparing to other).
* The last place goes to **Greece**🇬🇷.

## <a id='disc'>3.4. Discrimination</a>

In [16]:
discrimination_df['weight'] = np.NaN

discrimination_df.loc[discrimination_df['answer'] == 'Don`t know', 'weight'] = np.NaN
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'None of the above'), 'weight'] = 0
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'I have never accessed healthcare services'), 'weight'] = 0
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Difficulty in gaining access to healthcare'), 'weight'] = -1
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Having to change general practitioners or other specialists due to their negative reaction'), 'weight'] = -1
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Receiving unequal treatment when dealing with medical staff'), 'weight'] = -1
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Foregoing treatment for fear of discrimination or intolerant reactions'), 'weight'] = -1
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Specific needs ignored (not taken into account)'), 'weight'] = -1
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Inappropriate curiosity'), 'weight'] = -1
discrimination_df.loc[(discrimination_df['question_code'] == 'c10') & (discrimination_df['answer'] == 'Pressure or being forced to undergo any medical or psychological test'), 'weight'] = -1

set_YesNo_weight(
    df=discrimination_df,
    questions_list=[
        'c2a_a', 'c2a_b', 'c2a_c', 'c2a_d', 'c2_b', 'c2_c', 'c4_a', 'c4_b', 
        'c4_c', 'c4_d', 'c4_e', 'c4_f', 'c4_g', 'c4_h', 'c4_i', 'c4_j', 'c4_k', 'discrim1yr'
    ],
    yes_negative=True
)

set_AlwaysNever_weight(
    df=discrimination_df,
    questions_list=[
        'c8a_b', 'c8a_c', 'c8a_d', 'c8a_e', 'c8a_f', 'c9_b', 'c9_c', 'c9_d', 'c9_e'
    ],
    alsways_negative=True
)

set_AlwaysNever_weight(
    df=discrimination_df,
    questions_list=[
        'c8a_a', 'c9_a'
    ],
    alsways_negative=False
)

In [17]:
fig, discr_scores = calculate_and_plot_total_score(discrimination_df, title='Discrimination')
# fig.show()
fig.write_html("plotly-output/4.html")

* The first place goes to **Malta**🇲🇹 (which means that the people from the LGBT feel less discriminated in that country comparing to other countries).
* The last place goes to **Romania**🇷🇴.

## <a id='vah'>3.5. Violence and Harassment</a>

In [18]:
violence_harassment_df['weight'] = np.NaN

violence_harassment_df.loc[violence_harassment_df['answer'] == 'Don`t know', 'weight'] = np.NaN
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'e1') & (violence_harassment_df['answer'] == 'I do not have a same-sex partner'), 'weight'] = np.NaN
set_YesNo_weight(
    df=violence_harassment_df,
    questions_list=[
        'e1', 'e2', 'f1_a', 'f1_b', 'fa1_5', 'fa2_5', 'fb1_5', 'fb2_5'
    ],
    yes_negative=True
)
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'More than ten times'), 'weight'] = -1
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'Six to ten times'), 'weight'] = -0.86
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'Five times'), 'weight'] = -0.71
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'Four times'), 'weight'] = -0.57
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'Three times'), 'weight'] = -0.43
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'Twice'), 'weight'] = -0.29
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fa1_3') & (violence_harassment_df['answer'] == 'Once'), 'weight'] = -0.14

violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'More than ten times'), 'weight'] = -1
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'Six to ten times'), 'weight'] = -0.86
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'Five times'), 'weight'] = -0.71
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'Four times'), 'weight'] = -0.57
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'Three times'), 'weight'] = -0.43
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'Twice'), 'weight'] = -0.29
violence_harassment_df.loc[(violence_harassment_df['question_code'] == 'fb1_3') & (violence_harassment_df['answer'] == 'Once'), 'weight'] = -0.14

In [19]:
fig, violhar_scores = calculate_and_plot_total_score(violence_harassment_df, title='Violence and Harassment')
# fig.show()
fig.write_html("plotly-output/5.html")

* The first place goes to **Finland🇫🇮** (which means that the people from the LGBT are beinge the subject of harassment or violation less often in that country comparing to other countries).
* The last place goes to **Estonia**🇪🇪.

## <a id='overall'>3.6. Overall rank</a>

By taking the average of 4 scores we can rescale that values to get the final `Total Rank`.

In [20]:
overall_rang = pd.merge(
    dl_scores[['CountryFlag', 'CountryName', 'Daily Life Rank']],
    violhar_scores[['CountryFlag', 'CountryName', 'Violence and Harassment Rank']]).merge(
    discr_scores[['CountryFlag', 'CountryName', 'Discrimination Rank']]).merge(
    ra_scores[['CountryFlag', 'CountryName', 'Rights Awareness Rank']])

overall_rang['Average Rank'] = overall_rang.mean(axis=1)
overall_rang = overall_rang.sort_values(['Average Rank'], ascending=True).reset_index(drop=True)

overall_rang['Total Rank'] = overall_rang['Average Rank'].rank(method='dense', ascending=True).astype('int')

fig = plot_table(overall_rang)
# fig.show()
fig.write_html("plotly-output/6.html")

So!

* The absolute winners are **Denmark**🇩🇰, **Netherlands**🇳🇱, **Sweden**🇸🇪.
* The absolute losers are **Romania**🇧🇬, **Bulgaria**🇧🇬, **Cyprus**🇨🇾.

Here is something to think about when you are considering a destination for travelling/relocation.

# <a id='lbgt'>4. What the LGBT community says</a>

After I got the `Total Rank` for each country I want to look at some particular responses to find out how does LGBT community respond to living in EU countries.

## <a id='satisfied'>4.1.  Do people fell satisfied in EU countries?</a>

There was a question "**All things considered, how satisfied would you say you are with your life these days?**" in Daily Life questions block where subjects could pick a value from 0 to 10 (10 being the most satisfied) of how satisfied they feel. Using the same methodology I am going to find a score for this single question and compare it to the `Total Rank` from previous section.

In [21]:
subset_df = daily_life_df[daily_life_df['question_label'] == 'All things considered, how satisfied would you say you are with your life these days? *'].copy()
subset_df = subset_df.merge(subset_size_df, how='left', on=['CountryName', 'CountryFlag', 'subset'])
subset_df['answer'] = subset_df['answer'].astype('int') / 10
subset_df['percentage'] = subset_df['percentage'] * subset_df['subset_weight'] / 100
subset_df['Daily Life Score'] = subset_df['answer'] * subset_df['percentage'] 
subset_df = subset_df.groupby(['CountryName'], as_index=False)['Daily Life Score'].mean()
subset_df.columns = ['CountryName', 'Score']
subset_df = subset_df.merge(dl_scores[['CountryName', 'CountryFlag']])
subset_df['Score'] = subset_df['Score'].apply(lambda x: round(x,4))
subset_df = subset_df.merge(overall_rang[['CountryName', 'Total Rank']])
subset_df = subset_df[['CountryFlag', 'CountryName', 'Score', 'Total Rank']].sort_values(['Score'], ascending=False)

trace0 = go.Choropleth(
    colorscale = 'Greens', #'YlOrRd',
    autocolorscale = False,
    reversescale=True,
    locations = subset_df['CountryName'],
    text=subset_df['CountryFlag'],
    z = subset_df['Score'],
    locationmode = 'country names',
    colorbar = go.choropleth.ColorBar(
        title = '<b>Score</b>')
)

layout = go.Layout(
    title='<b>How satisfied would you say you are with your life these days?</b>',
    geo = go.layout.Geo(
        scope='europe',
        showlakes=False),
)

fig = go.Figure(data=[trace0], layout=layout)
# fig.show()
fig.write_html("plotly-output/5.html")

In [22]:
subset_df['Satisfaction Rank'] = subset_df['Score'].rank(method='dense', ascending=False).astype('int')
subset_df['Rank Diff'] = subset_df['Satisfaction Rank'] - subset_df['Total Rank'] 
fig = plot_table(subset_df)
# fig.show()
fig.write_html("plotly-output/8.html")

\begin{align*}
\textrm{Rank Diff} = \textrm{Satisfaction Rank} - \textrm{Total Rank}
\end{align*}

In such way, `-` sign in `Satisfaction Rank` column means that LGBT community feel more satisfied in that county as I would guess from `Total Rank` value. `+` sign tells the opposite.

## <a id='open'>4.2. Are people being open about their orientation?</a>

Next question "**4 levels of being open about LGBT background**" from Daiy Life questions block allow to see how open the LGBT community is in the country they live in. The possible answers are Never Open, Rarerly Open, Fairly Open, Very Open.

In [23]:
openess_df = daily_life_df[daily_life_df['question_label'] == '4 levels of being open about LGBT background *']
openess_df = openess_df.merge(dl_scores[['CountryName', 'CountryFlag']])
openess_df = openess_df.merge(overall_rang[['CountryName', 'Total Rank']])
openess_df = openess_df.sort_values(['Total Rank'], ascending=False)


colors = cl.scales['5']['seq']['Greens']
colors = colors[1:]
answers = ['Never Open', 'Rarely Open', 'Fairly open', 'Very open']
data = []
buttons = []

for (i,subset) in enumerate(openess_df['subset'].unique()):
    if i == 0:
        visible = True
    else:
        visible = False
    subset_df = openess_df[openess_df['subset'] == subset]
    subset_df = subset_df.sort_values(['Total Rank'], ascending=False)
    sum_df = subset_df.groupby(['CountryName'], as_index=False)[['percentage']].sum()
    sum_df.rename(columns={'percentage': 'sum_perc'}, inplace=True)
    subset_df = subset_df.merge(sum_df, how='left')
    for (j,ans_opt) in enumerate(answers):
        trace = go.Bar(
            name=ans_opt, 
            y=subset_df['CountryName'][subset_df['answer'] == ans_opt] +' ('+ subset_df['CountryFlag'][subset_df['answer'] == ans_opt] +')', 
            x=subset_df['percentage'][subset_df['answer'] == ans_opt]/subset_df['sum_perc'][subset_df['answer'] == ans_opt],
            orientation='h',
            legendgroup=ans_opt,
            marker=dict(color=colors[j]),
            visible=visible
        )
        data.append(trace)
        
    visible_list = [False] * (len(subset_df['answer'].unique()) * len(openess_df['subset'].unique()) + 4)
    visible_list[i*4:i*4+4] = [True] * 4
    buttons_temp = dict(
        label = subset,
        method = 'update',
        args = [
            {'visible': visible_list}
        ]
    )
    buttons.append(buttons_temp)

temp_df = openess_df.groupby(['CountryName', 'CountryFlag', 'answer', 'Total Rank'], as_index=False)[['percentage']].mean()
temp_df = temp_df.sort_values(['Total Rank'], ascending=False)
sum_df = openess_df.groupby(['CountryName','answer'], as_index=False)[['percentage']].mean().groupby(['CountryName'], as_index=False)[['percentage']].sum()
sum_df.rename(columns={'percentage': 'sum_perc'}, inplace=True)
temp_df = temp_df.merge(sum_df, how='left')
for (j,ans_opt) in enumerate(answers):
    trace = go.Bar(
        name=ans_opt, 
        y=temp_df['CountryName'][temp_df['answer'] == ans_opt] +' ('+ temp_df['CountryFlag'][temp_df['answer'] == ans_opt]+')', 
        x=temp_df['percentage'][temp_df['answer'] == ans_opt]/temp_df['sum_perc'][temp_df['answer'] == ans_opt],
        orientation='h',
        legendgroup=ans_opt,
        marker=dict(color=colors[j]),
        visible=visible
    )
    data.append(trace)

visible_list = [False] * (len(subset_df['answer'].unique()) * len(openess_df['subset'].unique()) + 4)
visible_list[-4:] = [True] * 4
buttons_temp = dict(
    label = 'All',
    method = 'update',
    args = [
        {'visible': visible_list}
    ]
)
buttons.append(buttons_temp)

updatemenus = list([
    dict(type="buttons",
         active=0,
         buttons=buttons,
         direction = "left",
         x=0.1,
         xanchor="left",
         y=1.1,
         yanchor="top"
        )
])

layout = go.Layout(
    updatemenus=updatemenus,
    annotations=[     
        go.layout.Annotation(
            text="<b>Subset:</b>", 
            showarrow=False,
            x=0, y=1.08, 
            yref="paper", align="left"
        )
    ],
    margin=dict(l=200, t=200),
    height=700,
    barmode='stack',
    title='<b>4 levels of being open about LGBT background</b><br><i>Taken from EU LGBT survey results (2012)</i>',
    xaxis=dict(title='<b>Ratio</b>')
    )
  
# layout = json.dumps(layout, cls=PlotlyJSONEncoder)
fig = go.Figure(data=data, layout=layout)
# fig.show()
fig.write_html("plotly-output/9.html")

The countries in the plot are sorted by the `Total Rank` (the top countries have the highest rank, the bottom countries have the lowest rank). You can notice how the 'openess ratio' is correlated with country `Total Rank` - the higher the rank, the higher is the ratio of 'open' people.

In [24]:
data = []
buttons = []
for (j,ans_opt) in enumerate(answers):
    if j == 0:
        visible = True
    else:
        visible = False
    x=temp_df['Total Rank'][temp_df['answer'] == ans_opt]
    y=temp_df['percentage'][temp_df['answer'] == ans_opt]
    corrCoef = np.corrcoef(temp_df['Total Rank'][temp_df['answer'] == ans_opt], temp_df['percentage'][temp_df['answer'] == ans_opt])[0,1]
    trace0 = go.Scatter(
        name=ans_opt,
        x=x, 
        y=y,
        mode="markers",
        marker=dict(color=colors[j]),
        visible=visible
    )
    
    trace1 = go.Scatter(
        x=[x.max()*0.9],
        y=[y.max()*0.8],
        mode='text',
        text='Correlation: {}'.format(round(corrCoef,2)),
        textfont=dict(
          family='sans serif',
          size=16,
          color='#FF4136'
        ),
        name=ans_opt,
        visible=visible
  )
    data.append(trace0)
    data.append(trace1)
    
    visible_list = [False] * len(answers) * 2
    visible_list[j*2:j*2+2] = [True] * 2
    buttons_temp = dict(
        label = ans_opt,
        method = 'update',
        args = [
            {'visible': visible_list}
        ]
    )
    buttons.append(buttons_temp)

updatemenus = list([
    dict(
        type="buttons",
        active=0,
        buttons=buttons,
        direction = "left",
        x=0.3,
        xanchor="left",
        y=1.1,
        yanchor="top"
    )
])

layout = go.Layout(
    showlegend=False,
    updatemenus=updatemenus,
    annotations=[     
        go.layout.Annotation(
            text="<b>Answer Option:</b>", 
            showarrow=False,
            x=1, y=1.08, 
            yref="paper", align="left"
        )
    ],
    title='<b>Correlation between "Openess" Ratio and Total Rank</b>',
    xaxis=dict(title='<b>Country Total Rank</b>'),
    yaxis=dict(title='<b>Percent of people answered</b>')
    )
  
fig = go.Figure(data=data, layout=layout)
# layout = json.dumps(layout, cls=PlotlyJSONEncoder)
# fig.show()
fig.write_html("plotly-output/10.html")

In [25]:
total_open = openess_df.groupby(['answer'], as_index=False)[['percentage']].mean()
total_open['subset'] = 'Total'
subset_open = openess_df.groupby(['subset', 'answer'], as_index=False)[['percentage']].mean()
subset_open = subset_open.append(total_open, sort=True)
subset_open['percentage'] = subset_open['percentage'].apply(lambda x: round(x,1))

data = []
for (i,ans_opt) in enumerate(subset_open['answer'].unique()):
    trace = go.Bar(
        name=ans_opt, 
        y=subset_open['subset'][subset_open['answer'] == ans_opt], 
        x=subset_open['percentage'][subset_open['answer'] == ans_opt]/100,
        orientation='h',
        legendgroup=ans_opt,
        marker=dict(color=colors[i])
    )
    data.append(trace)

layout = go.Layout(
    margin=dict(l=100),
    height=400,
    barmode='stack',
    title='<b>4 levels of being open about LGBT background</b><br><i>Total for all countries</i>',
#     xaxis=dict(title='Ratio'),
    legend_orientation="h"
)

fig = go.Figure(data=data, layout=layout)
# layout = json.dumps(layout, cls=PlotlyJSONEncoder)
# fig.show()
fig.write_html("plotly-output/11.html")

* Gay Men have the highest `Very Open` rate (23%) while Bisexual Men have the highest `Never Open` rate (75%).
* In total, about 27% of people from LGBT community being open about their orientaion (`Very Open` + `Fairly Open`), especially in Netherlands🇳🇱 (15%).

## <a id='comf'>4.3. What would allow to live more comfortable?</a>

There were a series of questions "**What would allow you to be more comfortable living as a LGB person?**" with 8 different options that allow to explore what is missing in current situation in the country for the LGBT community to feel better.

In [26]:
question_codes = ['b2_a', 'b2_f', 'b2_b', 'b2_d', 'b2_c', 'b2_h', 'b2_g', 'b2_e']
comfort_df = daily_life_df[daily_life_df['question_code'].apply(lambda x: x in question_codes)].reset_index(drop=True)
comfort_df['question_label'] = comfort_df['question_label'].apply(lambda x: x.split('? ')[1])
comfort_df = comfort_df.groupby(['CountryName', 'CountryFlag', 'question_code', 'question_label', 'answer'], as_index=False)[['percentage']].mean()
comfort_df['question_label'] = comfort_df['question_label'].apply(lambda x: x.replace('lesbian, gay and bisexual', 'LGB'))
comfort_df['answer'] = comfort_df['answer'].apply(lambda x: x.replace('lesbian, gay and bisexual', 'LGB'))

In [27]:
comfort_df = comfort_df.merge(overall_rang[['CountryName', 'Total Rank']])
comfort_df = comfort_df.sort_values(['Total Rank'], ascending=False)

answers = ['Don`t know', 'Strongly disagree', 'Disagree', 'Current situation is fine', 'Agree', 'Strongly agree']
colors = cl.scales[str(len(answers))]['seq']['Greens']
data = []
buttons = []

for (i,question) in enumerate(comfort_df['question_label'].unique()):
    if i == 0:
        visible = True
    else:
        visible = False
    subset_df = comfort_df[comfort_df['question_label'] == question]
    subset_df = subset_df.sort_values(['Total Rank'], ascending=False)
    sum_df = subset_df.groupby(['CountryName'], as_index=False)[['percentage']].sum()
    sum_df.rename(columns={'percentage': 'sum_perc'}, inplace=True)
    subset_df = subset_df.merge(sum_df, how='left')
    for (j,ans_opt) in enumerate(answers):
        trace = go.Bar(
            name=ans_opt, 
            y=subset_df['CountryName'][subset_df['answer'] == ans_opt] +' ('+ subset_df['CountryFlag'][subset_df['answer'] == ans_opt] +')', 
            x=subset_df['percentage'][subset_df['answer'] == ans_opt]/subset_df['sum_perc'][subset_df['answer'] == ans_opt],
            orientation='h',
            legendgroup=ans_opt,
            marker=dict(color=colors[j]),
            visible=visible
        )
        data.append(trace)
        
    visible_list = [False] * (len(answers) * len(comfort_df['question_label'].unique()))
    visible_list[i*6:i*6+6] = [True] * 6
    buttons_temp = dict(
        label = question,
        method = 'update',
        args = [
            {'visible': visible_list}
        ]
    )
    buttons.append(buttons_temp)

updatemenus = list([
    dict(type="dropdown",
         active=0,
         showactive=True,
         buttons=buttons,
         direction = "down",
         x=0.0,
         xanchor="left",
         y=1.1,
         yanchor="top"
        )
])

layout = go.Layout(
    updatemenus=updatemenus,
    annotations=[     
        go.layout.Annotation(
            text="Question:", 
            showarrow=False,
            x=0.01, y=1.08, 
            yref="paper", align="left"
        )
    ],
    margin=dict(l=200, t=120),
    height=700,
    barmode='stack',
    title='<b>What would allow you to be more comfortable living as a LGB person?</b>',
    xaxis=dict(title='<b>Ratio</b>')
    )
  
fig = go.Figure(data=data, layout=layout)
# layout = json.dumps(layout, cls=PlotlyJSONEncoder)
# fig.show()
fig.write_html("plotly-output/12.html")

In [28]:
comfort_total = comfort_df.groupby(['question_label', 'answer', 'question_code'], as_index=False)[['percentage']].mean()
comfort_total = comfort_total.sort_values(['question_code'], ascending=False)
comfort_total['percentage'] = comfort_total['percentage'].apply(lambda x: round(x,1))

data = []
for (i,ans_opt) in enumerate(answers):
    trace = go.Bar(
        name=ans_opt, 
        y=comfort_total['question_code'][comfort_total['answer'] == ans_opt], 
        x=comfort_total['percentage'][comfort_total['answer'] == ans_opt]/100,
        orientation='h',
        legendgroup=ans_opt,
        marker=dict(color=colors[i])
    )
    data.append(trace)

layout = go.Layout(
    height=400,
#     margin=dict(l=100),
    barmode='stack',
    title='<b>What would allow you to be more comfortable living as a LGB?</b><br><i>Total for all countries</i>',
#     xaxis=dict(title='Ratio'),
    yaxis=dict(
        title='<b>Question Code</b>',
        automargin=True),
    legend_orientation="h"
)

fig = go.Figure(data=data, layout=layout)
# layout = json.dumps(layout, cls=PlotlyJSONEncoder)
# fig.show()
fig.write_html("plotly-output/13.html")

fig = plot_table(comfort_total[['question_code', 'question_label']].drop_duplicates().sort_values(['question_code']))
# fig.show()
fig.write_html("plotly-output/14.html")

* High ratio of people (88%) agreed that **Measures implemented at school to respect LGB people** would improve the situation (especially in Italy🇮🇹 with 78%)
* 16% of people feel satisfied with the **The possibility to marry and/or register a partnership** (top countries are: Netherlands🇳🇱, Belgium🇧🇪 and Portugal🇵🇹)
* 9% of people don't think that **The possibility to foster / adopt children** would change a lot.

# <a id='end'>5. Conclusions</a>

So I estimated the country ranks of goodness for LGBT community, showed in what countries people are more open about their orientation and what do people think would make their life better. It's just a small piece of insights that could be extracted from this survey so many more questions can be answered. You can also check the [official report](https://fra.europa.eu/en/publication/2014/eu-lgbt-survey-european-union-lesbian-gay-bisexual-and-transgender-survey-main) with survey analysis results.