**<center><font size=5>LGBT Survey Analysis</font></center>**

<center><img src="https://i.ibb.co/qFF3K6f/equal-2495950-1920.jpg" alt="equal-2495950-1920" border="0" width="700"></center>

*<center>Image by <a href="https://pixabay.com/users/Wokandapix-614097/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=2495950">Wokandapix</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=2495950">Pixabay</a></center>*

***
**author**: Ruslan Klymentiev

**date**: 20th July, 2019

**Table of Contents**
- <a href='#intro'>1. Project overview and objectives</a> 
    - <a href='#survey'>1.1. The aim of the survey</a>
    - <a href='#data'>1.2. Data set overview</a>
- <a href='#bi'>2. Choropleth map visualization of responses</a>
- <a href='#score'>3. Country 'suitable' scores</a>
    - <a href='#method'>3.1. Scoring methodology</a>
    - <a href='#dl'>3.2. Daily Life</a>
    - <a href='#ra'>3.3. Right Awareness</a>
    - <a href='#disc'>3.4. Discrimination</a>
    - <a href='#vah'>3.5. Violence and Harassment</a>
    - <a href='#overall'>3.6. Overall rank</a>
- <a href='#lbgt'>4. What the LGBT community says</a>
    - <a href='#satisfied'>4.1. Do people fell satisfied in EU countries?</a>
    - <a href='#open'>4.2. Are people being open about their orientation?</a>
    - <a href='#comf'>4.3. What would allow to live more comfortable?</a>
- <a href='#end'>5. Conclusions</a>

In [1]:
!pip install emoji-country-flag

import numpy as np 
import pandas as pd 
import os
import flag
import pycountry
import json

from plotly.utils import PlotlyJSONEncoder
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
from plotly import tools
import colorlover as cl

from IPython.display import clear_output
clear_output()

pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_colwidth', 1000)
init_notebook_mode(connected=True)
RANDOM_SEED = 123

In [2]:
def plot_table(df):
    values = []
    for col_name in df.columns:
        values.append(df[col_name])
    trace0 = go.Table(
      header = dict(
        values = ['<b>'+x.upper()+'</b>' for x in df.columns],
        line = dict(color = 'black'),
        fill = dict(color = 'yellow'),
        align = ['center'],
        font = dict(color = 'black', size = 9)
      ),
      cells = dict(
        values = values,
        align = 'center',
        font = dict(color = 'black', size = 11)
        ))

    data = [trace0]
    iplot(data)
    
    
def calculate_and_plot_total_score(df, title):
    df = df[df['CountryName'] != 'Average']
    df = df.merge(SubsetWeightsDF, how='left', on=['CountryName', 'CountryFlag', 'subset'])
    df['perc_ratio'] = df['percentage']*df['SubsetWeight'] / 100
    # calculate the score
    df[title+' Score'] = df['weight'] * df['perc_ratio']
    # get the average
    ScoreDF = df.groupby(['CountryName', 'CountryFlag'], as_index=False)[[title+' Score']].mean()
    ScoreDF = ScoreDF[['CountryFlag', 'CountryName', title+' Score']].sort_values([title+' Score'], ascending=False).reset_index(drop=True)
    ScoreDF[title+' Rank'] = ScoreDF[title+' Score'].rank(method='dense', ascending=False).astype('int')
    ScoreDF[title+' Score'] = ScoreDF[title+' Score'].apply(lambda x: round(x,4))
    
#     # plot the map
#     trace0 = go.Choropleth(
#         colorscale = 'Greens', #'YlOrRd',
#         autocolorscale = False,
#         reversescale=True,
#         locations = ScoreDF['CountryName'],
#         text=ScoreDF['CountryFlag'],
#         z = ScoreDF[title+' Score'],
#         locationmode = 'country names',
#         colorbar = go.choropleth.ColorBar(
#             title = 'Score')
#     )

#     layout = go.Layout(
#         title='Total Score of {} by Country'.format(title),
#         geo = go.layout.Geo(
#             scope='europe',
#             showlakes=False),
#     )

#     fig = go.Figure(data=[trace0], layout=layout)
#     iplot(fig)    
    
    # plot the table
    plot_table(ScoreDF[[title+' Rank', 'CountryFlag', 'CountryName', title+' Score']])
    return ScoreDF


def show_InEx_questions(df, included=True):
    temp_df = df.groupby(['question_label', 'question_code'], as_index=False)[['weight']].max()
    if included:
        display(pd.DataFrame(
            temp_df[['question_label', 'question_code']][~temp_df['weight'].isnull()].drop_duplicates(), 
            columns=['question_label', 'question_code']).rename(columns={'question_label': 'Included Questions'})
               )
    else:
        display(pd.DataFrame(
            temp_df[['question_label', 'question_code']][temp_df['weight'].isnull()].drop_duplicates(), 
            columns=['question_label', 'question_code']).rename(columns={'question_label': 'Excluded Questions'})
               )

# <a id='intro'>1. Project overview and objectives</a>

The main purpose of this project is the visualization of survey results conducted in EU countries (and Croatia) among 93000 LGBT people (2012). I tried to estimate the overall score of "suitability" (in other words, how good is this county for LGBT community?) by assigning weights to answers and getting average scores for each of the question block. Then I look at some particular questions to explore how satisfied LGBT communiy is and what they think would improve their lives in the countries they live in.

## <a id='survey'>1.1. The aim of the survey</a>

> *The aim of the EU LGBT survey was to obtain robust and comparable data that would allow a better understanding of how lesbian, gay, bisexual and transgender (LGBT) people living in the European Union (EU) and Croatia experience the enjoyment of fundamental rights. The survey collected data from 93,079 people across the EU and Croatia through an anonymous online questionnaire, collecting the views, perceptions, opinions and experiences of persons aged 18 years or over, who self-identify as lesbian, gay, bisexual or transgender. The topics related to various fundamental rights issues with an emphasis on experienced discrimination, violence and harassment. The survey and all related activities covered the 27 current EU Member States as well as Croatia. FRA designed the questionnaire and finalised it in consultation with its Scientific Committee, relevant stakeholders and civil society organisations, as well as independent academics and national experts with expertise in the area of discrimination on grounds of sexual orientation and
gender identity.*
>
> *The survey asked a range of questions about LGBT people’s experiences including:*
> * *public perceptions and responses to homophobia and/or transphobia;*
> * *discrimination;*
> * *rights awareness;*
> * *safe environment;*
> * *violence and harassment;*
> * *the social context of being an LGBT person;*
> * *personal characteristics, including age and income group. *

*Taken from [EU LGBT survey technical report. Methodology, online survey, questionnaire and sample](https://fra.europa.eu/sites/default/files/eu-lgbt-survey-technical-report_en.pdf)*

## <a id='data'>1.2. Data set overview</a>

Data set consist of 5 .csv files that represent 5 blocks of questions.

The schema of all the tables is identical:

| Variable | Note/Example |
|:-:|:-:|
| `CountryCode` | name of the country |
| `subset` | `Lesbian`, `Gay`, `Bisexual women`, `Bisexual men` or `Transgender` |
| `question_code` | unique code ID for the question |
| `question_label` | full question text |
| `answer` | answer given |
| `percentage` | % |
| `notes` | `[0]`: small sample size; `[1]`: NA due to small sample size; `[2]`: missing value |
<br>
* Total amount of countries that participated in the survey is 28
* All answers are different (i.e, can be binary (`Yes-No`), numerical (`1-10`) or scale (`Always-Often-Never`))

In [3]:
# get the list of countries ID (like Germany - DE)
countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_2

# data inport
DailyLifeDF = pd.read_csv('../input/LGBT_Survey_DailyLife.csv')
RightsAwarenessDF = pd.read_csv('../input/LGBT_Survey_RightsAwareness.csv')
ViolenceAndHarassmentDF = pd.read_csv('../input/LGBT_Survey_ViolenceAndHarassment.csv')
TransgenderSpecificQuestionsDF = pd.read_csv('../input/LGBT_Survey_TransgenderSpecificQuestions.csv')
DiscriminationDF = pd.read_csv('../input/LGBT_Survey_Discrimination.csv')
SubsetSizeDF = pd.read_csv('../input/LGBT_Survey_SubsetSize.csv')

In [4]:
# data cleaning
def clean_data(df):
    df.rename(columns={'CountryCode': 'CountryName'}, inplace=True)
    codes = [countries.get(country, 'Unknown code') for country in df['CountryName']]
    df['CountryID'] = codes
    df.loc[df['CountryName'] == 'Czech Republic', 'CountryID'] = 'CZ'
    df['CountryFlag'] = df['CountryID'].apply(lambda x: x+flag.flagize(':'+x+':'))
    df.loc[df['notes'] == ' [1] ', 'notes'] = '[1]'
    df.loc[df['notes'] == '[1]', 'percentage'] = np.NaN
    df['percentage'] = df['percentage'].astype('float')
    return df


DailyLifeDF = clean_data(DailyLifeDF)
RightsAwarenessDF = clean_data(RightsAwarenessDF)
ViolenceAndHarassmentDF = clean_data(ViolenceAndHarassmentDF)
TransgenderSpecificQuestionsDF = clean_data(TransgenderSpecificQuestionsDF)
DiscriminationDF = clean_data(DiscriminationDF)

In [5]:
overview_df = pd.DataFrame({
    'Data set': [], 
    'Total number of questions': [], 
    'Number of records with "[0]" note (small sample size)': [],
    '% of total records(0)': [],
    'Number of records with "[1]" note (missing value due to the small sample size)': [],
    '% of total records(1)': [],
    'Number of records with "[2]" note (missing value)': [],
    '% of total records(2)': []
})

def data_overview(df, df_name=''):
    global overview_df
    temp = [[
        df_name, 
        df['question_label'].nunique(), 
        np.sum(df['notes'] == '[0]'),
        round(np.sum(df['notes'] == '[0]') * 100/ len(df), 1),
        np.sum(df['notes'] == '[1]'),
        round(np.sum(df['notes'] == '[1]') * 100/ len(df), 1),
        np.sum(df['notes'] == '[2]'),
        round(np.sum(df['notes'] == '[2]') * 100/ len(df), 1),
    ]]
    temp_df = pd.DataFrame(temp, columns=overview_df.columns)
    overview_df = overview_df.append(temp_df)
    
    
data_overview(DailyLifeDF, df_name='Daily Life')
data_overview(RightsAwarenessDF, df_name='Rights Awareness')
data_overview(ViolenceAndHarassmentDF, df_name='Violence and Harassment')
data_overview(TransgenderSpecificQuestionsDF, df_name='Transgender Specific Questions')
data_overview(DiscriminationDF, df_name='Discrimination')
display(overview_df.set_index(['Data set']))

Unnamed: 0_level_0,Total number of questions,"Number of records with ""[0]"" note (small sample size)",% of total records(0),"Number of records with ""[1]"" note (missing value due to the small sample size)",% of total records(1),"Number of records with ""[2]"" note (missing value)",% of total records(2)
Data set,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Daily Life,50.0,13447.0,39.5,1849.0,5.4,0.0,0.0
Rights Awareness,10.0,785.0,20.8,130.0,3.4,0.0,0.0
Violence and Harassment,47.0,25072.0,55.3,6897.0,15.2,0.0,0.0
Transgender Specific Questions,23.0,1649.0,48.2,370.0,10.8,0.0,0.0
Discrimination,32.0,5782.0,36.7,1198.0,7.6,0.0,0.0


# <a id='bi'>2. Choropleth map visualization of responses</a>

This visualization allows to explore single question response by country. The dashboard was done using **Microsoft Power BI**. Original map visualization can be find [here](https://fra.europa.eu/en/publications-and-resources/data-and-maps/survey-fundamental-rights-lesbian-gay-bisexual-and).

<iframe width="700" height="800" src="https://app.powerbi.com/view?r=eyJrIjoiMzI4MzMzN2QtYTA5NC00MTZkLTllYTAtMWMzOWQxNjlmZjI5IiwidCI6ImMzNWFiZTIwLTI1N2QtNDcxZi04ZDI3LWU3MTI5ZjA5MjJmNSIsImMiOjl9" frameborder="0" allowFullScreen="true"></iframe>

# <a id='score'>3. Country 'suitable' scores</a>

In this section I am going to score each country by the survey answers to find out which county is "most suitable" for LGBT community. Each country will get a score in 4 blocks **Daily Life**, **Discrimination**, **Violence and Harassment** and **Rights Awareness** (I didn't include **Transgender Specific Questions** here since the segment of people is just transgenders which is different from previous 4) and a **final score**. 

## <a id='method'>3.1. Scoring methodology</a>

Here I am going to describe the algorithm I came up with on how to score countries based on the survey answers. First I load the ratio of Lesbian/Gay/Bisexual women/Bisexual men/Transgender responses from `SubsetSize` table. In order to 'normalize' the responses ration for each county I assume that total ratios (first row) supposed to be *true* for each country. So I've added a `Subset Weight` variable which shows how far away the ratio of responses  from *true* ratio for specific country by calculating:

\begin{align*}
\textrm{Weight}_{\textrm{subset}} = \frac{\textrm{Subset Ratio}}{\textrm{True Ratio}}
\end{align*}

In [6]:
SubsetSizeDF = SubsetSizeDF.merge(DailyLifeDF[['CountryID', 'CountryFlag', 'CountryName']], how='left').drop_duplicates().reset_index(drop=True)
SubsetSizeDF.rename(columns={'Lesbian women': 'Lesbian', 'Gay men': 'Gay'}, inplace=True)

subset_options = ['Lesbian', 'Gay', 'Bisexual women', 'Bisexual men', 'Transgender']
for subset in subset_options:
    SubsetSizeDF[subset+' ratio'] = SubsetSizeDF[subset] / SubsetSizeDF['N']
    SubsetSizeDF[subset+' ratio'] = SubsetSizeDF[subset+' ratio'].apply(lambda x: round(x,3))
    
# SubsetSizeDF = SubsetSizeDF.reindex(sorted(SubsetSizeDF.columns), axis=1)
SubsetSizeDF.loc[SubsetSizeDF['CountryID'] == 'EU Total', 'CountryName'] = 'EU Total'
SubsetSizeDF.loc[SubsetSizeDF['CountryID'] == 'EU Total', 'CountryFlag'] = 'EU Total'
SubsetSizeDF = SubsetSizeDF[['CountryName', 'CountryFlag', 'N'] + subset_options + [x + ' ratio' for x in subset_options]]
# plot_table(SubsetSizeDF)

Final `Subset Weight` values look like this:

In [7]:
for subset in subset_options:
    SubsetSizeDF[subset+' weight'] = SubsetSizeDF[subset+' ratio'] / SubsetSizeDF.loc[SubsetSizeDF['CountryName'] == 'EU Total', subset+' ratio'].values[0]
    SubsetSizeDF[subset+' weight'] = SubsetSizeDF[subset+' weight'].apply(lambda x: round(x,3))
    
SubsetWeightsDF = SubsetSizeDF[['CountryName', 'CountryFlag']+[x + ' weight' for x in subset_options]]
plot_table(SubsetWeightsDF)

SubsetWeightsDF = pd.melt(
    SubsetWeightsDF, 
    id_vars=['CountryName', 'CountryFlag'], 
    value_vars=list(SubsetWeightsDF.columns[2:]),
    var_name='subset', 
    value_name='SubsetWeight'
).sort_values(['CountryName'])
SubsetWeightsDF['subset'] = SubsetWeightsDF['subset'].apply(lambda x: x[:-7])

After calculating the `Subset Weight` values I am going to get new value of `Percent` of responses for each subset by multiplying the original `Percent` value by the `Subset Weight`.

\begin{align*}
\textrm{Percent}_{\textrm{weighted}} = \textrm{Percent} \times \textrm{Weight}_{\textrm{subset}}
\end{align*}

After this I am adding a `Response Weight` which will show how 'good' the answer is. Let's take a look at imaginary example for two qestions for `Italy`:

| Country | Question | Answer | Percent (Weighted) |
|:-:|:-:|:-:|:-:|
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very widespread | 25 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly widespread | 15 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Dont know | 10 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly rare | 30 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very rare | 20 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Yes | 30 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Don't know | 20 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| No | 50 |
<br>
First step is going to be adding a weight to each answer in range `[-1, 1]` with `-1` being negative and `1` being positive. Looking at the first question `In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender?` the answer `Very rare` is the best possible scenario among all the answer options while `Very widespread` is the worst. So I'm assigning weight `-1` to `Very widespread` and `1` to `Very rare`. The rest weight of the answers are splited evenly (`-0.5` to `Fairly widespread` and `0.5` to `Fairly rare`). For example if there is 6 answer options, the weights look like this `[-1, -0.66, -0.33, 0.33, 0.66, 1]`. Answer option `Don't know` gets `np.NaN`. 

*Note: before I thought that `Don't know` answer weight should be `0` but then I changed it to `np.NaN` so it doesn't affect the total score since that answer is not really helpful. If you think it should be `0` I would love to hear your reasons.*

Then I compute the `Score` by following formula:

\begin{align*}
\textrm{Score} = \textrm{Weight}_{\textrm{response}} \times \frac{\textrm{Percent}_{\textrm{weighted}} }{100}
\end{align*}

In that case `Score` can also be in the range `[-1, 1]` with `-1` being negative and `1` being positive. The final `Total Block Score` for the country is just taking the average of all the scores.

\begin{align*}
\textrm{Total Block Score} = \textrm{Average(Score)}
\end{align*}

| Country | Question | Answer | Percent (Weighted) | Response Weight | Score |
|:-:|:-:|:-:|:-:|:-:|:-:|
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very widespread | 25 | -1 | -0.25 | 
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly widespread | 15 | -0.5 | -0.075 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Don't know | 10 | np.NaN | np.NaN |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Fairly rare | 30 | 0.5 | 0.15 |
| Italy | In your opinion, in the country where you live, how widespread is discrimination because a person is Transgender? | Very rare | 20 | 1 | 0.2 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Yes | 30 | -1 | -0.3 |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| Don't know | 20 | np.NaN | np.NaN |
| Italy | Have you personally felt discriminated against or harassed because of being perceived as Gay?	| No | 50 | 1 | 0.2 |
<br>
So the `Total Block Score` for this block is going to be `−0.0125`. After computing the scores for 4 blocks the `Total Score` is going to be the average of 4 `Total Block Scores`.

## <a id='dl'>3.2. Daily Life</a>

Let's start with `Daily Life` questions block where subjects answered questions about day to day living as a lesbian, gay, bisexual or transgender person.

In [8]:
def set_WidespreadRare_weight(df, questions_list, rare_negative=False):
    if rare_negative:
        weight = -1
    else:
        weight = 1
    for quesID in questions_list:
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Very widespread'), 'weight'] = -weight
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Fairly widespread'), 'weight'] = -weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Fairly rare'), 'weight'] = weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Very rare'), 'weight'] = weight
        

def set_YesNo_weight(df, questions_list, yes_negative=False):
    if yes_negative:
        weight = -1
    else:
        weight = 1
    for quesID in questions_list:
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Yes'), 'weight'] = weight
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'No'), 'weight'] = -weight
        
        
def set_AlwaysNever_weight(df, questions_list, alsways_negative=False):
    if alsways_negative:
        weight = -1
    else:
        weight = 1
    for quesID in questions_list:
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Always'), 'weight'] = weight
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Often'), 'weight'] = weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Rarely'), 'weight'] = -weight/2
        df.loc[(df['question_code'] == quesID) & (df['answer'] == 'Never'), 'weight'] = -weight

In [9]:
DailyLifeDF['weight'] = np.NaN
DailyLifeDF.loc[DailyLifeDF['answer'] == 'Don`t know', 'weight'] = np.NaN

set_WidespreadRare_weight(
    df=DailyLifeDF,
    questions_list=[
        'b1_a', 'b1_b', 'b1_c', 'b1_d', 'c1a_a', 'c1a_b', 'c1a_c', 'c1a_d', ''
    ],
    rare_negative=False
)
set_WidespreadRare_weight(
    df=DailyLifeDF,
    questions_list=[
        'b1_e', 'b1_g', 'b1_h', 'b1_i'
    ],
    rare_negative=True
)

DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_a') & (DailyLifeDF['answer'] == 'Never happened in the last sixth months'), 'weight'] = 1
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_a') & (DailyLifeDF['answer'] == 'Happened only once in the last six months'), 'weight'] = 0.5
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_a') & (DailyLifeDF['answer'] == '2-5 times in the last six months'), 'weight'] = -0.5
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_a') & (DailyLifeDF['answer'] == '6 times or more in the last six months'), 'weight'] = -1

DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_b') & (DailyLifeDF['answer'] == 'Never happened in the last sixth months'), 'weight'] = 1
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_b') & (DailyLifeDF['answer'] == 'Happened only once in the last six months'), 'weight'] = 0.5
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_b') & (DailyLifeDF['answer'] == '2-5 times in the last six months'), 'weight'] = -0.5
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_b') & (DailyLifeDF['answer'] == '6 times or more in the last six months'), 'weight'] = -1

DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_c') & (DailyLifeDF['answer'] == 'Never happened in the last sixth months'), 'weight'] = 1
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_c') & (DailyLifeDF['answer'] == 'Happened only once in the last six months'), 'weight'] = 0.5
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_c') & (DailyLifeDF['answer'] == '2-5 times in the last six months'), 'weight'] = -0.5
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'g4_c') & (DailyLifeDF['answer'] == '6 times or more in the last six months'), 'weight'] = -1

DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'h15') & (DailyLifeDF['answer'] == 'Yes'), 'weight'] = -1
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'h15') & (DailyLifeDF['answer'] == 'No'), 'weight'] = 1
DailyLifeDF.loc[(DailyLifeDF['question_code'] == 'h15') & (DailyLifeDF['answer'] == 'I did not need or use any benefits or services'), 'weight'] = np.NaN

In [10]:
DL_ScoreDF = calculate_and_plot_total_score(DailyLifeDF, title='Daily Life')

* The first place goes to **Denmark**🇩🇰 (which means that the responses about daily life for this country were more positive comparing to other countries).
* The last place goes to **Croatia**🇭🇷.

To make it clear, here is the (hidden) list of questions that I incuded in Score calculation and the list that I didn't include since they don't represent the county fitness for LGBT but rather subject related questions. Once again, if you feel like some of the question should be included please et me know!

In [11]:
show_InEx_questions(DailyLifeDF, included=True)

Unnamed: 0,Included Questions,question_code
11,"In the country where you have moved to (taken u p residence), have you or your partner been denied or restricted access to any benefits or services that would have been available for a different-sex spouse or partner because of you having a same-sex partner or spouse?",h15
17,"In your opinion, how widespread are assaults and harassment against lesbian, gay, bisexual and/or transgender people in the country where you live?",b1_d
18,"In your opinion, how widespread are casual jokes in everyday life about lesbian, gay, bisexual and/or transgender people in the country you live?",b1_b
19,"In your opinion, how widespread are expressions of hatred and aversion towards lesbian, gay, bisexual and/or transgender in public in the country where you live?",b1_c
20,"In your opinion, how widespread are positive measures to promote respect for the human rights of lesbian, gay or bisexual people in the country where you live? *",b1_h
21,"In your opinion, how widespread are positive measures to promote respect for the human rights of transgender people in the country where you live? *",b1_i
23,"In your opinion, how widespread is offensive language about lesbian, gay, bisexual and/or transgender people by politicians in the country where you live?",b1_a
24,"In your opinion, how widespread is public figures in politics, business, sports, etc being open about themselves being lesbian, gay, bisexual and/or transgender in the country where you live?",b1_g
25,"In your opinion, how widespread is same-sex partners holding hands in public in the country where you live?",b1_e
26,"In your opinion, in the country where you live, how widespread is discrimination because a person is Bisexual?",c1a_c


In [12]:
show_InEx_questions(DailyLifeDF, included=False)

Unnamed: 0,Excluded Questions,question_code
0,4 levels of being open about LGBT background *,openness_cat4
1,"All things considered, how satisfied would you say you are with your life these days? *",g5
2,Are you a parent or legal guardian of a child (or children)?,h9_1
3,Do any children (under the age of 18) live in your household?,h9
4,"Does your current partner know that you are L, G, B or T?",g1_a
5,"For each of the following types of discrimination, could you please specify whether, in your opinion, it is very rare, fairly rare, fairly widespread or very widespread in the country where you live?",c1_b
6,"For each of the following types of discrimination, could you please specify whether, in your opinion, it is very rare, fairly rare, fairly widespread or very widespread in the country where you live?",c1_c
7,"Have you been open about you being L, G, B or T? *",open_at_school
8,"Have you been open about you being L, G, B or T? *",open_at_work
9,"Have you ever moved to an EU country (and also taken up local residence) together with your same-sex partner, since you married or registered your partnership?",h14


## <a id='ra'>3.3. Right Awareness</a>

In [13]:
RightsAwarenessDF['weight'] = np.NaN
RightsAwarenessDF.loc[RightsAwarenessDF['answer'] == 'Don`t know', 'weight'] = np.NaN
RightsAwarenessDF.loc[RightsAwarenessDF['answer'] == 'No', 'weight'] = -1
RightsAwarenessDF.loc[RightsAwarenessDF['answer'] == 'Yes', 'weight'] = 1

In [14]:
RA_ScoreDF = calculate_and_plot_total_score(RightsAwarenessDF, title='Rights Awareness')

* The first place goes to **Finland🇫🇮** (which means that the people from the LGBT community are much more aware about their rights in that country comparing to other).
* The last place goes to **Greece**🇬🇷.

*Note how big the difference is between the `Rights Awareness Score` of Finland and Greece (`0.5` and `-0.03`)*

List of included/excluded questions:

In [15]:
show_InEx_questions(RightsAwarenessDF, included=True)

Unnamed: 0,Included Questions,question_code
0,"As far as you know, can same-sex couples legally marry and/or enter registered partnerships in the country where you live?",d5
1,Do you know of any organisation in the country where you live that can offer support or advice to people who have been discriminated against because they are Bisexual?,d3_c
2,Do you know of any organisation in the country where you live that can offer support or advice to people who have been discriminated against because they are Gay?,d3_b
3,Do you know of any organisation in the country where you live that can offer support or advice to people who have been discriminated against because they are Lesbian?,d3_a
4,Do you know of any organisation in the country where you live that can offer support or advice to people who have been discriminated against because they are Transgender?,d3_d
5,"In the country where you live, have you ever seen any programme or awareness campaign by either the government or a non-governmental organisation addressing - Discrimination against gay, lesbian and bisexual people?",d4_c
6,"In the country where you live, have you ever seen any programme or awareness campaign by either the government or a non-governmental organisation addressing - Discrimination against transgender people?",d4_d
7,"In the country where you live, have you ever seen any programme or awareness campaign by either the government or a non-governmental organisation addressing - Discrimination on the basis of gender?",d4_g
8,"In the country where you live, is there a law that forbids discrimination against persons because of their gender identity when applying for a job?",d2
9,"In the country where you live, is there a law that forbids discrimination against persons because of their sexual orientation when applying for a job?",d1


In [16]:
show_InEx_questions(RightsAwarenessDF, included=False)

Unnamed: 0,Excluded Questions,question_code


## <a id='disc'>3.4. Discrimination</a>

In [17]:
DiscriminationDF['weight'] = np.NaN

DiscriminationDF.loc[DiscriminationDF['answer'] == 'Don`t know', 'weight'] = np.NaN
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'None of the above'), 'weight'] = 0
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'I have never accessed healthcare services'), 'weight'] = 0
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Difficulty in gaining access to healthcare'), 'weight'] = -1
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Having to change general practitioners or other specialists due to their negative reaction'), 'weight'] = -1
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Receiving unequal treatment when dealing with medical staff'), 'weight'] = -1
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Foregoing treatment for fear of discrimination or intolerant reactions'), 'weight'] = -1
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Specific needs ignored (not taken into account)'), 'weight'] = -1
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Inappropriate curiosity'), 'weight'] = -1
DiscriminationDF.loc[(DiscriminationDF['question_code'] == 'c10') & (DiscriminationDF['answer'] == 'Pressure or being forced to undergo any medical or psychological test'), 'weight'] = -1

set_YesNo_weight(
    df=DiscriminationDF,
    questions_list=[
        'c2a_a', 'c2a_b', 'c2a_c', 'c2a_d', 'c2_b', 'c2_c', 'c4_a', 'c4_b', 
        'c4_c', 'c4_d', 'c4_e', 'c4_f', 'c4_g', 'c4_h', 'c4_i', 'c4_j', 'c4_k', 'discrim1yr'
    ],
    yes_negative=True
)

set_AlwaysNever_weight(
    df=DiscriminationDF,
    questions_list=[
        'c8a_b', 'c8a_c', 'c8a_d', 'c8a_e', 'c8a_f', 'c9_b', 'c9_c', 'c9_d', 'c9_e'
    ],
    alsways_negative=True
)

set_AlwaysNever_weight(
    df=DiscriminationDF,
    questions_list=[
        'c8a_a', 'c9_a'
    ],
    alsways_negative=False
)

In [18]:
Disc_ScoreDF = calculate_and_plot_total_score(DiscriminationDF, title='Discrimination')

* The first place goes to **Finland🇫🇮** (which means that the people from the LGBT feel less discriminated in that country comparing to other countries).
* The last place goes to **Cyprus**🇨🇾 (no surpise if you take into the account the large amount of Russians/Ukrainians living there).

List of included/excluded questions:

In [19]:
show_InEx_questions(DiscriminationDF, included=True)

Unnamed: 0,Included Questions,question_code
0,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T at a cafe, restaurant, bar or nightclub?",c4_g
1,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T at a shop?",c4_h
2,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T at a sport or fitness club?",c4_j
3,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T at work?",c4_b
4,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T by healthcare personnel (eg a receptionist, nurse or doctor)?",c4_d
5,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T by school / university personnel?",c4_f
6,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T by social service personnel?",c4_e
7,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T in a bank or insurance company (by bank or company personnel)?",c4_i
8,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T in any of the following situations? *",discrim1yr
9,"During the last 12 months, have you personally felt discriminated against because of being L, G, B or T when looking for a house or apartment to rent or buy?",c4_c


In [20]:
show_InEx_questions(DiscriminationDF, included=False)

Unnamed: 0,Excluded Questions,question_code
30,"Thinking about the most recent incident of discrimination, did you or anyone else report it anywhere?",c6
31,Why the most recent incident of discrimination was not reported?,c7


## <a id='vah'>3.5. Violence and Harassment</a>

In [21]:
ViolenceAndHarassmentDF['weight'] = np.NaN

ViolenceAndHarassmentDF.loc[ViolenceAndHarassmentDF['answer'] == 'Don`t know', 'weight'] = np.NaN
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'e1') & (ViolenceAndHarassmentDF['answer'] == 'I do not have a same-sex partner'), 'weight'] = np.NaN
set_YesNo_weight(
    df=ViolenceAndHarassmentDF,
    questions_list=[
        'e1', 'e2', 'f1_a', 'f1_b', 'fa1_5', 'fa2_5', 'fb1_5', 'fb2_5'
    ],
    yes_negative=True
)
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'More than ten times'), 'weight'] = -1
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'Six to ten times'), 'weight'] = -0.86
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'Five times'), 'weight'] = -0.71
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'Four times'), 'weight'] = -0.57
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'Three times'), 'weight'] = -0.43
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'Twice'), 'weight'] = -0.29
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fa1_3') & (ViolenceAndHarassmentDF['answer'] == 'Once'), 'weight'] = -0.14

ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'More than ten times'), 'weight'] = -1
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'Six to ten times'), 'weight'] = -0.86
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'Five times'), 'weight'] = -0.71
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'Four times'), 'weight'] = -0.57
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'Three times'), 'weight'] = -0.43
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'Twice'), 'weight'] = -0.29
ViolenceAndHarassmentDF.loc[(ViolenceAndHarassmentDF['question_code'] == 'fb1_3') & (ViolenceAndHarassmentDF['answer'] == 'Once'), 'weight'] = -0.14

In [22]:
VaH_ScoreDF = calculate_and_plot_total_score(ViolenceAndHarassmentDF, title='Violence and Harassment')

* The first place goes to **Luxembourg🇫🇮** (which means that the people from the LGBT are beinge the subject of harassment or violation less often in that country comparing to other countries).
* The last place goes to **Croatia**🇭🇷.

List of included/excluded questions:

In [23]:
show_InEx_questions(ViolenceAndHarassmentDF, included=True)

Unnamed: 0,Included Questions,question_code
2,"Do you avoid certain places or locations for fear of being assaulted, threatened or harassed because you are L, G, B or T?",e2
3,"Do you avoid holding hands in public with a same-sex partner for fear of being assaulted, threatened of harassed?",e1
4,"Do you think the LAST incident of harassment in the past 12 months happened partly or completely because you were perceived to be L, G, B or T?",fb1_5
5,"Do you think the LAST incident of physical / sexual attack or threat of violence in the past 12 months happened partly or completely because you were perceived to be L, G, B or T?",fa1_5
6,"Do you think the MOST SERIOUS incident of harassment happened partly or completely because you were perceived to be L, G, B or T?",fb2_5
7,"Do you think this physical / sexual attack or threat happened partly or completely because you were perceived to be L, G, B or T?",fa2_5
8,How many times did somebody harass you in the last 12 months?,fb1_3
9,How many times did somebody physically/sexually attack or threaten you with violence in the last 12 months in the European Union / in this country?,fa1_3
10,"In the last 5 years, have you been: personally harassed by someone or a group for any reason in a way that really annoyed, offended or upset you - either at work, home, on the street, on public transport, in a shop, in an office or on the internet ?",f1_b
11,"In the last 5 years, have you been: physically/sexually attacked or threatened with violence at home or elsewhere (street, on public transport, at your workplace, etc) for any reason?",f1_a


In [24]:
show_InEx_questions(ViolenceAndHarassmentDF, included=False)

Unnamed: 0,Excluded Questions,question_code
0,Did you or anyone else report the last incident of physical / sexual attack or threat of violence to any of the following organisations / institutions? *,fa1_13
1,Did you or anyone else report the last incident of physical / sexual attack or threat of violence to the police?,fa1_11
12,LAST incident of harassment in the past 12 months - Did you or anyone else report it to any of the following organisations / institutions? *,fb1_13
13,LAST incident of harassment in the past 12 months - Did you or anyone else report it to the police?,fb1_11
14,LAST incident of harassment in the past 12 months - Do you think the perpetrator(s) was ...? (sexual orientation),fb1_9
15,"LAST incident of harassment in the past 12 months - Was the perpetrator alone, or was there more than one perpetrator?",fb1_6
16,LAST incident of harassment in the past 12 months - What was the gender of the perpetrator(s)?,fb1_8
17,LAST incident of harassment in the past 12 months - Where did it happen?,fb1_10
18,LAST incident of harassment in the past 12 months - Why did you not report it to the police?,fb1_12
19,LAST incident of harassment in the past 12 months - who was the perpetrator?,fb1_7


## <a id='overall'>3.6. Overall rank</a>

By taking the average of 4 scores we can rescale that values to get the final `Total Rank`.

In [25]:
OverallRang_df = pd.merge(
    DL_ScoreDF[['CountryFlag', 'CountryName', 'Daily Life Rank']],
    VaH_ScoreDF[['CountryFlag', 'CountryName', 'Violence and Harassment Rank']]).merge(
    Disc_ScoreDF[['CountryFlag', 'CountryName', 'Discrimination Rank']]).merge(
    RA_ScoreDF[['CountryFlag', 'CountryName', 'Rights Awareness Rank']])

OverallRang_df['Average Rank'] = OverallRang_df.mean(axis=1)
OverallRang_df = OverallRang_df.sort_values(['Average Rank'], ascending=True).reset_index(drop=True)

OverallRang_df['Total Rank'] = OverallRang_df['Average Rank'].rank(method='dense', ascending=True).astype('int')
plot_table(OverallRang_df)

So!

* The absolute winners are **Finland**🇫🇮, **Denmark**🇩🇰, **Sweden**🇸🇪, **Netherlands**🇳🇱.
* The absolute losers are **Lithuania**🇱🇹, **Romania**🇷🇴, **Cyprus**🇨🇾.

Here is something to think about when you are considering a destination for travelling/relocation.

# <a id='lbgt'>4. What the LGBT community says</a>

After I got the `Total Rank` for each country I want to look at some particular responses to find out how does LGBT community respond to living in EU countries.

## <a id='satisfied'>4.1.  Do people fell satisfied in EU countries?</a>

There was a question "**All things considered, how satisfied would you say you are with your life these days?**" in Daily Life questions block where subjects could pick a value from 0 to 10 (10 being the most satisfied) of how satisfied they feel. Using the same methodology I am going to find a score for this single question and compare it to the `Total Rank` from previous section.

In [26]:
subset_df = DailyLifeDF[DailyLifeDF['question_label'] == 'All things considered, how satisfied would you say you are with your life these days? *'].copy()
subset_df = subset_df.merge(SubsetWeightsDF, how='left', on=['CountryName', 'CountryFlag', 'subset'])
subset_df['answer'] = subset_df['answer'].astype('int') / 10
subset_df['percentage'] = subset_df['percentage'] * subset_df['SubsetWeight'] / 100
subset_df['Daily Life Score'] = subset_df['answer'] * subset_df['percentage'] 
subset_df = subset_df.groupby(['CountryName'], as_index=False)['Daily Life Score'].mean()
subset_df.columns = ['CountryName', 'Score']
subset_df = subset_df.merge(DL_ScoreDF[['CountryName', 'CountryFlag']])
subset_df['Score'] = subset_df['Score'].apply(lambda x: round(x,4))
subset_df = subset_df.merge(OverallRang_df[['CountryName', 'Total Rank']])
subset_df = subset_df[['CountryFlag', 'CountryName', 'Score', 'Total Rank']].sort_values(['Score'], ascending=False)

trace0 = go.Choropleth(
    colorscale = 'Greens', #'YlOrRd',
    autocolorscale = False,
    reversescale=True,
    locations = subset_df['CountryName'],
    text=subset_df['CountryFlag'],
    z = subset_df['Score'],
    locationmode = 'country names',
    colorbar = go.choropleth.ColorBar(
        title = 'Score')
)

layout = go.Layout(
    title='How satisfied would you say you are with your life these days?',
    geo = go.layout.Geo(
        scope='europe',
        showlakes=False),
)

fig = go.Figure(data=[trace0], layout=layout)
iplot(fig)    

In [27]:
subset_df['Satisfaction Rank'] = subset_df['Score'].rank(method='dense', ascending=False).astype('int')
subset_df['Rank Diff'] = subset_df['Satisfaction Rank'] - subset_df['Total Rank'] 
plot_table(subset_df)

\begin{align*}
\textrm{Rank Diff} = \textrm{Satisfaction Rank} - \textrm{Total Rank}
\end{align*}

In such way, `-` sign in `Satisfaction Rank` column means that LGBT community feel more satisfied in that county as I would guess from `Total Rank` value. `+` sign tells the opposite.

## <a id='open'>4.2. Are people being open about their orientation?</a>

Next question "**4 levels of being open about LGBT background**" from Daiy Life questions block allow to see how open the LGBT community is in the country they live in. The possible answers are Never Open, Rarerly Open, Fairly Open, Very Open.

In [28]:
OpenessDF = DailyLifeDF[DailyLifeDF['question_label'] == '4 levels of being open about LGBT background *']
OpenessDF = OpenessDF.merge(DL_ScoreDF[['CountryName', 'CountryFlag']])
OpenessDF = OpenessDF.merge(OverallRang_df[['CountryName', 'Total Rank']])
OpenessDF = OpenessDF.sort_values(['Total Rank'], ascending=False)



colors = cl.scales['5']['seq']['Greens']
colors = colors[1:]
answers = ['Never Open', 'Rarely Open', 'Fairly open', 'Very open']
data = []
buttons = []

for (i,subset) in enumerate(OpenessDF['subset'].unique()):
    if i == 0:
        visible = True
    else:
        visible = False
    subset_df = OpenessDF[OpenessDF['subset'] == subset]
    subset_df = subset_df.sort_values(['Total Rank'], ascending=False)
    sum_df = subset_df.groupby(['CountryName'], as_index=False)[['percentage']].sum()
    sum_df.rename(columns={'percentage': 'sum_perc'}, inplace=True)
    subset_df = subset_df.merge(sum_df, how='left')
    for (j,ans_opt) in enumerate(answers):
        trace = go.Bar(
            name=ans_opt, 
            y=subset_df['CountryName'][subset_df['answer'] == ans_opt] +' ('+ subset_df['CountryFlag'][subset_df['answer'] == ans_opt] +')', 
            x=subset_df['percentage'][subset_df['answer'] == ans_opt]/subset_df['sum_perc'][subset_df['answer'] == ans_opt],
            orientation='h',
            legendgroup=ans_opt,
            marker=dict(color=colors[j]),
            visible=visible
        )
        data.append(trace)
        
    visible_list = [False] * (len(subset_df['answer'].unique()) * len(OpenessDF['subset'].unique()) + 4)
    visible_list[i*4:i*4+4] = [True] * 4
    buttons_temp = dict(
        label = subset,
        method = 'update',
        args = [
            {'visible': visible_list}
        ]
    )
    buttons.append(buttons_temp)

temp_df = OpenessDF.groupby(['CountryName', 'CountryFlag', 'answer', 'Total Rank'], as_index=False)[['percentage']].mean()
temp_df = temp_df.sort_values(['Total Rank'], ascending=False)
sum_df = OpenessDF.groupby(['CountryName','answer'], as_index=False)[['percentage']].mean().groupby(['CountryName'], as_index=False)[['percentage']].sum()
sum_df.rename(columns={'percentage': 'sum_perc'}, inplace=True)
temp_df = temp_df.merge(sum_df, how='left')
for (j,ans_opt) in enumerate(answers):
    trace = go.Bar(
        name=ans_opt, 
        y=temp_df['CountryName'][temp_df['answer'] == ans_opt] +' ('+ temp_df['CountryFlag'][temp_df['answer'] == ans_opt]+')', 
        x=temp_df['percentage'][temp_df['answer'] == ans_opt]/temp_df['sum_perc'][temp_df['answer'] == ans_opt],
        orientation='h',
        legendgroup=ans_opt,
        marker=dict(color=colors[j]),
        visible=visible
    )
    data.append(trace)

visible_list = [False] * (len(subset_df['answer'].unique()) * len(OpenessDF['subset'].unique()) + 4)
visible_list[-4:] = [True] * 4
buttons_temp = dict(
    label = 'All',
    method = 'update',
    args = [
        {'visible': visible_list}
    ]
)
buttons.append(buttons_temp)

updatemenus = list([
    dict(type="buttons",
         active=0,
         buttons=buttons,
         direction = "left",
         x=0.1,
         xanchor="left",
         y=1.1,
         yanchor="top"
        )
])

layout = go.Layout(
    updatemenus=updatemenus,
    annotations=[     
        go.layout.Annotation(
            text="Subset:", 
            showarrow=False,
            x=0, y=1.08, 
            yref="paper", align="left"
        )
    ],
    margin=dict(l=200, t=120),
    height=700,
    barmode='stack',
    title='4 levels of being open about LGBT background',
    xaxis=dict(title='Ratio')
    )
  
fig = go.Figure(data=data, layout=layout)
layout = json.dumps(layout, cls=PlotlyJSONEncoder)
iplot(fig, layout)

The countries in the plot are sorted by the `Total Rank` (the top countries have the highest rank, the bottom countries have the lowest rank). You can notice how the 'openess ratio' is correlated with country `Total Rank` - the higher the rank, the higher is the ratio of 'open' people.

In [29]:
data = []
buttons = []
for (j,ans_opt) in enumerate(answers):
    if j == 0:
        visible = True
    else:
        visible = False
    x=temp_df['Total Rank'][temp_df['answer'] == ans_opt]
    y=temp_df['percentage'][temp_df['answer'] == ans_opt]
    corrCoef = np.corrcoef(temp_df['Total Rank'][temp_df['answer'] == ans_opt], temp_df['percentage'][temp_df['answer'] == ans_opt])[0,1]
    trace0 = go.Scatter(
        name=ans_opt,
        x=x, 
        y=y,
        mode="markers",
        marker=dict(color=colors[j]),
        visible=visible
    )
    
    trace1 = go.Scatter(
        x=[x.max()*0.9],
        y=[y.max()*0.8],
        mode='text',
        text='Correlation: {}'.format(round(corrCoef,2)),
        textfont=dict(
          family='sans serif',
          size=16,
          color='#FF4136'
        ),
        name=ans_opt,
        visible=visible
  )
    data.append(trace0)
    data.append(trace1)
    
    visible_list = [False] * len(answers) * 2
    visible_list[j*2:j*2+2] = [True] * 2
    buttons_temp = dict(
        label = ans_opt,
        method = 'update',
        args = [
            {'visible': visible_list}
        ]
    )
    buttons.append(buttons_temp)

updatemenus = list([
    dict(
        type="buttons",
        active=0,
        buttons=buttons,
        direction = "left",
        x=0.3,
        xanchor="left",
        y=1.1,
        yanchor="top"
    )
])

layout = go.Layout(
    showlegend=False,
    updatemenus=updatemenus,
    annotations=[     
        go.layout.Annotation(
            text="Answer Option:", 
            showarrow=False,
            x=1, y=1.08, 
            yref="paper", align="left"
        )
    ],
    title='Correlation between "openess" ratio and Total Rank',
    xaxis=dict(title='Country Total Rank'),
    yaxis=dict(title='Percent of people answered')
    )
  
fig = go.Figure(data=data, layout=layout)
layout = json.dumps(layout, cls=PlotlyJSONEncoder)
iplot(fig, layout)

In [30]:
TotalOpen_df = OpenessDF.groupby(['answer'], as_index=False)[['percentage']].mean()
TotalOpen_df['subset'] = 'Total'
SubsetOpen_df = OpenessDF.groupby(['subset', 'answer'], as_index=False)[['percentage']].mean()
SubsetOpen_df = SubsetOpen_df.append(TotalOpen_df,sort=True)
SubsetOpen_df['percentage'] = SubsetOpen_df['percentage'].apply(lambda x: round(x,1))

data = []
for (i,ans_opt) in enumerate(SubsetOpen_df['answer'].unique()):
    trace = go.Bar(
        name=ans_opt, 
        y=SubsetOpen_df['subset'][SubsetOpen_df['answer'] == ans_opt], 
        x=SubsetOpen_df['percentage'][SubsetOpen_df['answer'] == ans_opt]/100,
        orientation='h',
        legendgroup=ans_opt,
        marker=dict(color=colors[i])
    )
    data.append(trace)

layout = go.Layout(
    margin=dict(l=100),
    height=400,
    barmode='stack',
    title='4 levels of being open about LGBT background<br>(Total for all countries)',
#     xaxis=dict(title='Ratio'),
    legend_orientation="h"
)

fig = go.Figure(data=data, layout=layout)
layout = json.dumps(layout, cls=PlotlyJSONEncoder)
iplot(fig, layout)

* Gay Men have the highest `Very Open` rate (23%) while Bisexual Men have the highest `Never Open` rate (75%).
* In total, about 27% of people from LGBT community being open about their orientaion (`Very Open` + `Fairly Open`), especially in Netherlands🇳🇱 (15%).

## <a id='comf'>4.3. What would allow to live more comfortable?</a>

There were a series of questions "**What would allow you to be more comfortable living as a LGB person?**" with 8 different options that allow to explore what is missing in current situation in the country for the LGBT community to feel better.

In [31]:
question_codes = ['b2_a', 'b2_f', 'b2_b', 'b2_d', 'b2_c', 'b2_h', 'b2_g', 'b2_e']
ComfortDF = DailyLifeDF[DailyLifeDF['question_code'].apply(lambda x: x in question_codes)].reset_index(drop=True)
ComfortDF['question_label'] = ComfortDF['question_label'].apply(lambda x: x.split('? ')[1])
ComfortDF = ComfortDF.groupby(['CountryName', 'CountryFlag', 'question_code', 'question_label', 'answer'], as_index=False)[['percentage']].mean()
ComfortDF['question_label'] = ComfortDF['question_label'].apply(lambda x: x.replace('lesbian, gay and bisexual', 'LGB'))
ComfortDF['answer'] = ComfortDF['answer'].apply(lambda x: x.replace('lesbian, gay and bisexual', 'LGB'))

In [32]:
ComfortDF = ComfortDF.merge(OverallRang_df[['CountryName', 'Total Rank']])
ComfortDF = ComfortDF.sort_values(['Total Rank'], ascending=False)

answers = ['Don`t know', 'Strongly disagree', 'Disagree', 'Current situation is fine', 'Agree', 'Strongly agree']
colors = cl.scales[str(len(answers))]['seq']['Greens']
data = []
buttons = []

for (i,question) in enumerate(ComfortDF['question_label'].unique()):
    if i == 0:
        visible = True
    else:
        visible = False
    subset_df = ComfortDF[ComfortDF['question_label'] == question]
    subset_df = subset_df.sort_values(['Total Rank'], ascending=False)
    sum_df = subset_df.groupby(['CountryName'], as_index=False)[['percentage']].sum()
    sum_df.rename(columns={'percentage': 'sum_perc'}, inplace=True)
    subset_df = subset_df.merge(sum_df, how='left')
    for (j,ans_opt) in enumerate(answers):
        trace = go.Bar(
            name=ans_opt, 
            y=subset_df['CountryName'][subset_df['answer'] == ans_opt] +' ('+ subset_df['CountryFlag'][subset_df['answer'] == ans_opt] +')', 
            x=subset_df['percentage'][subset_df['answer'] == ans_opt]/subset_df['sum_perc'][subset_df['answer'] == ans_opt],
            orientation='h',
            legendgroup=ans_opt,
            marker=dict(color=colors[j]),
            visible=visible
        )
        data.append(trace)
        
    visible_list = [False] * (len(answers) * len(ComfortDF['question_label'].unique()))
    visible_list[i*6:i*6+6] = [True] * 6
    buttons_temp = dict(
        label = question,
        method = 'update',
        args = [
            {'visible': visible_list}
        ]
    )
    buttons.append(buttons_temp)

updatemenus = list([
    dict(type="dropdown",
         active=0,
         showactive=True,
         buttons=buttons,
         direction = "down",
         x=0.0,
         xanchor="left",
         y=1.1,
         yanchor="top"
        )
])

layout = go.Layout(
    updatemenus=updatemenus,
    annotations=[     
        go.layout.Annotation(
            text="Question:", 
            showarrow=False,
            x=0.01, y=1.08, 
            yref="paper", align="left"
        )
    ],
    margin=dict(l=200, t=120),
    height=700,
    barmode='stack',
    title='What would allow you to be more comfortable living as a LGB person?',
    xaxis=dict(title='Ratio')
    )
  
fig = go.Figure(data=data, layout=layout)
layout = json.dumps(layout, cls=PlotlyJSONEncoder)
iplot(fig, layout)

In [33]:
ComfortDFTotal_df = ComfortDF.groupby(['question_label', 'answer', 'question_code'], as_index=False)[['percentage']].mean()
ComfortDFTotal_df = ComfortDFTotal_df.sort_values(['question_code'], ascending=False)
ComfortDFTotal_df['percentage'] = ComfortDFTotal_df['percentage'].apply(lambda x: round(x,1))

data = []
for (i,ans_opt) in enumerate(answers):
    trace = go.Bar(
        name=ans_opt, 
        y=ComfortDFTotal_df['question_code'][ComfortDFTotal_df['answer'] == ans_opt], 
        x=ComfortDFTotal_df['percentage'][ComfortDFTotal_df['answer'] == ans_opt]/100,
        orientation='h',
        legendgroup=ans_opt,
        marker=dict(color=colors[i])
    )
    data.append(trace)

layout = go.Layout(
    height=400,
#     margin=dict(l=100),
    barmode='stack',
    title='What would allow you to be more comfortable living as a LGB?<br>(Total for all countries)',
#     xaxis=dict(title='Ratio'),
    yaxis=dict(automargin=True),
    legend_orientation="h"
)

fig = go.Figure(data=data, layout=layout)
layout = json.dumps(layout, cls=PlotlyJSONEncoder)
iplot(fig, layout)

plot_table(ComfortDFTotal_df[['question_code', 'question_label']].drop_duplicates().sort_values(['question_code']))

* High ratio of people (88%) agreed that **Measures implemented at school to respect LGB people** would improve the situation (especially in Italy🇮🇹 with 95%)
* 16% of people feel satisfied with the **The possibility to marry and/or register a partnership** (top countries are: Netherlands🇳🇱, Belgium🇧🇪 and Portugal🇵🇹)
* 9% of people don't think that **The possibility to foster / adopt children** would change a lot.

# <a id='end'>5. Conclusions</a>

So I estimated the country ranks of goodness for LGBT community, showed in what countries people are more open about their orientation and what do people think would make their life better. It's just a small piece of insights that could be extracted from this survey so many more questions can be answered. You can also check the [official report](https://fra.europa.eu/en/publication/2014/eu-lgbt-survey-european-union-lesbian-gay-bisexual-and-transgender-survey-main) with survey analysis results.