### Project: The Legalisation of Same-Sex Marriage

#### <b> Description </b>
Creating a time slider visual using Python to show when a country the legal status of same-sex marriages.

#### <b> Part 2.2 </b>
Revising existing data to create a full country-year dataframe, unlike Part 2, to avoid non-hoverable countries.

##### by Sneha Verma

-------------------

To ensure that every country is visible/hover-able for every year, we need a final data source that has all countries for all years that are in the source dataset. If data does not exist on the legalisation of same-sex marriage for a specific country and year, the value will be "Data does not exist".

First, create two lists of all distinct countries and distinct years.

In [1]:
# Import packages
import pandas as pd
import numpy as np

In [2]:
# Load country_legalisation_full csv file.

country_legalisation = pd.read_csv("country_legalisation_full.csv")
print(country_legalisation.head())

        Entity Code  Year Same-sex marriage      Country   Latitude  Longitude
0  Afghanistan  AFG  1971            Banned  Afghanistan  33.768006  66.238514
1  Afghanistan  AFG  1972            Banned  Afghanistan  33.768006  66.238514
2  Afghanistan  AFG  1973            Banned  Afghanistan  33.768006  66.238514
3  Afghanistan  AFG  1974            Banned  Afghanistan  33.768006  66.238514
4  Afghanistan  AFG  1975            Banned  Afghanistan  33.768006  66.238514


In [5]:
# countries list for all distinct country names.
countries = country_legalisation[["Entity", 'Code']].drop_duplicates().reset_index(drop=True)

print(countries)

          Entity Code
0    Afghanistan  AFG
1        Albania  ALB
2        Algeria  DZA
3        Andorra  AND
4         Angola  AGO
..           ...  ...
190    Venezuela  VEN
191      Vietnam  VNM
192        Yemen  YEM
193       Zambia  ZMB
194     Zimbabwe  ZWE

[195 rows x 2 columns]


In [17]:
# years list for all distinct year values.

years = pd.DataFrame(
        {'Year': np.sort(
                country_legalisation["Year"].unique()
            )}
    )
print(years)

    Year
0   1950
1   1951
2   1952
3   1953
4   1954
..   ...
71  2021
72  2022
73  2023
74  2024
75  2025

[76 rows x 1 columns]


Next, create a cartesian product of these two dataframes to create a new resultant dataframe that has all countries for all years.

In [21]:
# Cross merge the two dataframes.

country_year = countries.merge(years, how="cross")
print(country_year)

            Entity Code  Year
0      Afghanistan  AFG  1950
1      Afghanistan  AFG  1951
2      Afghanistan  AFG  1952
3      Afghanistan  AFG  1953
4      Afghanistan  AFG  1954
...            ...  ...   ...
14815     Zimbabwe  ZWE  2021
14816     Zimbabwe  ZWE  2022
14817     Zimbabwe  ZWE  2023
14818     Zimbabwe  ZWE  2024
14819     Zimbabwe  ZWE  2025

[14820 rows x 3 columns]


Merge with the legalisation data and replace missing values.

In [27]:
# Merge cuntry_legalisation with country_year to get legal status

country_year_legalisation = country_year.merge(
    country_legalisation,
    on=["Entity", "Code", "Year"],
    how="left"
)
    
print(country_year_legalisation)

            Entity Code  Year Same-sex marriage   Country   Latitude  \
0      Afghanistan  AFG  1950               NaN       NaN        NaN   
1      Afghanistan  AFG  1951               NaN       NaN        NaN   
2      Afghanistan  AFG  1952               NaN       NaN        NaN   
3      Afghanistan  AFG  1953               NaN       NaN        NaN   
4      Afghanistan  AFG  1954               NaN       NaN        NaN   
...            ...  ...   ...               ...       ...        ...   
14815     Zimbabwe  ZWE  2021            Banned  Zimbabwe -18.455496   
14816     Zimbabwe  ZWE  2022            Banned  Zimbabwe -18.455496   
14817     Zimbabwe  ZWE  2023            Banned  Zimbabwe -18.455496   
14818     Zimbabwe  ZWE  2024            Banned  Zimbabwe -18.455496   
14819     Zimbabwe  ZWE  2025            Banned  Zimbabwe -18.455496   

       Longitude  
0            NaN  
1            NaN  
2            NaN  
3            NaN  
4            NaN  
...          ...  
14

There are NaN vlaues in the same-sex marriage column for the years where there was no data. We will replace them with "Data does not exist." There are missing values in the longitude and latitude columns as well but we do not need to replace them since Plotly uses the Code to create the map.

In [28]:
# Replace missing values in same-sex marriage column.

country_year_legalisation["Same-sex marriage"] = country_year_legalisation["Same-sex marriage"].fillna(
    value="Data does not exist"
)

print(country_year_legalisation)

            Entity Code  Year    Same-sex marriage   Country   Latitude  \
0      Afghanistan  AFG  1950  Data does not exist       NaN        NaN   
1      Afghanistan  AFG  1951  Data does not exist       NaN        NaN   
2      Afghanistan  AFG  1952  Data does not exist       NaN        NaN   
3      Afghanistan  AFG  1953  Data does not exist       NaN        NaN   
4      Afghanistan  AFG  1954  Data does not exist       NaN        NaN   
...            ...  ...   ...                  ...       ...        ...   
14815     Zimbabwe  ZWE  2021               Banned  Zimbabwe -18.455496   
14816     Zimbabwe  ZWE  2022               Banned  Zimbabwe -18.455496   
14817     Zimbabwe  ZWE  2023               Banned  Zimbabwe -18.455496   
14818     Zimbabwe  ZWE  2024               Banned  Zimbabwe -18.455496   
14819     Zimbabwe  ZWE  2025               Banned  Zimbabwe -18.455496   

       Longitude  
0            NaN  
1            NaN  
2            NaN  
3            NaN  
4   

In [29]:
# Load dataframe to csv

country_year_legalisation.to_csv("country_year_legalisation.csv", index=False)