### Project: The Legalisation of Same-Sex Marriage

#### <b> Description </b>
Creating a time slider visual using Python to show when a country the legal status of same-sex marriages.

#### <b> Part 2.2 </b>
Revising existing data to create a full country-year dataframe, unlike Part 2, to avoid non-hoverable countries.

##### by Sneha Verma

-------------------

To ensure that every country is visible/hover-able for every year, we need a final data source that has all countries for all years that are in the source dataset. If data does not exist on the legalisation of same-sex marriage for a specific country and year, the value will be "Data does not exist".

First, create two lists of all distinct countries and distinct years.

In [40]:
# Import packages
import pandas as pd
import numpy as np

In [41]:
# Load country_legalisation_full csv file.

country_legalisation = pd.read_csv("country_legalisation_full.csv")
print(country_legalisation.head())

  Code  Year Same-sex marriage      Country   Latitude  Longitude
0  AFG  1971            Banned  Afghanistan  33.768006  66.238514
1  AFG  1972            Banned  Afghanistan  33.768006  66.238514
2  AFG  1973            Banned  Afghanistan  33.768006  66.238514
3  AFG  1974            Banned  Afghanistan  33.768006  66.238514
4  AFG  1975            Banned  Afghanistan  33.768006  66.238514


In [42]:
# countries list for all distinct country names.
countries = country_legalisation[["Country", 'Code', "Latitude", "Longitude"]].drop_duplicates().reset_index(drop=True)

print(countries)

         Country Code   Latitude   Longitude
0    Afghanistan  AFG  33.768006   66.238514
1        Albania  ALB  41.000028   19.999962
2        Algeria  DZA  28.000027    2.999983
3        Andorra  AND  42.540717    1.573203
4         Angola  AGO -11.877577   17.569124
..           ...  ...        ...         ...
190    Venezuela  VEN   8.001871  -66.110932
191      Vietnam  VNM  15.926666  107.965086
192        Yemen  YEM  16.347124   47.891527
193       Zambia  ZMB -14.518912   27.558988
194     Zimbabwe  ZWE -18.455496   29.746841

[195 rows x 4 columns]


In [43]:
# years list for all distinct year values.

years = pd.DataFrame(
        {'Year': np.sort(
                country_legalisation["Year"].unique()
            )}
    )
print(years)

    Year
0   1950
1   1951
2   1952
3   1953
4   1954
..   ...
71  2021
72  2022
73  2023
74  2024
75  2025

[76 rows x 1 columns]


Next, create a cartesian product of these two dataframes to create a new resultant dataframe that has all countries for all years.

In [44]:
# Cross merge the two dataframes.

country_year = countries.merge(years, how="cross")
print(country_year)

           Country Code   Latitude  Longitude  Year
0      Afghanistan  AFG  33.768006  66.238514  1950
1      Afghanistan  AFG  33.768006  66.238514  1951
2      Afghanistan  AFG  33.768006  66.238514  1952
3      Afghanistan  AFG  33.768006  66.238514  1953
4      Afghanistan  AFG  33.768006  66.238514  1954
...            ...  ...        ...        ...   ...
14815     Zimbabwe  ZWE -18.455496  29.746841  2021
14816     Zimbabwe  ZWE -18.455496  29.746841  2022
14817     Zimbabwe  ZWE -18.455496  29.746841  2023
14818     Zimbabwe  ZWE -18.455496  29.746841  2024
14819     Zimbabwe  ZWE -18.455496  29.746841  2025

[14820 rows x 5 columns]


Merge with the legalisation data and replace missing values.

In [47]:
# Merge cuntry_legalisation with country_year to get legal status

country_year_legalisation = country_year.merge(
    country_legalisation,
    on=["Country", "Code", "Year", "Latitude", "Longitude"],
    how="left"
)
    
print(country_year_legalisation)

           Country Code   Latitude  Longitude  Year Same-sex marriage
0      Afghanistan  AFG  33.768006  66.238514  1950               NaN
1      Afghanistan  AFG  33.768006  66.238514  1951               NaN
2      Afghanistan  AFG  33.768006  66.238514  1952               NaN
3      Afghanistan  AFG  33.768006  66.238514  1953               NaN
4      Afghanistan  AFG  33.768006  66.238514  1954               NaN
...            ...  ...        ...        ...   ...               ...
14815     Zimbabwe  ZWE -18.455496  29.746841  2021            Banned
14816     Zimbabwe  ZWE -18.455496  29.746841  2022            Banned
14817     Zimbabwe  ZWE -18.455496  29.746841  2023            Banned
14818     Zimbabwe  ZWE -18.455496  29.746841  2024            Banned
14819     Zimbabwe  ZWE -18.455496  29.746841  2025            Banned

[14820 rows x 6 columns]


There are NaN vlaues in the same-sex marriage column for the years where there was no data. We will replace them with "Data does not exist."

In [48]:
# Replace missing values in same-sex marriage column.

country_year_legalisation["Same-sex marriage"] = country_year_legalisation["Same-sex marriage"].fillna(
    value="Data does not exist"
)

print(country_year_legalisation)

           Country Code   Latitude  Longitude  Year    Same-sex marriage
0      Afghanistan  AFG  33.768006  66.238514  1950  Data does not exist
1      Afghanistan  AFG  33.768006  66.238514  1951  Data does not exist
2      Afghanistan  AFG  33.768006  66.238514  1952  Data does not exist
3      Afghanistan  AFG  33.768006  66.238514  1953  Data does not exist
4      Afghanistan  AFG  33.768006  66.238514  1954  Data does not exist
...            ...  ...        ...        ...   ...                  ...
14815     Zimbabwe  ZWE -18.455496  29.746841  2021               Banned
14816     Zimbabwe  ZWE -18.455496  29.746841  2022               Banned
14817     Zimbabwe  ZWE -18.455496  29.746841  2023               Banned
14818     Zimbabwe  ZWE -18.455496  29.746841  2024               Banned
14819     Zimbabwe  ZWE -18.455496  29.746841  2025               Banned

[14820 rows x 6 columns]


In [49]:
# Load dataframe to csv

country_year_legalisation.to_csv("country_year_legalisation.csv", index=False)