#MIP Merge

This script merges the different data sets (MUP ownership data, MUP MIP panel on the owners and the companies), turns them into panels and cleans the different columns

In [1]:
import pandas as pd
import numpy as np

Load the data into DataFrames

In [None]:
df_ownership = pd.read_csv(r"C:\Users\lucas\OneDrive\BA\Data\Ownership_Change\MUPOwn.csv", encoding="ISO-8859-1")
df_companies = pd.read_csv(r"C:\Users\lucas\OneDrive\BA\Data\Ownership_Change\MUPMIP_panel_owned.csv", encoding="ISO-8859-1")
df_owners = pd.read_csv(r"C:\Users\lucas\OneDrive\BA\Data\Ownership_Change\MUPMIP_panel_owner.csv", encoding="ISO-8859-1")

The flag `b_is_main_owner` is used to seperate minority from majority shareholders (defined as the owner of at least 50% of the equity), and, where there is no information on the percentage owned, only owners with following "characteristics" (dt. Eigenschaft) were considered majority: "Owner" (Inhaber), "Shareholder" (Gesellschafter), "Limited Partner" (Kommanditist), "General Partner" (Komplementär), and "Majority Shareholder" (Hauptaktionär)

In [3]:
df_ownership["b_is_main_owner"] = np.where(df_ownership["b_anteil"] >= 50 | df_ownership["b_eigenschaft"].str.contains("Inhaber|Gesellschafter|Kommanditist|Komplementär|Hauptaktionär", regex=True), True, False)
df_ownership["b_is_main_owner"] = np.where(df_ownership["b_anteil"] < 50, False, True)

Specify the start and end year of the participation, as a step to turn the ownership data into a panel. The entries which have no start nor end date will be assigned all the years where there are observations in the MIP data set (first year 1993, last year 2021). The end date is set to 2023 for all participations which didn't end in the observation period or where there is no information so that 2021 is within the start to end range

In [46]:
df_ownership["b_start_year"] = df_ownership["b_beginn"].astype(str).str[:4]
df_ownership["b_end_year"] = df_ownership["b_ende"].astype(str).str[:4]
df_ownership["b_start_year"] = np.where(df_ownership["b_start_year"] == "0.0", 1993, df_ownership["b_start_year"])
df_ownership["b_start_year"] = np.where(df_ownership["b_start_year"] == "nan", 1993, df_ownership["b_start_year"])
df_ownership["b_end_year"] = np.where(df_ownership["b_end_year"] == "0.0", 2023, df_ownership["b_end_year"])
df_ownership["b_end_year"] = np.where(df_ownership["b_end_year"] == "nan", 2023, df_ownership["b_end_year"])

Parse `b_start_year` and `b_end_year` to integers

In [47]:
df_ownership["b_start_year"] = pd.to_numeric(df_ownership["b_start_year"], downcast="integer")
df_ownership["b_end_year"] = pd.to_numeric(df_ownership["b_end_year"], downcast="integer")

The array `survey_years` contains all the years with sample data for the companies in the MIP panel. Now I will create dummy variables for all sample years, so that we transform the ownership data frame later to a panel

In [49]:
survey_years = np.unique(df_companies["smpljahr"])
for i in range(len(survey_years)):
    df_ownership[str(survey_years[i])] = np.where((df_ownership["b_start_year"] <= survey_years[i]) & (df_ownership["b_end_year"] > survey_years[i]),True, False)

In [50]:
df_ownership.to_csv(r"C:\Users\lucas\OneDrive\BA\Data\test_ownership.csv")