### Description
Author: T. Majidzadeh

Date Created: April 4, 2025

Date Updated: April 4, 2025

Purpose: Produce visualizations and explanations for the ARIMA forecasting and synthetic control analysis of the RealPage-LRO merger.

In [2]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import altair as alt
alt.data_transformers.disable_max_rows()
from datetime import datetime
import json
import os
import re

#### Setup

In [4]:
paths = {
    "data": "..\\data\\",
    "zillow_raw": "..\\data\\zillow_data_raw\\",
    "zillow_reg": "..\\data\\zillow_reg_data\\",
    "reg_results": "..\\data\\reg_results\\",
    "vis_output": "..\\visualization\\"
}

In [5]:
# Assumes "affected" metros are those with at least 35% post-merger penetration rate and at least 10% share gain from the merger.

msas = {
    "Atlanta, GA" : 1,
    "Dallas, TX" : 1,
    "Phoenix, AZ" : 1,
    "Denver, CO" : 1,
    "Tampa, FL" : 1,
    "Washington, DC" : 1,
    
    "Houston, TX" : 0,
    "Riverside, CA" : 0,
    "Las Vegas, NV" : 0,
    "Seattle, WA" : 0,
    "Philadelphia, PA" : 0,
    "Boston, MA" : 0,
    "Minneapolis, MN" : 0,
    "San Diego, CA" : 0,
    "Miami, FL" : 0,
    "San Francisco, CA" : 0,
    "Chicago, IL" : 0,
    "Detroit, MI" : 0,
    "Los Angeles, CA" : 0,
    "New York, NY" : 0
}

affected_msas = [
    "Atlanta, GA",
    "Dallas, TX",
    "Phoenix, AZ",
    "Denver, CO",
    "Tampa, FL",
    "Washington, DC"
]

In [6]:
zillow_reg = pd.read_pickle(paths['zillow_reg']+'zillow_data_reg_counterfactual_20250404.pkl')
zillow_reg['Year-Month'] = pd.to_datetime(zillow_reg['Year-Month'])
shares_analysis = pd.read_excel(paths['data']+'multifamily_shares_calculations.xlsx', sheet_name='rp_shares_all_units')
att_dfs, att_dfs_list = {}, []
for city in affected_msas:
    att_dfs[city] = pd.read_csv(paths['reg_results']+f'syn_alt_con_2015-2022_att_{city}.csv')
    att_dfs[city].rename(columns={'Time':'TimeTrend', 'Estimate':'syn_att'}, inplace=True)
    att_dfs[city]['RegionName'] = city
    att_dfs_list += [att_dfs[city]]
syn_alt_con_atts = pd.concat(att_dfs_list)
zillow_reg = pd.merge(
    zillow_reg,
    syn_alt_con_atts,
    on = ['RegionName', 'TimeTrend'],
    how = 'left'
)
zillow_reg['Synthetic Control'] = zillow_reg['ZORI'] - zillow_reg['syn_att']
zillow_reg.rename(columns={
    'RegionName': 'City Name', 
    'Year-Month': 'Date',
    'ZORIOrig': 'Original Zillow Rent Index', 
    'ZORI': 'COVID-Adjusted Rent Index'
}, inplace=True)

In [7]:
zillow_reg_melt = pd.melt(zillow_reg, id_vars = ['City Name', 'Date'], value_vars = ['Original Zillow Rent Index', 'COVID-Adjusted Rent Index', 'Synthetic Control'], var_name='Index Type', value_name='Index Value')

#### It is Unclear Whether this Rent Impact is Due to Price Coordination
Our predictive models suggest that RealPage usage can predict higher rents. However, can we test whether this correlation is actually *because of* effects which are similar to price-fixing, and not other factors like efficiency?

In December 2017, RealPage completed an acquisition of Lease Rent Options (LRO), their primary competitor in the market for rental pricing algorithms. Did this acquisition lead to higher rent in the housing market? If the answer is yes, then this may be because of price-coordinating effects, because no landlords actually merged - the only merger was between providers for price algorithms. Market power would only change if landlords using one of RealPage's products (post-merger) were coordinating via the algorithms. 

However, we do not see this effect; therefore, our data does not prove whether or not price coordination is the root cause of RealPage's rent impact.

#### Step 1: Demonstrate ARIMA Modeling
The COVID pandemic of 2020 had a significant effect on the housing market which varied across cities and time. To test the effect of a 2017 merger, we start by adjusting for the effect of COVID in 2020 and beyond. We train an ARIMA forecasting model from December 2017 to December 2019, then use this to predict three years of housing market data without the effect of COVID.

In [10]:
arima = alt.Chart(zillow_reg_melt).mark_line().encode(
    x = 'Date',
    y = alt.Y('Index Value:Q',scale=alt.Scale(domainMin=750, domainMax=3000)),
    color = alt.Color(
        'Index Type:N',
        scale = alt.Scale(
            domain = ['Original Zillow Rent Index', 'COVID-Adjusted Rent Index'],
            range = ['#5eb4ec', '#e1b0c0']
        )
    )
)

# A dropdown filter
city_dropdown = alt.binding_select(options=list(msas.keys()), name="City Name")
city_select = alt.selection_point(fields=['City Name'], bind=city_dropdown, value="Washington, DC")

filter_cities = arima.add_params(
    city_select
).transform_filter(
    city_select
).properties(title={'text':"Zillow Index and COVID Adjustment", 'subtitle':"2015 - 2022", 'anchor':'start'})

rules = alt.Chart(pd.DataFrame({
  'Date': ['2020-01-01'],
  'Event': ['2020 COVID Pandemic']
})).mark_rule().encode(
    x='Date:T',
    opacity='Event:N'
)

chart=alt.layer(filter_cities, rules)
chart.save(paths['vis_output']+'covid_adjustment.html')
chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


#### Step 2: Explain Treatment and Control Selection
Next, we need to select a treatment group and a control group. If this merger has an effect, it would have the highest effect on cities with both a high RealPage penetration rate and a high share gain from the merger (and the lowest effect on cities with low penetration rates and share gains).

We estimate RealPage's share of multifamily rental units based on their own disclosed share of *all* rental units for 20 select cities in May 2023, the count of total units according to the 2023 American Community Survey, and the share of units which are multifamily from the same survey. We select the cities with the highest share (Atlanta, Dallas, Phoenix, Denver, Tampa, and Washington DC) as the treatment group, and the cities with the lowest share (Minneapolis, San Diego, Miami, San Francisco, Chicago, Detroit, Los Angeles, and New York) as the control group.

In [12]:
shares_analysis.rename(columns={
    "metro_area":"City Name",
    "airm_share_multifamily":"AIRM/YieldStar (RealPage) Share",
    "lro_share_multifamily":"Lease Rent Options (LRO) Share"
}, inplace=True)
shares_analysis = pd.melt(shares_analysis, id_vars=["City Name"], value_vars=["AIRM/YieldStar (RealPage) Share", "Lease Rent Options (LRO) Share"], var_name="Platform", value_name="Penetration Rate")

In [13]:
shares = alt.Chart(shares_analysis).mark_bar().encode(
    x=alt.X("City Name:N", sort={'encoding':'y', 'order':'descending'}),
    y=alt.Y("Penetration Rate:Q").axis(format="%"),
    color=alt.Color(
        "Platform",
        scale=alt.Scale(
            domain = ["AIRM/YieldStar (RealPage) Share", "Lease Rent Options (LRO) Share"],
            range = ['#5eb4ec', '#e1b0c0']   
        )
    ),
    order=alt.Order(
            'Platform:N',
            sort='ascending'
    )
).properties(title={'text': "RealPage's Multifamily Penetration Rate by City", 'subtitle':"May 2023", 'anchor': 'start'})
shares

shares.save(paths['vis_output']+'realpage_lro_shares_2023-05.html')
shares

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


#### Step 3: Compare Treatment to Control
For each of the six __treatment__ cities, we use a machine learning approach called the "Synthetic Control" method to predict the treatment city's pre-merger rent level using only the __control cities__' rent levels. We make this prediction model as accurate as possible using only __pre-merger__ data, then use that model to predict the __post-merger__ rent levels. 

In essence, the Synthetic Control is a prediction of what a city's rent *would* look like if the combined market share was much lower there. If a city's rent is much higher than its Synthetic Control predicted level, this suggests that the 2017 RealPage-LRO merger may be causing higher rent. Conversely, if the citys rent is much lower than its Synthetic Control predicted level, this suggests that the merger may be causing lower rent.

However, differences are small and generally inconsistent across cities. In Phoenix, this method predicts higher rents, but in Atlanta and Dallas, this method predicts lower rents. In the remaining three cities, there is no obvious difference. Furthermore, divergence for some cities begins in 2020, suggesting that this may be due to imperfections in the COVID adjustment. Because of this inconsistency, we cannot conclusively say whether RealPage's rent impact is a result of price-coordination.

In [15]:
syn = alt.Chart(zillow_reg_melt).mark_line().encode(
    x = 'Date',
    y = alt.Y('Index Value:Q',scale=alt.Scale(domainMin=750, domainMax=3000)),
    color = alt.Color(
        'Index Type:N',
        scale = alt.Scale(
            domain = ['COVID-Adjusted Rent Index', 'Synthetic Control'],
            range = ['#5eb4ec', '#e1b0c0']
        )
    )
)

# A dropdown filter
city_dropdown = alt.binding_select(options=list(affected_msas), name="City Name")
city_select = alt.selection_point(fields=['City Name'], bind=city_dropdown, value="Washington, DC")

filter_cities = syn.add_params(
    city_select
).transform_filter(
    city_select
).properties(title={'text':"High-Share Cities and their Synthetic Controls", 'subtitle':"2015 - 2022", 'anchor':'start'})

rules = alt.Chart(pd.DataFrame({
  'Date': ['2017-12-31', '2020-01-01'],
  'Event': ['RealPage-LRO Merger', '2020 COVID Pandemic']
})).mark_rule().encode(
    x='Date:T',
    opacity='Event:N'
)

chart=alt.layer(filter_cities, rules)
chart.save(paths['vis_output']+'synthetic_control.html')
chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
