---
title: |
  | Earnings Management and Investor Protection:
  | Accounting Reading Group - Assignment III\vspace{1cm}
author:
  - name: Melisa Mazaeva
    email: melisa.mazaeva@student.hu-berlin.de
    affiliations:
      - Humboldt-Universität zu Berlin  
date: today
date-format: MMM D, YYYY [\vspace{1cm}]
abstract: |
  | This project uses the TRR 266 Template for Reproducible Empirical Accounting Research (TREAT) to provide an infrastructure for open science-oriented empirical projects. Leveraging external Worldscope data sets on financial data, the repository showcases a reproducible workflow that integrates Python scripts for data analysis. The project’s output demonstrates a comprehensive application of skills to replicate and extend the findings from the seminal paper by Leuz, Nanda, and Wysocki (2003), particularly in providing descriptive statistics for the four individual earnings management measures as well as the aggregate earnings management score across various countries. In doing so, it documents and discusses the research design choices made and the variations between the original and reproduced results. This code base, adapted from TREAT, should give you an overview on how the template is supposed to be used for my specific project and how to structure a reproducible empirical project.
  | \vspace{6cm}
bibliography: references.bib
biblio-style: apsr
format:
  pdf:
    documentclass: article
    number-sections: true
    toc: false
fig_caption: yes
fontsize: 11pt
ident: yes
always_allow_html: yes
number-sections: true 
header-includes:
  - \usepackage[nolists]{endfloat}    
  - \usepackage{setspace}\doublespacing
  - \setlength{\parindent}{4em}
  - \setlength{\parskip}{0em}
  - \usepackage[hang,flushmargin]{footmisc}
  - \usepackage{caption} 
  - \captionsetup[table]{skip=24pt,font=bf}
  - \usepackage{array}
  - \usepackage{threeparttable}
  - \usepackage{adjustbox}
  - \usepackage{graphicx}
  - \usepackage{csquotes}
  - \usepackage{indentfirst}  # Added this line to ensure the first paragraph is indented for better readability
  - \usepackage[margin=1in]{geometry}
---


\pagebreak


During the preparation step, 8,265 firms and 20,521 firm-year observations (all due to second filtration step) were dropped, resulting in a final dataset with 18,040 firms and 123,469 firm-year observations. The differences in the numbers between the prepared dataset and the figures mentioned in the paper (70,955 firm-year observations and 8,616 non-financial firms) could be due to the assumptions such as variations in initial datasets, data updates, and filtering criteria listed in @sec-research_design_assumptions. However, the original study might have included additional data cleaning steps not explicitly mentioned, such as handling outliers, specific industry exclusions, or other criteria, which could affect the final counts.

Now that it is clear that the number of observations for this project is significantly higher than that in @Leuz_2003 study, in order to illustrate the differences and compare the firm-year observations, Table 1 from @Leuz_2003 was partially replicated (only columns on countries and firm/year observations) to distinguish specific discrepancies that could arise for certain countries. The following table representes the (partially) replicated Table 1.


In [None]:
#| label: table1
#| echo: false
#| output: true

import pandas as pd
from utils import read_config, setup_logging

log = setup_logging()

def main():
    log.info("Preparing data for analysis ...")
    cfg = read_config('config/prepare_data_cfg.yaml')

    # Load the pulled data
    wrds_data = pd.read_csv(cfg['worldscope_sample_save_path'])
    initial_obs_count_pulled = len(wrds_data)
    initial_firm_count_pulled = len(wrds_data['item6105'].unique())
    log.info(f"Initial number of observations after pulling data: {initial_obs_count_pulled}")
    log.info(f"Initial number of firms after pulling data: {initial_firm_count_pulled}")

    # Check for duplicate firm-year observations
    dup_obs = wrds_data[wrds_data.duplicated(subset=['item6105', 'year_'], keep=False)]
    if not dup_obs.empty:
        log.warning(f"Found {dup_obs.shape[0]} duplicate firm-year observations. Removing duplicates.")
        wrds_data = wrds_data.drop_duplicates(subset=['item6105', 'year_'], keep='first')

    # Filter countries with at least 300 firm-year observations for key accounting variables as in paper
    filtered_countries_data, eliminated_countries = filter_countries(wrds_data)

    # Print eliminated countries
    if eliminated_countries:
        log.info(f"Countries eliminated after filtration: {', '.join(eliminated_countries)}")
    else:
        log.info("No countries were eliminated after filtration.")

    # Filter firms with at least three consecutive years of income statement and balance sheet information as in paper
    initial_firm_count = len(filtered_countries_data['item6105'].unique())
    initial_obs_count = len(filtered_countries_data)
    
    filtered_firms_data = filter_firms(filtered_countries_data)

    final_firm_count = len(filtered_firms_data['item6105'].unique())
    final_obs_count = len(filtered_firms_data)

    firms_dropped = initial_firm_count - final_firm_count
    obs_dropped = initial_obs_count - final_obs_count

    log.info(f"Firms dropped after filtration: {firms_dropped}")
    log.info(f"Firm-year observations dropped after filtration: {obs_dropped}")

    log.info(f"Number of observations after preparation: {final_obs_count}")
    log.info(f"Number of firms after preparation: {final_firm_count}")

    # Save the filtered dataset
    filtered_firms_data.to_csv(cfg['prepared_data_save_path'], index=False)

    log.info("Preparing data for analysis ... Done!")

    # Generate summary table for firm-year observations per country
    summary_table = filtered_firms_data.groupby('item6026').size().reset_index(name='# Firm-years')
    summary_table.columns = ['Country', '# Firm-years']
    
    # Compute mean, median, min, and max
    summary_stats = summary_table['# Firm-years'].describe()[['mean', '50%', 'min', 'max']]
    summary_stats.index = ['Mean', 'Median', 'Min', 'Max']
    
    # Display the summary table and stats
    print(summary_table)
    print(summary_stats)

## Step 3: Analysis Implementation and Table Reproduction
In this replication step, implement the calculations for the EM measures as described in the methodology of the paper, ensuring that all statistical methods and groupings are accurately replicated.



\pagebreak

\setcounter{table}{0}
\renewcommand{\thetable}{\arabic{table}}

# References {-}
\setlength{\parindent}{-0.2in}
\setlength{\leftskip}{0.2in}
\setlength{\parskip}{8pt}
\noindent