# ratios

> Retrieve and process data from WRDS Financial Ratios Suite.

### From the manual

*Data Source:*

All accounting related data are obtained from Compustat Quarterly and Annual file. Pricing
related data, such as Market Capitalization and Price, are obtained from both CRSP and
Compustat, and we rely on CRSP as the primary data source for pricing data. Earnings
related data are from IBES database.

*Data Frequency:*

The final outputs for both individual firm and industry-level aggregated value are at
monthly frequency. In order to populate the data to monthly frequency, we carry forward 
the most recent quarterly or annual data item, whichever is most recently available at a
given time stamp, to the subsequent months before the next filing data becomes available.
In addition, in order to make sure that all data is publicly available at the monthly time
stamp, we lag all observations by two months to avoid any look ahead bias.4

*Outlier Control:*

As ratio metrics often produce unintended extreme outliers, we impose two layers of
outliers control before aggregating at the industry level. First, for all the monthly frequency
firm level individual ratio results, we impose a winsorization at 1% level for extreme values,
and truncate the outliers in the top and bottom percentile to be missing. Secondly, to arrive
at the final ratio output, we enforce a 12 month moving average on the monthly frequency
financial ratios. The second step serves two purpose: to further smooth the final output, and
to fill in the truncated extreme months (from step 1) with firm-specific moving average.
Note that the outlier controls are only applied to the ratios fed to the industry-level
aggregation. Outputs for firm-level financial ratios are raw ratios without any truncation or
smoothing. Hence researchers are advised to censor/smooth the raw ratios to get rid of the
extreme outliers before conducting further analysis.

***

`public_date`: date on which the information was available to the public

`adate`: fiscal year to which the information pertains

`qdate`: fiscal quarter to which the information pertains

***

### NOTE:
- This dataset has three variables that are in levels (apart from the ID variables): `be`, `mktcap`, and `price` (i.e. book equity, market cap, and stock price).

- Excludes financials.

- ID variables are: permno, gvkey, ticker, cusip, public_date, adate, qdate, gsector, gicdesc, and all variables starting with "ffi"

***

In [None]:
#| default_exp wrds.ratios

In [None]:
#|exports
from __future__ import annotations
from typing import List

import pandas as pd

import pandasmore as pdm
from finsets.wrds import wrds_api

In [None]:
#| exports
PROVIDER = 'Wharton Research Data Services (WRDS)'
URL = 'https://wrds-www.wharton.upenn.edu/pages/get-data/financial-ratios-suite-wrds/financial-ratios-with-ibes-subscription/financial-ratios-firm-level-ibes/'
LIBRARY = 'wrdsapps_finratio_ibes'
TABLE = 'firm_ratio_ibes'
FREQ = 'M'
MIN_YEAR = 1970
MAX_YEAR = None
ENTITY_ID_IN_RAW_DSET = 'permno'
ENTITY_ID_IN_CLEAN_DSET = 'permno'
TIME_VAR_IN_RAW_DSET = 'public_date'
TIME_VAR_IN_CLEAN_DSET = f'{FREQ}date'

In [None]:
#| export
def list_all_vars() -> pd.DataFrame:
    "Collects names of all available variables from WRDS f`{LIBRARY}.{TABLE}`"

    try:
        db = wrds_api.Connection()
        funda = db.describe_table(LIBRARY,TABLE).assign(wrds_library=LIBRARY, wrds_table=TABLE)
    finally:
        db.close()

    return funda[['name','type','wrds_library','wrds_table']]

In [None]:
#| eval: false
all_vars = list_all_vars()

Loading library list...
Done
Approximately 2835219 rows in wrdsapps_finratio_ibes.firm_ratio_ibes.


In [None]:
#| eval: false
all_vars.name.count()

np.int64(100)

In [None]:
#| export
def get_raw_data(vars: List[str]=None, # If None or '*', downloads all variables
             nrows: int=None, #Number of rows to download. If None, full dataset will be downloaded
             start_date: str=None, # Start date in MM/DD/YYYY format
             end_date: str=None #End date in MM/DD/YYYY format
             ) -> pd.DataFrame:
    """Downloads `vars` from `start_date` to `end_date` from WRDS `{LIBRARY}.{TABLE}` library"""

    wrds_api.validate_dates([start_date, end_date])
    if vars is None or vars=='*': vars = '*'
    else: vars = ','.join(['public_date','permno'] + [f'{x}' for x in vars if x not in ['public_date', 'permno']])

    sql_string=f"""SELECT {vars} FROM {LIBRARY}.{TABLE} WHERE 1 = 1 """
    if start_date is not None: sql_string += r" AND public_date >= %(start_date)s"
    if end_date is not None: sql_string += r" AND public_date <= %(end_date)s"
    if nrows is not None: sql_string += r" LIMIT %(nrows)s"

    return wrds_api.download(sql_string,
                            params={'start_date':start_date, 'end_date':end_date, 'nrows':nrows})

In [None]:
#| eval: false
raw = get_raw_data(start_date='01/01/2021', nrows=1000)

Loading library list...
Done


In [None]:
#| eval: false
raw.head(0)

Unnamed: 0,gvkey,permno,adate,qdate,public_date,capei,be,bm,evm,pe_op_basic,...,ffi30_desc,ffi30,ffi38_desc,ffi38,ffi48_desc,ffi48,ffi49_desc,ffi49,ticker,cusip


In [None]:
#| export
def process_raw_data(
        df: pd.DataFrame=None,  # Must contain `permno` and `datadate` columns         
        clean_kwargs: dict={},  # Params to pass to `pdm.setup_panel` other than `panel_ids`, `time_var`, and `freq`
) -> pd.DataFrame:
    """Converts some variables to categorical and applies `pandasmore.setup_panel` to `df`"""

    # Convert some columns to categorical
    for col in ['gvkey','ticker','cusip','gsector','gicdesc']:
        if col in df.columns: df[col] = df[col].astype('category')

    for col in df.columns:
        if col.startswith('ffi'):
            if col.endswith('desc'): df[col] = df[col].astype('category')
            else: df[col] = df[col].astype('Int64').astype('category')

    # Set panel structure     
    df = pdm.setup_panel(df, panel_ids=ENTITY_ID_IN_RAW_DSET, time_var=TIME_VAR_IN_RAW_DSET, freq=FREQ, panel_ids_toint=False, **clean_kwargs)
    return df 

In [None]:
#| eval: false
df_clean = process_raw_data(raw)

In [None]:
#| eval: false
df_clean.head(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,public_date,dtdate,gvkey,adate,qdate,capei,be,bm,evm,pe_op_basic,...,ffi30_desc,ffi30,ffi38_desc,ffi38,ffi48_desc,ffi48,ffi49_desc,ffi49,ticker,cusip
permno,Mdate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
10145,2021-01,2021-01-31,2021-01-31,1300,2019-12-31,2020-09-30,28.329023,19548.0,0.16924,17.302839,24.574843,...,,,GOVT,37,,,,,HON,43851610


In [None]:
#| export
def keep_only_ratios(
        df: pd.DataFrame
) -> pd.DataFrame:
    
    out = pd.DataFrame(index=df.index)

    not_ratios = r"""be, mktcap, price, dtdate, permno, gvkey, ticker, 
                    cusip, public_date, adate, qdate, gsector, gicdesc,
                """.replace("\n", "").replace(' ','').split(',')
    
    for col in list(df.columns):
        if col not in not_ratios and not col.startswith('ffi'):
            out[col] = df[col].copy()

    return out

In [None]:
#| eval: false
keep_only_ratios(df_clean).head(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,capei,bm,evm,pe_op_basic,pe_op_dil,pe_exi,pe_inc,ps,pcf,dpr,...,rd_sale,adv_sale,staff_sale,accrual,ret_crsp,ptb,peg_trailing,divyield,peg_1yrforward,peg_ltgforward
permno,Mdate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
10145,2021-01,28.329023,0.16924,17.302839,24.574843,24.856234,28.030129,28.030129,4.125068,22.696753,0.514853,...,0.046821,0.0,0.0,-0.017443,-0.081476,7.012911,0.958785,0.019041,-2.024123,10.657844


In [None]:
#| hide
import nbdev; nbdev.nbdev_export()