# wrds_ratios

> Retrieve and process data from WRDS Financial Ratios Suite.

### From the manual

*Data Source:*

All accounting related data are obtained from Compustat Quarterly and Annual file. Pricing
related data, such as Market Capitalization and Price, are obtained from both CRSP and
Compustat, and we rely on CRSP as the primary data source for pricing data. Earnings
related data are from IBES database.

*Data Frequency:*

The final outputs for both individual firm and industry-level aggregated value are at
monthly frequency. In order to populate the data to monthly frequency, we carry forward 
the most recent quarterly or annual data item, whichever is most recently available at a
given time stamp, to the subsequent months before the next filing data becomes available.
In addition, in order to make sure that all data is publicly available at the monthly time
stamp, we lag all observations by two months to avoid any look ahead bias.4

*Outlier Control:*

As ratio metrics often produce unintended extreme outliers, we impose two layers of
outliers control before aggregating at the industry level. First, for all the monthly frequency
firm level individual ratio results, we impose a winsorization at 1% level for extreme values,
and truncate the outliers in the top and bottom percentile to be missing. Secondly, to arrive
at the final ratio output, we enforce a 12 month moving average on the monthly frequency
financial ratios. The second step serves two purpose: to further smooth the final output, and
to fill in the truncated extreme months (from step 1) with firm-specific moving average.
Note that the outlier controls are only applied to the ratios fed to the industry-level
aggregation. Outputs for firm-level financial ratios are raw ratios without any truncation or
smoothing. Hence researchers are advised to censor/smooth the raw ratios to get rid of the
extreme outliers before conducting further analysis.

### NOTE:
- This dataset has three variables that are in levels (apart from the ID variables): `be`, `mktcap`, and `price` (i.e. book equity, market cap, and stock price).

- Excludes financials.

- ID variables are: permno, gvkey, ticker, cusip, public_date, adate, qdate, gsector, gicdesc, and all variables starting with "ffi"

In [None]:
#| default_exp wrds.ratios

In [None]:
#|exports
from __future__ import annotations
from pathlib import Path
from typing import List
import os

import pandas as pd
import numpy as np

import pandasmore as pdm
from finsets.wrds import wrds_api
from finsets import RESOURCES

In [None]:
#| export 
def raw_metadata(rawfile: str|Path=RESOURCES/'finratio_firm_ibes_variable_descriptions.csv', # location of the raw variable labels file
             ) -> pd.DataFrame:
    "Loads raw variable labels file, cleans it and returns it as a pd.DataFrame"

    df = pd.read_csv(rawfile)
    df['output_of'] = 'wrds.ratios.clean'

    df['Variable Label'] = df.apply(lambda row: row['Description'].replace(row['Variable Name'].strip()+' -- ', ''), axis=1)
    df['Variable Label'] = df.apply(lambda row: row['Variable Label'].replace( '(' + row['Variable Name'].strip() + ')', ''), axis=1)
    df['Variable Name'] = df['Variable Name'].str.strip().str.lower()
    df = df[['Variable Name', 'Variable Label','output_of', 'Type', 'Group']].copy()
    df.columns = ['name','label','output_of','type', 'group']
    return df

In [None]:
#| eval: false
raw_metadata()

Unnamed: 0,name,label,output_of,type,group
0,permno,PERMNO,wrds.ratios.clean,double,ID
1,gvkey,Global Company Key,wrds.ratios.clean,string,ID
2,cusip,CUSIP IDENTIFIER - HISTORICAL,wrds.ratios.clean,string,ID
3,ticker,EXCHANGE TICKER SYMBOL - HISTORICAL,wrds.ratios.clean,string,ID
4,peg_1yrforward,Forward P/E to 1-year Growth (PEG) ratio,wrds.ratios.clean,double,Valuation
...,...,...,...,...,...
70,sale_nwc,Sales/Working Capital,wrds.ratios.clean,double,Efficiency
71,accrual,Accruals/Average Assets,wrds.ratios.clean,double,Other
72,rd_sale,Research and Development/Sales,wrds.ratios.clean,double,Other
73,adv_sale,Avertising Expenses/Sales,wrds.ratios.clean,double,Other


The following function gives more detailed metadata but requires connecting to WRDS. If all you want is variable names and labels, then `raw_metadata` is sufficient.

In [None]:
#| export
def raw_metadata_extra(wrds_username: str=None
             ) -> pd.DataFrame:
    "Collects metadata from WRDS `wrdsapps_finratio_ibes.firm_ratio_ibes` and merges it with `variable_labels`."

    if wrds_username is None:
        wrds_username = os.getenv('WRDS_USERNAME')
        if wrds_username is None: wrds_username = input("Enter your WRDS username: ") 

    try:
        db = wrds_api.Connection(wrds_username = wrds_username)
        finr = db.describe_table('wrdsapps_finratio_ibes','firm_ratio_ibes')
        finr_rows = db.get_row_count('wrdsapps_finratio_ibes','firm_ratio_ibes')
    finally:
        db.close()
        
    finr_meta = finr[['name','type']].copy()
    finr_meta['nr_rows'] = finr_rows
    finr_meta['wrds_library'] = 'wrdsapps_finratio_ibes'
    finr_meta['wrds_table'] = 'firm_ratio_ibes'

    df = finr_meta.merge(raw_metadata()[['name','label']], how='left', on='name')
    
    df['output_of'] = 'wrds.ratios.download()'
    df = pdm.order_columns(df,these_first=['name','label','output_of'])
    for v in list(df.columns):
        df[v] = df[v].astype('string')
    
    return df

In [None]:
#| eval: false
mta = raw_metadata_extra()

Loading library list...
Done
Approximately 2750800 rows in wrdsapps_finratio_ibes.firm_ratio_ibes.


In [None]:
#| eval: false
mta

Unnamed: 0,name,label,output_of,type,nr_rows,wrds_library,wrds_table
0,gvkey,Global Company Key,wrds.ratios.download(),VARCHAR(6),2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
1,permno,PERMNO,wrds.ratios.download(),DOUBLE_PRECISION,2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
2,adate,,wrds.ratios.download(),DATE,2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
3,qdate,,wrds.ratios.download(),DATE,2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
4,public_date,,wrds.ratios.download(),DATE,2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
...,...,...,...,...,...,...,...
95,ffi48,,wrds.ratios.download(),DOUBLE_PRECISION,2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
96,ffi49_desc,,wrds.ratios.download(),VARCHAR(5),2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
97,ffi49,,wrds.ratios.download(),DOUBLE_PRECISION,2750800,wrdsapps_finratio_ibes,firm_ratio_ibes
98,ticker,EXCHANGE TICKER SYMBOL - HISTORICAL,wrds.ratios.download(),VARCHAR(8),2750800,wrdsapps_finratio_ibes,firm_ratio_ibes


In [None]:
#| export
def download(vars: List[str]=None, # If None, downloads all variables
             obs_limit: int=None, #Number of rows to download. If None, full dataset will be downloaded
             wrds_username: str=None, #If None, looks for WRDS_USERNAME with `os.getenv`, then prompts you if needed
             start_date: str="01/01/1900", # Start date in MM/DD/YYYY format
             end_date: str=None #End date in MM/DD/YYYY format; if None, defaults to current date
             ) -> pd.DataFrame:
    """Downloads `vars` from `start_date` to `end_date` from WRDS `wrdsapps_finratio_ibes.firm_ratio_ibes` library"""

    if vars is None: 
        vars = '*'
    else:
        vars = ','.join(['public_date','permno'] + [f'{x}' for x in vars if x not in ['public_date', 'permno']])

    limit_clause = f"LIMIT {obs_limit}" if obs_limit is not None else ""
    sql_string=f"""SELECT  {vars}
                    FROM wrdsapps_finratio_ibes.firm_ratio_ibes
                    WHERE public_date BETWEEN '{start_date}' AND COALESCE(%(end)s, CURRENT_DATE)
                    {limit_clause}
                """
    return wrds_api.download(sql_string, wrds_username=wrds_username, params={'end':end_date})

In [None]:
#| eval: false
raw = download(start_date='01/01/2021', obs_limit=100)

Loading library list...
Done


In [None]:
#| eval: false
raw

Unnamed: 0,gvkey,permno,adate,qdate,public_date,capei,be,bm,evm,pe_op_basic,...,ffi30_desc,ffi30,ffi38_desc,ffi38,ffi48_desc,ffi48,ffi49_desc,ffi49,ticker,cusip
0,001004,54594.0,2020-05-31,2020-11-30,2021-01-31,23.990408,900.700,0.899590,11.385353,33.550000,...,WHLSL,26.0,WHLSL,33.0,WHLSL,41.0,WHLSL,42.0,AIR,00036110
1,001004,54594.0,2020-05-31,2020-11-30,2021-02-28,28.445258,900.700,0.899590,11.385353,39.780000,...,WHLSL,26.0,WHLSL,33.0,WHLSL,41.0,WHLSL,42.0,AIR,00036110
2,001004,54594.0,2020-05-31,2020-11-30,2021-03-31,29.805215,900.700,0.899590,11.385353,41.650000,...,WHLSL,26.0,WHLSL,33.0,WHLSL,41.0,WHLSL,42.0,AIR,00036110
3,001004,54594.0,2020-05-31,2021-02-28,2021-04-30,26.833506,932.400,0.663635,13.604304,59.176471,...,WHLSL,26.0,WHLSL,33.0,WHLSL,41.0,WHLSL,42.0,AIR,00036110
4,001004,54594.0,2020-05-31,2021-02-28,2021-05-31,27.840428,932.400,0.663635,13.604304,61.397059,...,WHLSL,26.0,WHLSL,33.0,WHLSL,41.0,WHLSL,42.0,AIR,00036110
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,001075,27991.0,2021-12-31,2022-09-30,2022-12-31,15.853501,8842.895,1.212296,11.168843,,...,UTIL,20.0,UTILS,29.0,UTIL,31.0,UTIL,31.0,PNW,72348410
96,001076,10517.0,2019-12-31,2020-09-30,2021-01-31,19.355285,1869.833,0.488576,1.925582,13.033149,...,SERVS,22.0,SRVC,36.0,BUSSV,34.0,BUSSV,34.0,AAN,00253530
97,001076,10517.0,2020-12-31,2020-12-31,2021-02-28,18.910223,1113.074,0.305099,1.788821,15.290520,...,FIN,29.0,MONEY,35.0,BANKS,44.0,BANKS,45.0,PRG,74319R10
98,001076,10517.0,2020-12-31,2020-12-31,2021-03-31,16.413671,1113.074,0.305099,1.788821,13.238532,...,FIN,29.0,MONEY,35.0,BANKS,44.0,BANKS,45.0,PRG,74319R10


In [None]:
#| eval: false
download(vars = ['permno', 'be', 'bm'], start_date='01/01/2021', obs_limit=100)

Loading library list...
Done


Unnamed: 0,public_date,permno,be,bm
0,2021-01-31,54594.0,900.700,0.899590
1,2021-02-28,54594.0,900.700,0.899590
2,2021-03-31,54594.0,900.700,0.899590
3,2021-04-30,54594.0,932.400,0.663635
4,2021-05-31,54594.0,932.400,0.663635
...,...,...,...,...
95,2022-12-31,27991.0,8842.895,1.212296
96,2021-01-31,10517.0,1869.833,0.488576
97,2021-02-28,10517.0,1113.074,0.305099
98,2021-03-31,10517.0,1113.074,0.305099


In [None]:
#| export
def clean(df: pd.DataFrame=None,        # If None, downloads `vars` using `download` function; else, must contain `permno` and `date` columns
          vars: List[str]=None,         # If None, downloads `default_raw_vars`
          obs_limit: int=None, #Number of rows to download. If None, full dataset will be downloaded
          wrds_username: str=None,      # If None, looks for WRDS_USERNAME with `os.getenv`, then prompts you if needed
          start_date: str="01/01/1900", # Start date in MM/DD/YYYY format
          end_date: str=None,           # End date. Default is current date          
          clean_kwargs: dict={},        # Params to pass to `pdm.setup_panel` other than `panel_ids`, `time_var`, and `freq`
          ) -> pd.DataFrame:
    """Applies `pandasmore.setup_panel` to `df`. If `df` is None, downloads `vars` using `download` function."""

    if df is None: df = download(vars=vars, obs_limit=obs_limit, wrds_username=wrds_username, start_date=start_date, end_date=end_date)
    df = pdm.setup_panel(df, panel_ids='permno', time_var='public_date', freq='M', **clean_kwargs)
    return df 

In [None]:
#| eval: false
df = clean(raw)

In [None]:
#| eval: false
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
capei,100.0,10.273517,74.083071,-204.585308,-11.220671,15.820401,25.268330,389.948673
be,76.0,3074.718224,3604.536116,207.883000,215.500000,1007.000000,8007.829000,8842.895000
bm,76.0,0.842768,0.189622,0.305099,0.702359,0.869289,0.975803,1.212296
evm,100.0,9.766625,20.381260,-39.741648,10.470793,11.232325,12.890555,102.959892
pe_op_basic,98.0,21.659021,22.680352,-6.328947,13.043730,17.713708,29.929144,110.714286
...,...,...,...,...,...,...,...,...
ffi17,100.0,12.410000,1.504841,11.000000,11.000000,13.000000,14.000000,17.000000
ffi30,100.0,21.250000,5.251984,13.000000,20.000000,25.000000,26.000000,29.000000
ffi38,100.0,27.570000,4.593045,21.000000,26.000000,29.000000,33.000000,36.000000
ffi48,100.0,33.580000,8.161476,21.000000,31.000000,40.000000,41.000000,44.000000


We can download a small sample of the dataset and clean it in one step:

In [None]:
#| eval: false
df = clean(obs_limit=100, vars=['capei','bm'], start_date='01/01/2020', end_date='12/31/2020')

Loading library list...
Done


In [None]:
#| eval: false
df

Unnamed: 0_level_0,Unnamed: 1_level_0,public_date,dtdate,capei,bm
permno,Mdate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10517,2020-01,2020-01-31,2020-01-31,22.649201,0.501677
10517,2020-02,2020-02-29,2020-02-29,16.505583,0.537405
10517,2020-03,2020-03-31,2020-03-31,9.560061,0.537405
10517,2020-04,2020-04-30,2020-04-30,13.556363,0.537405
10517,2020-05,2020-05-31,2020-05-31,14.597763,1.089099
...,...,...,...,...,...
60038,2020-12,2020-12-31,2020-12-31,19.514119,0.788078
61487,2020-01,2020-01-31,2020-01-31,110.238011,1.141916
61487,2020-02,2020-02-29,2020-02-29,53.947232,0.979527
81912,2020-01,2020-01-31,2020-01-31,23.590900,0.947718


In [None]:
#| hide
import nbdev; nbdev.nbdev_export()