# FRED-MD Dataset

**Paper:** https://doi.org/10.1080/07350015.2015.1086655      
**Homepage:** https://research.stlouisfed.org/econ/mccracken/fred-databases/    

**Note:** The code below only works with *real-time vintages*, i.e. with the datasets released after 01-2015. The *historical vintages* can be downloaded from [this link](https://s3.amazonaws.com/files.research.stlouisfed.org/fred-md/Historical_FRED-MD.zip).

Import the dependencies.

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

Define a function for transforming the time series.

In [2]:
def transform_series(x, tcode):
    '''
    Transform the time series.

    Parameters:
    ______________________________
    x: pandas.Series
        Time series.

    tcode: int.
        Transformation code.
    '''

    if tcode == 1:
        return x
    elif tcode == 2:
        return x.diff()
    elif tcode == 3:
        return x.diff().diff()
    elif tcode == 4:
        return np.log(x)
    elif tcode == 5:
        return np.log(x).diff()
    elif tcode == 6:
        return np.log(x).diff().diff()
    elif tcode == 7:
        return x.pct_change()
    else:
        raise ValueError(f"unknown `tcode` {tcode}")

Define a function for downloading and transforming the time series.

In [3]:
def get_data(year, month, transform=True):
    '''
    Download and (optionally) transform the time series.

    Parameters:
    ______________________________
    year: int
        The year of the dataset vintage.

    month: int.
        The month of the dataset vintage.

    transform: bool.
        Whether the time series should be transformed or not.
    '''

    # get the dataset URL
    file = f"https://files.stlouisfed.org/files/htdocs/fred-md/monthly/{year}-{format(month, '02d')}.csv"

    # get the time series
    data = pd.read_csv(file, skiprows=[1])

    # process the dates
    data["sasdate"] = pd.to_datetime(data["sasdate"], format="%m/%d/%Y")

    # add back any missing dates
    data = data.set_index("sasdate").resample("MS").first()

    if transform:

        # get the transformation codes
        tcodes = pd.read_csv(file, nrows=1, index_col=0)

        # transform the time series
        data = data.apply(lambda x: transform_series(x, tcodes[x.name].item()))

    return data

Define a function for identifying the time series included in all dataset vintages between two dates.

In [4]:
def get_common_series(start_month, start_year, end_month, end_year):
    '''
    Get the list of time series included in
    all datasets vintages between two dates.

    Parameters:
    ______________________________
    start_month: int.
        The month of the start date.

    start_year: int.
        The year of the start date.

    end_month: int.
        The month of the end date.

    end_year: int.
        The year of the end date.
    '''

    # define the date range
    dates = pd.date_range(
        start=f"{start_year}-{start_month}-01",
        end=f"{end_year}-{end_month}-01",
        freq="MS"
    )

    # get the list of time series included
    # in the dataset on each date
    series = [
        pd.read_csv(
            f"https://files.stlouisfed.org/files/htdocs/fred-md/monthly/{date.year}-{format(date.month, '02d')}.csv",
            nrows=0,
            index_col=0
        ).columns.tolist() for date in dates
    ]

    # return the list of time series included
    # in the dataset on all dates
    return list(set.intersection(*map(set, series)))

Load the dataset vintage for 12-2023.

In [5]:
dataset = get_data(year=2023, month=12, transform=True)

In [6]:
dataset.shape

(779, 127)

In [7]:
dataset.head()

Unnamed: 0_level_0,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,IPDCONGD,...,DNDGRG3M086SBEA,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,UMCSENTx,DTCOLNVHFNM,DTCTHFNM,INVEST,VIXCLSx
sasdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1959-01-01,,,,,,,,,,,...,,,,,,,,,,
1959-02-01,0.003877,0.003621,0.010349,0.007336,0.00731,0.019391,0.013407,0.008625,0.00731,0.005235,...,,,,,,,,,,
1959-03-01,0.006457,0.007325,0.009404,-0.003374,0.008321,0.014306,0.006035,0.004894,0.0,0.019393,...,-0.001148,0.000292,-2.2e-05,-0.008147,0.004819,,0.004929,0.004138,-0.014792,
1959-04-01,0.00651,0.007029,-0.003622,0.019915,0.000616,0.021075,0.014338,0.014545,0.01565,0.006383,...,0.001312,0.00176,-2.2e-05,0.012203,-0.00489,,0.012134,0.006734,0.024929,
1959-05-01,0.005796,0.006618,0.012043,0.006839,0.007803,0.014955,0.00827,0.009582,0.00477,0.020149,...,-0.001695,-0.001867,-2.1e-05,-0.00409,-0.004819,,0.002828,0.00202,-0.015342,


In [8]:
dataset.tail()

Unnamed: 0_level_0,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,IPDCONGD,...,DNDGRG3M086SBEA,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,UMCSENTx,DTCOLNVHFNM,DTCTHFNM,INVEST,VIXCLSx
sasdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-07-01,0.000959,0.002689,0.004674,0.007571,0.005628,0.008918,0.00886,0.011675,0.012764,0.03287,...,-0.000887,0.000308,-0.002738,0.003215,-0.006112,7.3,-0.009362,-0.01105,-0.000322,13.8333
2023-08-01,0.000749,0.001759,-0.000551,-0.001668,0.007353,-8e-06,0.001259,0.001396,0.001537,-0.016683,...,0.014038,-0.001768,-1.4e-05,0.001425,-0.003026,-2.1,0.001532,0.000716,-0.006009,15.7822
2023-09-01,-0.000509,0.000725,0.003635,0.006531,0.008206,0.001177,-0.002274,-0.004014,-0.002411,0.008238,...,-0.010725,0.003456,0.00166,-0.00409,0.004892,-1.5,-0.003609,-0.001637,0.003319,15.0424
2023-10-01,0.002211,0.0031,0.000754,-0.000689,-0.002209,-0.008928,-0.006722,-0.007656,-0.009127,-0.05552,...,-0.006273,-0.002894,-0.003027,0.001724,-0.004144,-4.1,-0.004214,-0.000859,-0.004454,19.0462
2023-11-01,0.004228,0.005755,0.0032,,0.002759,0.002401,0.001836,0.002633,0.000753,0.034805,...,-0.005675,0.000486,0.005307,0.00342,0.006346,-2.5,,,0.009452,13.8563


Get the mapping table with the time series description and groups.

In [9]:
!curl -O https://files.stlouisfed.org/files/htdocs/uploads/FRED-MD%20Appendix.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  236k  100  236k    0     0   662k      0 --:--:-- --:--:-- --:--:--  670k


In [10]:
!unzip -o FRED-MD%20Appendix.zip

Archive:  FRED-MD%20Appendix.zip
   creating: FRED-MD Appendix/
  inflating: FRED-MD Appendix/FRED-MD_historic_appendix.csv  
  inflating: FRED-MD Appendix/FRED-MD_historic_appendix.pdf  
  inflating: FRED-MD Appendix/FRED-MD_updated_appendix.csv  
  inflating: FRED-MD Appendix/FRED-MD_updated_appendix.pdf  
  inflating: FRED-MD Appendix/README.txt  


In [11]:
mapping = pd.read_csv("FRED-MD Appendix/FRED-MD_updated_appendix.csv", encoding="ISO 8859-1", usecols=["fred", "description", "group"])

In [12]:
mapping.head()

Unnamed: 0,fred,description,group
0,RPI,Real Personal Income,1
1,W875RX1,Real personal income ex transfer receipts,1
2,DPCERA3M086SBEA,Real personal consumption expenditures,4
3,CMRMTSPLx,Real Manu. and Trade Industries Sales,4
4,RETAILx,Retail and Food Services Sales,4


In [13]:
mapping.tail()

Unnamed: 0,fred,description,group
122,UMCSENTx,Consumer Sentiment Index,4
123,DTCOLNVHFNM,Consumer Motor Vehicle Loans Outstanding,5
124,DTCTHFNM,Total Consumer Loans and Leases Outstanding,5
125,INVEST,Securities in Bank Credit at All Commercial Banks,5
126,VIXCLSx,VIX,8


Replace the group numbers with the group names.

In [14]:
groups = {
    1: "Output and Income",
    2: "Labor Market",
    3: "Consumption and Orders",
    4: "Orders and Inventories",
    5: "Money and Credit",
    6: "Interest Rates and Exchange Rates",
    7: "Prices",
    8: "Stock Market"
}

In [15]:
mapping["group"] = mapping["group"].apply(lambda x: groups[x])

In [16]:
mapping.head()

Unnamed: 0,fred,description,group
0,RPI,Real Personal Income,Output and Income
1,W875RX1,Real personal income ex transfer receipts,Output and Income
2,DPCERA3M086SBEA,Real personal consumption expenditures,Orders and Inventories
3,CMRMTSPLx,Real Manu. and Trade Industries Sales,Orders and Inventories
4,RETAILx,Retail and Food Services Sales,Orders and Inventories


In [17]:
mapping.tail()

Unnamed: 0,fred,description,group
122,UMCSENTx,Consumer Sentiment Index,Orders and Inventories
123,DTCOLNVHFNM,Consumer Motor Vehicle Loans Outstanding,Money and Credit
124,DTCTHFNM,Total Consumer Loans and Leases Outstanding,Money and Credit
125,INVEST,Securities in Bank Credit at All Commercial Banks,Money and Credit
126,VIXCLSx,VIX,Stock Market


Get the list of time series included in all dataset vintages up to 12-2023.

In [18]:
series = get_common_series(start_month=1, start_year=2015, end_month=12, end_year=2023)

In [19]:
len(series)

118

Clean up.

In [20]:
!rm FRED-MD%20Appendix.zip

In [21]:
!rm -r FRED-MD\ Appendix