#### Using LLMs To Analyse the Effect of U.S. Macroeconomic News: An Application to the Global Financial Cycle
<small>*10 October 2025*</small>

In 2013, Helene Rey [identified the Global Financial Cycle](https://www.nber.org/papers/w21162) ('GFCy') to describe the observed co-movement of global capital flows, risky asset prices, and credit growth.

'Risk on' periods are associated with increases in the above variables, and subsequent 'risk off' periods are associated with their retrenchment, often resulting in an economic crisis in the afflicted country. Importantly, this co-movement is observed across a spectrum of relevant variables across the globe.

What drives this cycle? Rey subsequently [argued the source of the GFCy](https://academic.oup.com/restud/article/87/6/2754/5834728) to be U.S. monetary policy shocks. The 'single global factor' underlying the cycle is driven by changes in U.S. monetary policy, and therefore these changes drive the cycle.

The implication of this that instead of their existing a 'trilemma' where countries are forced to choose two out of monetary policy indepedence, an open capital account, and a fixed exchange rate, [there is instead a 'dilemma'](https://www.nber.org/papers/w21162): in the absence of capital controls, a country cannot have monetary sovereignty, regardless of the exchange rate regime.

Economists Boehm and Kroner have [added further nuance to this story](https://blogs.lse.ac.uk/usappblog/2025/06/25/how-news-about-the-us-economy-drives-global-financial-conditions/) by showing that it is not just U.S. monetary policy that drives the GFCy, but actually merely *news* about U.S. macroeconomic conditions can explain the co-movement.

Negative surprises regarding the U.S. economy increases investor risk-aversion. The GFCy occurs because this increase in risk-aversion occurs globally. The logic here is similar to the ['gamma' model of Gabai & Maggiori](https://academic.oup.com/qje/article-abstract/130/3/1369/1933306), which places investor risk-aversion as the central force in driving global capital flows.

#### Using LLMs to Assess Macroeconomic News Coverage

Boehm and Kroner's approach uses (surprises in) U.S. statistical releases (e.g. nonfarm payrolls) as the explanatory variable in their estimates of the effect of news on the GFCy; but the focus on 'news' gives the opportunity to experiment by integrating textual data into this domain of macroeconomic analysis, an approach already being [explored in political science](https://kenbenoit.net/pdfs/Benoit_etal_2025_AJPS.pdf).

In particular, I'm interested applying LLM assessments of macroeconomic news coverage to analyse the GFCy. If we 'prompt' an LLM to put itself in the position of an international investor reading the latest macroeconomic news releases, then, in seeing the assessment that this LLM makes, we might be able to obtain an estimate a good proxy for investor risk-aversion. Importantly, this would happen in a replicable and scalable way.

My goal is to estimate the following equation:
$$
\Delta q_{i,t} = \alpha_{i} + \beta_{i} \cdot \gamma_{i,t} + \epsilon_{i,t}
$$

Where $\Delta q_{i,t}$ represents the change in asset prices in country $i$ at period $t$, and $\gamma_{i,t}$ LLM rating of story $i$ at time $t$. $\beta$ can therefore be interpreted as the sensitivity of asset price changes to the LLM rating, and a positive and statstically significant $\beta$ can be interpreted as evidence for investor risk-aversion affecting global asset prices (and hence the GFCy).


####  Code: Retrieving Input Data and Setting The AI Prompts

As a baseline analysis, I retrieve 'important' news stories involving the U.S. Federal Reserve from the [Refinitiv News API](https://developers.lseg.com/content/dam/devportal/api-families/refinitiv-data-platform/refinitiv-data-platform-apis/documentation/rdp_news_user_guide.pdf), including the bodies of each news article, the 'sentiment for each will be assessed by the LLM.

In [None]:
from lseg.data.content import news, historical_pricing
import pandas as pd

start_date = "SET-ME"
end_date = "SET-ME"
articles_count = "SET-ME"

response = news.headlines.Definition(
    query="Fed",
    date_from=start_date, 
    date_to=end_date, 
    count=articles_count
).get_data()
fed = response.data.df

topnews = news.headlines.Definition(
    query="TOPNWS",
    date_from=start_date, 
    date_to=end_date, 
    count=articles_count
).get_data().data.df

df_stories = pd.merge(fed, topnews, on="storyId")
df_stories = df_stories[df_stories['headline_x'].str.isupper()]

timestamps = []
bodies = []
for story_id in df_stories['storyId']:
    response = news.story.Definition(story_id=story_id).get_data()
    timestamps.append(response.data.raw['newsItem']['itemMeta']['versionCreated']['$'])
    bodies.append(response.data.raw['newsItem']['contentSet']['inlineData'][0]['$'])
df_stories['timestamp'] = timestamps
df_stories['body'] = bodies

Using the historical pricing data API I retrieve data on a variety of global asset prices.

| Country           | Index   |
|-------------------|-----------|
| Argentina         | .MERV     |
| Austria           | .ATX      |
| Belgium           | .BFX      |
| Brazil            | .BVSP     |
| Canada            | .GSPTSE   |
| Switzerland       | .SSMI     |
| Chile             | .SPIPSA   |
| Czech Republic    | .PX       |
| Germany           | .GDAXI    |
| Denmark           | .OMXC20   |
| Spain             | .IBEX     |
| Finland           | .OMXHPI   |
| France            | .FCHI     |
| United Kingdom    | .FTSE     |
| United States     | .NDX      |
| Japan             | .TOPX     |

Because the GFCy is characterised by fast-moving financial variables, resulting in difficulties with identification, I will employ a high-frequency event study. Specifically, I examine the response of asset prices to (the LLM assessment of) news stories about the Federal Reserve, within a configurable window, starting with 30 minutes.

In [None]:
from datetime import timedelta

start_times = []
for timestamp in df_stories['timestamp']:
    timestamp = pd.to_datetime(timestamp)
    for index, row in df_indices.iterrows():
        start_times.append({'Country': row['Country'], 'Index': row['Indices'], 'start': timestamp, 'Data': None})
df_prices = pd.DataFrame(start_times)

data = []
for index, row in df_prices.iterrows():
    timestamp = pd.to_datetime(row['start'])
    try:
        definition = historical_pricing.summaries.Definition(
            row['Index'],     
            interval = historical_pricing.Intervals.FIVE_MINUTES,
            start = timestamp,
            end = timestamp + timedelta(hours = 6))
        response = definition.get_data()
        data.append(response.data.df[['TRDPRC_1']])
    except Exception:
        data.append(None)

df_prices['Data'] = data

#### LLM Choice and Software Environment

For the initial analysis I use [OpenAI's GPT-4o model](https://openai.com/index/hello-gpt-4o/), although this can be substituted for other proprietary models to ensure that results are robust across different models. Because emphasis here is placed on powerful, scalable text-analysis, I favour proprietary models, although for scientific reproducibility an 'open-weight' model such as [Deepseek-V3](https://github.com/deepseek-ai/DeepSeek-V3) would allow for greater transparency and control.

To proxy for investor global risk-aversion, I ask the LLM to put itself in the shoes of an economist/investment professional reading a news story related to the U.S. macroeconomy, ranking its perceived implications for the story on a scale of 1 ("Extremely negative for global asset prices") to 7 ("Extremely positive for global asset prices.").

In [None]:
from openai import OpenAI

client = OpenAI(
	    api_key="API_SECRET_KEY"
    )

model = "gpt-4o"
with open("prompt.txt", "r") as file:
    instructions = file.read()

def generate_ratings(df: pd.DataFrame) -> pd.DataFrame:
    ratings = []  # Temporary list to store ratings
    for _, row in df.iterrows():
        response = client.responses.create(
            model=model,
            instructions=instructions,
            input=row['body']
        )
        ratings.append(response.output_text)  # Append to the temporary list
    # Assign the ratings list to the DataFrame after the loop
    df['rating'] = ratings
    return df

ratings = generate_ratings(df_stories)

#### Analysis and Results

... in the figure below. The coefficient on LLM assessment of news story is statistically significant at the 10% level in each sample, suggesting that ... .

|   Sample      | Coefficient | Standard Error | p-value |
|---------------|-------------|----------------|---------|
|      Full     |      382.3       |         195.3       |    0.051     |
|    Developed  |      XXX       |       XXX         |    XXX     |
|   Developing  |      XXX       |       XXX         |    XXX     |

In [None]:
import statsmodels.api as sm

expanded_rows = []

for _, row in df_prices.iterrows():
    if isinstance(row['Data'], pd.DataFrame): 
        row['Data']['Timestamp'] = row['Data'].index
        data_df = row['Data'].copy()
        data_df['start'] = row['start']
        data_df['Index'] = row['Index']
        expanded_rows.append(data_df)

df_prices = pd.concat(expanded_rows, ignore_index=True)
df_prices['Change_TRDPRC_1'] = df_prices.groupby(['Index', 'start'])['TRDPRC_1'].shift(6) - df_prices['TRDPRC_1']
df_prices['Shifted_Change_TRDPRC_1'] = df_prices.groupby(['Index', 'start'])['Change_TRDPRC_1'].shift(-6)

data_for_regression = pd.merge(df_prices, ratings, left_on='start', right_on='timestamp', how='inner')

data_for_regression['Timestamp'] = data_for_regression['Timestamp'].dt.tz_localize('UTC')
data_for_regression = data_for_regression[data_for_regression['timestamp'] + pd.Timedelta(minutes=30) == data_for_regression['Timestamp']].dropna().reset_index(drop=True)

X = pd.to_numeric(data_for_regression['rating'], errors='coerce')
y = pd.to_numeric(data_for_regression['Shifted_Change_TRDPRC_1'], errors='coerce')
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()

These initial results are promising, but I'd like to return to this analysis to ensure robustness across prompts and models. It's not particularly efficient to feed an LLM an entire news story, and so there one might integrate an 'intermediate' stage of text-summarisation before the actual categorisation is done, especially since LLMs have been shown to be [vulnerable to attention drift](https://arxiv.org/abs/2307.03172).

Given the computational requirements of extending this initial analysis, costs would soon add-up. I'm therefore interested in experimenting with [the Nebius platform](https://studio.nebius.com) for processing cost-effective batch jobs across a range of LLMs, including open-weight.

Until next time,

Matt