## DS 3000 Project Phase 2


###  IMPACT OF MACRO ECONOMIC FACTORS ON NETFLIX'S FINANCIAL PERFORMANCE AND STOCK PRICE

By : Darsheen Chona, Chen Yu Hsia, Hayli Wynn and Ethan Pon

### Central Motivation: 

The primary objective of this project is to analyze how macroeconomic factors influence Netflix's financial performance and stock price. By examining key economic indicators such as GDP, unemployment rate, inflation (measured by CPI), retail sales, and industrial production, we aim to identify patterns and correlations that could explain variations in Netflix’s profitability and stock movements. Ultimately, this analysis can lead to predictive models that help investors understand the impact of broader economic changes on Netflix's business performance.

#### Key Questions:

1. How do macro economic factors like GDP, Unemployment rate, CPI, retail sales and industrial output correlate with the financial performance of Netflix?
2. How do the macro economic factors like inflation affect the stock price of Netflix?

### Data Processing :



Macroeconomic Data Retrieval and Cleaning:

We collect macroeconomic data from the Federal Reserve Economic Data (FRED) API using a set of specific series IDs (e.g., GDP, UNRATE, CPIAUCSL, RSAFS, INDPRO).
Each series is retrieved using API requests, converted to a pandas DataFrame, and cleaned to ensure the data types are consistent (dates converted to datetime format and values to numerical types).
We then merge these series into a single DataFrame, aligning data points by date.
The merged DataFrame undergoes further processing to add categorical features, such as the classification of inflation levels (low, medium, high) and GDP growth stages (recession, stagnation, growth).
This merged macroeconomic data is saved to a CSV file (macro_data.csv) for easy access and further analysis.
Netflix Stock Data Retrieval and Cleaning:

Using the Alpha Vantage API, we pull historical daily stock data for Netflix (NFLX). This includes information on daily open, high, low, close prices, and trading volume.
The time series data is converted to a pandas DataFrame, and column names are simplified for ease of use.
Dates are converted to datetime format, and all values are transformed to numeric data types to facilitate analysis.


In [17]:
import requests
import pandas as pd

In [18]:
api_key = '7a802e204a08789569034837ff203fb7'
url_base = 'https://api.stlouisfed.org/fred/series/observations'

def fetch_and_clean_data(series_ids, start_date='2000-01-01', end_date='2024-12-31'):
    """ pulls macro economic data from the given series ID with specified start and 
        end date and cleans the data
    
    Args:
        series_ids (list): list of macro economic series id to get from the API
        start_date (str) : start date for collecting data in YY-MM-DD format
        end_date (str) : end date for collecting data in YY-MM-DD format
    Returns:
        merged_df (DataFrame): a merged dataframe with numerical and categorical data from
                                the series requested. Each column corresponds to a series ID 
                                with its values
    """
    # Creating an empty list to store all values
    all_data = []
    
    # Using for loop to iterate through each series ID to get data from API key  
    for series_id in series_ids:
        params = {
            'series_id': series_id,
            'api_key': api_key,
            'file_type': 'json',
            'observation_start': start_date,
            'observation_end': end_date
        }
        
        # Sending a request to API
        response = requests.get(url_base, params=params)
        data = response.json()['observations']
        
        # Creating the dataframe
        df = pd.DataFrame(data)
        df = df[['date', 'value']]
        
        # Converting the values to numeric and date in datetime format
        df['value'] = pd.to_numeric(df['value'], errors='coerce')
        df['date'] = pd.to_datetime(df['date'])
        
        # Renaming the value column to series ID
        df.rename(columns={'value': series_id}, inplace=True)
        
        # Append DataFrame to the list
        all_data.append(df)
    
    # Merging all data on the 'date' column
    merged_df = all_data[0]
    for df in all_data[1:]:
        merged_df = pd.merge(merged_df, df, on='date', how='outer')

    # Adding categorical feature: Inflation Level
    if 'CPIAUCSL' in series_ids:
        merged_df['Inflation_Level'] = pd.cut(merged_df['CPIAUCSL'], 
                                              bins=[-float('inf'), 2, 3, float('inf')], 
                                              labels=['Low', 'Medium', 'High'])
    
    # Calculating GDP growth rate and categorizing it as recession, stagnation, or growth
    if 'GDP' in series_ids:
        merged_df['GDP_Growth_Rate'] = merged_df['GDP'].pct_change() * 100
        merged_df['GDP_Growth_Stage'] = pd.cut(merged_df['GDP_Growth_Rate'], 
                                               bins=[-float('inf'), 0, 2, float('inf')], 
                                               labels=['Recession', 'Stagnation', 'Growth'])

    # Saving the merged DataFrame to a CSV file
    merged_df.to_csv('macro_data.csv', index=False)

    return merged_df



In [20]:
# list of series ID
series_ids = ['GDP', 'UNRATE', 'CPIAUCSL', 'RSAFS', 'INDPRO']
merged_df = fetch_and_clean_data(series_ids)

  merged_df['GDP_Growth_Rate'] = merged_df['GDP'].pct_change() * 100


In [22]:
# Printing merged_df
merged_df

Unnamed: 0,date,GDP,UNRATE,CPIAUCSL,RSAFS,INDPRO,Inflation_Level,GDP_Growth_Rate,GDP_Growth_Stage
0,2000-01-01,10002.179,4.0,169.300,268044.0,91.4092,High,,
1,2000-04-01,10247.720,3.8,170.900,271046.0,92.6659,High,2.454875,Growth
2,2000-07-01,10318.165,4.0,172.700,272630.0,92.8373,High,0.687421,Stagnation
3,2000-10-01,10435.744,3.9,173.900,276927.0,92.6400,High,1.139534,Stagnation
4,2001-01-01,10470.231,4.2,175.600,278834.0,91.8908,High,0.330470,Stagnation
...,...,...,...,...,...,...,...,...,...
292,2024-05-01,,4.0,313.225,704309.0,103.0711,High,0.000000,Recession
293,2024-06-01,,4.1,313.049,702350.0,103.2258,High,0.000000,Recession
294,2024-07-01,,4.3,313.534,710851.0,102.5863,High,0.000000,Recession
295,2024-08-01,,4.2,314.121,711291.0,102.9329,High,0.000000,Recession


#### Data Usage and Remaining Issues:

The aim of this analysis is to explore how various macroeconomic factors influence Netflix's financial performance and stock price. By examining data on GDP, unemployment rate, inflation (CPI), retail sales, and industrial output, the goal is to uncover correlations between these economic indicators and Netflix's profitability as well as stock movements. Two primary questions guide this study: (1) What relationships exist between macroeconomic factors like GDP, CPI, unemployment rate, and Netflix’s financial performance? and (2) How does inflation, in particular, impact Netflix’s stock price?

For the analysis, two key data sources will be utilized: macroeconomic data from one source, and Netflix's financial and stock performance from another. Eventually, once machine learning techniques are covered, regression analysis could be applied to predict how changes in these economic indicators may affect Netflix’s revenue or stock price. Additionally, classification algorithms could help categorize periods of economic change (such as recessions or periods of growth) and analyze their influence on Netflix’s overall performance. One example of how I plan to use the ML model is to find out how the inflation level in the economy affects the netflix's current stock price

One significant issue is the need for additional data cleaning, particularly in handling missing values (NaNs) that may affect the reliability of the analysis. I am still fiding the best possible way to tackle it so that I can run my regression analysis on this missing data.

Additionally, I am still in the process of finding a reliable third-party source for Netflix's complete stock history, which is crucial for accurate analysis. Once this is resolved, the combined data on Netflix’s stock performance and macroeconomic factors will allow for more predictive models.

In [24]:
import requests
import pandas as pd

# API key and symbol
api_key = '7a802e204a08789569034837ff203fb7'
symbol = 'NFLX'

# Fetch stock market data using Alpha Vantage API
url = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={symbol}&apikey={api_key}&outputsize=full'
response = requests.get(url)

# Extract the JSON data
data = response.json()

# Check if the response contains time series data
if "Time Series (Daily)" in data:
    # Convert the time series data into a DataFrame
    time_series = data["Time Series (Daily)"]
    
    # Create a DataFrame from the time series data
    stock_data = pd.DataFrame.from_dict(time_series, orient='index')
    
    # Rename the columns for easier access
    stock_data.columns = ['open', 'high', 'low', 'close', 'volume']
    
    # Convert the index to datetime
    stock_data.index = pd.to_datetime(stock_data.index)
    
    # Convert numeric columns to float
    stock_data = stock_data.astype(float)
    
    # Display the first few rows of the stock data
    print("Stock Data:")
    print(stock_data.head())
else:
    print("No time series data found.")

Stock Data:
              open      high     low   close      volume
2024-10-22  765.27  769.7000  761.12  764.24   2987252.0
2024-10-21  765.76  773.0000  756.60  772.07   6057093.0
2024-10-18  737.64  766.2810  736.23  763.89  15974119.0
2024-10-17  704.35  704.4124  677.88  687.65   8926672.0
2024-10-16  703.43  705.5900  697.82  702.00   2494276.0
