# Applying Machine Learning and Search Methods for S&P500 Stock Portfolio Forecasting and Optimization

 Project developed by: **Eduardo Passos** [202205630](https://sigarra.up.pt/fcup/pt/fest_geral.cursos_list?pv_num_unico=202205630), **Pedro Fernandes** [202208347](https://sigarra.up.pt/fcup/pt/fest_geral.cursos_list?pv_num_unico=202208347) and **Rafael Pacheco** [202206258](https://sigarra.up.pt/fcup/pt/fest_geral.cursos_list?pv_num_unico=202206258)

### Index {#index} #############################################
1. [Project Introduction](#intro)
2. [Introduction to Stock Concepts](#intro2)
3. [Data Extraction and Collection](#data)
4. [Exploratory Data Analysis](#eda)
5. 

? [Conclusion](#conclusion)
? [References](#ref)

# Project Introduction and Motivation {#intro}

This project encompasses the creation of a well suited investment strategy based on the S&P500 stock dataset.
The highlights of the group's development process are detailed throughout this report. All of the specific files utilized during the project's development can be found inside the submitted folder.

In order to predict stock behaviour, we employed:

 - `Deep Learning`: Long Short-Term Memory (LSTM)
 - ...

To optimize portfolio selection, we implemented:

 - `Search Methods`: Monte Carlo Tree Search (MCTS)
 - ...

The stock market is highly volatile and unpredictable, making stock price prediction nearly luck based.

In order to create strategies that allow for investors to efficiently obtain risk-adjusted returns, we can use **S&P500 data** to get a better understanding of how the stock market may behave, based on previously collected data and statistics.

It's important to mention that it doesn't always follow a guaranteed predictable, mathematical pattern. It is influenced by many real-world factors, independent to a company's growth and significance.

# Introduction to Stock Concepts

In case the reader is unfamiliar with stocks and investing, we decided to briefly explain key concepts used throughout this report.

The S&P500 is a stock market index that tracks the performance of 500 of the largest publicly traded companies in the United States. As per requested in the project statement, we used this dataset's information, from 2010 to 2023, in order to predict the stock behaviour of those companies during January 20204.

### **What are stocks, and why are they an investment?**
Stocks (or shares) represent ownership in a company. Investors buy stocks to gain a portion of a company's profits, or to benefit from an increase in the stock's market value.

#### **What are tickers?**
A ticker is a unique symbol assigned to a company's stock, essentially an identifier for each company, in order to facilitate stock tracking:

 - `AAPL`: Apple Inc.
 - `GOOG`: Alphabet Inc. (or, simply put, Google)
 
### **What are opening and closing prices?**
The opening price is the price at which a stock begins trading, when the market opens for the day. 
The closing price is the actual last transaction price on that day, for that specific stock.

We will be using daily windows in order to predict these prices.

### **What are windows, and how are they helpful during prediction?**
In time series analysis, a "window" refers to a segment of the data used for analysis or prediction. 
By using time series analysis, we aim to identify patterns, trends, and seasonal effects in the data.

### **Most importantly, how can I gain or lose money by investing?**

A positive return indicates profit, while a negative return signifies a loss. 
These are typically expressed as a percentage of the original investment. 

Imagine the investor purchases stock at 100:

 - Stock price increases from 100 to 110 -> the return is 10% -> <span style="color:green">Profit!</span>

 - Stock price decreases from 100 to 90 -> the return is -10% -> <span style="color:red">Loss!</span>

Market fluctuations dictate stock prices, which in return represent profit or loss for investors.

# Data Extraction and Collection

In order to extract the 2010-2023 section of the dataset, we used the `yfinance` module. 

[Wikipedia](https://en.wikipedia.org/wiki/List_of_S%26P_500_companies) is also accessed to download a table containing a list of S&P500 tickers.

The functions below document the extraction and collection process.

In [1]:
import pandas as pd
import yfinance as yf

In [None]:
# Step 1: Get the list of S&P 500 companies
def get_sp500_tickers():
    # Download the table from Wikipedia
    url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
    table = pd.read_html(url)[0]
    tickers = table['Symbol'].tolist()
    
    # Remove any invalid ticker symbols if necessary
    tickers = [ticker.replace('.', '-') for ticker in tickers]  # For Yahoo Finance compatibility
    return tickers

In [None]:
# Step 2: Download data for each stock
def download_sp500_data(tickers, start_date="2000-01-01", end_date="2024-12-31"):
    data = {}
    for ticker in tickers:
        print(f"Downloading data for {ticker}...")
        try:
            data[ticker] = yf.download(ticker, start=start_date, end=end_date)
        except Exception as e:
            print(f"Error downloading {ticker}: {e}")
    return data

In [None]:
# Step 3: Save or analyze the data
def save_data_to_csv(data):
    for ticker, df in data.items():
        if not df.empty:
            df.to_csv(f"{ticker}.csv")
        else:
            print(f"No data for {ticker}.")

This code is in `markdown` format in order to avoid unnecessary execution, as all of the files are in the `all_sp500_csvs_2010-24` folder.

```py
tickers = get_sp500_tickers()
sp500_data = download_sp500_data(tickers)
save_data_to_csv(sp500_data)
```

ta sacado ate dezembro 24