# A Deep Dive into the S&P 500: Predicting Stock Prices
Kanishk Chinnapapannagari, Aarav Naveen, Avyay Potarlanka, and Melvin Rajendran

## Introduction

In today’s evolving financial landscape, both investors and traders are constantly seeking an edge to make informed decisions. The stock market, which contains an intricate web of variables and is influenced by numerous factors, has proven to be a difficult environment to navigate.

In the past, investment-related decisions were often made based on analysis of historical trends. However, the advancement of data science and machine learning techniques has introduced a new opportunity to potentially predict future stock prices with reasonable accuracy and thus gain valuable insights.

This data science project delves into prediction of stock prices within the Standard & Poor’s 500 index, otherwise known as the S&P 500. This index contains 500 of the top companies in the United States, and it represents approximately 80% of the U.S. stock market’s total value. Hence, it serves as a strong indicator of the movement within the market. To learn more about the S&P 500 and other popular indices in the U.S., read this article: https://www.investopedia.com/insights/introduction-to-stock-market-indices/.

Throughout this project, we will follow a comprehensive data science approach that includes the following steps:
Data collection
Data processing
Exploratory data analysis and data visualization
Data analysis, hypothesis testing, and machine learning (ML)
Insight formation

Our project aims to leverage predictive modeling techniques to provide insights to investors. The analysis herein will identify stocks that are undervalued and thus will increase in price in the near future, meaning investors should consider buying or holding shares. Likewise, it will also identify stocks that are overvalued and will soon decrease in price, indicating that investors should consider selling their position.

In [1]:
# Import necessary libraries
import numpy as np
import os
import pandas as pd

## Data Collection

In [2]:
# Initialize an empty data frame to store the stock price data
data = pd.DataFrame()

# Initialize the path to the folder containing the data
folder_path = 'sp500-data'

# Iterate across each file in the folder by name
for file_name in os.listdir(folder_path):
    # Check if the current file is a CSV file
    if file_name.endswith('.csv'):
        # Read the current file into a temporary data frame
        temp = pd.read_csv(os.path.join(folder_path, file_name))
        
        # Extract the ticker from the current file's name
        ticker = file_name[0:-4]
        
        # Store the ticker in a new column in the temporary data frame 
        temp['Ticker'] = ticker
        
        # Concatenate the accumulating and temporary data frames
        data = pd.concat([data, temp], ignore_index = True)

# Reindex the data frame's columns
data = data.reindex(columns = ['Ticker', 'Date', 'Open', 'High', 'Low', 'Close', "Adjusted Close", 'Volume'])

# Print the first five rows of the data frame
data.head()

Unnamed: 0,Ticker,Date,Open,High,Low,Close,Adjusted Close,Volume
0,A,18-11-1999,32.546494,35.765381,28.612303,31.473534,26.92976,62546380.0
1,A,19-11-1999,30.713518,30.758226,28.478184,28.880545,24.711119,15234146.0
2,A,22-11-1999,29.551144,31.473534,28.657009,31.473534,26.92976,6577870.0
3,A,23-11-1999,30.400572,31.205294,28.612303,28.612303,24.481602,5975611.0
4,A,24-11-1999,28.701717,29.998213,28.612303,29.372318,25.131901,4843231.0


## Data Processing

## Exploratory Data Analysis and Data Visualization

## Data Analysis, Hypothesis Testing, and Machine Learning

## Insights