# Stock Analysis

## Getting and Cleaning Data
I want to analyze stocks that have a ~15% day-over-day price increase and see if we can predict which stocks will pop.

In [1]:
import pandas as pd
import numpy as np
import yfinance as yf
import requests
import os
from datetime import date


## Getting Data
We want to analyze all stocks on the NYSE. We will use historic daily trading data for the last 20 years. If the stock has not been publically traded in the last 10 years then we will subset them out.

First thing is to do download the list of publically traded stocks on NYSE here: https://www.nasdaq.com/market-activity/stocks/screener and place them in your data file. For some reason I could not find a link to the excel on the website so you need to do this manually. I have renamed the csv file as "nyse_stock_tickers.csv".

Once this is done we can load in the ticker data.

In [21]:
url = 'https://raw.githubusercontent.com/rreichel3/US-Stock-Symbols/main/nyse/nyse_tickers.txt'
r = requests.get(url, allow_redirects=True)


In [22]:
all_nyse_stock_tickers = r.text.split()
print(all_nyse_stock_tickers)

['A', 'AA', 'AAC', 'AACT', 'AAIC', 'AAIC', 'B', 'AAIC', 'C', 'AAIN', 'AAM', 'A', 'AAM', 'B', 'AAN', 'AAP', 'AAT', 'AB', 'ABBV', 'ABC', 'ABEV', 'ABG', 'ABM', 'ABR', 'ABR', 'D', 'ABR', 'E', 'ABR', 'F', 'ABT', 'AC', 'ACA', 'ACCO', 'ACEL', 'ACHR', 'ACI', 'ACM', 'ACN', 'ACP', 'ACP', 'A', 'ACR', 'ACR', 'C', 'ACR', 'D', 'ACRE', 'ACRO', 'ACV', 'ADC', 'ADC', 'A', 'ADCT', 'ADM', 'ADNT', 'ADT', 'ADX', 'AEE', 'AEFC', 'AEG', 'AEL', 'AEL', 'A', 'AEL', 'B', 'AEM', 'AENZ', 'AEO', 'AER', 'AES', 'AESC', 'AESI', 'AEVA', 'AFB', 'AFG', 'AFGB', 'AFGC', 'AFGD', 'AFGE', 'AFL', 'AFT', 'AFTR', 'AG', 'AGAC', 'AGCO', 'AGD', 'AGI', 'AGL', 'AGM', 'AGM', 'C', 'AGM', 'D', 'AGM', 'E', 'AGM', 'F', 'AGM', 'G', 'AGO', 'AGR', 'AGRO', 'AGS', 'AGTI', 'AGX', 'AHH', 'AHH', 'A', 'AHL', 'C', 'AHL', 'D', 'AHL', 'E', 'AHT', 'AHT', 'D', 'AHT', 'F', 'AHT', 'G', 'AHT', 'H', 'AHT', 'I', 'AI', 'AIC', 'AIF', 'AIG', 'AIG', 'A', 'AIN', 'AIO', 'AIR', 'AIRC', 'AIT', 'AIU', 'AIV', 'AIZ', 'AIZN', 'AJG', 'AJRD', 'AJX', 'AJXA', 'AKA', 'AKO', '

In [23]:
company_details_to_keep = ['country', 'industry', 'industryDisp', 'sector', 'longBusinessSummary', 'industryDisp', 
                           'marketCap', 'sharesOutstanding', 'exchange']
def get_company_information(ticker):
    yahoo_finance_company_object = yf.Ticker(ticker)
    company_overview_information = []
    for company_info in company_details_to_keep:
        company_overview_information.append(yahoo_finance_company_object.info.get(company_info))
    return company_overview_information

In [24]:
results = get_company_information(all_nyse_stock_tickers[0])

In [18]:
company_details

['United States',
 'Software—Infrastructure',
 'Software—Infrastructure',
 'Technology',
 'Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. The company operates in three segments: Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment offers Office, Exchange, SharePoint, Microsoft Teams, Office 365 Security and Compliance, Microsoft Viva, and Skype for Business; Skype, Outlook.com, OneDrive, and LinkedIn; and Dynamics 365, a set of cloud-based and on-premises business solutions for organizations and enterprise divisions. The Intelligent Cloud segment licenses SQL, Windows Servers, Visual Studio, System Center, and related Client Access Licenses; GitHub that provides a collaboration platform and code hosting service for developers; Nuance provides healthcare and enterprise AI solutions; and Azure, a cloud platform. It also offers enterprise support, M

There are 7631 rows and 11 columns. All have a unique index value.  

IPO year is missing a lot of data. One company missing its ticker. Around ~500 companies missing market cap and country values. Around 600 companies missing their sector and industry as well.

Last sale needs to be float. % Change needs to be a float as well. 

I really only care about the tickers right now as we'll be using it to gather previous historical stock data. So I'm only going to focus on ensure that all symbols are accounted for. There also may be issues with this dat

Let's check if there are any duplicate rows.

In [28]:
print(f"There are {tickers.duplicated().sum()} duplicated rows")

There are 0 duplicated rows


In [29]:
# check company w/missing ticker
missing_company_ticker = tickers[tickers['Symbol'].isna()]
missing_company_ticker.Name

4682    Nano Labs Ltd American Depositary Shares
Name: Name, dtype: object

In [30]:
# update Nano Labs ticker which is "NA"
tickers.loc[missing_company_ticker.index, 'Symbol'] = 'NA'

# sanity check
tickers.loc[missing_company_ticker.index, ]

Unnamed: 0,Symbol,Name,Last Sale,Net Change,% Change,Market Cap,Country,IPO Year,Volume,Sector,Industry
4682,,Nano Labs Ltd American Depositary Shares,$1.90,0.1,5.556%,52960923.0,China,2022.0,27710,Technology,Semiconductors


In [31]:
tickers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7632 entries, 0 to 7631
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Symbol      7632 non-null   object 
 1   Name        7632 non-null   object 
 2   Last Sale   7632 non-null   object 
 3   Net Change  7632 non-null   float64
 4   % Change    7632 non-null   object 
 5   Market Cap  7193 non-null   float64
 6   Country     7117 non-null   object 
 7   IPO Year    4454 non-null   float64
 8   Volume      7632 non-null   int64  
 9   Sector      7012 non-null   object 
 10  Industry    7012 non-null   object 
dtypes: float64(3), int64(1), object(7)
memory usage: 656.0+ KB


Let's now get the historical stock data for the last 10 years.

In [27]:
only_tickers = tickers.Symbol

NameError: name 'tickers' is not defined

In [28]:
current_company = yf.Ticker("AAPL")

In [31]:

msft = yf.Ticker("MSFT")

# get all stock info
msft.info

# get historical market data
hist = msft.history(period="1mo")


HTTPError: 401 Client Error: Unauthorized for url: https://query1.finance.yahoo.com/v7/finance/quote?formatted=true&lang=en-US&symbols=MSFT

In [77]:
all_stock_data = []
today_date = str(date.today())
start_date = "2010-01-01"
for ticker in only_tickers:
    stock_data = yf.download(ticker, start="2010-01-01", end=today_date)
    all_stock_data.append(stock_data)
    
print(len(all_stock_data))
    

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AACIW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AAIC^B: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AAIC^C: No timezone found, symbol may be delisted
[**************

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AL^A: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%*******

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- APXIW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AQUNR: Period 'max' is invalid, must be one of ['1d', '5d']
[*******

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AURCW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AUROW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- AUUDW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 compl

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- BFIIW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- BFRGW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Faile

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- CBRGW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
-

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- CITEW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[********************

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DBGIW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DBRG^H: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DBRG^I: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DBRG^J: No timezone found, symbol may be delisted
[***********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DLR^J: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DLR^K: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- DLR^L: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%**

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- EFTRW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[********************

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- EP^C: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%*******

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- FTIIW: Period 'max' is invalid, must be one of ['1d', '5d']
[********************

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- GLLIR: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- GLLIW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*******

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- HAIAW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[********************

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- HTZWW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- HUBCW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed

1 Faile

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- INTEW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[********************

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- IVDAW: Period 'max' is invalid, must be one of ['1d', '5d']
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- IVR^B: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- IVR^C: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[****************

KeyboardInterrupt: 

[*********************100%***********************]  2 of 2 completed


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 81 entries, 2017-01-03 to 2017-04-28
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   (Adj Close, AAPL)  81 non-null     float64
 1   (Adj Close, SPY)   81 non-null     float64
 2   (Close, AAPL)      81 non-null     float64
 3   (Close, SPY)       81 non-null     float64
 4   (High, AAPL)       81 non-null     float64
 5   (High, SPY)        81 non-null     float64
 6   (Low, AAPL)        81 non-null     float64
 7   (Low, SPY)         81 non-null     float64
 8   (Open, AAPL)       81 non-null     float64
 9   (Open, SPY)        81 non-null     float64
 10  (Volume, AAPL)     81 non-null     int64  
 11  (Volume, SPY)      81 non-null     int64  
dtypes: float64(10), int64(2)
memory usage: 8.2 KB
