# Company financials analysis
This notebook focuses on strategies to analyse a stock symbol's potential or performance based on its company's basic information and financial data.

Due to rate limit restrictions from most APIs, we will stick to downloading limited data from the S&Q 500 index only.

### Packages
Please ensure these packages are installed on your local environment via ```pip install -r requirements.txt``` or the corresponding package manager on your OS.

In [5]:
import csv
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.preprocessing import StandardScaler


%matplotlib inline

### 1. Gathering basic financial data on all US stock symbols
First we will download the basic financial data for every US stock symbol from the S&Q 500 index. The data can be found [here](https://datahub.io/core/s-and-p-500-companies-financials).

In [2]:
sq500_financials = pd.read_csv('s&q500_financials.csv')
sq500_financials.head()

Unnamed: 0,Symbol,Name,Sector,Price,Price/Earnings,Dividend Yield,Earnings/Share,52 Week Low,52 Week High,Market Cap,EBITDA,Price/Sales,Price/Book
0,MMM,3M Company,Industrials,222.89,24.31,2.332862,7.92,259.77,175.49,138721000000.0,9048000000.0,4.390271,11.34
1,AOS,A.O. Smith Corp,Industrials,60.24,27.76,1.147959,1.7,68.39,48.925,10783420000.0,601000000.0,3.575483,6.35
2,ABT,Abbott Laboratories,Health Care,56.27,22.51,1.908982,0.26,64.6,42.28,102121000000.0,5744000000.0,3.74048,3.19
3,ABBV,AbbVie Inc.,Health Care,108.48,19.41,2.49956,3.29,125.86,60.05,181386000000.0,10310000000.0,6.291571,26.14
4,ACN,Accenture plc,Information Technology,150.51,25.47,1.71447,5.44,162.6,114.82,98765860000.0,5643228000.0,2.604117,10.62


In [13]:
sq500_data = sq500_financials.drop(['Symbol', 'Name', 'Sector', '52 Week Low', '52 Week High'], axis=1)
sq500_data

Unnamed: 0,Price,Price/Earnings,Dividend Yield,Earnings/Share,Market Cap,EBITDA,Price/Sales,Price/Book
0,222.89,24.31,2.332862,7.92,1.387210e+11,9.048000e+09,4.390271,11.34
1,60.24,27.76,1.147959,1.70,1.078342e+10,6.010000e+08,3.575483,6.35
2,56.27,22.51,1.908982,0.26,1.021210e+11,5.744000e+09,3.740480,3.19
3,108.48,19.41,2.499560,3.29,1.813860e+11,1.031000e+10,6.291571,26.14
4,150.51,25.47,1.714470,5.44,9.876586e+10,5.643228e+09,2.604117,10.62
...,...,...,...,...,...,...,...,...
500,70.24,30.94,1.170079,1.83,1.291502e+10,7.220000e+08,2.726209,5.31
501,76.30,27.25,1.797080,4.07,2.700330e+10,2.289000e+09,6.313636,212.08
502,115.53,14.32,0.794834,9.01,2.445470e+10,2.007400e+09,3.164895,2.39
503,50.71,17.73,1.480933,2.60,1.067068e+10,0.000000e+00,3.794579,1.42


The disparity in data values are too large (e.g. price against market cap). So, we need to normalise our data for better results.

In [11]:
scaler = StandardScaler()
sq500_data_scaled = scaler.fit_transform(sq500_data)
sq500_data_scaled.shape

(505, 10)

We now have a numpy `ndarray` with shape (505, 10) that we can apply machine learning techniques on, in which the values of the dataset are now scaled.

### 2. Finding relationships between features

We can first get an idea of how different features are related to each other using a correlation matrix.

In [14]:
sq500_data.corr()

Unnamed: 0,Price,Price/Earnings,Dividend Yield,Earnings/Share,Market Cap,EBITDA,Price/Sales,Price/Book
Price,1.0,0.194761,-0.24463,0.591061,0.406474,0.180321,0.181625,0.023637
Price/Earnings,0.194761,1.0,-0.18133,0.00457,0.131381,0.00295,0.188558,0.000366
Dividend Yield,-0.24463,-0.18133,1.0,-0.077235,-0.021794,0.126133,-0.077724,0.089369
Earnings/Share,0.591061,0.00457,-0.077235,1.0,0.194063,0.180583,-0.036178,0.032254
Market Cap,0.406474,0.131381,-0.021794,0.194063,1.0,0.771344,0.095249,0.034411
EBITDA,0.180321,0.00295,0.126133,0.180583,0.771344,1.0,-0.037175,0.035547
Price/Sales,0.181625,0.188558,-0.077724,-0.036178,0.095249,-0.037175,1.0,0.012337
Price/Book,0.023637,0.000366,0.089369,0.032254,0.034411,0.035547,0.012337,1.0


We note that some features aren't really related to any other feature, for example, the P/B ratio - its strongest correlation is only 0.089.