<hr/>

# EDA - Warren Buffett US Stock Companies
### **[by Tomas Mantero](https://www.kaggle.com/tomasmantero)**
<hr/>

### Table of Contents
* **1. [Introduction](#ch1)**
* **2. [Data preparation](#ch2)**
    * 2.1 Load Data
    * 2.2 Check Features
    * 2.3 Clean Dataframes
    * 2.4 Concatenate The Stock Dataframes
* **3. [Exploratory Data Analysis](#ch3)**
    * 3.1 Warren Buffett Portfolio
    * 3.2 New Return Dataframes
    * 3.3 Standard Deviation
    * 3.4 Stocks Pearson Correlation Matrix
    * 3.5 Stocks Cluster Map
* **4. [Web Scraping Yahoo Finance](#ch4)**
    * 4.1 Sector, Industry and Number of Employees
    * 4.2 Load New Data
    * 4.3 Sunburst Charts Sectors and Industries
* **5. [Financial Charts](#ch5)**
    * 5.1 Line Charts
    * 5.2 Histograms Charts
    * 5.3 Moving Averages Charts
    * 5.4 Candlestick Charts
    * 5.5 Bollinger Band Charts
    * 5.6 OHLC Charts
* **6. [Predictions and Conclusion](#ch6)**
    * 6.1 Investment Recommendations
* **7. [References](#ch7)**

<a id="ch1"></a>
# 1. Introduction
---
In this notebook we will be analyzing the companies that make up Warren Buffett's portfolio. We will analyze their behaviors over the past few years, as well as their behaviors during the COVID-19 Pandemic.

We will try to determine new trends that can help investors make more informed decisions.

This Notebook follows five main parts:

* The data preparation
* The exploratory data analysis
* The web scraping
* The financial charts
* The Predictions and Conclusions

<img src="https://wallpapercave.com/wp/wp3746394.jpg" width="500" height="500"/>
<br>

*Note: The analysis of Warren Buffett's portfolio was conducted on October 16, 2020. It is possible that when you read this notebook there are a couple of outdated data, anyway, I will try to keep the notebook updated.*

<a id="ch2"></a>
# 2. Data preparation
---

### Imports

In [None]:
# data analysis and wrangling
import pandas as pd
from pandas_datareader import data
import numpy as np
import random as rnd
import datetime

# visualization
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('whitegrid')
%matplotlib inline

# plotly
import plotly
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import cufflinks as cf
cf.go_offline()

# web scraping
import requests
import bs4
import csv
import json 
import re
from io import StringIO

## 2.1 Load Data
There are two ways to load the data. The first is to simply read the csv files. The second is to use pandas dataerder which allows you to read stock information directly from the internet. In this case we will use Yahoo Finance. 

**Documentation: [Remote Data Access](https://pandas-datareader.readthedocs.io/en/latest/remote_data.html)**

In [None]:
# (The data ranges from 10/22/2010 - 10/16/2020)

# Company List and SEC Form 
company_list = pd.read_csv('../input/warren-buffett-us-stock-companies/Company List.csv',sep=';')
sec_form = pd.read_csv('../input/warren-buffett-us-stock-companies/SEC Form 13F.csv',sep=';')

# Remote Data Access
# set the start date and end date
start = datetime.datetime(2010,10,22)
end = datetime.datetime(2020,10,16)

# set each stock to be a separate dataframe
# Company Stocks List
GSPC = data.DataReader("^GSPC", 'yahoo', start, end)
AMZN = data.DataReader("AMZN", 'yahoo', datetime.datetime(2007,10,22), end)
AXP = data.DataReader("AXP", 'yahoo', start, end)
AAPL = data.DataReader("AAPL", 'yahoo', datetime.datetime(2007,10,22), end)
AXTA = data.DataReader("AXTA", 'yahoo', start, end)
BAC = data.DataReader("BAC", 'yahoo', datetime.datetime(2007,10,22), end)
BK = data.DataReader("BK", 'yahoo', start, end)
GOLD = data.DataReader("GOLD", 'yahoo', start, end)
BIIB = data.DataReader("BIIB", 'yahoo', start, end)
CHTR = data.DataReader("CHTR", 'yahoo', start, end)
KO = data.DataReader("KO", 'yahoo', datetime.datetime(2007,10,22), end)
COST = data.DataReader("COST", 'yahoo', start, end)
DVA = data.DataReader("DVA", 'yahoo', start, end)
GM = data.DataReader("GM", 'yahoo', start, end)
GL = data.DataReader("GL", 'yahoo', start, end)
JNJ = data.DataReader("JNJ", 'yahoo', start, end)
JPM = data.DataReader("JPM", 'yahoo', start, end)
KHC = data.DataReader("KHC", 'yahoo', datetime.datetime(2007,10,22), end)
KR = data.DataReader("KR", 'yahoo', start, end)
LBTYA = data.DataReader("LBTYA", 'yahoo', start, end)
LBTYK = data.DataReader("LBTYK", 'yahoo', start, end)
LILA = data.DataReader("LILA", 'yahoo', start, end)
LILAK = data.DataReader("LILAK", 'yahoo', start, end)
LSXMA = data.DataReader("LSXMA", 'yahoo', start, end)
LSXMK = data.DataReader("LSXMK", 'yahoo', start, end)
MTB = data.DataReader("MTB", 'yahoo', start, end)
MA = data.DataReader("MA", 'yahoo', start, end)
MDLZ = data.DataReader("MDLZ", 'yahoo', start, end)
MCO = data.DataReader("MCO", 'yahoo', start, end)
PNC = data.DataReader("PNC", 'yahoo', start, end)
PG = data.DataReader("PG", 'yahoo', start, end)
RH = data.DataReader("RH", 'yahoo', start, end)
SIRI = data.DataReader("SIRI", 'yahoo', start, end)
SNOW = data.DataReader("SNOW", 'yahoo', start, end)
SPY = data.DataReader("SPY", 'yahoo', start, end)
STNE = data.DataReader("STNE", 'yahoo', start, end)
STOR = data.DataReader("STOR", 'yahoo', start, end)
SU = data.DataReader("SU", 'yahoo', start, end)
SYF = data.DataReader("SYF", 'yahoo', start, end)
TEVA = data.DataReader("TEVA", 'yahoo', start, end)
USB = data.DataReader("USB", 'yahoo', start, end)
UPS = data.DataReader("UPS", 'yahoo', start, end)
VOO = data.DataReader("VOO", 'yahoo', start, end)
VRSN = data.DataReader("VRSN", 'yahoo', start, end)
V = data.DataReader("V", 'yahoo', start, end)
WFC = data.DataReader("WFC", 'yahoo', start, end)

# Berkshire Hathaway Stocks
BRKA = data.DataReader("BRK-A", 'yahoo', start, end)
BRKB = data.DataReader("BRK-B", 'yahoo', start, end)

## 2.2 Check Features 
First, check the data frames features to see if there is any problem in the data load.

***Company List***
-	**Name:** Name of the company. 
-	**Symbol:** Ticker symbol of the company. 	
-	**Holdings:** Number of shares.
-	**Market Price:** Current price at which a stock can be purchased or sold. (10/18/20)
-	**Value:** (Holdings * Market Price).  
-	**Stake:** The amount of stocks an investor owns from a company. 

In [None]:
company_list

In [None]:
company_list.info()

***SEC Form 13F***

**Name of Issuer, Title of Class, CUSIP Number, Market Value, Amount and Type of Security, Investment Discretion (Sole, Shared-Defined, Shared-Other), Other Managers, Voting Authority.**

You can find detail information of each column in the SEC [General Instructions Form 13F](https://www.sec.gov/pdf/form13f.pdf) in page 5. 

In [None]:
sec_form.head()

Every company file has the same structure with the same columns: 
-	**Date:** It is the date on which the prices were recorded.
-	**Close/Last:** Is the last price at which a stock trades during a regular trading session.
-	**Volume:** Is the number of shares that changed hands during a given day.
-	**Open:** Is the price at which a stock started trading when the opening bell rang.
-	**High:** Is the highest price at which a stock traded during the course of the trading day.
-	**Low:** Is the lowest price at which a stock traded during the course of the trading day.
- **Adj Close:** The adjusted closing price factors in corporate actions, such as stock splits, dividends, and rights offerings.

<img src="https://analyzingalpha.com/assets/images/posts/2020-04-17-bar-chart-ohlc.png" alt="Bar Chart OHLC" width="300" height="300"/>

In [None]:
AMZN.head()

## 2.3 Clean Dataframes

Remove spaces, % and $ sign from `company_list` dataframe. 

In [None]:
# Removing $ sign and spaces 
cols = ['Holdings', 'Market Price', 'Value', 'Stake']

# Company List (Table)
company_list[cols] = company_list[cols].replace({'\$': '', ' ': '', '%': '', ',': ''}, regex=True)

Remove the last row of the `company_list` dataframe. 

In [None]:
company_list.drop(company_list.tail(1).index,inplace=True)

Finally, convert the numbers, which are as strings, to floats.

In [None]:
# company_list dataframe
for x in cols:
    company_list[x] = company_list[x].astype(float)

## 2.4 Concatenate The Stock Dataframes

We are going to concatenate the stock dataframes together to a single data frame called `company_stocks`. This will help us to have better control over the dataframes and it will allow us to analyze it together.

We will also create an aditional dataframe called `berkshire_hathaway` with Class A stock (BRK-A) and Class B stock (BRK-B).

Let's create a list of the ticker symbols (as strings) in alphabetical order.

In [None]:
company_list_stocks = [AMZN,AXP,AAPL,AXTA,BAC,BK,GOLD,BIIB,CHTR,KO,COST,DVA,GM,GL,JNJ,JPM,KHC,KR,LBTYA,LBTYK,LILA,LILAK,LSXMA,
                       LSXMK,MTB,MA,MDLZ,MCO,PNC,PG,RH,SIRI,SNOW,SPY,STNE,STOR,SU,SYF,TEVA,USB,UPS,VOO,VRSN,V,WFC]

tickers = ['AMZN','AXP','AAPL','AXTA','BAC','BK','GOLD','BIIB','CHTR','KO','COST','DVA','GM','GL','JNJ','JPM','KHC','KR',
           'LBTYA','LBTYK','LILA','LILAK','LSXMA','LSXMK','MTB','MA','MDLZ','MCO','PNC','PG','RH','SIRI','SNOW','SPY','STNE',
           'STOR','SU','SYF','TEVA','USB','UPS','VOO','VRSN','V','WFC']

tickers2 = ['BRKA','BRKB']

company_stocks = pd.concat(company_list_stocks, axis=1, keys=tickers)

berkshire_hathaway = pd.concat([BRKA, BRKB], axis=1, keys=tickers2)

# Set the column name levels:
company_stocks.columns.names = ['Stock Ticker','Stock Info']
company_stocks.head()

In [None]:
berkshire_hathaway.columns.names = ['Stock Ticker','Stock Info']
berkshire_hathaway.head()

<a id="ch3"></a>
# 3. Exploratory Data Analysis
---

[Exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis) is an approach to analyze data sets to summarize their main characteristics, often with visual methods. In this case we are going to visualize and analyze the historical data of these stocks and try to find relevant information.

We will also analyze the stock portfolio to see which are the most relevant companies and which are the most profitable. 

I encourage you to check out the documentation on [Multi-Level Indexing](http://pandas.pydata.org/pandas-docs/stable/advanced.html) and [Using .xs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.xs.html), since we will be using it a lot.

## 3.1 Berkshire Hathaway Portfolio

### Pie Chart - Holdings per Company of Warren Buffett Portfolio
* The top 5 holdings in Berkshire Hathaway portfolio are:
    - Bank of America Corp
    - Apple Inc
    - The Coca-Cola Company
    - Kraft Heinz Co
    - American Express Company

In [None]:
fig = go.Figure()
fig.add_trace(go.Pie(values=company_list['Holdings'],labels=company_list['Symbol'], hole=.3, pull=[0, 0, 0, 0, 0.1]))
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(height=700)
fig.show()

### Bar Chart - Value per Company of Warren Buffett Portfolio
* Warren Buffett lagers position is Apple Inc. 
* Apple makes up 49.29% of Berkshire Hathaway portfolio and they own 119 Billion dollars worth of apple stock. 
* The second biggest stock of Berkshire Hathaway portfolio is Bank of America Corp, which is the biggest bank in America. 
* Bank of America makes up 10.33% of Berkshire Hathaway portfolio and the total value of this position is 25 Billion dollars. 
* Warren Buffett has been buying BAC for a long time. He started back in 2011 when the bank was struggling after the financial crisis. 
* The third biggest stock is Coca-Cola Company, which makes up 8.26% of Warren Buffett portfolio. He started investing in Coca-Cola in 1989. The value of this position is 20 Billion dollars.

In [None]:
order_value = company_list[['Symbol', 'Value']].sort_values('Value', ascending=False)

figure = px.bar(order_value, y=order_value['Symbol'], x=order_value['Value'], color='Symbol', 
                title='Bar Chart - Value per Company of Warren Buffett Portfolio')
figure.update_layout(showlegend=False)
figure.show()

In [None]:
order_value['Percentage'] = round((order_value['Value'] / order_value['Value'].sum()) * 100,2)
order_value[['Symbol','Percentage']].head(7).style.background_gradient(cmap='Blues').hide_index()

### Bar Chart - Top 5 Stakes per Company of Warren Buffett Portfolio
* Stake is the amount of stocks an investor owns from a company. 
* The biggest stake of Warren Buffett portfolio is 28.6% in Davita. 
* This means that Berkshire Hathaway owns 28.6% of DVA. 
* Kraft Heinz is in second place with 26.6% and American Express in third with 18.8%.
* Liberty Sirius XM Group Series C and Liberty Sirius XM Group Series A come in fourth and fifth place respectively.

In [None]:
company_list[['Symbol', 'Stake']].sort_values('Stake', ascending=False).head(5).style.background_gradient(cmap='Greens').hide_index()

In [None]:
top5_stake = company_list[['Symbol', 'Stake']].sort_values('Stake', ascending=False).head(5)

figure = px.bar(top5_stake, y=top5_stake['Symbol'], x=top5_stake['Stake'], color='Symbol', 
                title='Bar Chart - Top 5 Stakes per Company of Warren Buffett Portfolio')
figure.update_layout(showlegend=False)
figure.show()


## 3.2 New Return Dataframes

### Daily Returns DataFrame
Now we are going to create a new empty DataFrame called returns. This dataframe will contain the returns for each stock. 

Returns are typically defined by:

$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$

We can use pandas `pct_change()` method on the Close column to create a column representing this return value. Then we can create a for loop and for each Stock Ticker creates a returns column and set's it as a column in the returns DataFrame.

Our first value has NaN because you can not get a percent return on the very first day because there is nothing in the past to compare it to.

In [None]:
returns = pd.DataFrame()

for tick in tickers:
    returns[tick + ' Return'] = company_stocks[tick]['Close'].pct_change()
    
returns['BRKA Return'] = BRKA['Close'].pct_change()
returns['BRKB Return'] = BRKB['Close'].pct_change()
returns['GSPC Return'] = GSPC['Close'].pct_change()

### Monthly Returns DataFrame

In [None]:
monthly_returns = pd.DataFrame()

for tick in tickers:
    monthly_returns[tick + ' Return'] = company_stocks[tick]['Close'].resample('M').ffill().pct_change()
    
monthly_returns['BRKA Return'] = BRKA['Close'].resample('M').ffill().pct_change()
monthly_returns['BRKB Return'] = BRKB['Close'].resample('M').ffill().pct_change()
monthly_returns['GSPC Return'] = GSPC['Close'].resample('M').ffill().pct_change()

### Yearly Returns DataFrame

In [None]:
yearly_returns = pd.DataFrame()

for tick in tickers:
    yearly_returns[tick + ' Return'] = company_stocks[tick]['Close'].resample('Y').ffill().pct_change()
    
yearly_returns['BRKA Return'] = BRKA['Close'].resample('Y').ffill().pct_change()
yearly_returns['BRKB Return'] = BRKB['Close'].resample('Y').ffill().pct_change()
yearly_returns['GSPC Return'] = GSPC['Close'].resample('Y').ffill().pct_change()

### Cumulative Returns
To calculate the cumulative returns we will use the `cumprod()` function.

Documentation: [pandas.DataFrame.cumprod()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumprod.html)

In [None]:
cumulative_returns = pd.DataFrame()

for tick in tickers:
    cumulative_returns[tick + ' Return'] = (returns[tick + ' Return'] + 1).cumprod()
    
cumulative_returns['BRKA Return'] = (returns['BRKA Return'] + 1).cumprod()
cumulative_returns['BRKB Return'] = (returns['BRKB Return'] + 1).cumprod()
cumulative_returns['GSPC Return'] = (returns['GSPC Return'] + 1).cumprod()

### Yearly Cumulative Returns

In [None]:
yearly_cumulative_returns = pd.DataFrame()

for tick in tickers:
    yearly_cumulative_returns[tick + ' Return'] = (yearly_returns[tick + ' Return'] + 1).cumprod()
    
yearly_cumulative_returns['BRKA Return'] = (yearly_returns['BRKA Return'] + 1).cumprod()
yearly_cumulative_returns['BRKB Return'] = (yearly_returns['BRKB Return'] + 1).cumprod()
yearly_cumulative_returns['GSPC Return'] = (yearly_returns['GSPC Return'] + 1).cumprod()

### Bar Chart - Best Yearly Cumulative Returns in 2020

In [None]:
best_yearly_cum = yearly_cumulative_returns.loc['2020-12-31'].sort_values(ascending=False).head(10)
figure = px.bar(best_yearly_cum, title='Best Yearly Cumulative Returns in 2020', 
                labels={'value':'Yearly Cumulative Returns', 'index':'Stocks'})
figure.update_layout(showlegend=False)
figure.show()

## 3.3 Stock Standard Deviation
Let's take a look at the standard deviation of the returns.

**[Standard Deviation](https://en.wikipedia.org/wiki/Standard_deviation):** Is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

Standard deviation is the statistical measure of market volatility, measuring how widely prices are dispersed from the average price. If prices trade in a narrow trading range, the standard deviation will return a low value that indicates low volatility. Conversely, if prices swing wildly up and down, then standard deviation returns a high value that indicates high volatility.

**Basically, standard deviation rises as prices become more volatile. As price action calms, standard deviation heads lower.**

### Bar Chart - Daily Stocks Standard Deviation 

***Which stock would you classify as the riskiest over the entire time period?***
* Looking the daily returns the riskiest stocks are STNE, SNOW, LILAK and RH. 

In [None]:
returns.std().sort_values(ascending=False).plot(kind='bar', color='green', figsize=(16,6))
plt.ylabel('Standard Deviation', fontsize=15)
plt.title('Daily Stocks Standard Deviation', fontsize=18)
sns.despine()

### Bar Chart - Yearly Stocks Standard Deviation 
If we plot the same data but with the yearly returns, the riskiest stocks are RH, GOLD, STNE and BAC. 

In [None]:
yearly_returns.std().sort_values(ascending=False).plot(kind='bar', color='green', figsize=(16,6))
plt.ylabel('Standard Deviation', fontsize=15)
plt.title('Yearly Stocks Standard Deviation', fontsize=18)
sns.despine()

### Bar Chart - Most Volatile Stocks 2020

In [None]:
returns_2020 = returns.loc['2020-01-01':'2020-12-31'].std().head(10).sort_values(ascending=False)
figure = px.bar(returns_2020, title='Most Volatile Stocks 2020', 
                labels={'value':'Standard Deviation', 'index':'Stocks'})
figure.update_layout(showlegend=False)
figure.show()

### Dist Plot - 2020 AXP, BAC, AXTA and GOLD Standard Deviation

In [None]:
fig, axes = plt.subplots(2, 2,figsize=(15,10))
sns.distplot(returns.loc['2020-01-01':'2020-12-31']['AXP Return'], color='green', bins=50, ax=axes[0,0])
sns.distplot(returns.loc['2020-01-01':'2020-12-31']['BAC Return'], color='blue', bins=50, ax=axes[0,1])
sns.distplot(returns.loc['2020-01-01':'2020-12-31']['AXTA Return'], color='orange', bins=50, ax=axes[1,0])
sns.distplot(returns.loc['2020-01-01':'2020-12-31']['GOLD Return'], color='purple', bins=50, ax=axes[1,1])
sns.despine()
axes[0,0].set(xlabel='Standard Deviation', ylabel='Returns', title='American Express')
axes[0,1].set(xlabel='Standard Deviation', ylabel='Returns', title='Bank of America')
axes[1,0].set(xlabel='Standard Deviation', ylabel='Returns', title='Axalta Coating Systems')
axes[1,1].set(xlabel='Standard Deviation', ylabel='Returns', title='Barrick Gold')

## 3.4 Stocks Pearson Correlation Matrix
We use the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#:~:text=In%20statistics%2C%20the%20Pearson%20correlation,between%20%2B1%20and%20%E2%88%921.) to examine the strength and direction of the linear relationship between two continuous variables.

The correlation coefficient can range in value from −1 to +1. The larger the absolute value of the coefficient, the stronger the relationship between the variables. For the Pearson correlation, an absolute value of 1 indicates a perfect linear relationship. A correlation close to 0 indicates no linear relationship between the variables. 

The sign of the coefficient indicates the direction of the relationship. If both variables tend to increase or decrease together, the coefficient is positive, and the line that represents the correlation slopes upward. If one variable tends to increase as the other decreases, the coefficient is negative, and the line that represents the correlation slopes downward.

Let's create a heatmap of the correlation between the stocks Adj Close Price.

In [None]:
sns.set(style="whitegrid", font_scale=1)
plt.figure(figsize=(15,15))
plt.title('Pearson Correlation Matrix',fontsize=25)
sns.heatmap(company_stocks.xs(key='Adj Close', axis=1, level='Stock Info').corr(),linewidths=0.1,
            square=True,cmap="GnBu",linecolor='w', annot=False, cbar_kws={"shrink": .7})

## 3.5 Stocks Cluster Map 
Now we can use the same correlation of above to plot a [clustermap.](https://seaborn.pydata.org/generated/seaborn.clustermap.html) 

In [None]:
sns.clustermap(company_stocks.xs(key='Adj Close', axis=1, level='Stock Info').corr(), cmap='coolwarm', figsize=(15,15),
              linewidths=0.1, linecolor='w')

<a id="ch4"></a>
# 4. Web Scraping Yahoo Finance - Stocks Profile
---
## 4.1 Sector, Industry and Number of Employees
We are going to make a new datafram with all the companies from Warren Buffett porfolio. We are going to make a list with all the companies and their Sector, Industry and Number of Employees.

To quickly take this information we are going to do a web scraping in yahoo finance.

First, select the url from the profile page of the stock. Then replace the stock symbols with curly brackets. This will allow us to insert whatever symbol we want with string formatting.

In [None]:
"""
url_profile = 'https://finance.yahoo.com/quote/{}/profile?p={}'

stock = ['AMZN','AXP','AAPL','AXTA','BAC','BK','GOLD','BIIB','CHTR','KO','COST','DVA','GM','GL','JNJ','JPM','KHC','KR',
         'LBTYA','LBTYK','LILA','LILAK','LSXMA','LSXMK','MTB','MA','MDLZ','MCO','PNC','PG','RH','SIRI','SNOW','SPY','STNE',
         'STOR','SU','SYF','TEVA','USB','UPS','VOO','VRSN','V','WFC']

# Create the csv file
f = open('stock_profile.csv', 'w', encoding='utf-8')

# First line of the csv file.
headers = 'Name;Symbol;Sector;Industry;Num_Employees\n'
f.write(headers)  

count = 0
for i in range(0,45):
    
    response = requests.get(url_profile.format(tickers[i], tickers[i]))
    soup = bs4.BeautifulSoup(response.text, 'html.parser')

    pattern = re.compile(r'\s--\sData\s--\s')
    script_data = soup.find('script', text=pattern).contents[0]

    start = script_data.find('context')-2 # Beginning
    json_data = json.loads(script_data[start:-12]) # End
    
    longName = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['quoteType']['longName']
    symbol = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['quoteType']['symbol']
    
    # Since SPY and VOO are not stocks they have a different profile page.
    if count == 33 or count == 41:
        sector = 'NaN'
        industry = 'NaN'
        fullTimeEmployees = 'NaN'
    else:    
        sector = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['assetProfile']['sector']
        industry = json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['assetProfile']['industry']

        if 'fullTimeEmployees' in json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['assetProfile'].keys():  
            fullTimeEmployees = str(json_data['context']['dispatcher']['stores']['QuoteSummaryStore']['assetProfile']['fullTimeEmployees'])
        else:
            fullTimeEmployees = 'NaN'
    
    # Let's write the data into the file.
    f.write(longName.replace(',',' ') + ';' + symbol + ';' + sector + ';' + industry + ';' + fullTimeEmployees + '\n')
    
    count += 1
    print(count)
    
print('Finish')
f.close()
"""

## 4.2 Load New Data

In [None]:
# Load the new file
stock_profile = pd.read_csv('../input/warren-buffett-us-stock-companies/stock_profile.csv',sep=';')

# Drop Mutual Funds
stock_profile.drop(index=33 ,axis=0, inplace=True)
stock_profile.drop(index=41 ,axis=0, inplace=True)

Let's fill in the missing values by looking the companies on wikipedia.

* Barrick Gold Corporation (GOLD): 18421
* The Liberty SiriusXM Group (LSXMA): 6667
* The Liberty SiriusXM Group (LSXMK): 6667
* StoneCo Ltd. (STNE): 3000
* Visa Inc. (V): 19500
* Mean: 10851

In [None]:
stock_profile['Num_Employees'] = stock_profile['Num_Employees'].fillna(10851)
stock_profile

## 4.3 Sunburst Charts - Sectors and Industries

### Sunburst Chart - Sectors and Industries of Warren Buffett Portfolio 
The sunburst chart is ideal for displaying hierarchical data. Sunburst plots visualize hierarchical data spanning outwards radially from root to leaves. The root starts from the center and children are added to the outer rings.

The chart describe the relationship between 'Sector' and 'Industry'. You can click on the 'Sector' in the chart below to expand or contract it. 

* The Financial Service Sector is the biggest in Berkshire Hathaway portfolio. It include 13 companies. 
* The biggest industries are also in the financial service sector, which are Credit Services, Banks—Diversified and Banks—Regional.
* The second largest sector is Communication Services.

[Sunburst Charts in Python](https://plotly.com/python/sunburst-charts/)


In [None]:
fig = px.sunburst(stock_profile, path=['Sector', 'Industry','Symbol'], height=800)
fig.update_layout(title={
    'text': "Sectors and Industries of Warren Buffett Portfolio",
    'y':0.97,
    'x':0.5,
    'xanchor': 'center',
    'yanchor': 'top'},
    showlegend=False)
fig.show()

In [None]:
company_list2 = company_list[['Symbol', 'Value']]

# Drop SPY and VOO
company_list2.drop(index=33 ,axis=0, inplace=True)
company_list2.drop(index=41 ,axis=0, inplace=True)

company_list3 = pd.merge(company_list2,stock_profile,how='inner',on='Symbol')

company_list3.drop(columns='Name', inplace=True)

### Sunburst Chart - Sectors and Industries by Value of Warren Buffett Portfolio
Now we can see the sectors and industries in relation to the value of the companies. We can notice a big change in the importance of the sectors. 

The previous graph gave the impression that financial services was the largest sector, however, the technology sector is the largest due to the large investment that Berkshire Hathaway has in Apple.

In any case, the financial services sector continues to be quite important in the portfolio, occupying second place.

In [None]:
fig = px.sunburst(company_list3, path=['Sector', 'Industry','Symbol'], values='Value', height=800)
fig.update_layout(title={
    'text': "Sectors and Industries of Warren Buffett Portfolio",
    'y':0.97,
    'x':0.5,
    'xanchor': 'center',
    'yanchor': 'top'},
    showlegend=False)
fig.show()

<a id="ch5"></a>
# 5. Financial Charts
---
## 5.1 Line Charts

### Line and Scatter Plot Chart - Annual Total Returns (BRKA and GSPC)
In the line charts we can see the yearly cumulative returns from Berkshire Hathaway and the S&P500. The scatterplot shows the yearly return.

In [None]:
sns.set(style="whitegrid", font_scale=1)
fig, axes = plt.subplots(2, 2,figsize=(15,10))

sns.lineplot(data=yearly_cumulative_returns.loc['2011-12-31':'2020-12-31']['BRKA Return'], color='blue', lw=2, ax=axes[0,0])
sns.scatterplot(data=yearly_returns.loc['2011-12-31':'2020-12-31']['BRKA Return'], color='blue', ax=axes[0,1])

sns.lineplot(data=yearly_cumulative_returns.loc['2011-12-31':'2020-12-31']['GSPC Return'], color='orange', lw=2, ax=axes[1,0])
sns.scatterplot(data=yearly_returns.loc['2011-12-31':'2020-12-31']['GSPC Return'], color='orange', ax=axes[1,1])

sns.despine()

axes[0,0].set(xlabel='Years', ylabel='Cumulative Returns', title='Berkshire Hathaway Class A')
axes[0,1].set(xlabel='Years', ylabel='Returns', title='Berkshire Hathaway Class A')
axes[1,0].set(xlabel='Years', ylabel='Cumulative Returns', title='S&P 500')
axes[1,1].set(xlabel='Years', ylabel='Returns', title='S&P 500')

### Line Chart - Berkshire Hathaway vs S&P500
These charts show the Adj Close price over time of Berkshire Hathaway Class A and the S&P500. 

You can interact with the charts by clicking and dragging.

In [None]:
BRKA['Adj Close'].loc['2010-11-01':'2020-09-30'].iplot(fill=True,colors=['green'], ax=axes[0])
GSPC['Adj Close'].loc['2010-11-01':'2020-09-30'].iplot(fill=True,colors=['blue'], ax=axes[1])

## 5.2 Histograms Charts

### Histograms - Stocks Returns
We can also use plotly to show histograms. In this case we are showing the return of the stocks with the highest value in the portfolio from 2019 to 2020. You can interact with the charts by clicking and dragging.

* We can see that the behavior regarding the distribution in returns is quite similar between the 6 companies.
* In the majority the return is centered between -0.1 and 0.1.
* However, there are exceptions like KHC and AXP that reach 0.2

Documentation:[Histograms in Python](https://plotly.com/python/histograms/)

In [None]:
fig = make_subplots(rows=3, cols=2)

trace0 = go.Histogram(x=returns.loc['2019-01-01':'2020-12-31']['AAPL Return'], nbinsx=50, name="AAPL")
trace1 = go.Histogram(x=returns.loc['2019-01-01':'2020-12-31']['KO Return'], nbinsx=50, name="KO")
trace2 = go.Histogram(x=returns.loc['2019-01-01':'2020-12-31']['BAC Return'], nbinsx=50, name="BAC")
trace3 = go.Histogram(x=returns.loc['2019-01-01':'2020-12-31']['AXP Return'], nbinsx=50, name="AXP")
trace4 = go.Histogram(x=returns.loc['2019-01-01':'2020-12-31']['KHC Return'], nbinsx=50, name="KHC")
trace5 = go.Histogram(x=returns.loc['2019-01-01':'2020-12-31']['MCO Return'], nbinsx=50, name="MCO")

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 2)
fig.append_trace(trace4, 3, 1)
fig.append_trace(trace5, 3, 2)

fig.update_layout(title_text='Stocks Returns (2019 - 2020)')

fig.show()

## 5.3 Moving Averages Charts

### Simple Moving Averages - Amazon Stock Price
A [Simple Moving Average (SMA)](https://www.investopedia.com/terms/s/sma.asp) calculates the average of a selected range of prices, usually closing prices, by the number of periods in that range. Is a technical indicator that can aid in determining if an asset price will continue or if it will reverse a bull or bear trend.

Use `.ta_plot(study='sma')` to create a Simple Moving Averages plot of Amazon. You can interact with the charts by clicking and dragging.

* Berkshire started buying stock in the cloud-computing and e-commerce giant in the first quarter of 2019.
* The company bought 483,300 shares of Amazon at the time. But Berkshire owned 537,300 shares as of the end of its second quarter. 
* This was a very good investment, since at the end of 2019 the price of AMZN was approximately `$1,500` and now it is at `$3,300.`
* This means a return of 220% in almost one year, which is approximately `$967,140,000.`
* Amazon’s revenues increased from `$136` billion in 2016 to `$281` billion in 2019, mainly driven by the contribution of Retail revenue from the North America segment.
* An improvement in net income margin from 1.7% in 2016 to 4.1% in 2019 helped net income swell 137% over the period.

In [None]:
AMZN.loc['2010-01-01':'2020-12-31']['Adj Close'].ta_plot(study='sma', periods=[13,21,55])

## 5.4 Candlestick Charts

### Candlestick Chart - Bank of America Stock Price

Documentation: [Candlestick Charts in Python](https://plotly.com/python/candlestick-charts/)

The candlestick chart is a style of financial chart describing open, high, low and close for a given x coordinate (most likely time). The boxes represent the spread between the open and close values and the lines represent the spread between the low and high values. Sample points where the close value is higher (lower) then the open value are called increasing (decreasing). By default, increasing candles are drawn in green whereas decreasing are drawn in red.

You can learn more about Candlestick Charts [here.](https://www.investopedia.com/trading/candlestick-charting-what-is-it/#:~:text=in%20candlestick%20charts.-,Candlestick%20Components,close%20of%20that%20day's%20trading.)

* President Obama took office on Jan. 20, 2009. [(More information here)](https://www.investopedia.com/ask/answers/101314/where-was-dow-jones-when-obama-took-office.asp)
* The [subprime mortgage crisis](https://en.wikipedia.org/wiki/Subprime_mortgage_crisis) had a mayor part in the decline of prices.
* Markets had little confidence in the economy and the future was uncertain.
* The banking sector in general declining by 30%.
* Bank of America Corporation (BAC) dropped 29%.
* The S&P 500 and the Nasdaq took similar hits on inauguration day, dropping 5.3% and 5.8%, respectively.
* Warren Buffett bought a large number of shares of BAC in 2011. 

In [None]:
# Bank of America Candlestick Chart
fig = go.Figure(data=[go.Candlestick(x=BAC.index,
                open=BAC['Open'],
                high=BAC['High'],
                low=BAC['Low'], 
                close=BAC['Close'])
                ])

fig.update_layout(
    title='Bank of Amercia Stock Price',
    yaxis_title='BAC Stock',
    shapes = [dict(
        x0='2009-01-20', x1='2009-01-20', y0=0, y1=1, xref='x', yref='paper', line_width=2),
             dict(
        x0='2007-12-01', x1='2007-12-01', y0=0, y1=1, xref='x', yref='paper', line_width=2)],
    annotations=[dict(
        x='2009-01-20', y=0.95, xref='x', yref='paper',
        showarrow=False, xanchor='left', text='President Obama Took Office'), 
                 dict(
        x='2007-12-01', y=0.1, xref='x', yref='paper',
        showarrow=False, xanchor='right', text='Subprime Mortgage Crisis')]
)

fig.show()

## 5.5 Bollinger Band Charts

### Bollinger Band Chart - Apple Stock Price
A [Bollinger Band](https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/bollinger-bands#:~:text=Bollinger%20Bands%20are%20envelopes%20plotted,Period%20and%20Standard%20Deviations%2C%20StdDev.) is a technical analysis tool defined by a set of trendlines plotted two standard deviations (positively and negatively) away from a simple moving average (SMA) of a security's price, but which can be adjusted to user preferences.

* When the bands tighten during a period of low volatility, it raises the likelihood of a sharp price move in either direction.
* When the bands separate by an unusual large amount, volatility increases and any existing trend may be ending.
* Prices have a tendency to bounce within the bands' envelope, touching one band then moving to the other band. You can use these swings to help identify potential profit targets.

Use `.ta_plot(study='boll')` to create a Bollinger Band Plot for Apple.

* Berkshire bought its first 10 million Apple shares in May 2016.

In [None]:
AAPL.loc['2010-01-01':'2020-01-01']['Adj Close'].ta_plot(study='boll',periods=14, title='Bollinger Bands')

## 5.6 OHLC Charts

### OHLC Charts - Coca-Cola Stock Price
Documentation: [OHLC Charts in Python](https://plotly.com/python/ohlc-charts/)

The OHLC chart (for open, high, low and close) is a style of financial chart describing open, high, low and close values for a given x coordinate (most likely time). The tip of the lines represent the low and high values and the horizontal segments represent the open and close values. Sample points where the close value is higher (lower) then the open value are called increasing (decreasing). By default, increasing items are drawn in green whereas decreasing are drawn in red.

You can find more information [here.](https://www.investopedia.com/terms/o/ohlcchart.asp) 

<img src="http://www.saturn.network/blog/content/images/2019/01/ohcl.png" width="300" height="300"/>

* Warren Buffett bought more than `$1` billion of Coca-Cola (KO) shares in 1988, an amount equivalent to 6.2% of the company, making it the largest position in his portfolio at the time.
* It remains one of Berkshire Hathaway's biggest holdings today, as of October 2019, holding the number three spot.
* The stock market crash of 1987 had created attractive valuations, as all types of stocks were sold off with little regard to fundamentals. 
* After the stock market crash, Coca-Cola stock had been hit hard along with so many other companies.
* Buffett & Co. determined it was a good company, had great value, could withstand competition, and was poised to recover.

In [None]:
# Citigroup OHLC Chart
fig = go.Figure(data=go.Ohlc(x=KO.index,
                    open=KO['Open'],
                    high=KO['High'],
                    low=KO['Low'],
                    close=KO['Close']))

fig.update_layout(
    title='Coca-Cola Stock Price',
    yaxis_title='KO Stock',
    shapes = [dict(
        x0='2009-01-20', x1='2009-01-20', y0=0, y1=1, xref='x', yref='paper', line_width=2),
             dict(
        x0='2007-12-01', x1='2007-12-01', y0=0, y1=1, xref='x', yref='paper', line_width=2)],
    annotations=[dict(
        x='2009-01-20', y=0.95, xref='x', yref='paper',
        showarrow=False, xanchor='left', text='President Obama Took Office'), 
                 dict(
        x='2007-12-01', y=0.1, xref='x', yref='paper',
        showarrow=False, xanchor='right', text='Subprime Mortgage Crisis')]
)

fig.show()

<a id="ch6"></a>
# 6. Predictions and Conclusion
---
## 6.1 Investment Recommendations

<a id="ch7"></a>
# 7. References
---

* [Yahoo Finance](https://finance.yahoo.com/) | Profile Stock Data
* [SEC EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch.html) | Company Filings
* [NASDAQ](https://www.nasdaq.com/) | Historical Quotes
* [Financial Terms Dictionary](https://www.investopedia.com/financial-term-dictionary-4769738) | Comprehensive financial terms dictionary with over 13,000 finance and investment definitions.
* [Fidelity](https://fundresearch.fidelity.com/mutual-funds/summary/316390681) | Mutual Funds Examples

## Feedback
* **Your feedback is much appreciated**
* **<b><font color='green'>Please UPVOTE if you LIKE this notebook</font></b>**
* **Comment if you have any doubts or you found any errors in the notebook**