## Centrality measures

Centrality measures can be used to predict (positive or negative) outcomes for a node.
Your task in this week’s assignment is to identify an interesting set of network data that is available on the web
(either through web scraping or web APIs) that could be used for analyzing and comparing centrality measures across nodes.

As an additional constraint, there should be at least one categorical variable available for each node
(such as “Male” or “Female”; “Republican”, “Democrat,” or “Undecided”, etc.)

In addition to identifying your data source, you should create a high level plan that describes how you would load the data for analysis,
and describe a hypothetical outcome that could be predicted from comparing degree centrality across categorical groups.
For this week’s assignment, you are not required to actually load or analyze the data.  Please see also Project 1 below.
You may work in a small group on the assignment.   You should post your document to GitHub by end of day on Sunday.

In [3]:
import yfinance as yf
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

## Dow Jones Indistrial Average Equity Index

We begin by scrapping the data from Dow Jones using the Yahoo Finance API
We proceed to create a ticker object that allows access to the Dow Jones  Companies
To begin our analysis we start with only apple to have a sense of the data

In [7]:
var1 = yf.Ticker('AAPL')

After we create the the first variable wich is a ticker object we can use it to access data inside APPL stock like the institutional holders

In [8]:
apple = var1.institutional_holders
apple

Unnamed: 0,Holder,Shares,Date Reported,% Out,Value
0,"Vanguard Group, Inc. (The)",1261261357,2021-12-30,0.0773,223962179162
1,Blackrock Inc.,1019810291,2021-12-30,0.0625,181087713372
2,"Berkshire Hathaway, Inc",887135554,2021-12-30,0.0544,157528660323
3,State Street Corporation,633115246,2021-12-30,0.0388,112422274232
4,"FMR, LLC",352204129,2021-12-30,0.0216,62540887186
5,"Geode Capital Management, LLC",264351901,2021-12-30,0.0162,46940967060
6,Price (T.Rowe) Associates Inc,223148792,2021-12-30,0.0137,39624530995
7,Northern Trust Corporation,190876014,2021-12-30,0.0117,33893853805
8,Norges Bank Investment Management,167580974,2020-12-30,0.0103,22236319440
9,Bank Of New York Mellon Corporation,144695935,2021-12-30,0.0089,25693657177


We want to add a column to this data frame that contains the ticker symbol of Apple - we use this method to create a mapping back to Apple and the rest of the
Companies we will be analyzing shortly. Also the prompt requested to include categorical data to our analysis

In [9]:
apple['comp'] = var1.ticker

In [10]:
apple

Unnamed: 0,Holder,Shares,Date Reported,% Out,Value,comp
0,"Vanguard Group, Inc. (The)",1261261357,2021-12-30,0.0773,223962179162,AAPL
1,Blackrock Inc.,1019810291,2021-12-30,0.0625,181087713372,AAPL
2,"Berkshire Hathaway, Inc",887135554,2021-12-30,0.0544,157528660323,AAPL
3,State Street Corporation,633115246,2021-12-30,0.0388,112422274232,AAPL
4,"FMR, LLC",352204129,2021-12-30,0.0216,62540887186,AAPL
5,"Geode Capital Management, LLC",264351901,2021-12-30,0.0162,46940967060,AAPL
6,Price (T.Rowe) Associates Inc,223148792,2021-12-30,0.0137,39624530995,AAPL
7,Northern Trust Corporation,190876014,2021-12-30,0.0117,33893853805,AAPL
8,Norges Bank Investment Management,167580974,2020-12-30,0.0103,22236319440,AAPL
9,Bank Of New York Mellon Corporation,144695935,2021-12-30,0.0089,25693657177,AAPL


## All ticker symbols - DOW Jones

![wiki_table](wiki_table.png)


In [4]:
tickers = pd.read_html('https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average')[1]

For this Particular analysis we are only interested in the symbol

In [5]:
tickers

Unnamed: 0,Company,Exchange,Symbol,Industry,Date added,Notes,Index weighting
0,3M,NYSE,MMM,Conglomerate,1976-08-09,As Minnesota Mining and Manufacturing,3.02%
1,American Express,NYSE,AXP,Financial services,1982-08-30,,3.60%
2,Amgen,NASDAQ,AMGN,Biopharmaceutical,2020-08-31,,4.48%
3,Apple,NASDAQ,AAPL,Information technology,2015-03-19,,3.25%
4,Boeing,NYSE,BA,Aerospace and defense,1987-03-12,,3.96%
5,Caterpillar,NYSE,CAT,Construction and Mining,1991-05-06,,3.74%
6,Chevron,NYSE,CVX,Petroleum industry,2008-02-19,Also 1930-07-18 to 1999-11-01,2.53%
7,Cisco,NASDAQ,CSCO,Information technology,2009-06-08,,1.03%
8,Coca-Cola,NYSE,KO,Soft Drink,1987-03-12,Also 1932-05-26 to 1935-11-20,1.15%
9,Disney,NYSE,DIS,Broadcasting and entertainment,1991-05-06,,2.65%
