# Data Visualization Project

Open:FactSet Insights and Analysis team.  

The team mission is to educate, inspire, and empower FactSetters and clients to utilize Open:FactSet Data and Solutions.  

As part of the team you will be tasked with creating and communicating compelling stories for Marketplace content.  Secondarily, the team collective will need to develop best practices and methods for disseminating knowledge both internally and externally.  For the interview please prepare a 20-30 minute presentation covering:

1. For an audience of CTS Sales, create a presentation showcasing a FactSet content set using Python, R, or SQL.  The goal is to educate on potential applications, how a given technology is applied and how to pitch this to their client base. 
    - Python, Jupyter Notebook, Ondemand?, 
    - Unique Datasets - RBICs + SCG Data
    - Exploratory Data Analysis
    - Hans Rosling Chart


        
2. Present an idea of how the Insights and Analysis team can approach educating, inspiring, or empowering FactSetters and clients in FY19.  
    - Code-along programs
    - package together internet resources for people interested in learning
    - Newsletters + sharing of code, methodology, and data.  Make sure it's error-free documentation
    - Github for sharing and distributing across the firm
    - Get all teams involved in Machine Learning involved in an initiative to teach Data Science and Analysis techniques across the firm
    - Phase 1 team - Jupyter, Publish in Marketplace, Story-telling with EDA
    - Phase 2 team - d3.js - more interactive datavisualization tools
    - Conferences: Jupyter Conference, Tableau Conference, etc...
    - FactSet Surveys to collect unique content


Create a Hans Rosling Chart.

Hypothesis:
- Analyze the impact of ESG Data, Market Cap, and Revenue on S&P500 stocks against other companies in the same revere classification
- does the Rich get richer while the Poor get poorer?
- Hypothesize that the growth of FAANG has meant areas of growth for other companies in their Revere Sectors


Extract and Perform EDA
- Scatter Plot data points = Company
- X = **Revenue**, Growth Rate
- Y = # of Employees, Revenue, **GDPR**

- Time Series - plotted individually
- Size = Market Capitalization
- Color = Revere Classification


In [3]:
import matplotlib.pyplot as plt

In [2]:
# Sample Code from DataCamp Program

# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)

# Previous customizations
plt.xscale('log') 
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')

# Add grid() call
plt.grid(True)

# Show the plot
plt.show()


**Size**
- Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let's change this. Wouldn't it be nice if the size of the dots corresponds to the population?

- To accomplish this, there is a list pop loaded in your workspace. It contains population numbers for each country expressed in millions. You can see that this list is added to the scatter method, as the argument s, for size.

**Color**
 - The next step is making the plot more colorful! To do this, a list col has been created for you. It's a list with a color for each corresponding country, depending on the continent the country is part of.

- How did we make the list col you ask? The Gapminder data contains a list continent with the continent each country belongs to. A dictionary is constructed that maps continents onto colors:

In [None]:

dict = {
    'Asia':'red',
    'Europe':'green',
    'Africa':'blue',
    'Americas':'yellow',
    'Oceania':'black'
}

In [1]:
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd

In [14]:

data  = Path('.', 'assets', 'rbics_esg_data.csv')
hans_rosling = pd.read_csv(data, na_filter=True)

In [26]:
ati = hans_rosling.loc[:, 'Symbol'] == 'ATI'
hans_rosling.loc[ati, :]

Unnamed: 0,Symbol,Name,Date,mkt_val,rbics_econn,rbics_econ,rbics_sectn,rbics_sect,rbics_subsectn,rbics_subsect,rbics_indgrpn,rbics_indgrp,rbics_indn,rbics_ind,rbics_subindn,rbics_subind,msci_esg_env,msci_esg_gov,msci_esg_social
15,ATI,Allegheny Technologies Incorporated,12/31/2007,8777.059258,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,3.109999895,3.200000048,3.430000067
515,ATI,Allegheny Technologies Incorporated,12/31/2008,2484.859639,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,3.109999895,3.200000048,3.430000067
1016,ATI,Allegheny Technologies Incorporated,12/31/2009,4390.615121,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,3.079999924,3.200000048,3.430000067
1516,ATI,Allegheny Technologies Incorporated,12/31/2010,5437.563617,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,3.579999924,5.190000057,5.53000021
2016,ATI,Allegheny Technologies Incorporated,12/30/2011,5083.750454,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,2.650000095,5.269999981,6.400000095
2518,ATI,Allegheny Technologies Incorporated,12/31/2012,3260.632517,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,1.299999952,3.0,6.800000191
3018,ATI,Allegheny Technologies Incorporated,12/31/2013,3847.447117,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,1.299999952,3.0,4.400000095
3519,ATI,Allegheny Technologies Incorporated,12/31/2014,3779.87848,45,Non-Energy Materials,4515,Mining and Mineral Products,451510,Metal Products,45151020,Primary Metals Products,4515102035,Non-Ferrous Metal Products Manufacturing,451510000000.0,Mining and Mineral Products,0.600000024,4.0,4.599999905


In [23]:
hans_rosling.columns

Index(['Symbol', 'Name', 'Date', 'mkt_val', 'rbics_econn', 'rbics_econ',
       'rbics_sectn', 'rbics_sect', 'rbics_subsectn', 'rbics_subsect',
       'rbics_indgrpn', 'rbics_indgrp', 'rbics_indn', 'rbics_ind',
       'rbics_subindn', 'rbics_subind', 'msci_esg_env', 'msci_esg_gov',
       'msci_esg_social'],
      dtype='object')

In [25]:
# for ATI - Mapping all N/A Columns with last known data
hans_rosling.loc[ati,'rbics_econn'] = 45
hans_rosling.loc[ati,'rbics_econ'] = 'Non-Energy Materials'

hans_rosling.loc[ati,'rbics_sectn'] = 4515
hans_rosling.loc[ati,'rbics_sect'] = 'Mining and Mineral Products'

hans_rosling.loc[ati,'rbics_subsectn'] = 451510
hans_rosling.loc[ati,'rbics_subsect'] = 'Metal Products'

hans_rosling.loc[ati,'rbics_indgrpn'] = 45151020
hans_rosling.loc[ati,'rbics_indgrp'] = 'Primary Metals Products'

hans_rosling.loc[ati,'rbics_indn'] = 4515102035
hans_rosling.loc[ati,'rbics_ind'] = 'Non-Ferrous Metal Products Manufacturing'

hans_rosling.loc[ati,'rbics_subindn'] = 4.5151E+11
hans_rosling.loc[ati,'rbics_subind'] = 'Mining and Mineral Products'


In [40]:
#Searching for Blank Values
blank_values = hans_rosling.loc[:,'rbics_econn'] == '-'
hans_rosling.loc[blank_values,:].sort_values('Symbol')

Unnamed: 0,Symbol,Name,Date,mkt_val,rbics_econn,rbics_econ,rbics_sectn,rbics_sect,rbics_subsectn,rbics_subsect,rbics_indgrpn,rbics_indgrp,rbics_indn,rbics_ind,rbics_subindn,rbics_subind,msci_esg_env,msci_esg_gov,msci_esg_social
40,ABI,Applera Corp-Applied Biosystems,12/31/2007,5677.428688,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
22,ABKFQ,Ambac Financial Group Inc.,12/31/2007,2616.944093,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
3019,ALLE,Allegion PLC,12/31/2013,4243.502251,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
2019,ANRZQ,"Alpha Natural Resources, Inc.",12/30/2011,4490.514,-,-,-,-,-,-,-,-,-,-,-,-,4.289999962,5.860000134,4.269999981
4077,AVGO,Broadcom Inc.,12/31/2015,40099.01049,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
553,BAX,Baxter International Inc.,12/31/2008,33011.06251,-,-,-,-,-,-,-,-,-,-,-,-,7.510000229,7.840000153,7.340000153
58,BAX,Baxter International Inc.,12/31/2007,36782.67911,-,-,-,-,-,-,-,-,-,-,-,-,7.550000191,8.010000229,7.019999981
689,BEN,"Franklin Resources, Inc.",12/31/2008,14855.52426,-,-,-,-,-,-,-,-,-,-,-,-,2.480000019,1.809999943,3.089999914
1188,BEN,"Franklin Resources, Inc.",12/31/2009,24041.43727,-,-,-,-,-,-,-,-,-,-,-,-,3.430000067,3.069999933,3.140000105
194,BEN,"Franklin Resources, Inc.",12/31/2007,27432.19496,-,-,-,-,-,-,-,-,-,-,-,-,3.930000067,4.170000076,4.889999866
