In [3]:
from IPython.core.interactiveshell import InteractiveShell
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
%matplotlib inline 
import matplotlib.pyplot as plt
import pandas as pd


# Data Analysis With Python for Excel Users II - Perform Excel Function in Python

<!-- PELICAN_BEGIN_SUMMARY -->

When we computed the mean, minimum and maximum of the data in Python, we are employing a technique to verify the data distribution.
This can be very effective when we need to quickly verify datasets for financial analysis.

<!-- PELICAN_END_SUMMARY -->

**Apply Python Codes to Verify the Accuracy of the Datasets**
- Sum up columns or rows in python and export the file to Excel
- Transpose columns
- Get the maximum, mean, average value in the datasets 
- Filter data to isolate invalid data 

**Example - Dow Jone Index 30 Stocks' Dividends**

In [15]:
df = pd.read_excel('data/Dow.xlsx')
df.head(3)

Unnamed: 0,Stock Symbol,Company Name,Dividend Yield,Closing Price,Annualized Dividend,Ex-Div Date,Pay Date,50-day moving average,200-day moving average
0,WMT,Wal-Mart Stores,0.0242,85.91,2.08,2018-12-06,2019-01-02,82.35,78.68
1,V,Visa,0.007,119.78,0.84,2018-02-15,2018-03-06,106.44,98.9627
2,VZ,Verizon,0.0496,47.58,2.36,2018-04-09,2018-05-01,48.73,47.03


**Which Stock Has the Maximum Dividend Yield?**

In [16]:
max_div_stock=df.iloc[df["Dividend Yield"].idxmax()]
max_div_stock
print("The stock with the max dividend yield is %s with yield %s" % (max_div_stock['Company Name'],max_div_stock['Dividend Yield']))

The stock with the max dividend yield is Verizon with yield 0.0496


**Which Stock Has the Minimum Dividend Yield?**

In [17]:
min_div_stock=df.iloc[df["Dividend Yield"].idxmin()]
min_div_stock
print("The stock with the minimum dividend yield is %s with yield %s" % (min_div_stock['Company Name'],min_div_stock['Dividend Yield']))

The stock with the minimum dividend yield is Visa with yield 0.007


**Which Stock Price is Currently Below 50-day Moving Average?**
- Calculate the difference between 50-day moving average and the closing price 
- Use min() to show the the stock with the smallest 50-day average
- Filter the difference below zero, this showed a list of stocks below 50-day average

In [18]:
df["Dif 50-day"] = df["50-day moving average"] - df["Closing Price"]  
df["Dif 50-day"].min()  
df[df["Dif 50-day"] <=0].head(3)

Unnamed: 0,Stock Symbol,Company Name,Dividend Yield,Closing Price,Annualized Dividend,Ex-Div Date,Pay Date,50-day moving average,200-day moving average,Dif 50-day
0,WMT,Wal-Mart Stores,0.0242,85.91,2.08,2018-12-06,2019-01-02,82.35,78.68,-3.56
1,V,Visa,0.007,119.78,0.84,2018-02-15,2018-03-06,106.44,98.9627,-13.34
3,UNH,UnitedHealth Group,0.0135,221.9,3.0,2018-03-08,2018-03-20,198.85,187.48,-23.05


**Insert Column and Assign Value in Python Instead of in Excel**
- Test Insert new column after columns 2 with 5% Commission Rate


In [19]:
df.insert(2,"Com Rate %", 5)
df.head(3)

Unnamed: 0,Stock Symbol,Company Name,Com Rate %,Dividend Yield,Closing Price,Annualized Dividend,Ex-Div Date,Pay Date,50-day moving average,200-day moving average,Dif 50-day
0,WMT,Wal-Mart Stores,5,0.0242,85.91,2.08,2018-12-06,2019-01-02,82.35,78.68,-3.56
1,V,Visa,5,0.007,119.78,0.84,2018-02-15,2018-03-06,106.44,98.9627,-13.34
2,VZ,Verizon,5,0.0496,47.58,2.36,2018-04-09,2018-05-01,48.73,47.03,1.15


**Save the Calculation and the New Column to Excel**

In [23]:
df.to_excel('data/Dow.xlsx')
df.tail(3)   

Unnamed: 0,Stock Symbol,Company Name,Com Rate %,Dividend Yield,Closing Price,Annualized Dividend,Ex-Div Date,Pay Date,50-day moving average,200-day moving average,Dif 50-day
27,AAPL,Apple Inc.,5,0.0146,172.44,2.52,2018-02-09,2018-02-15,156.641,152.61,-15.799
28,AXP,American Express,5,0.0153,91.6,1.4,2018-04-05,2018-05-10,90.34,84.1304,-1.26
29,MMM,3M,5,0.0254,214.33,5.44,2018-02-15,2018-03-12,216.33,206.69,2.0


 *two new columns created in python is saved in the Excel file*

**Count Rows and Columns When Dealing With Huge Dataset**

In [11]:
Count_Row=df.shape[0] 
Count_Col=df.shape[1] 
print("There are %s rows in this file" % (Count_Row))
print("There are %s columns in this file" % (Count_Col))

There are 30 rows in this file
There are 11 columns in this file


**Sum up total for specific columns**

In [12]:
sum_row=df[["Dividend Yield","Closing Price"]].sum()
sum_row

Dividend Yield       0.7648
Closing Price     3513.1300
dtype: float64

**Transpose columns - This code is very helpful for viewing the output**

In [13]:
df_sum=pd.DataFrame(data=sum_row).T
df_sum 

Unnamed: 0,Dividend Yield,Closing Price
0,0.7648,3513.13
