Business Oriented Descriptive Statistics
===

Foreword:
---

To be written :3

Imports:
---

Below are the basic imports which we are going to use, please make note that in order for the charts to show up in the notebook we need to type the following line at the very beginning **"%matplotlib inline"**

In [1]:
%matplotlib inline
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from functools import reduce
from io import StringIO

Load Data:
---

We will use the **requests** library to fetch the data from Yahoo Fiance. The only thing we need is the URL which gets us the data we require, in this case the daily historical data for GE for the period between 01/04/2016 and 06/22/2016. Once we have the raw data we can get its text property and pass it to the **StringIO** method and fake the behavior of an actual csv file which is required by the **read_csv method** from the pandas library.

In [2]:
ex1_raw_data = requests.get('http://chart.finance.yahoo.com/table.csv?s=GE&a=0&b=4&c=2016&d=6&e=22&f=2016&g=d&ignore=.csv')
ex1_data = pd.read_csv(StringIO(ex1_raw_data.text),
                      sep=',',
                      encoding='latin1',
                      parse_dates=['Date'],
                      dayfirst=True,
                      index_col='Date').sort_index()

Average:
---

The **average** or **sample mean** (when the data comes from a sample) is the summation of the array's values divided by the number of observations (the size of the sample **n**)

$$\bar{X} = \frac{\sum_{i=1}^n X_i}n$$

First we have to get the value for **n**. We will use the built in function **len()** in comination with the DataFrame's index attribute passed as a parameter. For the sum of the values we will use the built-in **sum()** method or we could go all-out fancy mode and use a lambda function passed as a parameter to the reduce function which we **need to import from the functools package**.

In [3]:
n = len(ex1_data.index)
acum = sum(ex1_data['Adj Close'])
acum = reduce(lambda x, y: x + y, ex1_data['Adj Close'])

avg = acum/n
print('Mean without Numpy:', avg)

Mean without Numpy: 29.5182273143


In [4]:
print('Mean with Numpy:', np.mean(ex1_data['Adj Close']))


x, y = int(n/2 + (n % 2 - 1)), int(n/2 - (n % 2 - 1))

if x == y:
    print(np.mean(sorted(ex1_data['Adj Close'])[x]))
else:
    print(np.mean(sorted(ex1_data['Adj Close'])[x:y]))

print(np.median(ex1_data['Adj Close']))

Mean with Numpy: 29.5182273143
29.5431
29.5431
