# Descriptive Statistics

Descriptive statistics describe, and summarize the features of a dataset found in a given study, presented in a summary that describes the data sample and measurements.

## yfinance Basic Usage

In [1]:
import yfinance as yf

In [11]:
# Create a Ticker object for Apple (AAPL)
ticker_aapl = yf.Ticker("AAPL")
# Get general information about Apple
# print(ticker_aapl.info)
# ticker_aapl.info['address1'] # print addressline 1
ticker_aapl.info['longBusinessSummary'] # print longBusinessSummary

'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. In addition, the company offers various services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV+, which offers exclusive original content; Apple Card, a co-branded credit card; and Apple P

In [12]:
# Download historical market data for the past year
data_aapl_1y = ticker_aapl.history(period="1y")
print(data_aapl_1y.head())

                                 Open        High         Low       Close  \
Date                                                                        
2023-09-12 00:00:00-04:00  178.576213  179.212954  173.929990  175.402451   
2023-09-13 00:00:00-04:00  175.611368  176.397355  173.094250  173.323090   
2023-09-14 00:00:00-04:00  173.114159  175.203474  172.696299  174.845306   
2023-09-15 00:00:00-04:00  175.581521  175.601423  172.935074  174.119003   
2023-09-18 00:00:00-04:00  175.581530  178.466775  175.273110  177.063950   

                              Volume  Dividends  Stock Splits  
Date                                                           
2023-09-12 00:00:00-04:00   90370200        0.0           0.0  
2023-09-13 00:00:00-04:00   84267900        0.0           0.0  
2023-09-14 00:00:00-04:00   60895800        0.0           0.0  
2023-09-15 00:00:00-04:00  109205100        0.0           0.0  
2023-09-18 00:00:00-04:00   67257600        0.0           0.0  


## Google Stock Price sample example

In [14]:
import numpy as np
import pandas as pd
from scipy.stats import skew, kurtosis

In [16]:
# let's import some data
df = yf.download("GOOG")
print(df.head()) # let's see what he have in here

[*********************100%***********************]  1 of 1 completed

                Open      High       Low     Close  Adj Close     Volume
Date                                                                    
2004-08-19  2.490664  2.591785  2.390042  2.499133   2.493011  897427216
2004-08-20  2.515820  2.716817  2.503118  2.697639   2.691030  458857488
2004-08-23  2.758411  2.826406  2.716070  2.724787   2.718112  366857939
2004-08-24  2.770615  2.779581  2.579581  2.611960   2.605561  306396159
2004-08-25  2.614201  2.689918  2.587302  2.640104   2.633636  184645512





In [19]:
# let's say we only want the Adjusted Close
# The adjusted closing price amends a stock's closing price to reflect that stock's value after accounting for any corporate actions
# The closing price is the raw price, which is just the cash value of the last transacted price before the market closes (investopedia.com)
df_close = yf.download("GOOG")["Adj Close"]
df_close.head()

[*********************100%***********************]  1 of 1 completed


Date
2004-08-19    2.493011
2004-08-20    2.691030
2004-08-23    2.718112
2004-08-24    2.605561
2004-08-25    2.633636
Name: Adj Close, dtype: float64

In [23]:
# same thing with taking only the variation
# computing the variation with pct_change(1)
# plus dropping missing values with dropna()
df_close_pct = yf.download("GOOG")["Adj Close"].pct_change(1).dropna() # different assets are put on same scale
df_close_pct

[*********************100%***********************]  1 of 1 completed


Date
2004-08-20    0.079430
2004-08-23    0.010064
2004-08-24   -0.041408
2004-08-25    0.010775
2004-08-26    0.018019
                ...   
2024-09-05    0.005006
2024-09-06   -0.040794
2024-09-09   -0.015731
2024-09-10    0.003143
2024-09-11    0.014266
Name: Adj Close, Length: 5049, dtype: float64

In [22]:
df_close_pct.head()

Date
2004-08-20    0.079430
2004-08-23    0.010064
2004-08-24   -0.041408
2004-08-25    0.010775
2004-08-26    0.018019
Name: Adj Close, dtype: float64

## Central Tendency Measure

### Mean

The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set.

In [32]:
# mean with numpy, working with df_close_pct from above
# axis specifies in which way you want to do the mean; only necessary if more than on column
# axis=0 -> we want to do the mean on the row of this dataframe
mean = np.mean(df_close_pct, axis=0) * 100 # multiplying by 100 to have the value in percentage
print(f"Daily mean: {'%.2f' % mean} %")

Daily mean: 0.10 %


In [36]:
# annualization of the mean return
annual_mean = mean * 252 # 252 is the number of days the market is open
print(f"Annual mean: {'%.2f' % annual_mean} %")

Annual mean: 25.19 %


In [35]:
# day mean return -> monthly mean return
monthly_mean = mean * 21 # 21 or 20 is the number of days the market is open per month
print(f"Monthly mean: {'%.2f' % monthly_mean} %")

Monthly mean: 2.10 %
