# Basic Statistics
## Practical Example: Auto.csv

### 1. Calculating a mean with Numpy
Import pandas and numpy

In [2]:
import pandas as pd
import numpy as np

Load our data using pandas and change origin numbers to category.

In [3]:
df_auto = pd.read_csv("Auto.csv", na_values='?')
df_auto.replace({"origin": {1:"American", 2:"European", 3:"Japanese"}}, inplace=True)

Calculate the mean weight of all cars in the dataset:

In [4]:
np.mean(df_auto["weight"])

np.float64(2970.2619647355164)

Calculate the mean weight of all cars from Europe

In [5]:
np.mean(df_auto[df_auto["origin"]=="European"]["weight"])

np.float64(2423.3)

### 2. Calculating a median with Numpy
Calculate the median weight of all cars in the dataset

In [6]:
np.median(df_auto["weight"])

np.float64(2800.0)

Calculate the median weight of all cars from Japan and America

In [7]:
median_weight_jap = np.median(df_auto[df_auto["origin"]=="Japanese"]["weight"])
median_weight_ame = np.median(df_auto[df_auto["origin"]=="American"]["weight"])

Compare the median weights and display the result 

In [21]:
if median_weight_jap > median_weight_ame:
    comparison = "larger than"
elif median_weight_jap < median_weight_ame:
    comparison = "smaller than"
else:
    comparison = "equal to"
# Recommended to use f string method(modern)
print(f"The median weight of Japanese cars ({median_weight_jap}) is {comparison} the median weight of American cars ({median_weight_ame})")

The median weight of Japanese cars (2155.0) is smaller than the median weight of American cars (3372.5)


### 3. Calculating quartiles with Numpy
Calculate using pandas `describe()` function:

In [9]:
df_auto["weight"].describe()[4:7]

25%    2223.0
50%    2800.0
75%    3609.0
Name: weight, dtype: float64

Calculate using numpy and save the results to separate variables:

In [10]:
weight_q_25 = np.quantile(df_auto["weight"], q=0.25)
weight_q_50 = np.quantile(df_auto["weight"], q=0.50)
weight_q_75 = np.quantile(df_auto["weight"], q=0.75)

Display a statement with the mean and the interquartile range

In [11]:
"The median weight of a car (N = {}) is {} kg. The middle 50% of cars fall between {} kg and {} kg.".format(
    len(df_auto), weight_q_50, weight_q_25, weight_q_75)

'The median weight of a car (N = 397) is 2800.0 kg. The middle 50% of cars fall between 2223.0 kg and 3609.0 kg.'

### 4. Calculating Variance and Standard Deviation with Numpy
Calculate Variance and Standard Deviation for weight of all cars:

In [12]:
weight_std = np.std(df_auto["weight"])
weight_var = np.var(df_auto["weight"])
"The standard deviation and variance of the weight for all cars are {:.2f} and {:.2f}, respectively.".format(
    weight_std, weight_var)

'The standard deviation and variance of the weight for all cars are 846.84 and 717130.46, respectively.'

Calculate Variance and Standard Deviation for weight of Japanese cars:

In [13]:
weight_jap_std = np.std(df_auto[df_auto["origin"]=="Japanese"]["weight"])
weight_jap_var = np.var(df_auto[df_auto["origin"]=="Japanese"]["weight"])
"The standard deviation and variance of the weight for Japanese cars are {:.2f} and {:.2f}, respectively.".format(
    weight_jap_std, weight_jap_var)

'The standard deviation and variance of the weight for Japanese cars are 318.46 and 101418.25, respectively.'

Calculate Variance and Standard Deviation for weight of European cars:

In [14]:
weight_eur_std = np.std(df_auto[df_auto["origin"]=="European"]["weight"])
weight_eur_var = np.var(df_auto[df_auto["origin"]=="European"]["weight"])
"The standard deviation and variance of the weight for European cars are {:.2f} and {:.2f}, respectively.".format(
    weight_eur_std, weight_eur_var)

'The standard deviation and variance of the weight for European cars are 486.53 and 236711.72, respectively.'

### 5. Calculating Standard Errors with Numpy
Create a function to calculate the standard error

In [15]:
stderr = lambda x: np.sqrt(np.var(x)/x.shape[0])

Calculate the standard error with our new function

In [16]:
stderr(df_auto["weight"])

np.float64(42.5014582752255)

Display the 95% confidence interval of the mean

In [17]:
"Confidence interval for mean weight of cars: 95% CI [{:.2f} kg, {:.2f} kg]".format(
    np.mean(df_auto["weight"])-stderr(df_auto["weight"])*1.96,
    np.mean(df_auto["weight"])+stderr(df_auto["weight"])*1.96)

'Confidence interval for mean weight of cars: 95% CI [2886.96 kg, 3053.56 kg]'