<a href="https://colab.research.google.com/github/stevenkhwun/P4DS/blob/main/Chp02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Built-In Data Structures, Functions, and Files

## Data Structures and Sequences

### Tuple

A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed.

In [28]:
# Create a tuple (with parentheses)
tup1 = (4, 5, 6)
tup1

(4, 5, 6)

In [29]:
# Create a tuple (without parentheses)
tup2 = 7, 8, 9
tup2

(7, 8, 9)

#### Converting any sequence or iterator to a tuple by invoking `tuple`

In [30]:
# Convert a list into a tuple
tuple([4, 0, 2])

(4, 0, 2)

In [31]:
# Convert a string into a tuple
tup3 = tuple('string')
tup3

('s', 't', 'r', 'i', 'n', 'g')

#### Accessing elements by `[]`

In [32]:
# Accessing elements of tuple
tup3[0]

's'

#### Complicated tuples

In [33]:
# Create complicated tuples by enclosing the values in parentheses
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [34]:
# Accessing the value of a complicated tuple
nested_tup[0]

(4, 5, 6)

#### Mutable elements in a tuple

While the objects stored in a tuple may be mutable themselves, once the tuple is created it's not possible to modify which object is stored in each slot:

In [35]:
# Creating a tuple with different type of objects
tup4 = ('foo'), [1, 2], (True)
tup4

('foo', [1, 2], True)

In [36]:
# Another way to create the same tuple
tup5 =  ('foo', [1, 2], True)
tup5

('foo', [1, 2], True)

In [37]:
# Checking equivalence of the tuples
tup4 == tup5

True

In [38]:
# Elements in a tuple cannot be modified
tup4[2] = False

TypeError: ignored

In [39]:
# Modifying an mutable object in a tuple
tup4[1].append(3)
tup4

('foo', [1, 2, 3], True)

#### Concatenating tuples using the `+` operator

In [40]:
# Concatentating tuples
# Note the end , is needed if a tuple contain only one 'string' element
tup6 = (4, None, 'foo') + (6, 0)
tup6

(4, None, 'foo', 6, 0)

In [41]:
# Creating tuple with only one string
# Note the end , is needed if a tuple contain only one 'string' element
k = ('bar',)
print(k)
type(k)

('bar',)


tuple

In [42]:
# Concatentating tuples
tup6 + k

(4, None, 'foo', 6, 0, 'bar')

#### Multiplying a tuple by an integer

In [43]:
# Multiplying a tuple
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

# Basic descriptive statistics

The following demonstrates calculation of some common descriptive statistics, which includes mean, trimmed mean, weighted mean, weighted median, sample standard deviation, interquartile range (IQR) and median absolute deviation from the median (MAD).

We import the data as a pandas dataframe as the pandas dataframe methods, that is the `.method()`, can easily provide the mean, median, sample standard deviation and quantiles.

For trimmed mean, we need to use the `trim_mean` function in `scipy.stats`. For weighted mean, we use `average` function in `NumPy`. For weighted median, we use the specialized package `wquantiles`. And for MAD, we need the `robust` module in the package `statsmodels`.

Firstly, we need to install the `wquantiles` package as this is not included in the base Colab environment.

In [44]:
# Install the package "wquantiles"
!pip install wquantiles

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


We now import the necessary packages:

In [45]:
# Import necessary packages
import pandas as pd
import numpy as np
import scipy.stats
import wquantiles
from statsmodels import robust

We now load the data as a pandas dataframe:

In [46]:
# Load the dataset as pandas dataframe
link = "https://raw.githubusercontent.com/stevenkhwun/P4DS/main/Data/state.csv"
state = pd.read_csv(link)
state.head()

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA


## Mean

In [47]:
# Mean by pandas dataframe method
state['Population'].mean()

6162876.3

## Trimmed mean

In [48]:
# Trimmed mean using the scipy.stats package
scipy.stats.trim_mean(state['Population'], 0.1)

4783697.125

## Median

In [49]:
# Median by pandas dataframe method
state['Population'].median()

4436369.5

## Weighted mean

In [50]:
# Weighted mean by average function in NumPy
np.average(state['Murder.Rate'], weights=state['Population'])

4.445833981123393

## Weighted median

In [51]:
# Weighted median by median function in wquantiles
wquantiles.median(state['Murder.Rate'], weights=state['Population'])

4.4

## Standard deviation

Note that the result is a sample standard deviation.

In [52]:
# Sample standard deviation by pandas datafram method
state['Population'].std()

6848235.347401142

In [53]:
data = [2, 9, 12, 19, 86]
datadf = pd.DataFrame (data)
datadf.std()

0    34.311806
dtype: float64

## Interquartile range (IQR)

In [54]:
# Interquartile range (IQR) by pandas dataframe method
state['Population'].quantile(0.75) - state['Population'].quantile(0.25)

4847308.0

## Absolute deviation from the median (MAD)

In [55]:
# MAD by robust function in statsmodels package
robust.mad(state['Population'])

3849876.1459979336