# Generic Functions

In [None]:
from pandas import Series
from pandas import DataFrame
import pandas as pd
from numpy.random import randint
import numpy as np

## `.head()` and .`tail()`

In [None]:
s1 = Series( randint(1, 11, 25))
df1 = DataFrame(randint(1, 11, (25, 5)), 
                columns = list('ABCDE'),
                index = list('abcdefghijklmnopqrstuvwxy'))

In [None]:
s1.head()

In [None]:
df1.head()

In [None]:
s1.tail()


In [None]:
df1.tail()

## Series Functions

### Attributes

• `shape`

In [None]:
s1.shape

• Number of Dimensions

In [None]:
s1.ndim

• `index`

In [None]:
s1.index

In [None]:
s1.index.values

- `np.asarray(s1)` returns the same as `s1.index.values`

• `values`

In [None]:
s1.values

### Arithmetic Operators

In [None]:
s2 = s1 + 10
s2.head()

- **NOTICE**: The 10 is added to each element

• Matching Indexes

In [None]:
s3 = Series([1, 2, 3, 4, 5], index = list('abcde'))
s4 = s1.head()
print(s3)
print(s4)

In [None]:
s3 + s4

- Addition is done on matching index values

In [None]:
s5 = Series([1, 2, 3, 4, 5], index = [1, 3, 5, 9, 12])

In [None]:
s4 + s5

• Matching Indexes with `fill_value`

In [None]:
s4.add(s5, fill_value = 100)

- `s4.add()` allows for a `fill_value` for `NaN` fields 

- Subtraction (`sub`), multiplication (`mul`), and division (`div`) all work the same

In [None]:
s4.sub(s5, fill_value = 100)

### Boolean Operations

In [None]:
print(s4)
print(s5)

- **Using operators and only compare identically-labeled Series objects**
- Run the next line to see


In [None]:
s4 > s5

In [None]:
s4.gt(s5, fill_value=100)

### Boolean Reduction

In [None]:
s4 > 5

In [None]:
(s4 > 5).all()

In [None]:
(s4 > 5).any()

In [None]:
s4.empty

• Truth Value of a `Series` is Ambiguous

- Uncomment the last two lines of the next cell and execute to make sure you know how

In [None]:
s4

In [None]:
# is s4 True or False?
if s4:
    pass

• The Problem of Equality

In [None]:
s6 = Series([1, 2, np.nan])
print(s6)

In [None]:
((s6 + s6) == (s6 * 2)).all()

In [None]:
(s6 + s6) == (s6 * 2)

- **NOTE:** `np.nan == np.nan` compares as False
- **NOTE:** `np.nan` with any comparison operator is False

In [None]:
(s6 + s6).equals(s6 * 2)

- `.equals()` compares `np.nan == np.nan` as `True` if they are in corresponding locations

### Computational Tools

• `pct_change`
- Computes the percentage change from the immediately previous row by default

In [None]:
s7 = Series(np.random.randn(10))
print(s7)

In [None]:
print(s7.pct_change())

### Descriptive Statistics

In [None]:
print(s5)
print(s5.mean())
print(s5.sum())
print(s5.cumsum())

### Other Common Series Functions

![series_functions](img/series_functions.png)

## DataFrame Functions

In [None]:
df1 = DataFrame({'A':[1, 2, 3,], 'B':[4, 5, 6], 'C':[7, 8, 9], 'D':[10, 11, 12]},
                index = list('abc'))
print(df1)

In [None]:
df2 = DataFrame({'A':[1, 2, 3,], 'B':[4, np.nan, 6], 'C':[7, 8, 9], 
                 'D':[10, 11, 12]}, index = list('abc'))
print(df2)

###  Attributes

• `shape`

In [None]:
df1.shape

• Number of Dimensions

In [None]:
df1.ndim

• `index`

In [None]:
df1.index

In [None]:
df1.index.values

• `columns`

In [None]:
df1.columns

In [None]:
df1.columns.values

* `values`

In [None]:
df1.values

In [None]:
np.asarray(df1)

### Arithmetic Operators

• Addition of DataFrames with the Same Shape

In [None]:
df3 = df1 + df2
print(df1)
print(df2)
print(df3)

• Addition with `fill_value`

In [None]:
df4 = df1.add(df2, fill_value = 100)
print(df4)

• Addition of a Constant

In [None]:
df5 = df3 + 10
print(df5)

**What Is Happening?**

- The 10 is "broadcast" across the columns to make a row the same size 
  as a row
- The broadcast row is broadcast down the columns to make a 2D array with the
  same shape as the array the 10 is to be added to
- Element by element addition is done

### Addition of a Series I

In [None]:
print(df1)
s1 = Series([1, 2, 3, 4], index = list('ABCD'))
print(s1)

In [None]:
df6 = df1 + s1
print(df6)

**What Is Happening?**

- The Series is converted into a DataFrame with a shape of (1,4)
- The row of the DataFrame is broadcast down until the number of rows 
  is the same as the df1 being added to (3,4)
- Addition takes place between matching elements

• Addition of a Series II

In [None]:
print(df1)
s2 = Series([1, 2, 3], index = list('ABC'))
print(s2)

In [None]:
df7 = df1 + s2
print(df7)

**What Is Happening?**

- The Series is converted into a DataFrame with a shape of (1,4) with 
  the additional column being added and each value set to `np.nan`
- The row of the DataFrame is broadcast down until the number of rows is 
  the same as the df1 being added to (3,4)
- Addition takes place between matching elements

• Addition of a Series with `fill_value` is not supported!

• Addition of a Series III

In [None]:
print(df1)
s3 = Series([1, 2, 3], index = list('abc'))
print(s3)

In [None]:
df9 = df1 + s3
print(df9)

**What Is Happening?**

- The Series is converted into a DataFrame with a shape of (1,3) with the
  additional columns being added and each value set to `np.nan`. 
  The series now looks like:

       array(NaN, NaN, NaN, NaN, 1, 2, 3)
       
- The row of the DataFrame is broadcast down until the number of rows 
  is the same as the df1 being added 
- The `df1` DataFrame is extended so it looks like this:

       array([[1, 4, 7, 10, NaN, NaN, NaN], 
              [2, 5, 8, 11, NaN, NaN, NaN],
              [3, 6, 9, 12, NaN, NaN, NaN]])
       
- Addition takes place between matching elements which is all `np.nan` 

• Addition of a Series III Upon Different Axis

In [None]:
print(df1)
s3 = Series([1, 2, 3], index = list('abc'))
print(s3)

In [None]:
df10 = df1.add(s3, axis = 0)
print(df10)

**What Is Happening?**

- The change is that the Series is to match along the row axis (axis=0), which it does 
- The row is broadcast along the columns until the shape is correct
- Addition takes place between matching elements

- `sub`,`mul`,`div` all work the same way

### Boolean Operators

• DataFrames of the Same Shape

In [None]:
df12 = DataFrame(randint(1, 11, (3, 4)))
print(df12,'\n')

In [None]:
df13 = DataFrame(randint(1, 11, (3, 4)))
print(df13,"\n")
print(df12 > df13)

In [None]:
print((df12 > df13).all())

• Series Against DataFrame

In [None]:
s4 = Series([1, 2, 3])
print(s4)
print(df13)
print(s4 > df13)

In [None]:
s5 = Series([1, 2, 3], index = list('abc'))
print(s5)
print(df13)
print(s5 > df13)
print(df13.le(s5, axis = 0))

- The `.gt()` and the rest of the Boolean operators work as you would expect with broadcasting and extension

### Descriptive Statistics

* `mean`

In [None]:
print(df1)
print(df1.mean(axis = 0))
print(df1.mean(axis = 1))

* `sum`

In [None]:
df14 = DataFrame(randint(1, 11, (4, 5)), 
                 columns = list('ABCDE'),
                 index = list('abcd'))
print(df14)
print(df14.sum(axis = 0))
print(df14.sum(axis = 1))

### Other Common `DataFrame` Functions

![series_functions](img/series_functions.png)

# End of Notebook