# <span style="color:#130654; font-family: Helvetica; font-size: 200%; font-weight:700"> Pandas | <span style="font-size: 50%; font-weight:300">Functionality</span>

To use pandas in python import it first by using the following command:

In [1]:
# import pandas
import pandas as pd

# import other libraries here
import numpy as np

<br>

## <span style="color:#130654">Basic Functions</span>

These are the basic functions in `pandas` for working with `series` and `dataframe`:

| Functionality | Series                          | DataFerame        |
| :-----------: | --------------------------------| ------------------|
| **T**         | <span style="color:red">Does nothing for series.</span> | Transposes rows and columns. |
| **axes**      | Returns a list of the row axis labels. | Returns a list with the row axis labels and column axis labels as the only members. |
| **dtypes**    | Returns the dtype/dtypes of series.| Returns the dtypes in this object.                           |
| **empty**     | Returns True if series is empty.| True if NDFrame is entirely empty [no items]; if any of the axes are of length 0. |
| **ndim**      | Returns the number of dimensions of the underlying data, by definition 1. | Number of axes / array dimensions. |
| **shape**     | <span style="color:red">Return number of rows.</span> | Returns a tuple representing the dimensionality of the DataFrame. |
| **size**      | <span style="color:red">Return number of rows.</span>  | Number of elements in the NDFrame.|
| **values**    | Returns the Series as ndarray.| Numpy representation of NDFrame.|
| **head()**    | Returns the first n rows.| Returns the first n rows.|
| **tail()**    | Returns the last n rows. | Returns last n rows.|

*Examples:*

Let's start with creating a `series` and a `dataframe`:

In [2]:
# creating a series
series = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print("Created series is:\n", series, "\n")

# creating a dataframe
data = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
        'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
        'three' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])}

df = pd.DataFrame(data, dtype="int64")
print("Created dataframe is:\n", df)

Created series is:
 a    1
b    2
c    3
d    4
e    5
dtype: int64 

Created dataframe is:
    one  two  three
a  1.0  1.0      1
b  2.0  2.0      2
c  3.0  3.0      3
d  NaN  4.0      4
e  NaN  NaN      5


Functions common for both Series and DataFrame:tail

In [3]:
# 1. axes

s = series.axes
print("Series axes:\n", s, "\n")

d = df.axes
print("DataFrame axes:\n", d)

Series axes:
 [Index(['a', 'b', 'c', 'd', 'e'], dtype='object')] 

DataFrame axes:
 [Index(['a', 'b', 'c', 'd', 'e'], dtype='object'), Index(['one', 'two', 'three'], dtype='object')]


In [4]:
# 2. dtypes
# for series both `dtype` and `dtypes` will work, for dataframe only `dtypes` work

s = series.dtypes
print("Series dtype:\n", s, "\n")

d = df.dtypes
print("DataFrame dtype:\n", d)

Series dtype:
 int64 

DataFrame dtype:
 one      float64
two      float64
three      int64
dtype: object


##### Common dtypes in Pandas

|**Pandas dtype**|Python type|NumPy type|Usage|
|:-------------------:|-------------------|-------------------|-------------------|
|**object**|str or mixed|string_, unicode_, mixed types|Text or mixed numeric and non-numeric values|
|**int64**|int|int_, int8, int16, int32, int64, uint8, uint16, uint32, uint64|Integer numbers|
|**float64**|float|float_, float16, float32, float64|Floating point numbers|
|**bool**|bool|bool_|True/False values|
|**datetime64**|<span style="color:red">NA</span>|datetime64[ns]|Date and time values|
|**timedelta[ns]**|<span style="color:red">NA</span>|<span style="color:red">NA</span>|Differences between two datetimes|
|**category**|<span style="color:red">NA</span>|<span style="color:red">NA</span>|Finite list of text values|

In [5]:
# 3. empty

s = series.empty
print("Series empty:\n", s, "\n")

d = df.empty
print("DataFrame empty:\n", d)

Series empty:
 False 

DataFrame empty:
 False


In [6]:
# 4. ndim

s = series.ndim
print("Series ndim:\n", s, "\n")

d = df.ndim
print("DataFrame ndim:\n", d)

Series ndim:
 1 

DataFrame ndim:
 2


In [7]:
# 5. shape
# shape is represented as (# of rows, # of columns)
# series doesn't have columns

s = series.shape
print("Series shape:\n", s, "\n")

d = df.shape
print("DataFrame shape:\n", d)

Series shape:
 (5,) 

DataFrame shape:
 (5, 3)


In [8]:
# 6. size
# size is represented as (# of rows x # of columns)
# series doesn't have columns


s = series.size
print("Series size:\n", s, "\n")

d = df.size
print("DataFrame size:\n", d)

Series size:
 5 

DataFrame size:
 15


In [9]:
# 7. values

s = series.values
print("Series values:\n", s, "\n")

d = df.values
print("DataFrame values:\n", d)

Series values:
 [1 2 3 4 5] 

DataFrame values:
 [[ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 3.  3.  3.]
 [nan  4.  4.]
 [nan nan  5.]]


In [10]:
# 8. head()

s = series.head(2)
print("Series head:\n", s, "\n")

d = df.head(2)
print("DataFrame head:\n", d)

Series head:
 a    1
b    2
dtype: int64 

DataFrame head:
    one  two  three
a  1.0  1.0      1
b  2.0  2.0      2


In [11]:
# 9. tail()

s = series.tail(2)
print("Series head:\n", s, "\n")

d = df.tail(2)
print("DataFrame head:\n", d)

Series head:
 d    4
e    5
dtype: int64 

DataFrame head:
    one  two  three
d  NaN  4.0      4
e  NaN  NaN      5


In [12]:
# 10. T or Transpose

s = series.T
print("Series Transpose:\n", s, "\n")

d = df.T
print("DataFrame Transpose:\n", d)

Series Transpose:
 a    1
b    2
c    3
d    4
e    5
dtype: int64 

DataFrame Transpose:
          a    b    c    d    e
one    1.0  2.0  3.0  NaN  NaN
two    1.0  2.0  3.0  4.0  NaN
three  1.0  2.0  3.0  4.0  5.0


<br>

## <span style="color:#130654">Function Application</span>

- Function application is used to <u>apply your own or another library’s functions to Pandas objects</u>
- There are 3 types of FAs depending upon the object where they will be applied i.e dataframe, row or column, and elements.

| Function | Usage|
|:--------:|------|
|**pipe()**|Table wise Function Application.|
|**apply()**|Row or Column Wise Function Application.|
|**applymap()**|Element wise Function Application.|

### Tablewise Function Application

Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. 
*Syntax:*
```python
DataFrame.pipe(self, func, *args, **kwargs)
```

| Name   | Description                                                  | Type | Required |
| :----- :| :----------------------------------------------------------- | :----------------- | :------------------ |
| **func**   | function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame. | function           | Required            |
| **args**   | positional arguments passed into func.                       | iterable           | Optional            |
| **kwargs** | a dictionary of keyword arguments passed into func.          | mapping            | Optional            |

*Example:*

In [13]:
# Lets create a custom function which will double any argument it takes
def double(x):
    return x * 2

# Now use the pipe() FA and apply it with double() function on dataframe df
df_pipe =  df.pipe(double)

print("Normal DataFrame:\n", df)
print("\n")
print("DataFrame with Pipe:\n", df_pipe)

Normal DataFrame:
    one  two  three
a  1.0  1.0      1
b  2.0  2.0      2
c  3.0  3.0      3
d  NaN  4.0      4
e  NaN  NaN      5


DataFrame with Pipe:
    one  two  three
a  2.0  2.0      2
b  4.0  4.0      4
c  6.0  6.0      6
d  NaN  8.0      8
e  NaN  NaN     10


Notice how values of original dataframe `df` are doubled using `pipe()` and printed as second dataframe `df_pipe`. Here `double()` function is applied for each row and column in the dataframe through `pipe()`. 

### Row or Column Wise Function Application

- The apply() function is used to apply a function along an axis of the DataFrame.
- Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1).
- By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

<div style="text-align:centeR">
    <img src="./img/pandas-dataframe-apply.png" width=300 height=200 />
    <img src="./img/pandas-dataframe-apply-1.png" width=300 height=200 />
</div>

*Syntax:*
```python
DataFrame.apply(self, func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
```

|      Name       | Description                                                  | Type                                  | Required |
| :-------------: | :----------------------------------------------------------- | :------------------------------------ | :------- |
|    **func**     | Function to apply to each column or row.                     | function                              | Required |
|    **axis**     | Axis along which the function is applied: 0 or ‘index’ and 1 or ‘columns’ | 0 or ‘index’, 1 or ‘columns’}         | Required |
|     **raw**     | False : passes each row or column as a Series to the function.True : the passed function will receive ndarray objects instead. | bool                                  | Required |
| **result_type** | These only act when axis=1 (columns):<br />1. ‘expand’ : list-like results will be turned into columns.<br />2. 'reduce' : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.<br />3. 'broadcast' : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.<br /><br />The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns. | 'expand', 'reduce', 'broadcast', None | Required |
|    **args**     | Positional arguments to pass to func in addition to the array/series. | tuple                                 | Required |
|   ****kwds**    | Additional keyword arguments to pass as keywords arguments to func. | ----                                  | Required |

*Example:*

In [15]:
df

Unnamed: 0,one,two,three
a,1.0,1.0,1
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4
e,,,5


In [20]:
# Lets apply sum on columns
df.apply(np.sum, axis=0)

one       6.0
two      10.0
three    15.0
dtype: float64

*Ignoring index cell with `NaN`*
- for column `one` = index a + index b + index c = 1 + 2 + 3 = 6
- for column `two` = index a + index b + index c + index d = 1 + 2 + 3 + 4 = 10
- for column `three` = index a + index b + index c +  + index e = 1 + 2 + 3 + 4 + 5 = 15

In [21]:
# Lets apply sum on row
df.apply(np.sum, axis=1)

a    3.0
b    6.0
c    9.0
d    8.0
e    5.0
dtype: float64

*Ignoring column cell with `NaN`*
- for index `a` = col one + col two + col three = 1 + 1 + 1 = 3
- for index `b` = col one + col two + col three = 2 + 2 + 2 = 6
- for index `c` = col one + col two + col three = 3 + 3 + 3 = 9
- for index `d` = col two + col three = 4 + 4 = 8
- for index `e` = col three = 5 = 5

### Element Wise Function Application

- The applymap() function is used to apply a function to a Dataframe elementwise.
- This method applies a function that accepts and returns a scalar to every element of a DataFrame

*Syntax:*
```python
DataFrame.applymap(self, func)
```

|   Name   | Description                                                  | Type     | Required |
| :------: | :----------------------------------------------------------- | :------- | :------- |
| **func** | Python function, returns a single value from a single value. | callable | Required |

*Example:*

In [22]:
df.applymap(lambda x: x*2)

Unnamed: 0,one,two,three
a,2.0,2.0,2
b,4.0,4.0,4
c,6.0,6.0,6
d,,8.0,8
e,,,10


In [23]:
df.applymap(lambda x: len(str(x)))

Unnamed: 0,one,two,three
a,3,3,1
b,3,3,1
c,3,3,1
d,3,3,1
e,3,3,1
