# <span style="color:#130654; font-family: Helvetica; font-size: 200%; font-weight:700"> Pandas | <span style="font-size: 50%; font-weight:300">Column & Row Operations</span>

To use pandas in python import it first by using the following command:

In [50]:
# import pandas
import pandas as pd

# import copy
import copy

In [84]:
# Lets create dataframe for this notebook
data = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
        'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
        'three' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])}

df = pd.DataFrame(data, dtype="int64")

In [85]:
# Lets create a series for this notebook
sri = pd.Series([12,14,16,18,20], index=[0,1,2,3,4])

## <span style="color:#130654">**Column Operations**</span>

### <span style="color:#130654">1. Column selection</span>

- Column can be selected by using one of these two methods, exclude <>:
    1. `dataframe.<column_name>`
    2. `dataframe['<column_name>']`
- Use `dataframe.columns` to get column names.

*Example:*

In [4]:
# to use dataframe `df` for column operations we will assign it to new variable 
# so that original dataframe doesn't get altered
df_col = df.copy()

In [5]:
# get column names as index
df_col.columns

Index(['one', 'two', 'three'], dtype='object')

<br>

*Method 1: `dataframe.<column_name>`*

Overall syntax for this method is: `dataframe.<column_name>[<row / index]`

<span style="color:green">Note: This method is not useful for selecting multiple columns.</span>

In [6]:
# To get the values on column `one`
df_col.one

a    1.0
b    2.0
c    3.0
d    NaN
e    NaN
Name: one, dtype: float64

In [7]:
# To get first value in column `one`

# based on index name
df_col.one['a']

# based on index position
df_col.one[0]

1.0

<br>

*Method 2: `dataframe.['<column_name>']`*

In [8]:
# To get the first column `one`
df_col['one']

a    1.0
b    2.0
c    3.0
d    NaN
e    NaN
Name: one, dtype: float64

In [9]:
# To get first value in column `one`

# based on index name
df_col['one']['a']

# based on index position
df_col['one'][0]

1.0

In [10]:
# get multiple columns column
df_col[['one', 'three']]

Unnamed: 0,one,three
a,1.0,1
b,2.0,2
c,3.0,3
d,,4
e,,5


### <span style="color:#130654">2. Column addition

New column in dataframe can be created by adding `new data` or by applying operations on `existing columns`.

*Example:*

In [11]:
# creating new column by adding new data into dataframe
df_col['four'] = pd.Series([1, 2, 3, 4, 5, 6, 7], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
df_col

Unnamed: 0,one,two,three,four
a,1.0,1.0,1,1
b,2.0,2.0,2,2
c,3.0,3.0,3,3
d,,4.0,4,4
e,,,5,5


<span style="color:green; font-family: Helvetica;">
    <strong>Note</strong>: 
    Adding a column with higher index than dataframe, other indexes will ignored and will not be appended to dataframe.
</span>

In [12]:
# creating new column by by applying operations on existing columns
df_col['five'] = df_col['three']+df_col['four']
df_col

Unnamed: 0,one,two,three,four,five
a,1.0,1.0,1,1,2
b,2.0,2.0,2,2,4
c,3.0,3.0,3,3,6
d,,4.0,4,4,8
e,,,5,5,10


#### Question: Why some columns are showing values in float and some are in integer format?
### *Hint*: <span style="color: Red; font-family: Helvetica; font-size: 125%; font-weight:700"> `NaN` is a float! </span> 

### <span style="color:#130654">3. Column Deletion

Columns from dataframe can be removed using following:

|Method|Syntax|Inplace|
|:----:|------|-------|
|**del**| `del dataframe["<column_name">]`|Yes|
|**pop**| `dataframe.pop("<column_name>")`|No|
|**drop**| `df.drop(columns=["<column_names>"]`|No|


<span style="color:#130654; font-family: Helvetica;"><strong>Difference:</strong></span> 
- `pop` method can return the column popped while `del` won't return the deleted column. 
- So `pop` method can be utilized for popping out a column and pushing it to another dataframe.
- `drop` is a advance pandas method to remove row/index or columns by specifying label names and corresponding axis, or by specifying directly index or column names.

*Syntax of drop():*
```python
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
```

|Param|Details|
|:----:|------|
|**labels**| Index or column labels to drop.|
|**axis**| Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).|
|**index**| Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).|
|**columns**| Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).|
|**level**| For MultiIndex, level from which the labels will be removed.|
|**inplace**| If False, return a copy. Otherwise, do operation inplace and return None.|
|**errors**| If ‘ignore’, suppress error and only existing labels are dropped.|


*Example:*

In [13]:
# deleting first column in dataframe using del
del df_col['one']
df_col

Unnamed: 0,two,three,four,five
a,1.0,1,1,2
b,2.0,2,2,4
c,3.0,3,3,6
d,4.0,4,4,8
e,,5,5,10


In [14]:
# removing last column using pop method
df_col.pop('five')
df_col

Unnamed: 0,two,three,four
a,1.0,1,1
b,2.0,2,2
c,3.0,3,3
d,4.0,4,4
e,,5,5


In [15]:
# removing column using drop method
df_col.drop(labels=['three'], axis=1)

#or

df_col.drop(columns=['three'])

# both will give same result

Unnamed: 0,two,four
a,1.0,1
b,2.0,2
c,3.0,3
d,4.0,4
e,,5


-------------

## <span style="color:#130654">Row / Index Operations</span>

In [16]:
# to use dataframe `df` for row operations we will assign it to new variable 
# so that original dataframe doesn't get altered
df_row = df.copy()

In [17]:
# get index names as index
df_row.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

### <span style="color:#130654">1. Row selection

Rows in pandas can be selected using two method:
1. by label using `loc` method
2. by integer location using `iloc` method

*Example:*

In [18]:
# selecting row using label
df_row.loc['a']

one      1.0
two      1.0
three    1.0
Name: a, dtype: float64

In [19]:
# selecting row using integer location
df_row.iloc[0]

one      1.0
two      1.0
three    1.0
Name: a, dtype: float64

### <span style="color:#130654">2. Slice Rows

- Rows in pandas can be sliced using `:` operator
- Rows can be sliced using `index postition` or `index label`
- Index position works same as normal slicing, while slicing with index label is accurate to the labels used around `:` operator

*Example:*

In [20]:
# slicing using index position
df_row[1:4]

Unnamed: 0,one,two,three
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4


In [21]:
# slicing using index label
df_row['b':'d']

Unnamed: 0,one,two,three
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4


### <span style="color:#130654">3. Adding rows

- Rows to data frame can be added using `append()` method
- Single row can be appended
- Rows from another dataframe with same fields/columns can be appended
- `append()` method doesn't support inplace functionality, so it has to be assigned to make it inplace

*Syntax:*
```python
DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)
```

|Param|Details|
|:----:|------|
|**other**| The data to append.|
|**ignore_index**| If True, the resulting axis will be labeled 0, 1, …, n - 1.|
|**verify_integrity**|If True, raise ValueError on creating index with duplicates.|
|**sort**|Sort columns if the columns of self and other are not aligned.|

*Example:* Method 1

In [22]:
# Directly appending dictionary to dataframe
df_row.append({'two':5,'three':6}, ignore_index=True)

Unnamed: 0,one,two,three
0,1.0,1.0,1.0
1,2.0,2.0,2.0
2,3.0,3.0,3.0
3,,4.0,4.0
4,,,5.0
5,,5.0,6.0


*Example:* Method 2

In [23]:
# Creating dataframe and then appending

# creating another dataframe
data_row2 = {'two':[6],'three':[7],'four':[7]}
df_row2 = pd.DataFrame(data_row2)
df_row2

Unnamed: 0,two,three,four
0,6,7,7


In [24]:
# appending new dataframe with old dataframe
df_row.append(df_row2, ignore_index=True)

Unnamed: 0,one,two,three,four
0,1.0,1.0,1,
1,2.0,2.0,2,
2,3.0,3.0,3,
3,,4.0,4,
4,,,5,
5,,6.0,7,7.0


### <span style="color:#130654">4. Deleting rows**

Rows in pandas dataframe can be deleted using `drop()` method

*Example:*

In [25]:
# Removing rows using drop method
df_row.drop(labels=['a', 'c'], axis=0)

# OR

df_row.drop(index=['a', 'c'])

# OR

df_row.drop(index=df_row.iloc[[0, 2]].index, axis=0)

# each of these methods will give same result

Unnamed: 0,one,two,three
b,2.0,2.0,2
d,,4.0,4
e,,,5


In the last method `df.iloc[[0,2]]` is used to get the dataframe for index 'a' and 'c', then `.index` method is used to take out index labels which is then passed into "index" param. It is better to use variable instead for directly passing dataframe.

-------------

## <span style="color:#130654">Renaming Index and Column names</span>

DataFrame Index and Column names can be change dor altered using following methods:
1. `rename()`
2. `add_prefix()` and `add_suffix()`
3.  `dataframe.index` and `dataframe.column` attributes

In [87]:
# Lets create a copy of original dataframe as df_rename 
df_rename = copy.deepcopy(df)

In [88]:
# Lets create a copy of orignal series as sri_rename
sri_rename = copy.deepcopy(sri)

<br>

### <span style="color:#130654">1 rename()</span>

- `rename()` function is used to alter axes labels.
- Function / dict values must be unique (1-to-1).
- Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.
- This method is only for dataframe object.

*Syntax:*
```python
DataFrame.rename(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')
```

|    Param     | Description                                                  | Type                  | Required |
| :---------: | :----------------------------------------------------------- | :-------------------- | :------- |
| **mapper**  | Dict-like or functions transformations to apply to that axis’ values. | dict-like or function | Required |
|  **index**  | Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper). | dict-like or function | Required |
| **columns** | Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper). | dict-like or function | Required |
|  **axis**   | Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). | int or str            | Required |
|  **copy**   | Also copy underlying data.                                   | bool                  | Required |
| **inplace** | Whether to return a new DataFrame.                           | bool                  | Required |
|  **level**  | In case of a MultiIndex, only rename labels in the specified level. | int or level name     | Required |
| **errors**  | If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. <br />If ‘ignore’, existing keys will be renamed and extra keys will be ignored. | {‘ignore’, ‘raise’}   | Required |

<br>

**Example: For DataFrame**

In [28]:
# Check dataframe `df_rename`
df_rename

Unnamed: 0,one,two,three
a,1.0,1.0,1
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4
e,,,5


Lets capitalize the first aplhabet of column names and index names

In [29]:
df.rename(columns={'one':'One', 'two':'Two','three':'Three'},
         index = {'a':'A','b':'B','c':'C', 'd':'D', 'e':'E'})

Unnamed: 0,One,Two,Three
A,1.0,1.0,1
B,2.0,2.0,2
C,3.0,3.0,3
D,,4.0,4
E,,,5


Using functions to rename:
- Uppercase all column names using `upper()`
- Lowercase all index names using `lower()`

In [30]:
df_rename.rename(columns=str.upper, index=str.lower)

Unnamed: 0,ONE,TWO,THREE
a,1.0,1.0,1
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4
e,,,5


Using `lambda` function:
- Add string `Col` before column name
- Double index string for index name

In [31]:
df_rename.rename(columns = lambda c : 'col ' + c, 
                 index = lambda i : i * 2)

Unnamed: 0,col one,col two,col three
aa,1.0,1.0,1
bb,2.0,2.0,2
cc,3.0,3.0,3
dd,,4.0,4
ee,,,5


### <span style="color:#130654">2. add_prefix() & add_suffix()</span>

`add_prefix()` and `add_sufix()` behaves differently for both `DataFrame` and `Series`:

|Data Structure|Process     |
|:------------:|------------|
|**DataFrame** |Column Only |
|**Series**    |Index Only  |

<br>

*Example:* DataFrame

In [41]:
df_rename.add_prefix("1_")

Unnamed: 0,1_one,1_two,1_three
a,1.0,1.0,1
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4
e,,,5


In [42]:
df_rename.add_suffix("_2")

Unnamed: 0,one_2,two_2,three_2
a,1.0,1.0,1
b,2.0,2.0,2
c,3.0,3.0,3
d,,4.0,4
e,,,5


<br>

*Example:* Series

In [57]:
sri_rename.add_prefix("a_")

a_0    12
a_1    14
a_2    16
a_3    18
a_4    20
dtype: int64

In [58]:
sri_rename.add_suffix("_b")

0_b    12
1_b    14
2_b    16
3_b    18
4_b    20
dtype: int64

<br>

### <span style="color:#130654">3. dataframe.index and dataframe.column attributes</span>

- `.index` provides list of row indexes, assigning a list will replace original index with assigned list.
- `.columns` provides list of column labels, assigning a list will replace original labels with assigned list.
- length of assigned list should not exceed length of axis (index or columns).
- `series` only support `.index` attribute as it is 1D data structure and doesn't contains column.
- replacing indexes/columns with these attributes are `inplace`.

Creating list of index and columns:

In [97]:
# define list variable of index
index = ['aa', 'bb', 'cc', 'dd', 'ee']

# define list variable of columns label
columns = ['ace', 'king', 'queen']

<br>

*Example:* DataFrame

In [91]:
df_rename.index = index
df_rename

Unnamed: 0,one,two,three
aa,1.0,1.0,1
bb,2.0,2.0,2
cc,3.0,3.0,3
dd,,4.0,4
ee,,,5


In [92]:
df_rename.columns = columns
df_rename

Unnamed: 0,ace,king,queen
aa,1.0,1.0,1
bb,2.0,2.0,2
cc,3.0,3.0,3
dd,,4.0,4
ee,,,5


<br>

*Example:* Series

In [98]:
sri_rename.index = index
sri_rename

aa    12
bb    14
cc    16
dd    18
ee    20
dtype: int64

<br>