## Data Frames - Basic Operations

Here are some of the basic operations we typically perform on top of Pandas Data Frame.
* Getting number of records and columns.
* Getting data types of the columns.
* Replacing `NaN` with some standard values.
* Dropping a column from the Data Frame.
* Getting or updating column names.
* Sorting by index or values.

In [1]:
import pandas as pd

```{note}
Creating Pandas Data Frame using list of dicts.
```

In [2]:
sals_ld = [
    {'id': 1, 'sal': 1500.0},
    {'id': 2, 'sal': 2000.0, 'comm': 10.0},
    {'id': 3, 'sal': 2200.0, 'active': False}
]

```{note}
Column names will be inherited automatically using keys from the dict.
```

In [3]:
sals_df = pd.DataFrame(sals_ld)

In [4]:
sals_df

Unnamed: 0,id,sal,comm,active
0,1,1500.0,,
1,2,2000.0,10.0,
2,3,2200.0,,False


In [5]:
sals_df['id']

0    1
1    2
2    3
Name: id, dtype: int64

In [6]:
sals_df[['id', 'sal']]

Unnamed: 0,id,sal
0,1,1500.0
1,2,2000.0
2,3,2200.0


In [7]:
sals_df.shape

(3, 4)

In [8]:
sals_df.shape[0]

3

In [9]:
sals_df.count()

id        3
sal       3
comm      1
active    1
dtype: int64

In [10]:
sals_df.count()[:2]

id     3
sal    3
dtype: int64

In [11]:
sals_df.count()['id']

3

In [12]:
sals_df

Unnamed: 0,id,sal,comm,active
0,1,1500.0,,
1,2,2000.0,10.0,
2,3,2200.0,,False


In [13]:
sals_df.dtypes

id          int64
sal       float64
comm      float64
active     object
dtype: object

In [14]:
sals_df.fillna?

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [None]:
sals_df.fillna(0.0)

In [None]:
sals_df.fillna({'comm': 0.0})

In [None]:
sals_df.fillna({'comm': 0.0, 'active': True})

```{note}
Original Data Frame will be untouched, instead a new Data Frame will be created. Original Data Frame still contain `NaN`. We typically assign the output of most of the Data Frame functions to another variable or object.
```

In [None]:
sals_df

In [None]:
sals_df = sals_df.fillna({'comm': 0.0, 'active': True})
sals_df

In [None]:
sals_df.drop?

In [None]:
sals_df.drop(columns='comm')

```{note}
We can also drop multiple columns by passing column names as list.
```

In [None]:
sals_df.drop(columns=['comm', 'active'])

In [None]:
sals_df.drop(['comm', 'active'], axis=1)

In [None]:
sals_df = sals_df.drop(columns='comm')

In [None]:
sals_df.columns

In [None]:
sals_df.columns = ['employee_id', 'salary', 'commission']

In [None]:
sals_df

In [None]:
sals_df.sort_index?

In [None]:
sals_df.sort_index(ascending=False)

In [None]:
sals_df.sort_values?

In [None]:
sals_df.sort_values(by='employee_id', ascending=False)

In [None]:
sals_df.sort_values(by='salary')

In [None]:
sals_df.sort_values(by='salary', ascending=False)

In [None]:
sals_ld = [
    {'id': 1, 'sal': 1500.0},
    {'id': 2, 'sal': 2000.0, 'comm': 10.0},
    {'id': 3, 'sal': 2200.0, 'active': False},
    {'id': 4, 'sal': 2000.0}
]

In [None]:
sals_df = pd.DataFrame(sals_ld)

In [None]:
sals_df.sort_values(by=['sal', 'id'])

In [None]:
sals_df.sort_values(by=['sal', 'id'], ascending=[False, True])