# [30 Examples to Master Pandas](https://towardsdatascience.com/30-examples-to-master-pandas-f8a2da751fa4)

In [1]:
import numpy as np
import pandas as pd

### 1. Reading the csv file

In [4]:
df=pd.read_csv("data/Churn_Modelling.csv")

In [5]:
df.shape

(10000, 14)

In [6]:
df.columns

Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')

### 2. Dropping columns

In [7]:
df.drop(labels=['RowNumber', 'CustomerId', 'Surname', 'CreditScore'], 
       axis=1, inplace=True)
df.shape

(10000, 10)

```python
    df.drop(labels=['RowNumber', 'CustomerId', 'Surname', 'CreditScore'], 
       axis=1)
```
is equal to
```python
df.drop(columns=['RowNumber', 'CustomerId', 'Surname', 'CreditScore'],    inplace=True)
```

### 3. Select particular columns while reading

We can read only some of the columns from the csv file. The list of columns is passed to the `usecols` parameter while reading. It is better than dropping later on if you know the column names beforehand.

In [9]:
df_part=pd.read_csv("data/Churn_Modelling.csv", 
                    usecols=['Gender', 'Age', 'Tenure', 'Balance'])
df_part.head()

Unnamed: 0,Gender,Age,Tenure,Balance
0,Female,42,2,0.0
1,Female,41,1,83807.86
2,Female,42,8,159660.8
3,Female,39,1,0.0
4,Female,43,2,125510.82


### 4. Reading a part of the DataFrame
The read_csv function allows reading a part of the dataframe in terms of the rows. There are two options. The first one is to read the first n number of rows.

In [11]:
df_partial=pd.read_csv("data/Churn_Modelling.csv", nrows=1000)
df_partial.shape

(1000, 14)

Using the `nrows` parameters, we created a dataframe that contains the first 1000 rows of the csv file.
We can also select rows from the end of the file by using the `skiprows` parameter. `Skiprows`=1000 means that we will skip the first 1000 rows while reading the csv file.

### 5. Sample

After creating a dataframe, we may want to draw a small sample to work. We can either use the `n` parameter or `frac` parameter to determine the sample size.
* **n**: The number of rows in the sample
* **frac**: The ratio of the sample size to the whole dataframe size

In [12]:
df.shape

(10000, 10)

In [13]:
df_sample=df.sample(n=2000)
df_sample.shape

(2000, 10)

### 6. Checking the missing values

In [14]:
df.isna().sum()

Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

### 7. Adding missing values using loc and iloc

In [15]:
missing_index=np.random.randint(low=10000, size=20)
df.loc[missing_index, ['Balance','Geography']]=np.nan

In [16]:
df.isna().sum()

Geography          20
Gender              0
Age                 0
Tenure              0
Balance            20
NumOfProducts       0
HasCrCard           0
IsActiveMember      0
EstimatedSalary     0
Exited              0
dtype: int64

In [17]:
df.iloc[missing_index, -1]=np.nan
df.isna().sum()

Geography          20
Gender              0
Age                 0
Tenure              0
Balance            20
NumOfProducts       0
HasCrCard           0
IsActiveMember      0
EstimatedSalary     0
Exited             20
dtype: int64

### 8. Filling missing values

The `fillna` function is used to fill the missing values. It provides many options. We can use a specific value, an aggregate function (e.g. `mean`), or the previous or next value.
For the geography column, I will use the most common value.

In [20]:
mode=df.Geography.value_counts()[0]
mode

5004

In [21]:
df.Geography.fillna(value=mode, inplace=True)

In [22]:
df.isna().sum()

Geography           0
Gender              0
Age                 0
Tenure              0
Balance            20
NumOfProducts       0
HasCrCard           0
IsActiveMember      0
EstimatedSalary     0
Exited             20
dtype: int64

Similarly, for the balance column, I will use the mean of the column to replace missing values.

In [23]:
avg = df['Balance'].mean()
df['Balance'].fillna(value=avg, inplace=True)

In [24]:
df.isna().sum()

Geography           0
Gender              0
Age                 0
Tenure              0
Balance             0
NumOfProducts       0
HasCrCard           0
IsActiveMember      0
EstimatedSalary     0
Exited             20
dtype: int64

### 9. Dropping missing values
Another way to handle missing values is to drop them. There are still missing values in the “Exited” column. The following code will drop rows that have any missing value.

In [25]:
df.shape

(10000, 10)

In [26]:
df.dropna(axis=0, how="any", inplace=True)

In [27]:
df.shape

(9980, 10)