<a href="https://colab.research.google.com/github/sureshmecad/Google-Colab/blob/master/1_NaN_NULL_infinity_values.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### **DataFrame.isinf()**

#### **np.isfinite(dataframe_name)**

-----------------

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
import numpy as np
import pandas as pd

### **Check if dataframe contains infinity**

- There are various cases where a data frame can contain **infinity** as value.

- **np.inf** for **positive infinity**

- **np.inf** for **negative infinity**

#### **Method1**
- Use **DataFrame.isinf()** function to check whether the dataframe contains **infinity or not**. It returns **boolean value**.

- If it contains any **infinity**, it will return **True**. **Else**, it will return **False**. 

In [10]:
# Create dataframe using dictionary
data = {'Student ID': [10, 11, 12, 13, 14], 
        'Age': [23, 22, 24, 22, 25],
        'Weight': [66, 72, np.inf, 68, -np.inf]}
  
df = pd.DataFrame(data)
df

Unnamed: 0,Student ID,Age,Weight
0,10,23,66.0
1,11,22,72.0
2,12,24,inf
3,13,22,68.0
4,14,25,-inf


In [11]:
# checking for infinity
ds = df.isin([np.inf, -np.inf])
print(ds)

   Student ID    Age  Weight
0       False  False   False
1       False  False   False
2       False  False    True
3       False  False   False
4       False  False    True


In [12]:
# printing the count of infinity values
count = np.isinf(df).values.sum()
print("It contains " + str(count) + " infinite values")

It contains 2 infinite values


In [13]:
# counting infinity in a particular column name
c = np.isinf(df['Weight']).values.sum()
print("It contains " + str(c) + " infinite values")

It contains 2 infinite values


In [14]:
# printing column name where infinity is present
col_name = df.columns.to_series()[np.isinf(df).any()]
print(col_name)

Weight    Weight
dtype: object


In [15]:
# printing row index with infinity
r = df.index[np.isinf(df).any(1)]
print(r)

Int64Index([2, 4], dtype='int64')


--------------------------

#### **Method2**
- Use **np.isfinite(dataframe_name)** to check the presence of infinite value(s). It returns boolean value. It will return **False** for **infinite values** and it will return **True** for **finite values.**

In [18]:
# Create dataframe using dictionary
df1 = {'Student ID': [10, 11, 12, 13, 14], 'Age': [
    23, 22, 24, 22, 25], 'Weight': [66, 72, np.inf, 68, -np.inf]}
  
df1 = pd.DataFrame(data)
  
d = np.isfinite(df1)
  
display(d)

Unnamed: 0,Student ID,Age,Weight
0,True,True,True
1,True,True,True
2,True,True,False
3,True,True,True
4,True,True,False


### **Remove infinite values**

- https://www.geeksforgeeks.org/remove-infinite-values-from-a-given-pandas-dataframe/

- you can **remove Inf values** by **converting** those to **first NaN and then removing the NaNs.**

       data.replace([np.inf, -np.inf], np.nan, inplace=True).dropna(inplace = True)
       
       df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")

- The above single line would handle the **conversion and dropping of useless data.**

In [19]:
# Create a dictionary for the dataframe
dict = {'Name': ['Sumit Tyagi', 'Sukritin', 'Akriti Goel',
                 'Sanskriti', 'Abhishek Jain'],
        'Age': [22, 20, np.inf, -np.inf, 22], 
        'Marks': [90, 84, 33, 87, 82]}
  
# Converting Dictionary to Pandas Dataframe
df2 = pd.DataFrame(dict)
  
# Print Dataframe
df2

Unnamed: 0,Name,Age,Marks
0,Sumit Tyagi,22.0,90
1,Sukritin,20.0,84
2,Akriti Goel,inf,33
3,Sanskriti,-inf,87
4,Abhishek Jain,22.0,82


In [20]:
# Replacing infinite with nan
df2.replace([np.inf, -np.inf], np.nan, inplace=True)
  
# Dropping all the rows with nan values
df2.dropna(inplace=True)
  
# Printing df2
df2

Unnamed: 0,Name,Age,Marks
0,Sumit Tyagi,22.0,90
1,Sukritin,20.0,84
4,Abhishek Jain,22.0,82


--------------------------

#### **Medical Appointment No Shows**

In [4]:
train = pd.read_csv("/content/drive/MyDrive/Datasets/Medical_Appointment_No_Shows.zip")
train.head()

Unnamed: 0,PatientId,AppointmentID,Gender,ScheduledDay,AppointmentDay,Age,Neighbourhood,Scholarship,Hipertension,Diabetes,Alcoholism,Handcap,SMS_received,No-show
0,29872500000000.0,5642903,F,2016-04-29T18:38:08Z,2016-04-29T00:00:00Z,62,JARDIM DA PENHA,0,1,0,0,0,0,No
1,558997800000000.0,5642503,M,2016-04-29T16:08:27Z,2016-04-29T00:00:00Z,56,JARDIM DA PENHA,0,0,0,0,0,0,No
2,4262962000000.0,5642549,F,2016-04-29T16:19:04Z,2016-04-29T00:00:00Z,62,MATA DA PRAIA,0,0,0,0,0,0,No
3,867951200000.0,5642828,F,2016-04-29T17:29:31Z,2016-04-29T00:00:00Z,8,PONTAL DE CAMBURI,0,0,0,0,0,0,No
4,8841186000000.0,5642494,F,2016-04-29T16:07:23Z,2016-04-29T00:00:00Z,56,JARDIM DA PENHA,0,1,1,0,0,0,No


##### **Looking for NaN/ Empty values**

- The most frustrating part about having to work with such a huge data size is that it is impossible to verify data visually. Having **NaN/NULL/inf** values in the dataset is one such problem that will have you pull your hair out because these can be hard to identify. I remember when I first coded a linear regression predictor, my loss would constantly rise to infinity without any explanation. It took me **two days of searching to realize that my dataset had infinity values.**

- So whenever you get a dataset, the **first step** should always be to **identify NaN/NULL/inf values**. The below example shows how that can be done.

In [6]:
train[train.isna().any(axis=1)]

Unnamed: 0,PatientId,AppointmentID,Gender,ScheduledDay,AppointmentDay,Age,Neighbourhood,Scholarship,Hipertension,Diabetes,Alcoholism,Handcap,SMS_received,No-show


In [16]:
# checking for infinity
ds = train.isin([np.inf, -np.inf])
print(ds)

        PatientId  AppointmentID  Gender  ...  Handcap  SMS_received  No-show
0           False          False   False  ...    False         False    False
1           False          False   False  ...    False         False    False
2           False          False   False  ...    False         False    False
3           False          False   False  ...    False         False    False
4           False          False   False  ...    False         False    False
...           ...            ...     ...  ...      ...           ...      ...
110522      False          False   False  ...    False         False    False
110523      False          False   False  ...    False         False    False
110524      False          False   False  ...    False         False    False
110525      False          False   False  ...    False         False    False
110526      False          False   False  ...    False         False    False

[110527 rows x 14 columns]


In [17]:
# printing the count of infinity values
count = np.isinf(ds).values.sum()
print("It contains " + str(count) + " infinite values")

It contains 0 infinite values
