# Handling Missing Values

### We can represent missing values in python using two special values (sentinel types):
    1. NaN - from numpy, special IEEE floating point value
    2. None - Python Object

In [8]:
import numpy as np

print("type of nan is : ",type(np.nan))
print("print type of None is : ",type(None))

type of nan is :  <class 'float'>
print type of None is :  <class 'NoneType'>


### Differences:
    w.r.t type:
    1. Using NaN in a numpy array converts the elements in array to type float
    2. Using None in a numpy array converts the elements in array to type object

In [9]:
arr_with_nan = np.array([1,2,np.NaN])
arr_with_none = np.array([1,2,None])

print("numpy array having NaN:",arr_with_nan ,"\n type of array : ",type(arr_with_nan),"\n type of elements in array:",arr_with_nan.dtype)
print()
print("numpy array having None:",arr_with_none ,"\n type of array : ",type(arr_with_none),"\n type of elements in array:",arr_with_none.dtype)


numpy array having NaN: [ 1.  2. nan] 
 type of array :  <class 'numpy.ndarray'> 
 type of elements in array: float64

numpy array having None: [1 2 None] 
 type of array :  <class 'numpy.ndarray'> 
 type of elements in array: object


### Differences w.r.t to computation speed between floating values array and objects array:
    Aggregation operations on array with floating point values takes much lesser time compared to array with elements of type object because of additional overhead involved with objects. 
    So whenever possible, using NaN instead of None to represent missing values can help in faster computation, especially when aggregation operations on arrays are invloved.

In [10]:
arr_of_float = np.arange(1E6,dtype=float)
arr_of_obj = np.arange(1E6,dtype=object)

print("array of elements with type float : ",arr_of_float,"\n type of elements:",arr_of_float.dtype)
print()
print("array of elements with type object: ",arr_of_obj,"\n type of elements:",arr_of_obj.dtype)


array of elements with type float :  [0.00000e+00 1.00000e+00 2.00000e+00 ... 9.99997e+05 9.99998e+05
 9.99999e+05] 
 type of elements: float64

array of elements with type object:  [0 1 2 ... 999997 999998 999999] 
 type of elements: object


In [11]:
%timeit arr_of_float.sum()
%timeit arr_of_obj.sum()

243 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
22.3 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [12]:
print(arr_with_nan.sum(),arr_with_nan.max())
print(1+np.NaN,0*np.NaN)

nan nan
nan nan


Performing these aggregation and arithmetic operations on nan values result in nan. Alternatively we can use other methods when we want aggregated values even in the presence of nan.

In [13]:
print("sum of array having nan with another method : ",np.nansum(arr_with_nan))
print("mean of array having nan with another method : ",np.nanmean(arr_with_nan))
print("max value of array having nan with another method : ",np.nanmax(arr_with_nan))

sum of array having nan with another method :  3.0
mean of array having nan with another method :  1.5
max value of array having nan with another method :  2.0


The same arithmetic or aggregation operations on NoneType object would result in error, as the operations are not defined for the operand type object.

In [14]:
# the following lines provide demo that they generate error
print(1+None, 0*None)
print(arr_with_none.sum())

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

## NaN and None in Pandas:
    Both NaN and None can be used in pandas. However, pandas converts NaN to None and viceversa wherever appropriate.

In [15]:
import pandas as pd

print("pandas series with only NaN :\n",pd.Series([1,2,np.NaN]))
print()
print("pandas series with only None:\n",pd.Series([1,2,None]))
print()
print("pandas series having both NaN and None:\n",pd.Series([1,2,np.NaN,None]))

pandas series with only NaN :
 0    1.0
1    2.0
2    NaN
dtype: float64

pandas series with only None:
 0    1.0
1    2.0
2    NaN
dtype: float64

pandas series having both NaN and None:
 0    1.0
1    2.0
2    NaN
3    NaN
dtype: float64


None object got converted to NaN and the whole elements of array have become type float.

In [16]:
# creating a sample pandas series with int elements
pd_series = pd.Series([1,2],dtype=int)
print("a sample pandas series : \n", pd_series)

a sample pandas series : 
 0    1
1    2
dtype: int64


In [17]:
#assigning a NaN value to one of the elements in series
pd_series[0] = np.NaN
print("sample pandas series after assigning NaN value:\n", pd_series)

sample pandas series after assigning NaN value:
 0    NaN
1    2.0
dtype: float64


The dtype of pandas series got changed from int to float because of assinging an NaN value

In [18]:
#assigning a NoneType to pandas series with int elements
pd_series = pd.Series([1,2],dtype=int)
pd_series[0] = None
print("sample pandas series after assigning NoneType object:\n", pd_series)

sample pandas series after assigning NoneType object:
 0    NaN
1    2.0
dtype: float64


The dtype of pandas series got changed from int to float because of assinging a None value but not to dtype object unlike in numpy array. So pandas casted integer array to floating point values and also converted None to NaN.

In [19]:
# similarly for a pandas series with boolean values with NaN assignment
pd_series_with_boolean_val = pd.Series([True,False,True,False])
print("pandas series with boolean values : \n",pd_series_with_boolean_val)
pd_series_with_boolean_val[0] = np.NaN
print("pandas series after assigning NaN to an element : \n",pd_series_with_boolean_val)

pandas series with boolean values : 
 0     True
1    False
2     True
3    False
dtype: bool
pandas series after assigning NaN to an element : 
 0    NaN
1    0.0
2    1.0
3    0.0
dtype: float64


boolean type pandas series got converted to a float type series because of assigning NaN. All True values became 1 and False values became 0. The value to which NaN is assigned remained as NaN.

In [20]:
# similarly for a pandas series with boolean values with None assignment
pd_series_with_boolean_val = pd.Series([True,False,True,False])
print("pandas series with boolean values : \n",pd_series_with_boolean_val)
pd_series_with_boolean_val[0] = None
print("pandas series after assigning None to an element : \n",pd_series_with_boolean_val)

pandas series with boolean values : 
 0     True
1    False
2     True
3    False
dtype: bool
pandas series after assigning None to an element : 
 0    False
1    False
2     True
3    False
dtype: bool


boolean type pandas series remained as boolean type series even after assigning None. The element to which None is assigned got converted as False boolean value.

In [21]:
# similarly for a pandas series with boolean values with NaN assignment
pd_series_with_strings = pd.Series([1,2,True,"test1","test2"])
print("pandas series having a string element : \n",pd_series_with_strings)
pd_series_with_strings[0] = np.NaN
print("pandas series after assigning NaN to an element : \n",pd_series_with_strings)

pandas series having a string element : 
 0        1
1        2
2     True
3    test1
4    test2
dtype: object
pandas series after assigning NaN to an element : 
 0      NaN
1        2
2     True
3    test1
4    test2
dtype: object


In [22]:
pd_series_with_strings = pd.Series([1,2,True,"test1","test2"])
print("pandas series having a string element : \n",pd_series_with_strings)
pd_series_with_strings[0] = None
print("pandas series after assigning None to an element : \n",pd_series_with_strings)

pandas series having a string element : 
 0        1
1        2
2     True
3    test1
4    test2
dtype: object
pandas series after assigning None to an element : 
 0     None
1        2
2     True
3    test1
4    test2
dtype: object


## Operating on Null/Missing values

## Detecting Null Values
    we can use isnull() method to detect missing values. We can also use notnull() method for detecting non null values.

In [23]:
pd_series_with_missing_values = pd.Series([1,2,3,np.NaN,4,np.NaN,None,"test1"])
print("pandas series is : \n",pd_series_with_missing_values)
print()
print("output of isnull method :\n ",pd_series_with_missing_values.isnull())

pandas series is : 
 0        1
1        2
2        3
3      NaN
4        4
5      NaN
6     None
7    test1
dtype: object

output of isnull method :
  0    False
1    False
2    False
3     True
4    False
5     True
6     True
7    False
dtype: bool


pandas isnull() returns a boolean mask indicating missing values

In [24]:
print("output of notnull method : \n",pd_series_with_missing_values.notnull())

output of notnull method : 
 0     True
1     True
2     True
3    False
4     True
5    False
6    False
7     True
dtype: bool


pandas notnull() returns a boolean mask indicating values - opposite of isnull()

Advantage of getting a boolean mask array is that we can use it for indexing a pandas series or a dataframe.

In [25]:
print(pd_series_with_missing_values[pd_series_with_missing_values.notnull()])

0        1
1        2
2        3
4        4
7    test1
dtype: object


We got a subset of the series which are not null using this boolean mask obtained from notnull().

In [26]:
print(pd_series_with_missing_values[pd_series_with_missing_values.isnull()])

3     NaN
5     NaN
6    None
dtype: object


We got a subset of the series which are null using the boolean mask obtained from isnull().

## Dropping Null Values

In [27]:
print("pandas series after dropping the null values : \n",pd_series_with_missing_values.dropna())

pandas series after dropping the null values : 
 0        1
1        2
2        3
4        4
7    test1
dtype: object


#### we have more options in dropping null values in a pandas dataframe:
    1.we can drop rows having one or more or entirely null values.
    2.we can drop  columns having one or more or entirely null values.

In [28]:
#let's create a sample pandas dataframe
pd_df_with_missing_values = pd.DataFrame([[1,2,np.NaN],[4,np.NaN,5],[4,6,8],[1,5,np.NaN],[np.NaN,np.NaN]])
print("pandas dataframe with missing values:\n",pd_df_with_missing_values)

pandas dataframe with missing values:
      0    1    2
0  1.0  2.0  NaN
1  4.0  NaN  5.0
2  4.0  6.0  8.0
3  1.0  5.0  NaN
4  NaN  NaN  NaN


In [29]:
pd_df_with_missing_values.append([[1,2,3]])

Unnamed: 0,0,1,2
0,1.0,2.0,
1,4.0,,5.0
2,4.0,6.0,8.0
3,1.0,5.0,
4,,,
0,1.0,2.0,3.0


In [30]:
#droppping rows having atleast one null value

#let's create a sample pandas dataframe
print("pandas dataframe with missing values:\n",pd_df_with_missing_values)
print()

print("pandas dataframe after dropping rows having null values:")
print(pd_df_with_missing_values.dropna())

print()

print("pandas dataframe after dropping rows having null values using axis parameter set to 0:")
print(pd_df_with_missing_values.dropna(axis=0))


print()

print("pandas dataframe after dropping rows having null values using axis parameter set to rows:")
print(pd_df_with_missing_values.dropna(axis='rows'))



pandas dataframe with missing values:
      0    1    2
0  1.0  2.0  NaN
1  4.0  NaN  5.0
2  4.0  6.0  8.0
3  1.0  5.0  NaN
4  NaN  NaN  NaN

pandas dataframe after dropping rows having null values:
     0    1    2
2  4.0  6.0  8.0

pandas dataframe after dropping rows having null values using axis parameter set to 0:
     0    1    2
2  4.0  6.0  8.0

pandas dataframe after dropping rows having null values using axis parameter set to rows:
     0    1    2
2  4.0  6.0  8.0


In [31]:
#droppping columns having atleast one null value


print("pandas dataframe with missing values:\n",pd_df_with_missing_values)
print()

print("pandas dataframe after dropping columns having null values:")
print(pd_df_with_missing_values.dropna(axis=1))


print()

print("pandas dataframe after dropping columns having null values:")
print(pd_df_with_missing_values.dropna(axis='columns'))



pandas dataframe with missing values:
      0    1    2
0  1.0  2.0  NaN
1  4.0  NaN  5.0
2  4.0  6.0  8.0
3  1.0  5.0  NaN
4  NaN  NaN  NaN

pandas dataframe after dropping columns having null values:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]

pandas dataframe after dropping columns having null values:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]


In [32]:
#since all columns has atleast one null value, lets add one column with non null values to test 


pd_df_with_missing_values = pd.concat([pd_df_with_missing_values,pd.DataFrame([1,2,3,4,5])],axis = 1,ignore_index=True)
print("pandas dataframe with missing values : \n",pd_df_with_missing_values)
print()

print("pandas dataframe after dropping columns having null values:")
print(pd_df_with_missing_values.dropna(axis=1))


print()

print("pandas dataframe after dropping columns having null values:")
print(pd_df_with_missing_values.dropna(axis='columns'))


pandas dataframe with missing values : 
      0    1    2  3
0  1.0  2.0  NaN  1
1  4.0  NaN  5.0  2
2  4.0  6.0  8.0  3
3  1.0  5.0  NaN  4
4  NaN  NaN  NaN  5

pandas dataframe after dropping columns having null values:
   3
0  1
1  2
2  3
3  4
4  5

pandas dataframe after dropping columns having null values:
   3
0  1
1  2
2  3
3  4
4  5


In [33]:
## using how parameter to drop missing data
print("pandas dataframe with missing values:\n" ,pd_df_with_missing_values)
print()
print("after dropping rows which contains a null value:\n",pd_df_with_missing_values.dropna(axis=0,how='any'))
print()
print("after dropping rows which contains only null values:\n",pd_df_with_missing_values.dropna(axis=0,how='all'))
print()

pandas dataframe with missing values:
      0    1    2  3
0  1.0  2.0  NaN  1
1  4.0  NaN  5.0  2
2  4.0  6.0  8.0  3
3  1.0  5.0  NaN  4
4  NaN  NaN  NaN  5

after dropping rows which contains a null value:
      0    1    2  3
2  4.0  6.0  8.0  3

after dropping rows which contains only null values:
      0    1    2  3
0  1.0  2.0  NaN  1
1  4.0  NaN  5.0  2
2  4.0  6.0  8.0  3
3  1.0  5.0  NaN  4
4  NaN  NaN  NaN  5



In [34]:
# adding a row with all nan values
pd_df_with_missing_values = pd_df_with_missing_values.append([[np.NaN]*5]).reset_index()

print("pandas dataframe with missing values:\n" ,pd_df_with_missing_values)
print()
print("after dropping rows which contains a null value:\n",pd_df_with_missing_values.dropna(axis=0,how='any'))
print()
print("after dropping rows which contains only null values:\n",pd_df_with_missing_values.dropna(axis=0,how='all'))
print()

pandas dataframe with missing values:
    index    0    1    2    3   4
0      0  1.0  2.0  NaN  1.0 NaN
1      1  4.0  NaN  5.0  2.0 NaN
2      2  4.0  6.0  8.0  3.0 NaN
3      3  1.0  5.0  NaN  4.0 NaN
4      4  NaN  NaN  NaN  5.0 NaN
5      0  NaN  NaN  NaN  NaN NaN

after dropping rows which contains a null value:
 Empty DataFrame
Columns: [index, 0, 1, 2, 3, 4]
Index: []

after dropping rows which contains only null values:
    index    0    1    2    3   4
0      0  1.0  2.0  NaN  1.0 NaN
1      1  4.0  NaN  5.0  2.0 NaN
2      2  4.0  6.0  8.0  3.0 NaN
3      3  1.0  5.0  NaN  4.0 NaN
4      4  NaN  NaN  NaN  5.0 NaN
5      0  NaN  NaN  NaN  NaN NaN



similarly the same operations can be done on columns by setting axis parameter to 1 or 'columns'

In [35]:
#using thresh parameter to specify the minimum number of non null values in a row for a row to not get dropped

print("pandas dataframe with missing values:\n" ,pd_df_with_missing_values)
print()

print("after dropping rows not having atleast no.of non null values specified by thresh param: \n",pd_df_with_missing_values.dropna(thresh=3))

pandas dataframe with missing values:
    index    0    1    2    3   4
0      0  1.0  2.0  NaN  1.0 NaN
1      1  4.0  NaN  5.0  2.0 NaN
2      2  4.0  6.0  8.0  3.0 NaN
3      3  1.0  5.0  NaN  4.0 NaN
4      4  NaN  NaN  NaN  5.0 NaN
5      0  NaN  NaN  NaN  NaN NaN

after dropping rows not having atleast no.of non null values specified by thresh param: 
    index    0    1    2    3   4
0      0  1.0  2.0  NaN  1.0 NaN
1      1  4.0  NaN  5.0  2.0 NaN
2      2  4.0  6.0  8.0  3.0 NaN
3      3  1.0  5.0  NaN  4.0 NaN


In [36]:
#using info method on pandas dataframe to check number of non-null value in each columns
pd_df_with_missing_values.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   index   6 non-null      int64  
 1   0       4 non-null      float64
 2   1       3 non-null      float64
 3   2       2 non-null      float64
 4   3       5 non-null      float64
 5   4       0 non-null      float64
dtypes: float64(5), int64(1)
memory usage: 416.0 bytes


## Filling missing/null values

In [39]:
pd_series = pd.Series([1,2,3,4,np.NaN,None])
print("pandas series with missing values is : \n",pd_series)

pandas series with missing values is : 
 0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
5    NaN
dtype: float64


In [40]:
#lets fill the missing values with 0.
pd_series.fillna(0)

0    1.0
1    2.0
2    3.0
3    4.0
4    0.0
5    0.0
dtype: float64

In [78]:
#lets fill the missing values with some value.
pd_series.fillna(value=3)

0    1.0
1    2.0
2    3.0
3    4.0
4    3.0
5    3.0
6    5.0
dtype: float64

In [49]:
#filling using forward fill. This will be useful while hadnling time series data
pd_series.fillna(method="ffill")

0    1.0
1    2.0
2    3.0
3    4.0
4    4.0
5    4.0
dtype: float64

In [51]:
#filling using backward fill. As there are no elements in series ahead of NaN values . they won't get filled.
pd_series.fillna(method="bfill")

0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
5    NaN
dtype: float64

In [81]:
# trying to fill values using both value and method will rise a ValueError
pd_series.fillna(value=3,method='bfill')

ValueError: Cannot specify both 'value' and 'method'.

In [82]:
#let's add another elemt to series
pd_series = pd_series.append(pd.Series([5]),ignore_index=True)

In [83]:
#backward fill once again. This time missing values gets filled.
pd_series.fillna(method="bfill")

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    5.0
6    5.0
7    5.0
dtype: float64

In [84]:
df = pd.DataFrame([[1,2,np.NaN],[2,3,np.NaN],[5,np.NaN,3]])
print('pandas dataframe is:\n',df)

pandas dataframe is:
    0    1    2
0  1  2.0  NaN
1  2  3.0  NaN
2  5  NaN  3.0


In [85]:
#fill all missing values with 0
df.fillna(0)

Unnamed: 0,0,1,2
0,1,2.0,0.0
1,2,3.0,0.0
2,5,0.0,3.0


In [86]:
#fill all nulls with some value
df.fillna(value=3)

Unnamed: 0,0,1,2
0,1,2.0,3.0
1,2,3.0,3.0
2,5,3.0,3.0


In [87]:
#fill all null values using forward fill

In [88]:
#fill missing vlaues using forward fill
df.fillna(method='ffill')

Unnamed: 0,0,1,2
0,1,2.0,
1,2,3.0,
2,5,3.0,3.0


In [90]:
#fill missing values using backward fill
df.fillna(method='bfill')

Unnamed: 0,0,1,2
0,1,2.0,3.0
1,2,3.0,3.0
2,5,,3.0


In [92]:
#fill missing values using both value and method parameters. It will throw a ValueError
df.fillna(value=2,method='ffill')

ValueError: Cannot specify both 'value' and 'method'.

In [100]:
#fill all na values with a specific value along rows. This works same as filling a value withouit specifying axis param.
df.fillna(value=2,axis=0)

Unnamed: 0,0,1,2
0,1,2.0,2.0
1,2,3.0,2.0
2,5,2.0,3.0


In [99]:
# fill value along colums
df.fillna(value=3,axis=1)

Unnamed: 0,0,1,2
0,1,2.0,3.0
1,2,3.0,3.0
2,5,3.0,3.0


In [103]:
#fill using method and axis. forward filling using rows
df.fillna(method='ffill',axis=0)

Unnamed: 0,0,1,2
0,1.0,2.0,2.0
1,2.0,3.0,3.0
2,5.0,5.0,3.0


In [104]:
#fill using method and axis. forward filling using columns
df.fillna(method='ffill',axis=1)

Unnamed: 0,0,1,2
0,1.0,2.0,2.0
1,2.0,3.0,3.0
2,5.0,5.0,3.0


In [105]:
#fill using method and axis. backward filling using rows
df.fillna(method='bfill',axis=0)

Unnamed: 0,0,1,2
0,1,2.0,3.0
1,2,3.0,3.0
2,5,,3.0


In [106]:
#fill using method and axis. backward filling using columns
df.fillna(method='bfill')

Unnamed: 0,0,1,2
0,1.0,2.0,
1,2.0,3.0,
2,5.0,3.0,3.0


In [152]:
#let'try to fill with mean. First lets calculate mean
print('pandas dataframe is:\n',df)
print()
print('mean along rows:\n',df.mean())
print()
print('mean along rows:\n',df.mean(axis='rows')) # note that mean is calculated along the rows. not mean of a row.
print()
print('mean along columns:\n',df.mean(axis='columns'))# note that mean is calculated along the columns. not mean of a column.


pandas dataframe is:
    0    1    2
0  1  2.0  NaN
1  2  3.0  NaN
2  5  NaN  3.0

mean along rows:
 0    2.666667
1    2.500000
2    3.000000
dtype: float64

mean along rows:
 0    2.666667
1    2.500000
2    3.000000
dtype: float64

mean along columns:
 0    1.5
1    2.5
2    4.0
dtype: float64


In [153]:
# this will fill with mean of the respective rows

print('pandas dataframe is:\n',df)
print()
df.fillna(value = {0:2,1:3,2:4,3:5})

pandas dataframe is:
    0    1    2
0  1  2.0  NaN
1  2  3.0  NaN
2  5  NaN  3.0



Unnamed: 0,0,1,2
0,1,2.0,4.0
1,2,3.0,4.0
2,5,3.0,3.0


In [154]:
#lets check what type the mean method returns
type(df.mean())

pandas.core.series.Series

In [155]:
# lets fill null values with a series. pandas accepts a dictionary or a pandas series for the value parameter.
# it alwasy fills column by column

print('pandas dataframe is:\n',df)
print()
df.fillna(value = pd.Series([1,2,3]))

pandas dataframe is:
    0    1    2
0  1  2.0  NaN
1  2  3.0  NaN
2  5  NaN  3.0



Unnamed: 0,0,1,2
0,1,2.0,3.0
1,2,3.0,3.0
2,5,2.0,3.0


In [156]:
#this will fill with mean of the respective columns
df.fillna(value = df.mean(axis='columns'),axis=0)

Unnamed: 0,0,1,2
0,1,2.0,4.0
1,2,3.0,4.0
2,5,2.5,3.0


In [157]:
#trying to fill along along axis 1 will throw an NotImplementedError depending on the version of pandas
df.fillna(value = df.mean(axis='columns'),axis=1)

NotImplementedError: Currently only can fill with dict/Series column by column

In [177]:
# so, what to do if you need to fill missing value for different rows with a series.
# Do it with a apply method.
print('pandas dataframe is :\n ',df)
print()
df.apply(lambda x: print(x),axis=1) # axis 0 will print along the rows

pandas dataframe is :
     0    1    2
0  1  2.0  3.0
1  2  3.0  3.0
2  5  2.5  3.0

0    1.0
1    2.0
2    3.0
Name: 0, dtype: float64
0    2.0
1    3.0
2    3.0
Name: 1, dtype: float64
0    5.0
1    2.5
2    3.0
Name: 2, dtype: float64


0    None
1    None
2    None
dtype: object

In [176]:
# so if we wanna fill null values within rows with their respective row means we can do as below using lambda method and axis param
df.apply(lambda x: x.fillna(x.mean()),axis=1)

Unnamed: 0,0,1,2
0,1.0,2.0,3.0
1,2.0,3.0,3.0
2,5.0,2.5,3.0


In [181]:
# filling in a specific column using its mean
df = pd.DataFrame([[1,2,np.NaN],[2,3,np.NaN],[5,np.NaN,3]])
print('pandas dataframe is : \n',df)

df[1] = df[1].fillna(value=df[1].mean())
print("pandas df after filling column 1 with its mean:\n",df)

pandas dataframe is : 
    0    1    2
0  1  2.0  NaN
1  2  3.0  NaN
2  5  NaN  3.0
pandas df after filling column 1 with its mean:
    0    1    2
0  1  2.0  NaN
1  2  3.0  NaN
2  5  2.5  3.0


In [199]:
# filling in a specific row using its mean
df = pd.DataFrame([[1,2,np.NaN],[2,3,np.NaN],[5,np.NaN,3]])
df.iloc[1] = df.iloc[1].fillna(value=df.iloc[1].mean())
print("pandas df after filling column 1 with its mean:\n",df)

pandas df after filling column 1 with its mean:
      0    1    2
0  1.0  2.0  NaN
1  2.0  3.0  2.5
2  5.0  NaN  3.0


In [207]:
# filling a specific element in pandas dataframe is just through assigning it
print("pandas dataframe is : \n",df)
print('\n')
df.iloc[2,1] = 2

print("pandas dataframe after filling a specific element:\n",df)

pandas dataframe is : 
      0    1    2
0  1.0  2.0  NaN
1  2.0  3.0  2.5
2  5.0  2.0  3.0


pandas dataframe after filling a specific element:
      0    1    2
0  1.0  2.0  NaN
1  2.0  3.0  2.5
2  5.0  2.0  3.0
