Null values

A Null value can occur when no data is being provided to the items. The various columns may contain no values which are usually represented as NaN. In Pandas, several useful functions are available for detecting, removing, and replacing the null values in Data Frame. These functions are as follows:

- isnull(): The main task of isnull() is to return the true value if any row has null values.

- notnull(): It is opposite of isnull() function and it returns true values for not null value.

- dropna(): This method analyzes and drops the rows/columns of null values.

- fillna(): It allows the user to replace the NaN values with some other values.

- replace(): It is a very rich function that replaces a string, regex, series, dictionary, etc.

- interpolate(): It is a very powerful function that fills null values in the DataFrame or series.

String operation

A set of a string function is available in Pandas to operate on string data and ignore the missing/NaN values. There are different string operation that can be performed using .str. option. These functions are as follows:

- lower(): It converts any strings of the series or index into lowercase letters.

- upper(): It converts any string of the series or index into uppercase letters.

- strip(): This function helps to strip the whitespaces including a new line from each string in the Series/index.

- split(' '): It is a function that splits the string with the given pattern.

- cat(sep=' '): It concatenates series/index elements with a given separator.

- contains(pattern): It returns True if a substring is present in the element, else False.

- replace(a,b): It replaces the value a with the value b.

- repeat(value): It repeats each element with a specified number of times.

- count(pattern): It returns the count of the appearance of a pattern in each element.

- startswith(pattern): It returns True if all the elements in the series starts with a pattern.

- endswith(pattern): It returns True if all the elements in the series ends with a pattern.

- find(pattern): It is used to return the first occurrence of the pattern.

- findall(pattern): It returns a list of all the occurrence of the pattern.

- swapcase: It is used to swap the case lower/upper.

- islower(): It returns True if all the characters in the string of the Series/Index are in lowercase. Otherwise, it returns False.

- isupper(): It returns True if all the characters in the string of the Series/Index are in uppercase. Otherwise, it returns False.

- isnumeric(): It returns True if all the characters in the string of the Series/Index are numeric. Otherwise, it returns False.

Count Values

This operation is used to count the total number of occurrences using 'value_counts()' option.

Plots

Pandas plots the graph with the matplotlib library. The .plot() method allows you to plot the graph of your data.

- .plot() function plots index against every column.

You can also pass the arguments into the plot() function to draw a specific column.

In [1]:
import numpy as np
import pandas as pd

In [3]:
info = pd.DataFrame(np.arange(12).reshape(4, 3),  index = [['a', 'a', 'b', 'b'], ['one', 'two', 'three', 'four']],   
columns = [['num1', 'num2', 'num3'], ['x', 'y', 'x']])   
info

Unnamed: 0_level_0,Unnamed: 1_level_0,num1,num2,num3
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x
a,one,0,1,2
a,two,3,4,5
b,three,6,7,8
b,four,9,10,11


In [4]:
info.columns

MultiIndex([('num1', 'x'),
            ('num2', 'y'),
            ('num3', 'x')],
           )

In [10]:
info.swaplevel()  


Unnamed: 0_level_0,Unnamed: 1_level_0,num1,num2,num3
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x
one,a,0,1,2
two,a,3,4,5
three,b,6,7,8
four,b,9,10,11


In [38]:
df=pd.read_csv("titanic.csv")

In [39]:
df.head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1.0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2.0,,"St Louis, MO"
1,1.0,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.55,C22 C26,S,11.0,,"Montreal, PQ / Chesterville, ON"
2,1.0,0.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1.0,0.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1.0,0.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"


In [40]:
df.corr()

Unnamed: 0,pclass,survived,age,sibsp,parch,fare,body
pclass,1.0,-0.312469,-0.408106,0.060832,0.018322,-0.558629,-0.034642
survived,-0.312469,1.0,-0.055513,-0.027825,0.08266,0.244265,
age,-0.408106,-0.055513,1.0,-0.243699,-0.150917,0.178739,0.058809
sibsp,0.060832,-0.027825,-0.243699,1.0,0.373587,0.160238,-0.099961
parch,0.018322,0.08266,-0.150917,0.373587,1.0,0.221539,0.051099
fare,-0.558629,0.244265,0.178739,0.160238,0.221539,1.0,-0.04311
body,-0.034642,,0.058809,-0.099961,0.051099,-0.04311,1.0


In [41]:
df

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1.0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0000,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1.0,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1.0,0.0,"Allison, Miss. Helen Loraine",female,2.0000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1.0,0.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0000,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1.0,0.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1305,3.0,0.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665,14.4542,,C,,,
1306,3.0,0.0,"Zakarian, Mr. Mapriededer",male,26.5000,0.0,0.0,2656,7.2250,,C,,304.0,
1307,3.0,0.0,"Zakarian, Mr. Ortin",male,27.0000,0.0,0.0,2670,7.2250,,C,,,
1308,3.0,0.0,"Zimmerman, Mr. Leo",male,29.0000,0.0,0.0,315082,7.8750,,S,,,


In [52]:
df.iloc[1:10]

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
1,1.0,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.55,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1.0,0.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1.0,0.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1.0,0.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
5,1.0,1.0,"Anderson, Mr. Harry",male,48.0,0.0,0.0,19952,26.55,E12,S,3,,"New York, NY"
6,1.0,1.0,"Andrews, Miss. Kornelia Theodosia",female,63.0,1.0,0.0,13502,77.9583,D7,S,10,,"Hudson, NY"
7,1.0,0.0,"Andrews, Mr. Thomas Jr",male,39.0,0.0,0.0,112050,0.0,A36,S,,,"Belfast, NI"
8,1.0,1.0,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2.0,0.0,11769,51.4792,C101,S,D,,"Bayside, Queens, NY"
9,1.0,0.0,"Artagaveytia, Mr. Ramon",male,71.0,0.0,0.0,PC 17609,49.5042,,C,,22.0,"Montevideo, Uruguay"


In [58]:
df.loc[(df['pclass']==2)]

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
323,2.0,0.0,"Abelson, Mr. Samuel",male,30.0,1.0,0.0,P/PP 3381,24.000,,C,,,"Russia New York, NY"
324,2.0,1.0,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28.0,1.0,0.0,P/PP 3381,24.000,,C,10,,"Russia New York, NY"
325,2.0,0.0,"Aldworth, Mr. Charles Augustus",male,30.0,0.0,0.0,248744,13.000,,S,,,"Bryn Mawr, PA, USA"
326,2.0,0.0,"Andrew, Mr. Edgardo Samuel",male,18.0,0.0,0.0,231945,11.500,,S,,,"Buenos Aires, Argentina / New Jersey, NJ"
327,2.0,0.0,"Andrew, Mr. Frank Thomas",male,25.0,0.0,0.0,C.A. 34050,10.500,,S,,,"Cornwall, England Houghton, MI"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
595,2.0,0.0,"Wheeler, Mr. Edwin ""Frederick""",male,,0.0,0.0,SC/PARIS 2159,12.875,,S,,,
596,2.0,1.0,"Wilhelms, Mr. Charles",male,31.0,0.0,0.0,244270,13.000,,S,9,,"London, England"
597,2.0,1.0,"Williams, Mr. Charles Eugene",male,,0.0,0.0,244373,13.000,,S,14,,"Harrow, England"
598,2.0,1.0,"Wright, Miss. Marion",female,26.0,0.0,0.0,220844,13.500,,S,9,,"Yoevil, England / Cottage Grove, OR"


In [57]:
df.loc[(df['pclass']==2) & (df['sex']=='male') & (df['age']<7)]

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
339,2.0,1.0,"Becker, Master. Richard F",male,1.0,2.0,1.0,230136,39.0,F4,S,11,,"Guntur, India / Benton Harbour, MI"
359,2.0,1.0,"Caldwell, Master. Alden Gates",male,0.8333,0.0,2.0,248738,29.0,,S,13,,"Bangkok, Thailand / Roseville, IL"
427,2.0,1.0,"Hamalainen, Master. Viljo",male,0.6667,1.0,1.0,250649,14.5,,S,4,,"Detroit, MI"
492,2.0,1.0,"Mallet, Master. Andre",male,1.0,0.0,2.0,S.C./PARIS 2079,37.0042,,C,10,,"Paris / Montreal, PQ"
514,2.0,1.0,"Navratil, Master. Edmond Roger",male,2.0,1.0,1.0,230080,26.0,F2,S,D,,"Nice, France"
515,2.0,1.0,"Navratil, Master. Michel M",male,3.0,1.0,1.0,230080,26.0,F2,S,D,,"Nice, France"
548,2.0,1.0,"Richards, Master. George Sibley",male,0.8333,1.0,1.0,29106,18.75,,S,4,,"Cornwall / Akron, OH"
549,2.0,1.0,"Richards, Master. William Rowe",male,3.0,1.0,1.0,29106,18.75,,S,4,,"Cornwall / Akron, OH"
587,2.0,1.0,"Wells, Master. Ralph Lester",male,2.0,1.0,1.0,29103,23.0,,S,14,,"Cornwall / Akron, OH"


In [73]:
df.loc[:,['sex','pclass']]

Unnamed: 0,sex,pclass
0,female,1.0
1,male,1.0
2,female,1.0
3,male,1.0
4,female,1.0
...,...,...
1305,female,3.0
1306,male,3.0
1307,male,3.0
1308,male,3.0
