<h2 align="center">Titanic Dataset</h2>

**Queries**

1. How to find out total no. of records & columns?
2. How to find out all missing values count in all columns?
3. How to show all missing age records?
4. How to show all missing Embarked records?
5. Show total count survived or not survived passenger?
6. How to find out total count survived male & female passenger?
7. How to find all passenger count which Fare=0 ?
8. How many survived passenger count which Fare=0 ?
9. Show maximum fare only 3 passenger list ?
10. Only show 3 columns with maximum fare 3 passenger list ?
11. All missing Embarked data rows dropped ?
12. All missing Age values replace with NaN to -1 ?
13. Show all records which Fare b/w 0 to <100 ?
14. Show all minimum Fare=0 & maximum Fare records b/w 0 to <100 ?

**Dataset:** The dataset used here is a csv file titled "Titanic"

<img src="Dataset/Titanic data_dic.png">

In [1]:
# Basic EDA Tools:
import numpy as np
import pandas as pd

In [2]:
# Import the csv file
data = pd.read_csv("Dataset/titanic.csv")

In [3]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [5]:
# 1. How to find out total no. of records & columns?
data.shape

(891, 12)

In [6]:
# 2. How to find out all missing values count in all columns?
data.isna().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [7]:
# 3. How to show all missing age records?
data[data["Age"].isna()]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0000,,S
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C
26,27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.2250,,C
28,29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q
...,...,...,...,...,...,...,...,...,...,...,...,...
859,860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C
863,864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.5500,,S
868,869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5000,,S
878,879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S


In [8]:
# 4. How to show all missing Embarked records?
data[data.Embarked.isna()]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
61,62,1,1,"Icard, Miss. Amelie",female,38.0,0,0,113572,80.0,B28,
829,830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62.0,0,0,113572,80.0,B28,


In [9]:
# 5. Show total count survived or not survived passenger?
# 0:Not survived 1:Survived
data.Survived.value_counts()

0    549
1    342
Name: Survived, dtype: int64

In [10]:
# 6. How to find out total count of survived male & female passengers?
data[data.Survived==1].Sex.value_counts()

female    233
male      109
Name: Sex, dtype: int64

In [11]:
# 7. How to find all passenger count which Fare=0 ?
data[data.Fare==0].PassengerId.size

15

In [12]:
# 8. How many survived passenger count which Fare=0 ?
data[(data["Survived"]==1)&(data["Fare"]==0)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
271,272,1,3,"Tornquist, Mr. William Henry",male,25.0,0,0,LINE,0.0,,S


In [13]:
# 9. Show maximum fare only 3 passenger list ?
data.sort_values(by="Fare",ascending=False)[:3]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
258,259,1,1,"Ward, Miss. Anna",female,35.0,0,0,PC 17755,512.3292,,C
737,738,1,1,"Lesurer, Mr. Gustave J",male,35.0,0,0,PC 17755,512.3292,B101,C
679,680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36.0,0,1,PC 17755,512.3292,B51 B53 B55,C


In [14]:
# 10. Only show 3 columns with maximum fare 3 passenger list ?
data.loc[data.sort_values("Fare",ascending=False).index[:3],["Name","Age","Sex"]]
# or data.iloc[data.sort_values("Fare",ascending=False).index[:3],[3,5,4]] gives same output

Unnamed: 0,Name,Age,Sex
258,"Ward, Miss. Anna",35.0,female
737,"Lesurer, Mr. Gustave J",35.0,male
679,"Cardeza, Mr. Thomas Drake Martinez",36.0,male


In [15]:
# 11. All missing Embarked data rows dropped ?
data.Embarked.dropna()

0      S
1      C
2      S
3      S
4      S
      ..
886    S
887    S
888    S
889    C
890    Q
Name: Embarked, Length: 889, dtype: object

In [16]:
# 12. All missing Age values replace with NaN to -1 ?
data.Age.fillna(-1, inplace=True)

In [17]:
# 13. Show all records which Fare b/w 0 to <100 ?
data[(data.Fare>=0)&(data.Fare<100)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,-1.0,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [18]:
# 14. Show all records minimum Fare=0 & maximum Fare records b/w 0 to <100 ?
f = data[(data.Fare<100)]
max_ = np.max(f.Fare)
min_ = np.min(f.Fare)
h=f[(f.Fare==max_)|(f.Fare==min_)]
h

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
179,180,0,3,"Leonard, Mr. Lionel",male,36.0,0,0,LINE,0.0,,S
263,264,0,1,"Harrison, Mr. William",male,40.0,0,0,112059,0.0,B94,S
271,272,1,3,"Tornquist, Mr. William Henry",male,25.0,0,0,LINE,0.0,,S
277,278,0,2,"Parkes, Mr. Francis ""Frank""",male,-1.0,0,0,239853,0.0,,S
302,303,0,3,"Johnson, Mr. William Cahoone Jr",male,19.0,0,0,LINE,0.0,,S
413,414,0,2,"Cunningham, Mr. Alfred Fleming",male,-1.0,0,0,239853,0.0,,S
466,467,0,2,"Campbell, Mr. William",male,-1.0,0,0,239853,0.0,,S
481,482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,-1.0,0,0,239854,0.0,,S
520,521,1,1,"Perreault, Miss. Anne",female,30.0,0,0,12749,93.5,B73,S
597,598,0,3,"Johnson, Mr. Alfred",male,49.0,0,0,LINE,0.0,,S
