<a href="https://colab.research.google.com/github/irfankhan745/Daily-data-analytics-practice/blob/main/String_methods_in_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Project Overview

This notebook demonstrates a data cleaning and feature engineering process on the Titanic dataset. The primary goal was to extract and structure name-related information from the 'Name' column to create a more useful 'full_name' feature.

### Key Steps:

1.  **Data Loading**: The `titanic.csv` dataset was loaded into a pandas DataFrame.
2.  **Name Feature Extraction**: Various components of the 'Name' column were extracted:
    *   `last_name`: The surname of the passenger.
    *   `no_ser_name`: The part of the name after the comma.
    *   `mr_miss` and `firs_name`: The title (e.g., Mr, Miss) and the first name(s) were separated from `no_ser_name`.
3.  **Title Standardization**: Common titles like 'Mrs', 'Lady', and 'Ms' were standardized to 'Miss' for consistency.
4.  **Full Name Construction**: A new `full_name` column was created by concatenating the standardized title, first name(s), and last name.
5.  **Column Cleanup**: The original 'Name' column and all intermediate columns used for name extraction (`last_name`, `no_ser_name`, `mr_miss`, `firs_name`) were dropped to streamline the DataFrame.
6.  **Column Reordering**: The newly created `full_name` column was strategically placed in the DataFrame for better readability.

In [10]:
import pandas as pd

In [11]:
titenic = pd.read_csv('/content/titanic.csv')

In [12]:
titenic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [13]:
titenic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [14]:
titenic['last_name'] = titenic['Name'].str.split().str[0].str.strip(',')

In [15]:
titenic['no_ser_name'] = titenic['Name'].str.split(',').str[1]

In [16]:
titenic[['mr_miss','firs_name']] = titenic['no_ser_name'].str.split('.', n=1, expand=True)

In [17]:
titenic['mr_miss'].unique()

array([' Mr', ' Mrs', ' Miss', ' Master', ' Don', ' Rev', ' Dr', ' Mme',
       ' Ms', ' Major', ' Lady', ' Sir', ' Mlle', ' Col', ' Capt',
       ' the Countess', ' Jonkheer'], dtype=object)

In [18]:
titenic['mr_miss'] = titenic['mr_miss'].replace(['Mrs','Lady','Ms'],'Miss')

In [19]:
titenic['full_name'] = titenic['mr_miss'] + titenic['firs_name'] + titenic['last_name']

In [20]:
titenic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'last_name',
       'no_ser_name', 'mr_miss', 'firs_name', 'full_name'],
      dtype='object')

In [21]:
titenic = titenic.drop(columns=['firs_name','mr_miss','no_ser_name','last_name','Name'])

In [22]:
full_name = titenic.pop('full_name')

In [23]:
titenic.insert(2,'full_name',full_name)

In [24]:
titenic.head()

Unnamed: 0,PassengerId,Survived,full_name,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,Mr Owen HarrisBraund,3,male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,Mrs John Bradley (Florence Briggs Thayer)Cumings,1,female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,Miss LainaHeikkinen,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,Mrs Jacques Heath (Lily May Peel)Futrelle,1,female,35.0,1,0,113803,53.1,C123,S
4,5,0,Mr William HenryAllen,3,male,35.0,0,0,373450,8.05,,S
