## Pandas
Pandas is an open-source data analysis and manipulation library for Python. It provides data structures like Series and DataFrame, which allow for efficient handling and analysis of large datasets. Pandas is built on top of NumPy and is widely used in data science, machine learning, and scientific research.

Key Features of Pandas

* Data Structures: Provides two primary data structures: Series (one-dimensional) and DataFrame (two-dimensional).
* Data Manipulation: Supports operations like merging, reshaping, selecting, and data cleaning.
* Data Analysis: Facilitates statistical analysis and data visualization.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# reading a CSV file through url
# https://github.com/datasciencedojo/datasets/blob/master/titanic.csv
data = pd.read_csv(r"https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")

In [3]:
data.head()  # fetches first 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
import os
os.getcwd() # gives the current working directory

'd:\\Data Science\\github\\Data-Science\\data science\\Datascience_libraries'

In [5]:
# reading a CSV file from local device
data1 = pd.read_csv('d:\\Data Science\\github\\Data-Science\\data science\\Datascience_libraries\\files\\titanic.csv')
data1.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [6]:
# reading a json file through url
data2 = pd.read_json(r"https://raw.githubusercontent.com/mayank953/mlbootcamp-24/main/Data%20Science%20Libraries/class%202%20-%20Pandas%20/titanicdata.csv")
data2

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [7]:
# reading the train.csv file with comma seperator
pd.read_table('d:\\Data Science\\github\\Data-Science\\data science\\Datascience_libraries\\files\\train.csv', sep=',')

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


## Data Structures: Series and DataFrame
Pandas has two primary data structures:

* Series: A one-dimensional labeled array capable of holding any data type.
* DataFrame: A two-dimensional labeled data structure with columns of potentially different types.

In [8]:
## lets take sample json data
json = {
    "page": 2,
    "per_page": 6,
    "total": 12,
    "total_pages": 2,
    "data": [
        {
            "id": 7,
            "email": "michael.lawson@reqres.in",
            "first_name": "Michael",
            "last_name": "Lawson",
            "avatar": "https://reqres.in/img/faces/7-image.jpg"
        },
        {
            "id": 11,
            "email": "george.edwards@reqres.in",
            "first_name": "George",
            "last_name": "Edwards",
            "avatar": "https://reqres.in/img/faces/11-image.jpg"
        },
        {
            "id": 12,
            "email": "rachel.howell@reqres.in",
            "first_name": "Rachel",
            "last_name": "Howell",
            "avatar": "https://reqres.in/img/faces/12-image.jpg"
        }
    ],
    "support": {
        "url": "https://reqres.in/#support-heading",
        "text": "To keep ReqRes free, contributions towards server costs are appreciated!"
    }
}

In [9]:
json['data']

[{'id': 7,
  'email': 'michael.lawson@reqres.in',
  'first_name': 'Michael',
  'last_name': 'Lawson',
  'avatar': 'https://reqres.in/img/faces/7-image.jpg'},
 {'id': 11,
  'email': 'george.edwards@reqres.in',
  'first_name': 'George',
  'last_name': 'Edwards',
  'avatar': 'https://reqres.in/img/faces/11-image.jpg'},
 {'id': 12,
  'email': 'rachel.howell@reqres.in',
  'first_name': 'Rachel',
  'last_name': 'Howell',
  'avatar': 'https://reqres.in/img/faces/12-image.jpg'}]

In [10]:
df = pd.DataFrame(json['data'])  # converting the json format to data frames
df

Unnamed: 0,id,email,first_name,last_name,avatar
0,7,michael.lawson@reqres.in,Michael,Lawson,https://reqres.in/img/faces/7-image.jpg
1,11,george.edwards@reqres.in,George,Edwards,https://reqres.in/img/faces/11-image.jpg
2,12,rachel.howell@reqres.in,Rachel,Howell,https://reqres.in/img/faces/12-image.jpg


In [11]:
df1 = pd.DataFrame([1,2,3,4])

In [12]:
df1  # gives the data in columns

Unnamed: 0,0
0,1
1,2
2,3
3,4


In [13]:
df1 = pd.DataFrame([1,2,3,4], columns=["values"])    # providing the column name
df1

Unnamed: 0,values
0,1
1,2
2,3
3,4


In [14]:
df1 = pd.DataFrame([1,2,3,4], columns=["values"], index=["s1", "s2", "s3", "s4"])    # providing the row name
df1

Unnamed: 0,values
s1,1
s2,2
s3,3
s4,4


In [15]:
# np.array() - This function is used to create a numpy array.
# The first argument is a list of tuples, where each tuple represents a row of data.
# dtype=[('a', 'i'), ('b', 'i'), ('c', 'i')] - This specifies the data type for each field (column) in the array.
# 'a', 'b', 'c' are the field names (column names).
# 'i' specifies that the data type for each field is integer.
dat = np.array([(1,2,3),(4,5,6),(7,8,9)], dtype=[('a','i'), ('b','i'), ('c','i')])
dat

array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
      dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')])

In [16]:
df2 = pd.DataFrame(dat, columns=['a','c'])  # from the dat array. fetching only a & c columns
df2

Unnamed: 0,a,c
0,1,3
1,4,6
2,7,9


In [18]:
### dictinary datatype into dataframe
# here the key is taking as column and value is taking as row
df3 = pd.DataFrame({'col-1':1,'col-2:':2}, index=['a'])
df3

Unnamed: 0,col-1,col-2:
a,1,2


In [19]:
df4 = pd.DataFrame({'col-1':[1,2],'col-2:':[3,4]}, index=['a','b']) # taking values as list
df4

Unnamed: 0,col-1,col-2:
a,1,3
b,2,4


In [20]:
# nested dictionary
# here we are providing the index names as row1 and row2. no need to specify explicitly
df5 = pd.DataFrame({'col-1': {'row-1':10, 'row-2':20}, 'col-2': {'row-1':30, 'row-2': 40}})  # nested dictionary
df5

Unnamed: 0,col-1,col-2
row-1,10,30
row-2,20,40


## Operations on DataFrame

In [21]:
titanic = data1
titanic

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [22]:
titanic.name        # trying to fetching the column name from dataset. but ist case sensitive

AttributeError: 'DataFrame' object has no attribute 'name'

In [23]:
titanic.Name       # fetching the column Name

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [24]:
titanic["Name"] # best way to fetch it.

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [26]:
titanic['Age'].head()

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64

## Adding a New Column

In [27]:
# Create a new column 'Age_greater_25' in the 'titanic' DataFrame
# This column will contain boolean values: True if 'Age' is greater than 25, False otherwise
titanic["Age_greater_25"] = titanic['Age'] > 25  # new column appends at the last
titanic

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,True
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S,False
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,True


In [28]:
titanic.shape   # gives the no. of rows & columns matrix

(891, 13)

In [29]:
# Create a new column 'Survived_true' in the 'titanic' DataFrame
# This column will contain boolean values: True if 'Survived' is equal to 1 (indicating the passenger survived), False otherwise
titanic['Survived_true'] = titanic['Survived'] == 1
titanic

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25,Survived_true
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,False,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,True,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,True,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,True,False
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False,True
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S,False,False
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,True,True


In [30]:
titanic.shape  

(891, 14)

In [31]:
titanic.to_csv('d:\\Data Science\\github\\Data-Science\\data science\\Datascience_libraries\\files\\modified_titanic.csv')

## Delete Column

In [33]:
del titanic['Survived_true']  # deleting the column

In [34]:
titanic.shape

(891, 13)

In [35]:
titanic['survived_true']    # key error. deleted successfully

KeyError: 'survived_true'

## Rename Column

In [36]:
# Rename the column 'Name' to 'Name_Of_Passenger' in the 'titanic' DataFrame
# The rename() method is used to rename one or more columns
titanic.rename(columns={"Name":"Name_Of_Passenger"})  # Name is changed to Name_Of_Paseenger

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,True
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S,False
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,True


In [37]:
titanic.head() # when i try to access dataset again its still the previous name. not updated

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [38]:
# inorder to update the name to the dataset. we should use inplace=True
titanic.rename(columns={"Name":"Name_Of_Passenger"}, inplace=True)

In [39]:
titanic.head()  # name changed

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


## Slicing & Indexing

In [40]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [41]:
titanic[0:5]    # we can achieve the above using the slicing

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [42]:
# Slice the 'titanic' DataFrame to select every second row from index 1 to 9
titanic[1:10:2]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,False
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S,False
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,False


In [43]:
titanic.tail()  # last 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S,True
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S,False
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S,False
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,True
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,True


In [44]:
# fetching the last 5 rows using slicing
titanic[-1:-6:-1]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,True
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,True
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S,False
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S,False
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S,True


In [45]:
# Slice the 'titanic' DataFrame to select every second row in reverse order from index -3 to -13
titanic[-3:-13: -2]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S,False
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S,True
884,885,0,3,"Sutehall, Mr. Henry Jr",male,25.0,0,0,SOTON/OQ 392076,7.05,,S,False
882,883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22.0,0,0,7552,10.5167,,S,False
880,881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25.0,0,1,230433,26.0,,S,False


## iloc - index location
he iloc property in pandas is used to access a group of rows and columns by their integer position (i.e., position-based indexing).

In [46]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [47]:
# Use iloc to slice the 'titanic' DataFrame to select from index 0 to 9
titanic.iloc[0:10]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,False
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,True
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S,False
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,True
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,False


In [48]:
# Use iloc to slice the 'titanic' DataFrame to select every second row from index 0 to 9
titanic.iloc[0:10:2]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,True
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,True


In [49]:
# same as like slicing the rows. we can do the same slicing on columns as well
# Use iloc to slice both rows and columns of the 'titanic' DataFrame
# The syntax for iloc slicing is [rows, columns]
# Here, [0:10:2, 0:5] selects rows from index 0 to 9 (exclusive), taking every second row,
# and selects columns from index 0 to 4 (inclusive).
titanic.iloc[0:10:2,0:5]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex
0,1,0,3,"Braund, Mr. Owen Harris",male
2,3,1,3,"Heikkinen, Miss. Laina",female
4,5,0,3,"Allen, Mr. William Henry",male
6,7,0,1,"McCarthy, Mr. Timothy J",male
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female


In [50]:
# if i want to get all the rows and columns
titanic.iloc[:,:]  # same as like slicing

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,True
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S,False
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,True


In [51]:
# if i want the columns to be reversed
titanic.iloc[:,::-1] # all columns reversed

Unnamed: 0,Age_greater_25,Embarked,Cabin,Fare,Ticket,Parch,SibSp,Age,Sex,Name_Of_Passenger,Pclass,Survived,PassengerId
0,False,S,,7.2500,A/5 21171,0,1,22.0,male,"Braund, Mr. Owen Harris",3,0,1
1,True,C,C85,71.2833,PC 17599,0,1,38.0,female,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,1,2
2,True,S,,7.9250,STON/O2. 3101282,0,0,26.0,female,"Heikkinen, Miss. Laina",3,1,3
3,True,S,C123,53.1000,113803,0,1,35.0,female,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,1,4
4,True,S,,8.0500,373450,0,0,35.0,male,"Allen, Mr. William Henry",3,0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,True,S,,13.0000,211536,0,0,27.0,male,"Montvila, Rev. Juozas",2,0,887
887,False,S,B42,30.0000,112053,0,0,19.0,female,"Graham, Miss. Margaret Edith",1,1,888
888,False,S,,23.4500,W./C. 6607,2,1,,female,"Johnston, Miss. Catherine Helen ""Carrie""",3,0,889
889,True,C,C148,30.0000,111369,0,0,26.0,male,"Behr, Mr. Karl Howell",1,1,890


In [52]:
# if i want the last 3 columns and last 5 rows
titanic.iloc[-1:-6:-1, -1:-4:-1]

Unnamed: 0,Age_greater_25,Embarked,Cabin
890,True,Q,
889,True,C,C148
888,False,S,
887,False,S,B42
886,True,S,


In [53]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [54]:
titanic.shape

(891, 13)

In [55]:
# Use iloc to slice both rows and columns of the 'titanic' DataFrame
# The syntax for iloc slicing is [rows, columns]
# Here, [4:13, 3:13:2] selects rows from index 4 to 12 (exclusive),
# and selects columns from index 3 to 12 (exclusive) with a step of 2.
titanic.iloc[4:13, 3:13:2]  # 3:13:2 => 3 means 4th column 

Unnamed: 0,Name_Of_Passenger,Age,Parch,Fare,Embarked
4,"Allen, Mr. William Henry",35.0,0,8.05,S
5,"Moran, Mr. James",,0,8.4583,Q
6,"McCarthy, Mr. Timothy J",54.0,0,51.8625,S
7,"Palsson, Master. Gosta Leonard",2.0,1,21.075,S
8,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",27.0,2,11.1333,S
9,"Nasser, Mrs. Nicholas (Adele Achem)",14.0,0,30.0708,C
10,"Sandstrom, Miss. Marguerite Rut",4.0,1,16.7,S
11,"Bonnell, Miss. Elizabeth",58.0,0,26.55,S
12,"Saundercock, Mr. William Henry",20.0,0,8.05,S


In [56]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [57]:
# # iloc works only with indexes. doesnt work by accessing with column names
titanic.iloc[1:5, 'Age']  

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

In [58]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


## loc
The loc indexer is used to access a group of rows and columns by label(s) or a boolean array in a DataFrame. It allows for both label-based indexing and boolean indexing along both the row and column axes.

In [59]:
# accessing the column age using its name
titanic.loc[0:4, 'Age']

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64

In [60]:
# Use loc to select specific rows and columns by label
# Here, we're selecting rows with labels from 0 to 3 (inclusive) and columns with labels 'Name_Of_Passenger', 'Age', and 'Cabin'
titanic.loc[0:3, ['Name_Of_Passenger', 'Age', 'Cabin']]

Unnamed: 0,Name_Of_Passenger,Age,Cabin
0,"Braund, Mr. Owen Harris",22.0,
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,C85
2,"Heikkinen, Miss. Laina",26.0,
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,C123


In [62]:
# Use loc to select specific rows and columns based on a condition
# Here, we're selecting rows where the 'Age' column is greater than 30, and columns 'Name_Of_Passenger' and 'Age'
titanic.loc[titanic['Age']>30, ['Name_Of_Passenger', 'Age']]

Unnamed: 0,Name_Of_Passenger,Age
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
6,"McCarthy, Mr. Timothy J",54.0
11,"Bonnell, Miss. Elizabeth",58.0
...,...,...
873,"Vander Cruyssen, Mr. Victor",47.0
879,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",56.0
881,"Markun, Mr. Johann",33.0
885,"Rice, Mrs. William (Margaret Norton)",39.0


In [63]:
# accessing based on age which is null
titanic.loc[titanic['Age'].isnull()]

#   or both works similar
titanic.loc[titanic['Age'].isna()]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,False
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0000,,S,False
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C,False
26,27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.2250,,C,False
28,29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
859,860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C,False
863,864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.5500,,S,False
868,869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5000,,S,False
878,879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S,False


In [64]:
# accessing based on age which is not null
titanic.loc[titanic['Age'].notnull()]

#   or

titanic.loc[titanic['Age'].notna()]

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,True
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29.1250,,Q,True
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,True
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,True


In [65]:
## titanic.info() -> gives information about dataset like rows columns and data type of column 
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   PassengerId        891 non-null    int64  
 1   Survived           891 non-null    int64  
 2   Pclass             891 non-null    int64  
 3   Name_Of_Passenger  891 non-null    object 
 4   Sex                891 non-null    object 
 5   Age                714 non-null    float64
 6   SibSp              891 non-null    int64  
 7   Parch              891 non-null    int64  
 8   Ticket             891 non-null    object 
 9   Fare               891 non-null    float64
 10  Cabin              204 non-null    object 
 11  Embarked           889 non-null    object 
 12  Age_greater_25     891 non-null    bool   
dtypes: bool(1), float64(2), int64(5), object(5)
memory usage: 84.5+ KB


In [66]:
# gives the description about the dataset for numeric columns like mean count max..etc.,
titanic.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [67]:
# the indexing by default starts with zero
titanic.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True


In [68]:
# the row index by default starts with 0. if we want to change the indexing then
titanic.index = ["row" + str(i) for i in range(0, len(titanic))]
titanic.head(2)  # changed to row0, row1...row890

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
row1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,True


In [69]:
titanic.tail(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,True
row890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,True


In [71]:
# Access the value of 'Name_Of_Passenger' column for 'row0'
titanic['Name_Of_Passenger']['row0']

'Braund, Mr. Owen Harris'

In [72]:
# to fetch the columns
titanic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name_Of_Passenger', 'Sex', 'Age',
       'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked',
       'Age_greater_25'],
      dtype='object')

In [73]:
# to fetch rows
titanic.index

Index(['row0', 'row1', 'row2', 'row3', 'row4', 'row5', 'row6', 'row7', 'row8',
       'row9',
       ...
       'row881', 'row882', 'row883', 'row884', 'row885', 'row886', 'row887',
       'row888', 'row889', 'row890'],
      dtype='object', length=891)

In [74]:
# the shape of dataset
titanic.shape

(891, 13)

In [75]:
# if i want to add something infront of the name then we use apply method
# apply()
titanic['Name_Of_Passenger'] = titanic['Name_Of_Passenger'].apply(lambda name : 'Mr/Mrs ' + name)
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,35.0,1,0,113803,53.1,C123,S,True
row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [76]:
# Use the apply function to apply a lambda function to each value in the 'Fare' column
# The lambda function adds 100 to each value
titanic['Fare'].apply(lambda Fare:Fare+100).head()

row0    107.2500
row1    171.2833
row2    107.9250
row3    153.1000
row4    108.0500
Name: Fare, dtype: float64

## Count()
gives the count of every column(which is not null)

In [77]:
titanic.count()

PassengerId          891
Survived             891
Pclass               891
Name_Of_Passenger    891
Sex                  891
Age                  714
SibSp                891
Parch                891
Ticket               891
Fare                 891
Cabin                204
Embarked             889
Age_greater_25       891
dtype: int64

In [78]:
# taking first row
first_row = titanic.iloc[0, :]
first_row

PassengerId                                       1
Survived                                          0
Pclass                                            3
Name_Of_Passenger    Mr/Mrs Braund, Mr. Owen Harris
Sex                                            male
Age                                            22.0
SibSp                                             1
Parch                                             0
Ticket                                    A/5 21171
Fare                                           7.25
Cabin                                           NaN
Embarked                                          S
Age_greater_25                                False
Name: row0, dtype: object

In [79]:
# transpose (rows as columns and vice versa)
pd.DataFrame(first_row).T

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False


In [80]:
## appending this row to this titanic
titanic._append(first_row).tail()    # added at the end of dataset

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row887,888,1,1,"Mr/Mrs Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S,False
row888,889,0,3,"Mr/Mrs Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S,False
row889,890,1,1,"Mr/Mrs Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,True
row890,891,0,3,"Mr/Mrs Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,True
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False


In [81]:
# achieve the same appending using concat
pd.concat([titanic,pd.DataFrame(first_row).T])

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,35.0,1,0,113803,53.1,C123,S,True
row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
row887,888,1,1,"Mr/Mrs Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S,False
row888,889,0,3,"Mr/Mrs Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S,False
row889,890,1,1,"Mr/Mrs Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C,True
row890,891,0,3,"Mr/Mrs Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q,True


In [82]:
titanic.dtypes  # get columns with its types

PassengerId            int64
Survived               int64
Pclass                 int64
Name_Of_Passenger     object
Sex                   object
Age                  float64
SibSp                  int64
Parch                  int64
Ticket                object
Fare                 float64
Cabin                 object
Embarked              object
Age_greater_25          bool
dtype: object

In [83]:
titanic.columns # get columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name_Of_Passenger', 'Sex', 'Age',
       'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked',
       'Age_greater_25'],
      dtype='object')

In [84]:
# extract all the numerical/int column and apply .mean()
for i in titanic.columns:
    if titanic[i].dtype == 'int64' or titanic[i].dtype == 'float64':
        print(i, titanic[i].mean())


PassengerId 446.0
Survived 0.3838383838383838
Pclass 2.308641975308642
Age 29.69911764705882
SibSp 0.5230078563411896
Parch 0.38159371492704824
Fare 32.204207968574636


In [85]:
# achieve the same
titanic.select_dtypes(include=['int64','float64']).mean()

PassengerId    446.000000
Survived         0.383838
Pclass           2.308642
Age             29.699118
SibSp            0.523008
Parch            0.381594
Fare            32.204208
dtype: float64

In [86]:
# creating a dataframe 1D
data = pd.DataFrame([1,2,3,4], columns=['col1'])
data

Unnamed: 0,col1
0,1
1,2
2,3
3,4


In [87]:
# creating a dataframe 2D
data = pd.DataFrame([[1,2,3],[4,5,6],[1,2,3]], columns=['col1', 'col2', 'col3'])
data

Unnamed: 0,col1,col2,col3
0,1,2,3
1,4,5,6
2,1,2,3


In [88]:
# changing the rows names
data.index = ["row" + str(i) for i in range(0,len(data))]
data

Unnamed: 0,col1,col2,col3
row0,1,2,3
row1,4,5,6
row2,1,2,3


In [89]:
# axis -> 0  for column
# axis -> 1  for row
data.sum(axis=0)    # sum

col1     6
col2     9
col3    12
dtype: int64

In [90]:
# axis -> 0  for column
# axis -> 1  for row
data.sum(axis=1)    # sum

row0     6
row1    15
row2     6
dtype: int64

In [91]:
# mean
data.mean(axis=0)

col1    2.0
col2    3.0
col3    4.0
dtype: float64

In [92]:
data.mean(axis=1)

row0    2.0
row1    5.0
row2    2.0
dtype: float64

In [93]:
# delete duplicates
data.drop_duplicates()  # row3 is deleted

Unnamed: 0,col1,col2,col3
row0,1,2,3
row1,4,5,6


In [94]:
# sorting the dataset
# the index is in string format
titanic.sort_index()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
row10,11,1,3,"Mr/Mrs Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S,False
row100,101,0,3,"Mr/Mrs Petranec, Miss. Matilda",female,28.0,0,0,349245,7.8958,,S,True
row101,102,0,3,"Mr/Mrs Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
row95,96,0,3,"Mr/Mrs Shorney, Mr. Charles Joseph",male,,0,0,374910,8.0500,,S,False
row96,97,0,1,"Mr/Mrs Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C,True
row97,98,1,1,"Mr/Mrs Greenfield, Mr. William Bertram",male,23.0,0,1,PC 17759,63.3583,D10 D12,C,False
row98,99,1,2,"Mr/Mrs Doling, Mrs. John T (Ada Julia Bone)",female,34.0,0,1,231919,23.0000,,S,True


In [95]:
# just resets the custom names that we have given for the index
titanic.reset_index().head()

Unnamed: 0,index,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
0,row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
1,row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
2,row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
3,row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,35.0,1,0,113803,53.1,C123,S,True
4,row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [96]:
# drop
# axis -> 0 for row
# axis -> 1 for column
titanic.drop('Age', axis=1).head() # dropped the age column

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,1,0,A/5 21171,7.25,,S,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,1,0,PC 17599,71.2833,C85,C,True
row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,0,0,STON/O2. 3101282,7.925,,S,True
row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,1,0,113803,53.1,C123,S,True
row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,0,0,373450,8.05,,S,True


In [97]:
# drop 
# axis -> 0 for row
# axis -> 1 for column
titanic.drop('row3').head() # droped the row 'row3'

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True
row5,6,0,3,"Mr/Mrs Moran, Mr. James",male,,0,0,330877,8.4583,,Q,False


In [98]:
# Dropping columns that have any missing values and displaying the first 5 rows
titanic.dropna(axis='columns').head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,SibSp,Parch,Ticket,Fare,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,1,0,A/5 21171,7.25,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,1,0,PC 17599,71.2833,True
row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,0,0,STON/O2. 3101282,7.925,True
row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,1,0,113803,53.1,True
row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,0,0,373450,8.05,True


In [99]:
# The dropna() method is used to remove missing data (NaN values)
# By default, dropna() drops any row with at least one NaN value
titanic.dropna()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,35.0,1,0,113803,53.1000,C123,S,True
row6,7,0,1,"Mr/Mrs McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,True
row10,11,1,3,"Mr/Mrs Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S,False
row11,12,1,1,"Mr/Mrs Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
row871,872,1,1,"Mr/Mrs Beckwith, Mrs. Richard Leonard (Sallie ...",female,47.0,1,1,11751,52.5542,D35,S,True
row872,873,0,1,"Mr/Mrs Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S,True
row879,880,1,1,"Mr/Mrs Potter, Mrs. Thomas Jr (Lily Alexenia W...",female,56.0,0,1,11767,83.1583,C50,C,True
row887,888,1,1,"Mr/Mrs Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,False


In [100]:
titanic.count() # age and cobin are having null values

PassengerId          891
Survived             891
Pclass               891
Name_Of_Passenger    891
Sex                  891
Age                  714
SibSp                891
Parch                891
Ticket               891
Fare                 891
Cabin                204
Embarked             889
Age_greater_25       891
dtype: int64

In [101]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name_Of_Passenger,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_greater_25
row0,1,0,3,"Mr/Mrs Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,False
row1,2,1,1,"Mr/Mrs Cumings, Mrs. John Bradley (Florence Br...",female,38.0,1,0,PC 17599,71.2833,C85,C,True
row2,3,1,3,"Mr/Mrs Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,True
row3,4,1,1,"Mr/Mrs Futrelle, Mrs. Jacques Heath (Lily May ...",female,35.0,1,0,113803,53.1,C123,S,True
row4,5,0,3,"Mr/Mrs Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,True


In [102]:
# if want to fill the null values with something else
titanic.fillna(0).count() # filling all the null values with zeros

PassengerId          891
Survived             891
Pclass               891
Name_Of_Passenger    891
Sex                  891
Age                  891
SibSp                891
Parch                891
Ticket               891
Fare                 891
Cabin                891
Embarked             891
Age_greater_25       891
dtype: int64

## pandas series

In [1]:
import pandas as pd

In [3]:
pd.Series([1,2,3], index=('row1',"row2","row3"), name="numbers")

row1    1
row2    2
row3    3
Name: numbers, dtype: int64