## **PANDAS**

Pandas is a powerful data manipulation library in python, widely used for data analysis and data cleaning. It provides two primary data structires : Series and DataFrame. 

A Series is a one-dimentional array-like Object, while a DataFrame is a two-dimentional, size-mutable, and potentailly heterogeneous tabular data structure with labelled axes (rows and columns)

In [1]:
import pandas as pd

### **Series**

In [2]:
## Series ---> A pandas series is a one-dimentional array like-object that can hold any data type.
               # It is similar to a column in a table.

data = [1,2,3,4,5]
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [4]:
## Creating a Series from a Dictionary
data = {'A': 1, 'B' : 2, 'C' : 3}
series = pd.Series(data)
series
## The keys of the dictionary becomes the index

A    1
B    2
C    3
dtype: int64

In [11]:
data = [10,20,30]
indexes = ['A', 'B', 'C']

series = pd.Series(data, index= indexes)
print(type(series))
series

<class 'pandas.core.series.Series'>


A    10
B    20
C    30
dtype: int64

### **DataFrame**

In [15]:
# Creating a DataFrame from a dictionary of list.

data = {
    'Name': ['Krish','John','Jack'],
    'Age' : [25,30,45],
    'City': ['Bangalore', 'New York', 'Texas']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Texas


In [10]:
## Creating a DataFrame from a list of Dictonaries
data = [
    {'Name' : 'Krish', 'Age' : 32, 'City' : 'Bangalore'}, 
    {'Name': 'John', 'Age' : 29, 'City' : 'Florida'}, 
    {'Name': 'Jack', 'Age' : 27, 'City' : 'New York'}, 
    {'Name': 'Bappy', 'Age' : 28, 'City' : 'Swedan'} 
]
df = pd.DataFrame(data)
print(type(df))
df

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Name,Age,City
0,Krish,32,Bangalore
1,John,29,Florida
2,Jack,27,New York
3,Bappy,28,Swedan


In [12]:
## Accessing a specified element
df.at[2, 'Age']

np.int64(27)

In [13]:
df.at[2, "Name"]

'Jack'

In [None]:
## Accessing a specified element using iat(row_index, column_index)
df.iat[2,2]

'New York'

### **Data Manipulation**

In [17]:
## Data Manipulation with DataFrames
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Texas


In [22]:
## Adding a New Column
df['Salary'] = [50000, 60000, 70000]
df

Unnamed: 0,Name,Age,City,Salary
0,Krish,25,Bangalore,50000
1,John,30,New York,60000
2,Jack,45,Texas,70000


In [23]:
## Removing a Column
df.drop('Salary', axis=1, inplace= True)   ## By default the axis value is zero.
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Texas


In [24]:
## Add age to a column
df['Age'] = df['Age'] + 1
df

Unnamed: 0,Name,Age,City
0,Krish,26,Bangalore
1,John,31,New York
2,Jack,46,Texas


In [25]:
## Removing a row
df.drop(0, inplace=True)
df

Unnamed: 0,Name,Age,City
1,John,31,New York
2,Jack,46,Texas


In [28]:
## Importing a CSV data
df = pd.read_csv(r"C:\Users\singh\OneDrive\Desktop\Python\Data\titanic - titanic.csv")
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
414,1306,1,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


In [None]:
df.head()   ## Shows the first five  rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [30]:
df.tail()    ## shows the last five rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.05,,S
414,1306,1,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9,C105,C
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.25,,S
416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.05,,S
417,1309,0,3,"Peter, Master. Michael J",male,,1,1,2668,22.3583,,C


In [None]:
## Display the data type of each column
print("Data Types:\n", df.dtypes)

##Describe the dataframe
print("Statistical Summary:\n", df.describe())

## Gropu by a column and perform an aggregation
# grouped = df.groupby('Age')

In [32]:
df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,418.0,418.0,418.0,332.0,418.0,418.0,417.0
mean,1100.5,0.363636,2.26555,30.27259,0.447368,0.392344,35.627188
std,120.810458,0.481622,0.841838,14.181209,0.89676,0.981429,55.907576
min,892.0,0.0,1.0,0.17,0.0,0.0,0.0
25%,996.25,0.0,1.0,21.0,0.0,0.0,7.8958
50%,1100.5,0.0,3.0,27.0,0.0,0.0,14.4542
75%,1204.75,1.0,3.0,39.0,1.0,0.0,31.5
max,1309.0,1.0,3.0,76.0,8.0,9.0,512.3292
