<a href=" https://neuronize.dev/pandas-101-learn-pandas-in-10-minutes"> Learn Pandas in 10 minutes
</a>

Library for data analysis and manipulation

In [9]:
import pandas as pd
import numpy as np
import os

The data can be one-dimensional data or multi-dimensional data. First, I will give you an overview of one-dimensional data, then we’ll dive deep into multi-dimensional data.

Series
A pandas series is a one-dimensional array of data that can store any type of values, such as numbers, strings, or boolean. You can create a pandas series using Series() from a list, a dictionary, or a numpy array.

For example, you can create a series like this:

In [10]:
# creating an array using list
list1 = [1, 2, 3, 4, 5] 
list1

[1, 2, 3, 4, 5]

In [11]:
# creating an array using list
temps_list = [1, 2, 3, 4, 5] 

# creating a dictionary
temps_dict = {1: 10, 2: 20, 3: 30, 4: 40, 5: 50} 

# creating a numpy array
temps_numpy = np.array([100, 200, 300, 400, 500]) 

# creating a series from list
list_series = pd.Series(temps_list) 

# creating a series from dictionary
dict_series = pd.Series(temps_dict)

# creating a series from numpy array
numpy_series = pd.Series(temps_numpy)


In [12]:
# creating a dictionary
dict1 = {1: 10, 2: 20, 3: 30, 4: 40, 5: 50} 
dict1

{1: 10, 2: 20, 3: 30, 4: 40, 5: 50}

## pd.Series() method

In [13]:
dict2= {1:"apple",2:"bananna",3:"coward",4:"darling"}
dict2
dict2_series = pd.Series(dict2)
dict2_series

1      apple
2    bananna
3     coward
4    darling
dtype: object

## numpy array method

In [14]:
# creating a numpy array
numpy1 = np.array([100, 200, 300, 400, 500]) 
numpy1

array([100, 200, 300, 400, 500])

In [15]:
# creating a series from dictionary
dict_series = pd.Series(temps_dict)
dict_series

1    10
2    20
3    30
4    40
5    50
dtype: int64

In [16]:
# creating a series from numpy array
numpy_series = pd.Series(temps_numpy)
numpy_series

0    100
1    200
2    300
3    400
4    500
dtype: int32

You can clearly see that this is an one-dimensional data.

Now, let’s learn about multi-dimensional data.

DataFrame
A pandas dataframe is a data structure that allows you to store and manipulate tabular data in Python. It is similar to a spreadsheet or a database table, but with more features and flexibility.

You can create a dataframe from various sources, such as lists, dictionaries, files, or web pages. A dataframe has rows and columns, each with a label. You can access and modify the data in a dataframe using various methods and attributes.

For example, you can create a dataframe of names and ages of people like this:

In [17]:
# creating a dictionary having names and ages
names_ages_dict = {
                    "Name": ["Alice", "Bob", "John", "Doe"],
                    "Age": [18, 24, 35, 11]
                  }
# creating dataframe from that dictionary
dict_dataframe = pd.DataFrame(names_ages_dict)
dict_dataframe

Unnamed: 0,Name,Age
0,Alice,18
1,Bob,24
2,John,35
3,Doe,11


In [18]:
# read_csv(file_path). 
data = pd.read_csv("Titanic-Dataset.csv")
data

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1.0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0000,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1.0,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1.0,0.0,"Allison, Miss. Helen Loraine",female,2.0000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1.0,0.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0000,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1.0,0.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1305,3.0,0.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665,14.4542,,C,,,
1306,3.0,0.0,"Zakarian, Mr. Mapriededer",male,26.5000,0.0,0.0,2656,7.2250,,C,,304.0,
1307,3.0,0.0,"Zakarian, Mr. Ortin",male,27.0000,0.0,0.0,2670,7.2250,,C,,,
1308,3.0,0.0,"Zimmerman, Mr. Leo",male,29.0000,0.0,0.0,315082,7.8750,,S,,,


In [19]:
# this will output the top 5 rows as by default n = 5
data.head()


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1.0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2.0,,"St Louis, MO"
1,1.0,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.55,C22 C26,S,11.0,,"Montreal, PQ / Chesterville, ON"
2,1.0,0.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1.0,0.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1.0,0.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"


In [20]:
# this will output the top 10 rows as I've set n = 10
data.head(n=2)


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1.0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1.0,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.55,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"


In [21]:
# this will output the bottom 5 rows as by default n = 5
data.tail()


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
1305,3.0,0.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665.0,14.4542,,C,,,
1306,3.0,0.0,"Zakarian, Mr. Mapriededer",male,26.5,0.0,0.0,2656.0,7.225,,C,,304.0,
1307,3.0,0.0,"Zakarian, Mr. Ortin",male,27.0,0.0,0.0,2670.0,7.225,,C,,,
1308,3.0,0.0,"Zimmerman, Mr. Leo",male,29.0,0.0,0.0,315082.0,7.875,,S,,,
1309,,,,,,,,,,,,,,


## data.select_dtypes(include='number').columns

In [22]:
number_columns = data.select_dtypes(include='number').columns
data[number_columns].min()

pclass      1.0000
survived    0.0000
age         0.1667
sibsp       0.0000
parch       0.0000
fare        0.0000
body        1.0000
dtype: float64

In [23]:
number_columns

Index(['pclass', 'survived', 'age', 'sibsp', 'parch', 'fare', 'body'], dtype='object')

In [24]:
data[number_columns].max()

pclass        3.0000
survived      1.0000
age          80.0000
sibsp         8.0000
parch         9.0000
fare        512.3292
body        328.0000
dtype: float64

data.mean()

In [25]:
data[number_columns].mean()

pclass        2.294882
survived      0.381971
age          29.881135
sibsp         0.498854
parch         0.385027
fare         33.295479
body        160.809917
dtype: float64

In [26]:
data[number_columns].median()

pclass        3.0000
survived      0.0000
age          28.0000
sibsp         0.0000
parch         0.0000
fare         14.4542
body        155.0000
dtype: float64

To get the standard deviation
To get the standard deviation value for every column.

In [None]:
data[number_columns].std()


In [None]:
# This will return the Name of first row
data.loc[0, "name"]


In [None]:
data.head()

In [None]:
data.loc[:,["name", "age"]]
