# Pandas Series

A Pandas Series is a one-dimensional labeled array-like object that can hold data of any type. <br>

A Pandas Series can be thought of as a column in a spreadsheet or a single column of a DataFrame. It consists of two main components: the labels and the data.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Create a Pandas Series from list
data = [10, 20, 30, 40, 50]

In [3]:
my_series = pd.Series(data)
my_series

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [4]:
# The labels in the Pandas Series are index numbers by default. Like in dataframe and array, the index number in series starts from 0
my_series[1]

20

In [5]:
a = [1, 3, 5]

In [6]:
# Create a series and specify labels
my_series_1 = pd.Series(a, index = ["x", "y", "z"])
my_series_1

x    1
y    3
z    5
dtype: int64

In [8]:
my_series_1['z']

5

In [9]:
# Create Series From a Python Dictionary
grades = {
    'Semester 1': 3.25,
    'Semester 2': 3.28,
    'Semester 3': 3.75,
}

In [10]:
my_series = pd.Series(grades)
my_series

Semester 1    3.25
Semester 2    3.28
Semester 3    3.75
dtype: float64

In [11]:
#  keys of the dictionary have become the labels
my_series['Semester 2']

3.28

In [12]:
# customize the series by selecting specific items from the dictionary using the index argument.
my_series_1 = pd.Series(grades, index = ["Semester 1", "Semester 2"])
my_series_1

Semester 1    3.25
Semester 2    3.28
dtype: float64

# Pandas DataFrame

A DataFrame is like a table where the data is organized in rows and columns. It is a two-dimensional data structure like a two-dimensional array.

Create a Pandas DataFrame <br>

We can create a Pandas DataFrame in the following ways: <br>

* Using Python Dictionary <br>
* Using Python List <br>
* From a File <br>
* Creating an Empty DataFrame

In [13]:
# Using Python Dictionary
data = {
    'Name':['John', 'Alice', 'Bob'],
    'Age':[25, 30, 35],
    'City':['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,30,London
2,Bob,35,Paris


In [14]:
# Using Python List
data = [
    ['John', 25, 'New York'],
    ['Alice', 30, 'London'],
    ['Bob', 35, 'Paris']
]
df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2
0,John,25,New York
1,Alice,30,London
2,Bob,35,Paris


In [None]:
# from a file
df = pd.read_csv('data.csv')
df

In [15]:
# read_json()
# read_excel()
# read_dql()

In [17]:
# Create an Empty Data Frame
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


# Pandas Index

In Pandas, an index refers to the labeled array that identifies rows or columns in a DataFrame or a Series.

Create Indexes in Pandas <br>

Pandas offers several ways to create indexes. Some common methods are as follows: <br>

* Default Index <br>
* Setting Index <br>
* Creating a Range Index <br>

In [18]:
# Default index
data = {
    'Name':['John', 'Alice', 'Bob'],
    'Age':[25, 28, 32],
    'City':['New York', 'London', 'Paris'],
}

In [20]:
df = pd.DataFrame(data)
print(df) # Here we have default index - 0,1,2

    Name  Age      City
0   John   25  New York
1  Alice   28    London
2    Bob   32     Paris


In [28]:
# Setting Index
data = {
    'Name':['John', 'Alice', 'Bob'],
    'Age':[25, 28, 32],
    'City':['New York', 'London', 'Paris'],
}

In [29]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,28,London
2,Bob,32,Paris


In [30]:
# Set the Name column as Index
df.set_index('Name', inplace=True)
df

Unnamed: 0_level_0,Age,City
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
John,25,New York
Alice,28,London
Bob,32,Paris


In [33]:
df['Age']

Name
John     25
Alice    28
Bob      32
Name: Age, dtype: int64

In [34]:
df['John':'Alice']

Unnamed: 0_level_0,Age,City
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
John,25,New York
Alice,28,London


In [36]:
# Creating a Range Index
data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age':[25, 28, 32],
    'City':['New York', 'London', 'Paris'],
}

In [37]:
# create a range index
df = pd.DataFrame(data, index = pd.RangeIndex(5, 8, name = 'Index'))
print(df)

        Name  Age      City
Index                      
5       John   25  New York
6      Alice   28    London
7        Bob   32     Paris


Modifying Indexes in Pandas <br>

Pandas allows us to make changes to indexes easily. Some common modification operations are: <br>

* Renaming Index
* Resetting Index

In [38]:
# Renaming Index
data = {
    'Name':['John', 'Alice', 'Bob'],
    'Age':[25, 28, 32],
    'City':['New York', 'London', 'Paris'],
}

In [39]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,28,London
2,Bob,32,Paris


In [41]:
# rename index
df.rename(index = {0:'A', 1:'B', 2:'C'}, inplace = True)
df

Unnamed: 0,Name,Age,City
A,John,25,New York
B,Alice,28,London
C,Bob,32,Paris


In [42]:
# reset index
df.reset_index(inplace=True)
df

Unnamed: 0,index,Name,Age,City
0,A,John,25,New York
1,B,Alice,28,London
2,C,Bob,32,Paris


Access Rows by Index <br>

We can access rows of a DataFrame using the .iloc property.

In [43]:
data = {
    'Name':['John', 'Alice', 'Bob'],
    'Age':[25, 28, 32],
    'City':['New York', 'London', 'Paris'],
}

In [44]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,28,London
2,Bob,32,Paris


In [47]:
df.iloc[1, 1]

28

Get DataFrame Index <br>

We can access the DataFrame Index using the index attribute. For example,

In [48]:
data = {
    'Name':['John', 'Alice', 'Bob'],
    'Age':[25, 28, 32],
    'City':['New York', 'London', 'Paris'],
}

In [49]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,28,London
2,Bob,32,Paris


In [50]:
# return index object
print(df.index)

RangeIndex(start=0, stop=3, step=1)


In [51]:
# return index values
print(df.index.values)

[0 1 2]


# Pandas Array

An array allows us to store a collection of multiple values in a single data structure. <br>

Pandas array is designed to provide a more memory-efficient and performance-enhanced alternative to Python's built-in lists, NumPy arrays, and other data structures for handling the same type of data.

In [52]:
data = [1, 3, 5, 7]

In [53]:
array = pd.array(data)
array

<IntegerArray>
[1, 3, 5, 7]
Length: 4, dtype: Int64

In [54]:
array1 = pd.array([1, 2, 3, 4, 5, 6])
array1

<IntegerArray>
[1, 2, 3, 4, 5, 6]
Length: 6, dtype: Int64

In [55]:
# Explicitly Specify Array Elements Data Type
int_array = pd.array([1,2,3,4,5], dtype='int')
int_array

<PandasArray>
[1, 2, 3, 4, 5]
Length: 5, dtype: int64

In [56]:
float_array = pd.array([1.1, 2,1, 3.1, 4.1, 5.1], dtype = 'float')
float_array

<PandasArray>
[1.1, 2.0, 1.0, 3.1, 4.1, 5.1]
Length: 6, dtype: float64

In [57]:
string_array = pd.array(['rajesh', 'kumar', 'tom', 'jerry'], dtype = 'string')
string_array

<StringArray>
['rajesh', 'kumar', 'tom', 'jerry']
Length: 4, dtype: string

In [58]:
bool_array = pd.array([True, True, False, True, False], dtype = 'bool')
bool_array

<PandasArray>
[True, True, False, True, False]
Length: 5, dtype: bool

In [59]:
# Create Series from pandas array
array = pd.array([1,2,3,4,5,6])

df = pd.Series(array)
df

0    1
1    2
2    3
3    4
4    5
5    6
dtype: Int64