# Pandas 

Pandas is a Python library used for data manipulation and analysis. Pandas provides a convenient way to analyze and clean data.

### What is Pandas Used for?

Pandas is a powerful library generally used for:

Data Cleaning

Data Transformation

Data Analysis

Machine Learning

Data Visualization


### Install Pandas

by entering the following command in the terminal:

pip install pandas

### Import Pandas in Python 

import pandas as pd

### Pandas Series  

### Create a Pandas Series


In [2]:
import pandas as pd

# create a list
data = [10, 20, 30, 40, 50]
print(data)
# create a series from the list
my_series = pd.Series(data)

print(my_series)

[10, 20, 30, 40, 50]
0    10
1    20
2    30
3    40
4    50
dtype: int64


### Labels
 

In [4]:
import pandas as pd

# create a list
data = [10, 20, 30, 40, 50]

# create a series from the list
my_series = pd.Series(data)
print(my_series )
# display third value in the series
print(my_series[2])

0    10
1    20
2    30
3    40
4    50
dtype: int64
30


In [8]:
import pandas as pd 

# create a list
a = [1, 3, 5]

# create a series and specify labels
my_series = pd.Series(a, index = ["x", "y", "z"])

print(my_series)
print(my_series['x'])

x    1
y    3
z    5
dtype: int64
1


In [5]:
import pandas as pd 

# create a list
a = [1, 3, 5]

# create a series and specify labels
my_series = pd.Series(a, index = ["x", "y", "z"])

# display the value with label y
print(my_series["y"])

3


### Create Series From a Python Dictionary

In [10]:
import pandas as pd

# create a dictionary
grades = {"Semester1": 3.25, "Semester2": 3.28, "Semester3": 3.75}

# create a series from the dictionary
my_series = pd.Series(grades)

# display the series
print(my_series)
print(my_series["Semester1"])

Semester1    3.25
Semester2    3.28
Semester3    3.75
dtype: float64
3.25


In [11]:
import pandas as pd

# create a dictionary
grades = {"Semester1": 3.25, "Semester2": 3.28, "Semester3": 3.75}

# select specific dictionary items using index argument
my_series = pd.Series(grades, index = ["Semester1", "Semester2"])

# display the series
print(my_series)

Semester1    3.25
Semester2    3.28
dtype: float64


### Pandas DataFrame


### Pandas DataFrame Using Python Dictionary

In [12]:
import pandas as pd

# create a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
       'Age': [25, 30, 35],
       'City': ['New York', 'London', 'Paris']}

# create a dataframe from the dictionary
df = pd.DataFrame(data)

print(df)

    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   35     Paris


### Pandas DataFrame Using Python List


In [13]:
import pandas as pd

# create a two-dimensional list
data = [['John', 25, 'New York'],
       ['Alice', 30, 'London'],
       ['Bob', 35, 'Paris']]

# create a DataFrame from the list
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

print(df)

    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   35     Paris


### Pandas DataFrame From a File


In [15]:
import pandas as pd

# load data from a CSV file
df = pd.read_csv('data.csv')

print(df)

       Name   Age         City
0     John       25   New York
1     Alice      30     London
2     Bob        35      Paris


### Create an Empty DataFrame


Empty DataFrame
Columns: []
Index: []


### Pandas Index


### Create Indexes in Pandas 

Default Index

Setting Index

Creating a Range Index

### Default Index
 

In [20]:
import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)
print(df)

    Name  Age      City
0   John   25  New York
1  Alice   28    London
2    Bob   32     Paris


### Setting Index


In [21]:
import pandas as pd

# create dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

# set the 'Name' column as index
df.set_index('Name', inplace=True)

print(df)

       Age      City
Name                
John    25  New York
Alice   28    London
Bob     32     Paris


### Creating a Range Index


In [20]:
import pandas as pd

# create dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

# create a range index
df = pd.DataFrame(data, index=pd.RangeIndex(5, 8, name='Index'))

print(df)

        Name  Age      City
Index                      
5       John   25  New York
6      Alice   28    London
7        Bob   32     Paris


### Modifying Indexes in Pandas
 

### Renaming Index

We can rename an index using the rename() method.

In [21]:
import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)

# display original dataframe
print('Original DataFrame:')
print(df)
print()

# rename index
df.rename(index={0: 'A', 1: 'B', 2: 'C'}, inplace=True)

# display dataframe after index is renamed
print('Modified DataFrame')
print(df)

Original DataFrame:
    Name  Age      City
0   John   25  New York
1  Alice   28    London
2    Bob   32     Paris

Modified DataFrame
    Name  Age      City
A   John   25  New York
B  Alice   28    London
C    Bob   32     Paris


### Resetting Index


In [22]:
import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

# create a dataframe
df = pd.DataFrame(data)

# rename index
df.rename(index={0: 'A', 1: 'B', 2: 'C'}, inplace=True)

# display dataframe
print('Original DataFrame:')
print(df)
print('\n')

# reset index
df.reset_index(inplace=True)

# display dataframe after index is reset
print('Modified DataFrame:')
print(df)

Original DataFrame:
    Name  Age      City
A   John   25  New York
B  Alice   28    London
C    Bob   32     Paris


Modified DataFrame:
  index   Name  Age      City
0     A   John   25  New York
1     B  Alice   28    London
2     C    Bob   32     Paris


### Access Rows by Index

In [24]:
import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

second_row = df.iloc[2]

print(second_row)

Name      Bob
Age        32
City    Paris
Name: 2, dtype: object


### Get DataFrame Index

In [27]:
import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

# return index object
print(df.index)

# return index values
print(df.index.values)

RangeIndex(start=0, stop=3, step=1)
[0 1 2]


### Pandas Array 

### Create Array Using Python List

In [28]:
import pandas as pd

# create a list named data
data = [2, 4, 6, 8]

# create Pandas array using data
array1 = pd.array(data)

print(array1)

<IntegerArray>
[2, 4, 6, 8]
Length: 4, dtype: Int64


In [28]:
import pandas as pd

# create Pandas array by passing list directly
array1 = pd.array([2, 4, 6, 8])

print(array1)

<IntegerArray>
[2, 4, 6, 8]
Length: 4, dtype: Int64


### Explicitly Specify Array Elements Data Type 

In [29]:
import pandas as pd

# creating a pandas.array of integers
int_array = pd.array([1, 2, 3, 4, 5], dtype='int')
print(int_array)
print()

# creating a pandas.array of floating-point numbers
float_array = pd.array([1.1, 2.2, 3.3, 4.4, 5.5], dtype='float')
print(float_array)
print()

# creating a pandas.array of strings
string_array = pd.array(['apple', 'banana', 'cherry', 'date'], dtype='str')
print(string_array)
print()

# creating a pandas.array of boolean values
bool_array = pd.array([True, False, True, False], dtype='bool')
print(bool_array)
print()

<NumpyExtensionArray>
[1, 2, 3, 4, 5]
Length: 5, dtype: int32

<NumpyExtensionArray>
[1.1, 2.2, 3.3, 4.4, 5.5]
Length: 5, dtype: float64

<NumpyExtensionArray>
['apple', 'banana', 'cherry', 'date']
Length: 4, dtype: str192

<NumpyExtensionArray>
[True, False, True, False]
Length: 4, dtype: bool



### Create Series From Pandas Array
 

In [31]:
import pandas as pd


arr= pd.array([12,13,14,15,16,17,18])

arrr_ser = pd.Series(arr)
print(arrr_ser)

0    12
1    13
2    14
3    15
4    16
5    17
6    18
dtype: Int64
