# Introduction to Pandas

Pandas is a powerful open-source Python library for data manipulation and analysis. It is one of the most popular libraries used in data science and machine learning. Pandas provides two main data structures:

- Series: A one-dimensional array-like object.
- DataFrame: A two-dimensional, tabular data structure with labeled axes (rows and columns).

Pandas makes it easy to:

- Load and explore data.
- Clean and transform data.
- Perform statistical analysis.

## Installing and Importing Pandas

In [2]:
#!pip install pandas Install Pandas (if not already installed)
import pandas as pd

## Pandas Data Structures

### Pandas Series
A Series is like a column in an Excel spreadsheet or a one-dimensional array. It can hold data of any type.

- Designed for data manipulation and analysis.
- More useful when working with labeled data or data that requires an index (e.g., time series data).
- Ideal for working with data where you need to apply filtering, aggregation, or statistical operations.

In [7]:
# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data, dtype='int')

print("Pandas Series:")
print(series)


Pandas Series:
0    10
1    20
2    30
3    40
4    50
dtype: int32


The left-hand side represents the index (starting from 0).
The right-hand side represents the values in the Series.

In [4]:
s1 = pd.Series(['Muhammad', 'Nabeel', 'Ibrahim','Khan'])
s1

0    Muhammad
1      Nabeel
2     Ibrahim
3        Khan
dtype: object

In [8]:
# Accessing Elements from series 
s1[2]

'Ibrahim'

In [9]:
print(type(s1))

<class 'pandas.core.series.Series'>


In [10]:
s2 = pd.Series([12,24,36,12,6,12,24], index=['apples','oranges','bananas', 'samosas','rolls', 'chickens', 'coldrinks'])
# labels / string indices / costomize index

In [11]:
s2.index

Index(['apples', 'oranges', 'bananas', 'samosas', 'rolls', 'chickens',
       'coldrinks'],
      dtype='object')

In [12]:
s2.values

array([12, 24, 36, 12,  6, 12, 24], dtype=int64)

In [13]:
s2['apples']

12

In [17]:
s2[0]

  s2[0]


12

In [14]:
s2*2

apples       24
oranges      48
bananas      72
samosas      24
rolls        12
chickens     24
coldrinks    48
dtype: int64

### Pandas DataFrame

A DataFrame is a two-dimensional data structure, similar to an Excel spreadsheet or SQL table. It consists of rows and columns, where each column can be a different data type.

In [42]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)

print("Pandas DataFrame:")
df


Pandas DataFrame:


Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


Each key in the dictionary represents a column name.
Each value is a list that forms the data in that column.

In [43]:
# Creating a data frame
result = {'name':          ['Alpha','Bravo', 'Charlie','Delta','Echo','Foxtrot','Golf','Hotel','India','Juliet'],
          'roll':          [11,22,33,44,55,66,77,88,99,111],
          'python':        [78,67,89,90,91,72,76,89,67,85],
          'excel':         [89,87,67,74,78,90,76,78,90,54],
           'power_bi':     [45,67,89,87,67,90,65,67,56,90],
          'pandas':        [81,85,89,87,41,45,96,56,93,78],
        'machine_learning':[96,95,84,85,82,81,74,75,96,90],
         'statistics':     [67,48,45,98,75,71,70,60,90,93]}

df1 = pd.DataFrame(result)
df1


Unnamed: 0,name,roll,python,excel,power_bi,pandas,machine_learning,statistics
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [44]:
type(df)

pandas.core.frame.DataFrame

Basic DataFrame Operations

In [45]:
# Basic operations
print("Shape of DataFrame:", df.shape)
print("\nSummary statistics:")
print(df.describe())


Shape of DataFrame: (3, 3)

Summary statistics:
        Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


In [48]:
# Basic operations
# df.head(): Displays the first few rows of the DataFrame.
# df.tail(): Displays the last few rows.
# df.shape: Shows the number of rows and columns.
# df.columns: Displays the column names.
# df.describe(): Generates summary statistics for numerical columns.

print("Shape of DataFrame:", df1.shape)
print("\nSummary statistics:")
print(df1.describe())


Shape of DataFrame: (10, 8)

Summary statistics:
             roll    python      excel   power_bi     pandas  \
count   10.000000  10.00000  10.000000  10.000000  10.000000   
mean    60.600000  80.40000  78.300000  72.300000  75.100000   
std     33.470385   9.59398  11.576317  15.881855  20.184978   
min     11.000000  67.00000  54.000000  45.000000  41.000000   
25%     35.750000  73.00000  74.500000  65.500000  61.500000   
50%     60.500000  81.50000  78.000000  67.000000  83.000000   
75%     85.250000  89.00000  88.500000  88.500000  88.500000   
max    111.000000  91.00000  90.000000  90.000000  96.000000   

       machine_learning  statistics  
count         10.000000   10.000000  
mean          85.800000   71.700000  
std            8.216515   18.037307  
min           74.000000   45.000000  
25%           81.250000   61.750000  
50%           84.500000   70.500000  
75%           93.750000   86.250000  
max           96.000000   98.000000  


In [49]:
df1.sample()

Unnamed: 0,name,roll,python,excel,power_bi,pandas,machine_learning,statistics
3,Delta,44,90,74,87,87,85,98


In [50]:
df1.columns

Index(['name', 'roll', 'python', 'excel', 'power_bi', 'pandas',
       'machine_learning', 'statistics'],
      dtype='object')

In [51]:
df1.columns = ['Name','Roll','Python','Excel','Power_bi','Pandas','Machine_learning','Statistics']
df1.columns

Index(['Name', 'Roll', 'Python', 'Excel', 'Power_bi', 'Pandas',
       'Machine_learning', 'Statistics'],
      dtype='object')

In [52]:
df1

Unnamed: 0,Name,Roll,Python,Excel,Power_bi,Pandas,Machine_learning,Statistics
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [53]:
df1.rename(columns={'Roll':'Id', 'Statistics':"Stats"})


Unnamed: 0,Name,Id,Python,Excel,Power_bi,Pandas,Machine_learning,Stats
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [54]:
df1

Unnamed: 0,Name,Roll,Python,Excel,Power_bi,Pandas,Machine_learning,Statistics
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [55]:
df1.rename(columns={'Roll':'Id', 'Statistics':"Stats"},inplace=True)

In [56]:
df1

Unnamed: 0,Name,Id,Python,Excel,Power_bi,Pandas,Machine_learning,Stats
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


Selecting Data in a DataFrame

In [57]:
# Selecting a specific column
ages = df['Age']
print("Ages of individuals:")
print(ages)


Ages of individuals:
0    25
1    30
2    35
Name: Age, dtype: int64


In [58]:
# Selecting rows where Age is greater than 30
older_people = df[df['Age'] > 30]
print("Individuals with Age greater than 30:")
print(older_people)


Individuals with Age greater than 30:
      Name  Age         City
2  Charlie   35  Los Angeles


Adding, Modifying, and Removing Data

In [63]:
# Adding a new column
df['Salary'] = [50000, 60000, 70000]
print("DataFrame with Salary column added:")
print(df)


DataFrame with Salary column added:
      Name  Age           City  Salary
0    Alice   26       New York   50000
1      Bob   31  San Francisco   60000
2  Charlie   36    Los Angeles   70000


In [64]:
# Modifying the values of the 'Age' column
df['Age'] = df['Age'] + 1
print("Updated Ages:")
print(df)


Updated Ages:
      Name  Age           City  Salary
0    Alice   27       New York   50000
1      Bob   32  San Francisco   60000
2  Charlie   37    Los Angeles   70000


In [65]:
# # Removing a column

# The axis parameter specifies whether you want to remove rows or columns.
    # axis=0 refers to rows.
    # axis=1 refers to columns.
# Without specifying axis=1, Pandas would assume you're trying to drop a row (since axis=0 is the default for many operations
df = df.drop('Salary', axis=1)
print("DataFrame after removing Salary column:")
print(df)


DataFrame after removing Salary column:
      Name  Age           City
0    Alice   27       New York
1      Bob   32  San Francisco
2  Charlie   37    Los Angeles


In [66]:
# Dropping a row (row index 1)
df = df.drop(1, axis=0)

print(df)


      Name  Age         City
0    Alice   27     New York
2  Charlie   37  Los Angeles
