***What is Pandas?
Pandas is a Python library used for
working with data sets.
It has functions for analyzing, cleaning,
exploring, and manipulating data.

Pandas Series

One-dimensional: A Series is like a single column in a spreadsheet or a single list with an associated index.   

Single Data Type: Typically holds data of a single type (e.g., all integers, all strings).
   
Index: Has an index that labels each value. This index can be numeric (default) or custom labels.   
Pandas DataFrame

Two-dimensional: A DataFrame is a table-like structure with rows and columns.  

Multiple Data Types: Can hold multiple columns with different data types. 
  
Row and Column Indexes: Has both a row index and a column index.   

Collection of Series: You can think of a DataFrame as a collection of Series, where each column is a Series.   

Here's an analogy:

Series: Imagine a single column in an Excel sheet (e.g., a list of names).

DataFrame: Imagine the entire Excel sheet with multiple columns (e.g., names, ages, cities).

In essence:

A Series is a building block for a DataFrame.   
A DataFrame provides a more structured and versatile way to work with tabular data.

In [8]:
import pandas as pd
import numpy as np
print(pd.__version__)

2.2.3


In [10]:
series_index=pd.Series(np.arange(1,6),index=list('UVWXY'))
series_index

U    1
V    2
W    3
X    4
Y    5
dtype: int64

In [2]:
somnath=[1,7,9]
somnathPandas=pd.Series(somnath)
print(somnathPandas)
print(somnathPandas[1])

0    1
1    7
2    9
dtype: int64
7


In [3]:


# Create a Pandas Series from Dictionary
data = {'a': 10, 'b': 20, 'c': 30}
series = pd.Series(data) 
print("Series:\n", series) 

# Create a Pandas DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 
        'Age': [25, 30, 22], 
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)

Series:
 a    10
b    20
c    30
dtype: int64

DataFrame:
       Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris


<!-- Python Data Analysis or Panda -->


Explanation:

Series:

pd.Series(data) creates a Series object from the dictionary data.
The keys of the dictionary become the index of the Series.
The values of the dictionary become the values of the Series.
DataFrame:

pd.DataFrame(data) creates a DataFrame from the dictionary data.
Each key in the dictionary becomes a column in the DataFrame.
The values associated with each key become the values in the respective columns.

In [6]:
import pandas as pd

# 1. Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 
        'Age': [25, 30, 22], 
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df) 

# 2. Selecting Columns
print("\nSelecting 'Age' column:", df['Age']) 
print("\nSelecting multiple columns:", df[['Name', 'City']]) 

# 3. Selecting Rows
print("\nSelecting first two rows:", df.head(2)) 
print("\nSelecting last row:", df.tail(1)) 

# 4. Filtering Data
print("\nSelecting rows where Age > 25:", df[df['Age'] > 25])

# 5. Basic Operations
print("\n🚀Mean age:", df['Age'].mean()) 

print("\nSort by Age:", df.sort_values(by='Age')) 

# 6. Reading Data from a File
# df = pd.read_csv('your_data.csv') # Replace 'your_data.csv' with the actual file path

# 7. Writing Data to a File
df.to_csv('output.csv', index=False)
# Key Points:

# By default, to_csv() includes the DataFrame's index as a column in the output CSV. Using index=False removes this extra column.
# You can modify the file name and path as needed.
# The to_csv() method has other optional parameters for customizing the output, such as:
# sep: Specify the delimiter (e.g., ';', '|') instead of the default comma.
# header: Control whether to include column headers.
# encoding: Specify the encoding of the output file (e.g., 'utf-8').

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris

Selecting 'Age' column: 0    25
1    30
2    22
Name: Age, dtype: int64

Selecting multiple columns:       Name      City
0    Alice  New York
1      Bob    London
2  Charlie     Paris

Selecting first two rows:     Name  Age      City
0  Alice   25  New York
1    Bob   30    London

Selecting last row:       Name  Age   City
2  Charlie   22  Paris

Selecting rows where Age > 25:   Name  Age    City
1  Bob   30  London

🚀Mean age: 25.666666666666668

Sort by Age:       Name  Age      City
2  Charlie   22     Paris
0    Alice   25  New York
1      Bob   30    London


In [17]:
import pandas as pd

# Sample Series
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
series = pd.Series(data)

# Slicing Series by index
print("First three elements:", series[:3])  # Output: a    10
                                         #          b    20
                                         #          c    30
                                         #          dtype: int64

print("Elements from index 'b' to 'd':", series['b':'d'])  # Output: b    20
                                                       #          c    30
                                                       #          d    40
                                                       #          dtype: int64

print( 'Sample DataFrame')
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 
        'Age': [25, 30, 22, 40], 
        'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)
print(df)

# Slicing DataFrame by row
print("First two rows:", df[:2]) 

# Slicing DataFrame by column
print("First two columns:", df[['Name', 'Age']]) 

# Slicing DataFrame by row and column
print("First two rows and first two columns:", df[:2][['Name', 'Age']]) 

# Slicing DataFrame with step size
print("Every other row:", df[::2])

First three elements: a    10
b    20
c    30
dtype: int64
Elements from index 'b' to 'd': b    20
c    30
d    40
dtype: int64
Sample DataFrame
      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   22     Paris
3    David   40     Tokyo
First two rows:     Name  Age      City
0  Alice   25  New York
1    Bob   30    London
First two columns:       Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22
3    David   40
First two rows and first two columns:     Name  Age
0  Alice   25
1    Bob   30
Every other row:       Name  Age      City
0    Alice   25  New York
2  Charlie   22     Paris


# DataFrame:

df[:2] selects the first two rows of the DataFrame.

df[['Name', 'Age']] selects the 'Name' and 'Age' columns.

df[:2][['Name', 'Age']] selects the first two rows and then the 'Name' and 'Age' columns from the selected rows.

df[::2] selects every other row of the DataFrame.

# Pandas Series Slicing: In Pandas Series, slicing with [:3] includes the element at index 3.NumPy Array Slicing: In NumPy arrays, slicing with [:3] includes elements up to index 2, but excludes the element at index 3.

In [18]:
import pandas as pd
import numpy as np

# Pandas Series
series = pd.Series([1, 2, 3, 4, 5])
print("Pandas Series:", series[:3])  # Output: 0    1
                                   #          1    2
                                   #          2    3
                                   #          dtype: int64 

# NumPy Array
array = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", array[:3])  # Output: [1 2 3]

Pandas Series: 0    1
1    2
2    3
dtype: int64
NumPy Array: [1 2 3]


Key Takeaway:

When working with Pandas Series, remember that slicing with [:n] includes the element at index n-1. This behavior can be slightly different from NumPy array slicing.

General Syntax for iloc and loc in Pandas

iloc (Integer-location based indexing)

df.iloc[row_selector, column_selector]

row_selector:

Integer or list of integers representing the row indices (0-based).

Slicing syntax (e.g., [start:stop:step]) can be used.

Can use single integer for a single row, a list of integers for multiple rows, or a slice for a range of rows.
column_selector:


Integer or list of integers representing the column indices (0-based).

Slicing syntax can be used.
Can use single integer for a single column, a list of integers for multiple columns, or a slice for a range of columns.
loc (Label-based indexing)


df.loc[row_selector, column_selector]

row_selector:

Label or list of labels representing the row indices.

Slicing syntax can be used with labels.
Can use a single label, a list of labels, a slice of labels, or a boolean array.
column_selector:


Label or list of labels representing the column names.
Slicing syntax can be used with labels.
Can use a single label, a list of labels, a slice of labels.

In [27]:
import pandas as pd

# Sample DataFrame with alphanumeric index
data = {'A': [1, 2, 3, 4, 5], 
        'B': [10, 20, 30, 40, 50], 
        'C': ['a', 'b', 'c', 'd', 'e']}
index_labels = ['A1', 'B2', 'C3', 'D4', 'E5'] 
df = pd.DataFrame(data, index=index_labels)

print(df) 

# iloc: Integer-location based indexing
print("\n--- iloc slicing ---")
print("First two rows:", df.iloc[:2]) 
print("Rows 1 to 3 (inclusive):\n", df.iloc[1:4]) 
print("Every other row:", df.iloc[::2]) 

# loc: Label-based indexing
print("\n--- loc slicing ---")
print("Rows from 'A1' to 'C3' (inclusive):\n", df.loc['A1':'C3']) 
print("Rows 'A1', 'C3', and 'E5':", df.loc[['A1', 'C3', 'E5']]) 

# Combining iloc and loc
print("\n--- Combining iloc and loc ---")
print("First two rows, then select 'B' and 'C' columns:", df.iloc[:2][['B', 'C']])

    A   B  C
A1  1  10  a
B2  2  20  b
C3  3  30  c
D4  4  40  d
E5  5  50  e

--- iloc slicing ---
First two rows:     A   B  C
A1  1  10  a
B2  2  20  b
Rows 1 to 3 (inclusive):
     A   B  C
B2  2  20  b
C3  3  30  c
D4  4  40  d
Every other row:     A   B  C
A1  1  10  a
C3  3  30  c
E5  5  50  e

--- loc slicing ---
Rows from 'A1' to 'C3' (inclusive):
     A   B  C
A1  1  10  a
B2  2  20  b
C3  3  30  c
Rows 'A1', 'C3', and 'E5':     A   B  C
A1  1  10  a
C3  3  30  c
E5  5  50  e

--- Combining iloc and loc ---
First two rows, then select 'B' and 'C' columns:      B  C
A1  10  a
B2  20  b


# Explanation:

# iloc:

Integer-Location based indexing: Uses integer positions to select rows and columns.
df.iloc[:2]: Selects the first two rows (rows at indices 0 and 1).
df.iloc[1:4]: Selects rows from index 1 to 3 (inclusive).
df.iloc[::2]: Selects every other row.

# loc:

Label-based indexing: Uses the index labels to select rows and columns.
df.loc['A1':'C3']: Selects rows with labels from 'A1' to 'C3' (inclusive).
df.loc[['A1', 'C3', 'E5']]: Selects rows with specific labels.
Combining iloc and loc:

You can chain iloc and loc to perform more complex selections.
df.iloc[:2].loc[:, ['B', 'C']]:
First, df.iloc[:2] selects the first two rows.
Then, loc[:, ['B', 'C']] selects the 'B' and 'C' columns from the selected rows.

In [28]:
print("columns 1 & 3 :\n", df.iloc[:,[0,2]]) 

columns 1 & 3 :
     A  C
A1  1  a
B2  2  b
C3  3  c
D4  4  d
E5  5  e


In [30]:
df.sort_values('A',ascending=False)

Unnamed: 0,A,B,C
E5,5,50,e
D4,4,40,d
C3,3,30,c
B2,2,20,b
A1,1,10,a
