In [12]:
import pandas as pd
import numpy as np

## Inroduction | Creating Objects | Viewing Data

1. https://www.geeksforgeeks.org/pandas/introduction-to-pandas-in-python/

2. https://www.geeksforgeeks.org/python/how-to-install-python-pandas-on-windows-and-linux/

3. https://www.geeksforgeeks.org/machine-learning/how-to-use-jupyter-notebook-an-ultimate-guide/


1. https://www.geeksforgeeks.org/pandas/creating-a-pandas-dataframe/

2. https://www.geeksforgeeks.org/python/python-pandas-series/

3. https://www.geeksforgeeks.org/python/creating-a-pandas-series/

1. https://www.geeksforgeeks.org/python/python-pandas-dataframe-series-head-method/

2. https://www.geeksforgeeks.org/python/python-pandas-dataframe-series-tail-method/

3. https://www.geeksforgeeks.org/pandas/python-pandas-dataframe-describe-method/


In [13]:
# pandas_basics_combined.py
# Pandas basics: installation notes, Series, DataFrame creation, head/tail/describe examples.
# Top-level script with detailed inline comments explaining what each function does and parameters.

# ---------- INSTALL / START NOTES (no code) ----------
# To install pandas:
#   pip install pandas
#
# Recommended environment:
#   - Use a virtualenv or conda environment (conda create -n pd python=3.10)
#   - Use Jupyter Notebook or JupyterLab for interactive exploration:
#       jupyter notebook
#   - See GfG guides for step-by-step install and using Jupyter. :contentReference[oaicite:1]{index=1}
#
# Note: pandas depends on numpy. If you have Anaconda, pandas is included.

# ------------------------------
# Imports
# ------------------------------
import pandas as pd
import numpy as np

print("\n=== PANDAS BASICS: SERIES & DATAFRAME CREATION ===\n")


# ======================================================
# 1) PANDAS SERIES
# ======================================================
# A Series is a 1D labeled array. It holds values + an index (labels).
# Common constructor:
#   pd.Series(data, index=None, dtype=None, name=None)
# Parameters:
#   data  : list/ndarray/dict/scalar
#   index : list-like labels; if omitted pandas uses 0..n-1
#   dtype : force data type (e.g., 'float64', 'int32', 'object')
#   name  : optional name of the Series (shows in prints)
#
# Series behaves like a single column (vector). See GfG Series guide. :contentReference[oaicite:2]{index=2}

# Create from a Python list (default index = 0..n-1)
s1 = pd.Series([10, 20, 30, 40])
print("Series from list:\n", s1, "\n")

# Create from a list with a custom index
s2 = pd.Series([100, 200, 300], index=['a', 'b', 'c'], dtype='int64', name='scores')
print("Series with custom index & dtype:\n", s2, "\n")

# Create from a dict (keys -> index, values -> data)
d = {'apple': 5, 'banana': 3, 'cherry': 7}
s3 = pd.Series(d, name='fruits')
print("Series from dict (keys become index):\n", s3, "\n")


# Accessing Series:
print("s2['b'] ->", s2['b'])           # by label
print("s1[0]    ->", s1[0])            # by integer position (index 0)
print("s2.index ->", s2.index)         # index labels
print("s2.values ->", s2.values)       # ndarray of values
print("\n")


# ======================================================
# 2) PANDAS DATAFRAME (many ways to create)
# ======================================================
# A DataFrame is a 2D labeled tabular structure (rows + columns).
# Constructor signatures:
#   pd.DataFrame(data=None, index=None, columns=None, dtype=None)
# where 'data' can be:
#   - dict of lists/ndarrays: {colname: column_values}
#   - list of dicts (records)
#   - 2D ndarray + columns list
#   - Series dict
#
# GfG covers many ways to create DataFrame. :contentReference[oaicite:3]{index=3}

# 2A: From dict of lists (common)
data_dict = {
    'Name' : ['Alice', 'Bob', 'Charlie'],
    'Age'  : [25, 30, 22],
    'City' : ['Delhi', 'Mumbai', 'Bangalore']
}
df1 = pd.DataFrame(data_dict, columns=['Name', 'Age', 'City'])  # columns order optional
print("DataFrame from dict of lists:\n", df1, "\n")

# 2B: From list of dicts (each dict is a row / "record")
rows = [
    {'Name':'Dan', 'Age': 28},
    {'Name':'Eve', 'Age': 35, 'City': 'Chennai'},  # missing City in first row -> NaN
]
df2 = pd.DataFrame(rows)
print("DataFrame from list of dicts (records):\n", df2, "\n")

# 2C: From NumPy 2D array + column names
arr = np.array([[1,2,3],[4,5,6]])
df3 = pd.DataFrame(arr, columns=['A','B','C'])
print("DataFrame from 2D ndarray:\n", df3, "\n")

# 2D: From a Series mapping (each Series is a column)
col1 = pd.Series([10,20,30], index=['x','y','z'])
col2 = pd.Series([0.1, 0.2, 0.3], index=['x','y','z'])
df4 = pd.DataFrame({'col1': col1, 'col2': col2})
print("DataFrame from Series objects (index aligned):\n", df4, "\n")


# ======================================================
# 3) Basic DataFrame inspection / attributes
# ======================================================
# Useful attributes & methods:
#   df.shape      -> (n_rows, n_cols)
#   df.columns    -> column Index
#   df.index      -> row Index
#   df.dtypes     -> dtype per column
#   df.info()     -> concise summary (non-null counts + dtypes)
#   df.head(n=5)  -> first n rows (default n=5)
#   df.tail(n=5)  -> last n rows (default n=5)
#   df.describe() -> summary statistics for numeric columns (count, mean, std, min, 25%, 50%, 75%, max)
#
# head/tail/describe are covered on GfG. :contentReference[oaicite:4]{index=4}

print("df1.shape:", df1.shape)
print("df1.columns:", df1.columns)
print("df1.dtypes:\n", df1.dtypes)
print("\nConcise info() output:")
df1.info()   # prints info (non-null counts, memory usage)

print("\n-- head() examples --")
print("df1.head()  -> default first 5 rows (here all rows):\n", df1.head(), "\n")
print("df1.head(2) -> first 2 rows:\n", df1.head(2), "\n")

print("-- tail() examples --")
print("df1.tail()  -> last 5 rows (here all):\n", df1.tail(), "\n")
print("df1.tail(1) -> last 1 row:\n", df1.tail(1), "\n")

print("-- describe() example --")
# describe() returns descriptive stats for numeric columns by default
print(df1.describe(), "\n")   # count, mean, std, min, quartiles, max


# ======================================================
# 4) Small manipulation examples (selection / slicing)
# ======================================================
# Selecting columns: df['col'] returns a Series; df[['col1','col2']] returns DataFrame
print("Select single column (Series):\n", df1['Age'], "\n")
print("Select multiple columns (DataFrame):\n", df1[['Name','City']], "\n")

# Row selection by position: iloc, by label: loc
print("Row 0 by position (iloc):\n", df1.iloc[0], "\n")
# If index is labeled (not 0..n-1) use df.loc[label]
print("Select rows where Age > 24:\n", df1[df1['Age'] > 24], "\n")

# Adding a new column (vectorized)
df1['Age_plus_5'] = df1['Age'] + 5
print("After adding Age_plus_5 column:\n", df1, "\n")

# Dropping a column (returns new DataFrame unless inplace=True)
df_copy = df1.drop(columns=['Age_plus_5'])
print("After drop (copy):\n", df_copy, "\n")


# ======================================================
# 5) IO quick notes (read/write)
# ======================================================
# Read CSV:
#   pd.read_csv(filepath, sep=',', header='infer', index_col=None, usecols=None, dtype=None, parse_dates=False)
# Important params:
#   filepath   : path to CSV
#   sep        : delimiter (default ',')
#   header     : row number to use as column names (default 0)
#   index_col  : column to use as row labels
#   parse_dates: try to parse date columns
#
# Write CSV:
#   df.to_csv(path, index=True/False)
#
# Example (commented out since no file in this run):
#   df = pd.read_csv('data.csv')
#   df.to_csv('out.csv', index=False)
#
# See GfG install/read guides for more. :contentReference[oaicite:5]{index=5}


# ======================================================
# 6) Short cookbook: useful one-liners
# ======================================================
print("=== COOKBOOK ===")
print("Value counts of a column (frequency): df['City'].value_counts() ->")
print(df1['City'].value_counts(), "\n")

print("Sort by column: df.sort_values('Age') ->")
print(df1.sort_values('Age'), "\n")

print("Reset index: df.reset_index(drop=True) ->")
print(df1.reset_index(drop=True), "\n")

print("Rename columns: df.rename(columns={'Age':'age_years'}) ->")
print(df1.rename(columns={'Age':'age_years'}), "\n")


# ======================================================
# 7) Closing notes & pointers
# ======================================================
# - Pandas Series and DataFrame are built on top of NumPy arrays — operations are vectorized.
# - Use head()/tail()/describe() for quick data exploration. They are your first commands after loading data. :contentReference[oaicite:6]{index=6}
# - For learning path: start with Series -> DataFrame -> IO -> indexing -> groupby -> merge/join -> time-series.
# - I can convert this into a Jupyter notebook with explanatory cells and outputs if you like.

print("\n=== Done: pandas basics example script ===\n")



=== PANDAS BASICS: SERIES & DATAFRAME CREATION ===

Series from list:
 0    10
1    20
2    30
3    40
dtype: int64 

Series with custom index & dtype:
 a    100
b    200
c    300
Name: scores, dtype: int64 

Series from dict (keys become index):
 apple     5
banana    3
cherry    7
Name: fruits, dtype: int64 

s2['b'] -> 200
s1[0]    -> 10
s2.index -> Index(['a', 'b', 'c'], dtype='object')
s2.values -> [100 200 300]


DataFrame from dict of lists:
       Name  Age       City
0    Alice   25      Delhi
1      Bob   30     Mumbai
2  Charlie   22  Bangalore 

DataFrame from list of dicts (records):
   Name  Age     City
0  Dan   28      NaN
1  Eve   35  Chennai 

DataFrame from 2D ndarray:
    A  B  C
0  1  2  3
1  4  5  6 

DataFrame from Series objects (index aligned):
    col1  col2
x    10   0.1
y    20   0.2
z    30   0.3 

df1.shape: (3, 3)
df1.columns: Index(['Name', 'Age', 'City'], dtype='object')
df1.dtypes:
 Name    object
Age      int64
City    object
dtype: object

Concise i

### Common binary Operations

sub()	Method is used to subtract series or list like objects with same length from the caller series

mul()	Method is used to multiply series or list like objects with same length with the caller series

div()	Method is used to divide series or list like objects with same length by the caller series

sum()	Returns the sum of the values for the requested axis

prod()	Returns the product of the values for the requested axis

mean()	Returns the mean of the values for the requested axis

pow()	Method is used to put each element of passed series as exponential power of caller series and returned the results

abs()	Method is used to get the absolute numeric value of each element in Series/DataFrame

cov()	Method is used to find covariance of two series


.


1. https://www.geeksforgeeks.org/python/python-pandas-series-mul/

2. https://www.geeksforgeeks.org/python/python-pandas-series-div/

3. https://www.geeksforgeeks.org/python/python-pandas-series-sum/

4. https://www.geeksforgeeks.org/machine-learning/python-pandas-series-prod/

5. https://www.geeksforgeeks.org/pandas/python-pandas-series-mean/

6. https://www.geeksforgeeks.org/python/python-pandas-series-pow/

7. https://www.geeksforgeeks.org/python/python-pandas-series-abs/

8. https://www.geeksforgeeks.org/python/python-pandas-series-cov-to-find-covariance/

In [14]:
# pandas_series_operations.py
# Covers:
# 1) Series.mul()
# 2) Series.div()
# 3) Series.sum()
# 4) Series.prod()
# 5) Series.mean()
# 6) Series.pow()
# 7) Series.abs()
# 8) Series.cov()

import pandas as pd
import numpy as np


print("\n==================== PANDAS SERIES OPERATIONS ====================\n")

# Sample Series for demonstrations
s1 = pd.Series([10, 20, 30, 40])
s2 = pd.Series([1, 2, 3, 4])


# =====================================================================
# 1) Series.mul() → elementwise multiplication
# =====================================================================
# Syntax:
#   Series.mul(other, fill_value=None)
# Parameters:
#   other       : number or another Series
#   fill_value  : value used to fill missing indexes before operation
# Meaning:
#   Performs s1 * other elementwise.
print("\n1) Series.mul() - elementwise multiplication")
print("s1:\n", s1)
print("\nMultiplying s1 * 2 → s1.mul(2)")
print(s1.mul(2))

print("\nMultiplying two series s1 * s2")
print(s1.mul(s2))



# =====================================================================
# 2) Series.div() → elementwise division
# =====================================================================
# Syntax:
#   Series.div(other, fill_value=None)
# Meaning:
#   s1 / other elementwise.
print("\n\n2) Series.div() - elementwise division")
print("s1 / 10 → s1.div(10)")
print(s1.div(10))

print("\nDividing s1 / s2 → s1.div(s2)")
print(s1.div(s2))



# =====================================================================
# 3) Series.sum() → sum of all elements
# =====================================================================
# Syntax:
#   Series.sum(skipna=True)
# Parameters:
#   skipna : ignore NaN values (default True)
# Meaning:
#   Returns scalar sum.
print("\n\n3) Series.sum() - sum of elements")
print("Sum of s1:", s1.sum())



# =====================================================================
# 4) Series.prod() → product of all elements
# =====================================================================
# Syntax:
#   Series.prod(skipna=True)
# Meaning:
#   Multiply all elements together → returns scalar.
print("\n\n4) Series.prod() - product of all elements")
print("Product of s1:", s1.prod())



# =====================================================================
# 5) Series.mean() → mean (average)
# =====================================================================
# Syntax:
#   Series.mean(skipna=True)
# Meaning:
#   Returns arithmetic mean of the series.
print("\n\n5) Series.mean() - average value")
print("Mean of s1:", s1.mean())



# =====================================================================
# 6) Series.pow() → elementwise exponentiation
# =====================================================================
# Syntax:
#   Series.pow(other, fill_value=None)
# Meaning:
#   s1 ** other (each element raised to power).
print("\n\n6) Series.pow() - exponentiation")
print("s2.pow(2)  # each element squared")
print(s2.pow(2))



# =====================================================================
# 7) Series.abs() → absolute values
# =====================================================================
# Syntax:
#   Series.abs()
# Meaning:
#   Returns absolute value of each element.
s3 = pd.Series([-5, -10, 15, -2])
print("\n\n7) Series.abs() - absolute values")
print("Original:\n", s3)
print("Absolute:\n", s3.abs())



# =====================================================================
# 8) Series.cov() → covariance between two Series
# =====================================================================
# Syntax:
#   Series.cov(other)
# Meaning:
#   Calculates covariance between this Series and another.
#
# Important:
#   - Lengths must match
#   - Returns scalar covariance value
#
# Covariance meaning:
#   +ve → variables increase together
#   -ve → one increases while other decreases
#    0  → independent movement
#
x = pd.Series([10, 20, 30, 40, 50])
y = pd.Series([5, 15, 25, 35, 45])

print("\n\n8) Series.cov() - covariance between Series")
print("x:\n", x)
print("y:\n", y)

print("Covariance x.cov(y):", x.cov(y))



print("\n==================== DONE ====================\n")





1) Series.mul() - elementwise multiplication
s1:
 0    10
1    20
2    30
3    40
dtype: int64

Multiplying s1 * 2 → s1.mul(2)
0    20
1    40
2    60
3    80
dtype: int64

Multiplying two series s1 * s2
0     10
1     40
2     90
3    160
dtype: int64


2) Series.div() - elementwise division
s1 / 10 → s1.div(10)
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

Dividing s1 / s2 → s1.div(s2)
0    10.0
1    10.0
2    10.0
3    10.0
dtype: float64


3) Series.sum() - sum of elements
Sum of s1: 100


4) Series.prod() - product of all elements
Product of s1: 240000


5) Series.mean() - average value
Mean of s1: 25.0


6) Series.pow() - exponentiation
s2.pow(2)  # each element squared
0     1
1     4
2     9
3    16
dtype: int64


7) Series.abs() - absolute values
Original:
 0    -5
1   -10
2    15
3    -2
dtype: int64
Absolute:
 0     5
1    10
2    15
3     2
dtype: int64


8) Series.cov() - covariance between Series
x:
 0    10
1    20
2    30
3    40
4    50
dtype: int64
y:
 0     

## Selection | Slicing | Other Operations

1. https://www.geeksforgeeks.org/pandas/dealing-with-rows-and-columns-in-pandas-dataframe/

2. https://www.geeksforgeeks.org/python/python-pandas-extracting-rows-using-loc/

3. https://www.geeksforgeeks.org/python/python-extracting-rows-using-pandas-iloc/

4. https://www.geeksforgeeks.org/pandas/indexing-and-selecting-data-with-pandas/

5. https://www.geeksforgeeks.org/pandas/boolean-indexing-in-pandas/

6. https://www.geeksforgeeks.org/python/python-pandas-dataframe-ix/

7. https://www.geeksforgeeks.org/python/python-pandas-series-str-slice/

8. https://www.geeksforgeeks.org/python/how-to-take-column-slices-of-dataframe-in-pandas/


### Other Operations

1. https://www.geeksforgeeks.org/python/python-pandas-apply/

2. https://www.geeksforgeeks.org/python/apply-function-to-every-row-in-a-pandas-dataframe/

3. https://www.geeksforgeeks.org/pandas/python-pandas-series-apply/

4. https://www.geeksforgeeks.org/python/python-pandas-dataframe-aggregate/

5. https://www.geeksforgeeks.org/python/python-pandas-dataframe-mean/

6. https://www.geeksforgeeks.org/pandas/python-pandas-series-mean/

7. https://www.geeksforgeeks.org/python/python-pandas-dataframe-mad/

8. https://www.geeksforgeeks.org/python/python-pandas-series-mad-to-calculate-mean-absolute-deviation-of-a-series/

9. https://www.geeksforgeeks.org/python/python-pandas-dataframe-sem/

18. https://www.geeksforgeeks.org/python/python-pandas-series-value_counts/

10. https://www.geeksforgeeks.org/python/applying-lambda-functions-to-pandas-dataframe/


In [15]:
# pandas_indexing_examples.py
# Demonstrates rows/columns operations, loc/iloc, boolean indexing, .ix (deprecated note),
# Series.str.slice, and column slicing. All examples are top-level with comments.

import pandas as pd
import numpy as np

print("\n=== SETUP: sample DataFrame ===\n")

# Build a sample DataFrame used across examples
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age' : [25, 30, 35, 40, 22],
    'City': ['Delhi', 'Mumbai', 'Bengaluru', 'Kolkata', 'Chennai'],
    'Score':[85, 92, 78, 88, 91]
}
df = pd.DataFrame(data)
# Show index and columns (default index 0..n-1)
print("Initial DataFrame:\n", df, "\n")
print("shape:", df.shape)
print("columns:", df.columns)
print("index:", df.index, "\n")


# ============================================================
# 1) ROWS & COLUMNS: add, drop, rename, set/reset index
# ============================================================
print("\n=== 1) Rows & Columns operations ===\n")

# Add a new column (vectorized assignment)
# df['NewCol'] = <series-like> ; new column aligned by index
df['Passed'] = df['Score'] >= 80   # boolean column
print("Added 'Passed' column:\n", df, "\n")

# Drop a column:
# df.drop(columns=['colname'], inplace=False) returns new DF by default
df_dropped = df.drop(columns=['Passed'])
print("After df.drop(columns=['Passed']) (copy):\n", df_dropped, "\n")

# Drop a row by label (index value):
# df.drop(index=label) ; axis=0 by default
df_droprow = df.drop(index=2)   # removes row with index 2 ('Charlie')
print("After df.drop(index=2):\n", df_droprow, "\n")

# Rename columns:
# df.rename(columns={'old':'new'}, inplace=False)
print("Rename 'City' -> 'Location':\n", df.rename(columns={'City':'Location'}), "\n")

# Set an existing column as index:
# df.set_index('Name', inplace=False)
df_indexed = df.set_index('Name')
print("Set 'Name' as index (new DataFrame):\n", df_indexed, "\n")

# Reset index back to default:
print("Reset index (back to numeric):\n", df_indexed.reset_index(), "\n")


# ============================================================
# 2) loc — label-based indexing
# ============================================================
print("\n=== 2) .loc (label-based) ===\n")

# Basic: df.loc[row_label, col_label]
# When index is default integers, row_label is integer index value
print("Row with label/index 1 (as Series):\n", df.loc[1], "\n")

# Select multiple rows and columns by labels:
# df.loc[[row_labels], [col_labels]]
print("Rows 1 & 3, columns 'Name' and 'Score':\n", df.loc[[1,3], ['Name','Score']], "\n")

# Slicing with labels (inclusive of the end label for loc)
# df.loc[start_label : end_label, start_col : end_col]
print("Rows 1 to 3 (inclusive), columns 'Name' to 'City' (inclusive):\n",
      df.loc[1:3, 'Name':'City'], "\n")

# Selecting all rows but specific columns:
print("All rows, columns 'Name' and 'Age':\n", df.loc[:, ['Name','Age']], "\n")

# Boolean condition with loc:
# df.loc[df['Age'] > 30, ['Name','Score']]
print("Select rows where Age > 30 (loc + boolean mask):\n",
      df.loc[df['Age'] > 30, ['Name','Score']], "\n")

# Assigning using loc (in-place)
# df.loc[mask, 'col'] = value  — modifies DataFrame in-place
df.loc[df['Name'] == 'Alice', 'Score'] = 87
print("After updating Alice's Score with loc assignment:\n", df, "\n")


# ============================================================
# 3) iloc — integer position based indexing
# ============================================================
print("\n=== 3) .iloc (position-based) ===\n")

# iloc uses integer positions [row_pos, col_pos], zero-based and end-exclusive for slices
print("First row by position (iloc[0]):\n", df.iloc[0], "\n")

# Select rows 1..3 by position (end-exclusive), and columns 0..2
print("df.iloc[1:4, 0:3] -> rows pos 1..3, cols pos 0..2:\n", df.iloc[1:4, 0:3], "\n")

# Fancy indexing by positions: pass lists of integer positions
print("Rows at positions [0,2], columns [1,3] ->\n", df.iloc[[0,2],[1,3]], "\n")

# Negative indices allowed (like Python lists)
print("Last row with iloc[-1]:\n", df.iloc[-1], "\n")


# ============================================================
# 4) Combined examples: loc vs iloc differences
# ============================================================
print("\n=== 4) loc vs iloc differences ===\n")

# If index labels are integers and non-default, loc treats them as labels (not positions)
df2 = df.set_index('Age')   # index now values [25,30,35,40,22]
print("df2 (Age as index):\n", df2, "\n")
# df2.loc[30] -> uses label 30 (row where Age==30)
print("df2.loc[30] -> row with index label 30:\n", df2.loc[30], "\n")
# df2.iloc[1] -> second row by position
print("df2.iloc[1] -> second row by position:\n", df2.iloc[1], "\n")


# ============================================================
# 5) Boolean indexing (filtering)
# ============================================================
print("\n=== 5) Boolean indexing ===\n")

# Boolean mask example:
mask = (df['Score'] >= 90) & (df['Age'] < 35)
print("Mask (Score>=90 & Age<35):", mask.tolist())
print("Filtered rows with mask:\n", df[mask], "\n")

# Use .query() as a string-based filter alternative (useful for readability)
print("Using df.query('Score>=90 and Age < 35'):\n", df.query('Score >= 90 and Age < 35'), "\n")


# ============================================================
# 6) .ix (DEPRECATED) — explanation and safe alternative
# ============================================================
print("\n=== 6) .ix — deprecated (do NOT use) ===\n")
print("Note: .ix was deprecated and removed in pandas >= 0.20. It tried to be 'label or position' ambiguous.")
print("Use .loc for label-based selection and .iloc for position-based selection.\n")

# For historical demonstration only: show recommended replacements
# Example intention: "select row with label 1 and column 'Name'"
print("Use .loc[1,'Name'] if 1 is a label; use .iloc[1,0] if 1 is a position.\n")


# ============================================================
# 7) Series string slicing (.str.slice and other .str helpers)
# ============================================================
print("\n=== 7) Series string operations (.str.slice) ===\n")

# Suppose we want first 3 characters of each Name
names = df['Name']
print("Original names:\n", names.tolist())
# Series.str.slice(start, stop, step) -> works on string Series
# start inclusive, stop exclusive (like Python slicing)
print("First 3 chars (names.str.slice(0,3)):\n", names.str.slice(0, 3).tolist(), "\n")

# Other useful .str methods: .lower(), .upper(), .contains(), .split(), .replace()
print("names.str.upper():", names.str.upper().tolist())
print("names.str.contains('a') -> boolean Series indicating whether 'a' appears:\n", names.str.contains('a'))


# ============================================================
# 8) Column slices: label-range and positional slicing
# ============================================================
print("\n=== 8) Column slicing ===\n")

# 8A: Label-range slicing (inclusive) with loc:
# df.loc[:, 'Name':'City'] -> selects all rows and columns from 'Name' through 'City' (inclusive)
print("Columns 'Name' through 'City' (label-range with loc):\n", df.loc[:, 'Name':'City'], "\n")

# 8B: Column list to select specific columns:
print("Select columns by list ['City','Score']:\n", df[['City','Score']], "\n")

# 8C: Positional column slice using iloc:
# df.iloc[:, 1:4] -> selects columns by position (end-exclusive)
print("Columns by positional slice iloc[:, 1:3] ->\n", df.iloc[:, 1:3], "\n")

# 8D: Using filter to choose columns by regex or like
print("Columns with name starting with 'S' using filter(regex):\n", df.filter(regex='^S').columns.tolist(), "\n")


# ============================================================
# 9) Accessing multiple rows in different ways (examples)
# ============================================================
print("\n=== 9) Accessing different rows ===\n")

# contiguous rows by slice
print("df[1:4] -> rows positions 1..3 (slice by position on default index):\n", df[1:4], "\n")

# rows by explicit list of labels (fancy indexing)
print("df.loc[[0,2,4]] -> rows with labels 0,2,4:\n", df.loc[[0,2,4]], "\n")

# selecting by boolean masks built from multiple conditions:
print("Rows where City == 'Mumbai' or Score>90:\n", df[(df['City']=='Mumbai') | (df['Score']>90)], "\n")


# ============================================================
# 10) Good practices & tips
# ============================================================
print("\n=== 10) Tips & best practices ===\n")
print("- Prefer .loc and .iloc explicitly; they are unambiguous.")
print("- Use boolean masks for filtering; combine masks with & and | and wrap conditions with parentheses.")
print("- Use df.at[row_label, col_label] or df.iat[row_pos, col_pos] for fast scalar access/assignment.")
print("- Avoid chained indexing like df[df['A']>0]['B'] = val (may cause SettingWithCopyWarning). Use loc instead.")
print("- When index labels are integers, be careful: loc uses labels, iloc uses positions.\n")

print("=== DONE ===\n")



=== SETUP: sample DataFrame ===

Initial DataFrame:
       Name  Age       City  Score
0    Alice   25      Delhi     85
1      Bob   30     Mumbai     92
2  Charlie   35  Bengaluru     78
3    David   40    Kolkata     88
4      Eve   22    Chennai     91 

shape: (5, 4)
columns: Index(['Name', 'Age', 'City', 'Score'], dtype='object')
index: RangeIndex(start=0, stop=5, step=1) 


=== 1) Rows & Columns operations ===

Added 'Passed' column:
       Name  Age       City  Score  Passed
0    Alice   25      Delhi     85    True
1      Bob   30     Mumbai     92    True
2  Charlie   35  Bengaluru     78   False
3    David   40    Kolkata     88    True
4      Eve   22    Chennai     91    True 

After df.drop(columns=['Passed']) (copy):
       Name  Age       City  Score
0    Alice   25      Delhi     85
1      Bob   30     Mumbai     92
2  Charlie   35  Bengaluru     78
3    David   40    Kolkata     88
4      Eve   22    Chennai     91 

After df.drop(index=2):
     Name  Age     City  S

In [16]:
# pandas_apply_agg_examples.py
# Demonstrates: DataFrame.apply, Series.apply, DataFrame.agg, mean, mad (manual), sem, value_counts,
# apply with lambda (row-wise), applymap (elementwise), and usage notes.
#
# Run: python pandas_apply_agg_examples.py

import pandas as pd
import numpy as np

print("\n=== SETUP SAMPLE DATA ===\n")

df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [1.5, 2.5, 3.5, np.nan],
    'C': ['x', 'y', 'x', 'z']
})

print("Sample DataFrame:\n", df, "\n")


# ============================================================
# 1) DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
# - Applies function along an axis (0 = index/columns, 1 = columns/rows)
# - If func returns a Series for each input, results can be combined into a DataFrame.
# - result_type controls how list-like results are combined (None, 'expand', 'reduce', 'broadcast').
# ============================================================

print("\n=== 1) DataFrame.apply examples ===\n")

# Example B: apply function to each ROW (axis=1)
# Here the function receives a Series for the row (index = column names)
def row_sum(row):
    # sum numeric columns in the row (skip non-numeric automatically by pandas' sum)
    return row[['A', 'B']].sum()

print("Apply row_sum to each row (axis=1):")
print(df.apply(row_sum, axis=1))
print()

# Example C: result_type='expand' when func returns a sequence for each row
# result_type='expand' will expand sequences into columns (DataFrame)
def row_stats(row):
    a = row['A']
    b = row['B']
    return (a + b, a - b)   # 2-tuple → will expand

print("Apply row_stats with result_type='expand' to convert tuples to columns:")
print(df.apply(row_stats, axis=1, result_type='expand'))
print()

# Example D: passing additional args via args=()
def add_scalar(col, scalar):
    """Add scalar to column (works when apply passes Series col)."""
    return col + scalar

print("Apply add_scalar to each column with args=(5,):")
print(df[['A','B']].apply(add_scalar, args=(5,)))   # only numeric columns shown for clarity
print()


# ============================================================
# 2) Series.apply(func, convert_dtype=True, args=(), **kwargs)
# - Applies function elementwise to Series values (or a ufunc to whole Series).
# - convert_dtype: try to coerce result to a better dtype (default True).
# ============================================================

print("\n=== 2) Series.apply examples ===\n")

s = pd.Series([1, 4, 9, 16])

# Apply sqrt elementwise (function receives scalar)
print("Series.apply with sqrt (elementwise):")
print(s.apply(np.sqrt))   # equivalent to s.map(np.sqrt) for elementwise
print()

# Using a Python function with args
def power(x, p=2):
    return x ** p

print("Series.apply with custom function + args (p=3):")
print(s.apply(power, p=3))   # passing keyword arg works too (via kwargs)
print()

# Note: for vectorized NumPy ufuncs prefer calling ufunc on Series directly (faster):
print("Direct NumPy ufunc (faster) - np.sqrt(s):")
print(np.sqrt(s))
print()


# ============================================================
# 3) DataFrame.agg / agg (aggregate) — flexible aggregations
# - Accepts function name string, function, list of functions, or dict mapping column->functions
# - Aggregation is performed over an axis (default axis=0 meaning aggregate each column)
# ============================================================

print("\n=== 3) DataFrame.agg examples ===\n")

print("Column-wise mean and sum using a list of functions:")
print(df[['A','B']].agg(['mean', 'sum']))    # returns DataFrame with rows = agg names, cols = original columns
print()

print("Different aggregations per column using dict:")
print(df.agg({'A': ['mean', 'min'], 'B': ['mean', 'std']}))
print()

# Apply a named aggregation (useful with groupby too) — here on full DataFrame
print("Named-style aggregation producing a flat column MultiIndex (example):")
print(df.agg(A_mean=('A', 'mean'), B_sum=('B', 'sum')))
print()


# ============================================================
# 4) mean() — Series.mean() and DataFrame.mean()
# - Parameters: axis, skipna (default True), numeric_only, level, etc.
# ============================================================

print("\n=== 4) mean() examples ===\n")

print("Mean of column A:", df['A'].mean())
print("Mean across columns for each row (numeric only):")
print(df[['A','B']].mean(axis=1))   # axis=1 computes per-row mean (over columns)


# ============================================================
# 5) mad() — mean absolute deviation
# - NOTE: pandas.DataFrame.mad and Series.mad are deprecated in recent pandas versions.
# - Manual equivalent: (s - s.mean()).abs().mean()
# ============================================================

print("\n=== 5) MAD (mean absolute deviation) examples ===\n")

s2 = pd.Series([2.0, 4.0, 4.0, 6.0, 8.0])

# Manual MAD (preferred since mad() deprecation)
mad_manual = (s2 - s2.mean()).abs().mean()
print("Manual MAD for s2:", mad_manual)

# If mad() is available in your pandas version you could call s2.mad(), but prefer manual form for compatibility.
print("Equivalent expression: (s - s.mean()).abs().mean()")
print()


# ============================================================
# 6) sem() — standard error of the mean
# - DataFrame.sem() / Series.sem()
# - Parameters include: axis, skipna=True, ddof=1 (delta degrees of freedom), numeric_only
# - sem = std / sqrt(N) (with ddof affecting std)
# ============================================================
print("\n=== 6) sem() examples ===\n")

print("Standard error of column A (Series.sem):", df['A'].sem())   # ddof default 1
print("Standard error across rows (DataFrame.sem axis=1):\n", df[['A','B']].sem(axis=1))
print()


# ============================================================
# 7) Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
# - Returns counts of unique values as a Series (descending by count by default)
# ============================================================
print("\n=== 7) value_counts() examples ===\n")

print("Value counts for column C:")
print(df['C'].value_counts())   # counts of 'x','y','z'
print("Relative frequencies (normalize=True):")
print(df['C'].value_counts(normalize=True))
print()


# ============================================================
# 8) apply with lambda across rows/columns + applymap (elementwise)
# - Use df.apply(lambda row: ..., axis=1) for row-wise single-row functions
# - Use df.applymap(func) to apply elementwise to each entry of DataFrame
# ============================================================
print("\n=== 8) apply with lambda and applymap ===\n")

# Row-wise: create summary column by combining A and B
df['A_plus_B'] = df.apply(lambda r: (r['A'] + (r['B'] if pd.notna(r['B']) else 0)), axis=1)
print("After row-wise lambda (A + B with NaN safe):\n", df[['A','B','A_plus_B']], "\n")

# Elementwise via applymap: convert numeric cells to strings with 'v=' prefix
def prefix_v(x):
    # apply only to numeric-ish values; leave strings as-is
    if isinstance(x, (int, float, np.integer, np.floating)):
        return f"v={x}"
    return x

print("Elementwise applymap example (numeric -> 'v=...'):")
print(df[['A','B']].applymap(prefix_v))
print()

# NOTE: prefer vectorized operations (df['A'] + df['B']) for speed where possible instead of row-wise apply.


# ============================================================
# 9) Performance notes and tips (short)
# - Series.apply and DataFrame.apply can call Python code per-element/row/col -> slower than vectorized ops
# - Use NumPy ufuncs directly on Series when possible (np.sqrt(series)), or vectorized pandas ops (df['A'] + 5)
# - Use apply/agg for flexible transformations/aggregations when vectorized ops are not available
# ============================================================
print("\n=== 9) Performance tips ===\n")
print("- Prefer vectorized pandas/NumPy operations for speed.")
print("- Use apply/agg when you need custom Python-level logic.")
print("- For elementwise scalar transformations use Series.map/.apply for clarity (but still Python-level).")
print()

print("=== DONE: pandas apply/agg examples ===\n")



=== SETUP SAMPLE DATA ===

Sample DataFrame:
     A    B  C
0  10  1.5  x
1  20  2.5  y
2  30  3.5  x
3  40  NaN  z 


=== 1) DataFrame.apply examples ===

Apply row_sum to each row (axis=1):
0    11.5
1    22.5
2    33.5
3    40.0
dtype: float64

Apply row_stats with result_type='expand' to convert tuples to columns:
      0     1
0  11.5   8.5
1  22.5  17.5
2  33.5  26.5
3   NaN   NaN

Apply add_scalar to each column with args=(5,):
    A    B
0  15  6.5
1  25  7.5
2  35  8.5
3  45  NaN


=== 2) Series.apply examples ===

Series.apply with sqrt (elementwise):
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

Series.apply with custom function + args (p=3):
0       1
1      64
2     729
3    4096
dtype: int64

Direct NumPy ufunc (faster) - np.sqrt(s):
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64


=== 3) DataFrame.agg examples ===

Column-wise mean and sum using a list of functions:
          A    B
mean   25.0  2.5
sum   100.0  7.5

Different aggregations per column using dic

  print(df[['A','B']].applymap(prefix_v))


Function	Description

DataFrame.iat[]	Access a single value for a row/column pair by integer position.

DataFrame.pop()	Return item and drop from DataFrame.

DataFrame.xs() Return a cross-section (row(s) or column(s)) from the DataFrame.

DataFrame.get()	Get item from object for given key (e.g DataFrame column).

DataFrame.isin()	Return a boolean DataFrame showing whether each element is contained in values.

DataFrame.where()	Return an object of the same shape with entries from self where cond is True otherwise from other.

DataFrame.mask()	Return an object of the same shape with entries from self where cond is False otherwise from other.

DataFrame.insert()	Insert a column into DataFrame at a specified location.

## Data Manipulation and Grouping