In [6]:
import pandas as pd
import numpy as np

In [50]:
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]}, index=['a', 'b', 'c', 'd'])
df

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11
d,4,8,12


In [51]:
# loc Accessor: The loc accessor is used for label-based indexing.
df.loc[[0, 1], ['A', 'B']]

KeyError: "None of [Index([0, 1], dtype='int64')] are in the [index]"

In [None]:
# iloc Accessor: The iloc accessor is used for integer position-based indexing.
df.iloc[[0, 1], [0, 2]]

In [None]:
# Boolean Indexing
df[df['A'] > 10]

In [None]:
# Indexing with MultiIndex

# Creating a MultiIndex for rows
index = pd.MultiIndex.from_tuples(
    [('USA', 2020), ('USA', 2021), ('Canada', 2020), ('Canada', 2021)],
    names=['Country', 'Year']
)

# Creating a DataFrame using the MultiIndex
df = pd.DataFrame({
    'Population': [331, 332, 38, 39],
    'GDP': [21.43, 22.68, 1.64, 1.70]
}, index=index)

print(df)

In [None]:
df.loc[('USA', 2020), 'Population']

In [None]:
# Indexing with DatetimeIndex

date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')

df = pd.DataFrame({
    'value': np.random.randn(len(date_range))
}, index=date_range)

print(df.head())

In [52]:
df

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11
d,4,8,12


In [53]:
# Selecting data for January 2023
jan_2023_data = df.loc['2023-01-01':'2023-01-31']
print(jan_2023_data)

Empty DataFrame
Columns: [A, B, C]
Index: []


In [54]:
series = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
series

a    1
b    2
c    3
d    4
dtype: int64

In [55]:
series['a']

1

In [56]:
series[0]

  series[0]


1

In [57]:
series['b':'d']

b    2
c    3
d    4
dtype: int64

In [58]:
series[1:3]

b    2
c    3
dtype: int64

In [59]:
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]}, index=['a', 'b', 'c', 'd'])
df

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11
d,4,8,12


In [60]:
df['A']

a    1
b    2
c    3
d    4
Name: A, dtype: int64

In [61]:
df.A

a    1
b    2
c    3
d    4
Name: A, dtype: int64

Note that while using dot notation is concise, it has some limitations and potential issues. You can use this access only if the index element is a valid Python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.

In [62]:
df[['A', 'C']]

Unnamed: 0,A,C
a,1,9
b,2,10
c,3,11
d,4,12


In [63]:
df.loc['a']

A    1
B    5
C    9
Name: a, dtype: int64

In [64]:
df.iloc[0]

A    1
B    5
C    9
Name: a, dtype: int64

In [65]:
df.loc[['a', 'c']]

Unnamed: 0,A,B,C
a,1,5,9
c,3,7,11


In [66]:
df.iloc[[0,2]]

Unnamed: 0,A,B,C
a,1,5,9
c,3,7,11


In [67]:
df.loc['a':'c'] # Output: DataFrame with rows from index label 1 to 3 (inclusive)

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11


In [68]:
df.iloc[0:3] # Output: DataFrame with rows from integer position 1 to 3 (exclusive)

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11


In [69]:
df.loc[:, 'A':'C'] # Output: DataFrame with columns from 'A' to 'C' (inclusive)

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11
d,4,8,12


In [70]:
df.iloc[:, 0:2] # Output: DataFrame with columns from integer position 0 to 2 (exclusive)

Unnamed: 0,A,B
a,1,5
b,2,6
c,3,7
d,4,8


In [71]:
# if number index => Exclsivie
# if lable index => Inclusive

## Random Sampling in Pandas

In data analysis and machine learning, it is often useful to select random samples from your dataset. Pandas provides a convenient way to generate random samples from Series and DataFrames using the `sample()` method. This allows you to create subsets of your data for various purposes, such as testing, validation, or exploratory analysis.

The `sample()` method in Pandas allows you to randomly select a specified number of rows from a Series or DataFrame. It returns a new Series or DataFrame containing the randomly selected samples. It also provides several parameters to control the random sampling process:

### Parameters of `sample()`:

- **`n`**: The number of items to return. If not specified, a single item is returned.
- **`frac`**: The fraction of items to return. If `n` is not specified, `frac` must be between 0 and 1.
- **`replace`**: Whether to allow sampling with replacement. If `True`, selected items can be chosen again.
- **`weights`**: Probabilities associated with each item. If not specified, items are chosen with equal probability.
- **`random_state`**: The seed for the random number generator. Specifying a fixed value allows for reproducibility.

In [72]:
df.sample(frac=0.5)

Unnamed: 0,A,B,C
b,2,6,10
c,3,7,11


In [73]:
df.sample(n=3, replace=True)

Unnamed: 0,A,B,C
a,1,5,9
a,1,5,9
c,3,7,11


In [75]:
df

Unnamed: 0,A,B,C
a,1,5,9
b,2,6,10
c,3,7,11
d,4,8,12


In [77]:
weights = [0.1, 0.2, 0.3, 0.2]
df.sample(n=3, weights=weights)

# Specifying probabilities for each row

Unnamed: 0,A,B,C
c,3,7,11
d,4,8,12
b,2,6,10


In [81]:
df.sample(n=3, random_state=42)

Unnamed: 0,A,B,C
b,2,6,10
d,4,8,12
a,1,5,9


Pandas offers two optimized methods for accessing and modifying individual scalar values in a DataFrame:

- **`at`**: This method is used for label-based scalar access and setting. It takes row and column labels as arguments.
- **`iat`**: This method is used for integer-based scalar access and setting. It takes row and column integer positions as arguments.

In [82]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
df

Unnamed: 0,A,B
a,1,4
b,2,5
c,3,6


In [83]:
df.at['b', 'A']

2

In [84]:
df.at['b', 'A'] = 10
df

Unnamed: 0,A,B
a,1,4
b,10,5
c,3,6


In [85]:
df.iat[1, 0]

10

In [86]:
df.iat[1, 0] = 100
df

Unnamed: 0,A,B
a,1,4
b,100,5
c,3,6


Using at and iat for scalar value access and modification provides performance benefits compared to using loc and iloc or standard indexing with square brackets []. The at and iat methods are optimized for fast scalar access and bypass some of the overhead associated with the more flexible indexing methods.

However, it's important to note that the performance gains are most significant when accessing or modifying a single scalar value. If you need to access or modify multiple values or slices of data, using loc or iloc may be more appropriate and can still provide good performance.