## LOC

This block demonstrates the usage of loc which is a label-based data selection method in Pandas. It shows how to use loc to select rows and columns by their labels in a DataFrame.

In [2]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3'])

# Use loc to select rows and columns by label
print(df.loc['row1', 'A'])  # Output: 1
print(df.loc['row1'])       # Output: A    1
                            #         B    4
                            # Name: row1, dtype: int64
print(df.loc[:, 'A'])       # Output: row1    1
                            #         row2    2
                            #         row3    3
                            # Name: A, dtype: int64


1
A    1
B    4
Name: row1, dtype: int64
row1    1
row2    2
row3    3
Name: A, dtype: int64


In [3]:

(
    df
    .groupby("A")
    .sum()
    .reset_index()
    .loc[:, "B"]
)

0    4
1    5
2    6
Name: B, dtype: int64

In [4]:
df

Unnamed: 0,A,B
row1,1,4
row2,2,5
row3,3,6


## ILOC

This block demonstrates the usage of iloc, which is an integer-location based indexing for selection by position in a DataFrame.

In [5]:
# Use iloc to select rows and columns by integer location
print(df.iloc[0, 0])  # Output: 1
print(df.iloc[0])     # Output: A    1
                      #         B    4
                      # Name: row1, dtype: int64
print(df.iloc[:, 1])  # Output: row1    1
                      #         row2    2
                      #         row3    3
                      # Name: A, dtype: int64


1
A    1
B    4
Name: row1, dtype: int64
row1    4
row2    5
row3    6
Name: B, dtype: int64


Here a new column is created based on a condition in another column of the DataFrame.

In [6]:
import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
    'Math': [95, 89, 92, 88, 91],
    'English': [85, 95, 78, 89, 92]
}
df = pd.DataFrame(data)

# Create a new column "Math Pass/Fail" initialized with "Fail"
df['Math Pass/Fail'] = 'Fail'

# Update the "Math Pass/Fail" status of students who scored above 90 in Math to "Pass"
df.loc[df['Math'] > 90, 'Math Pass/Fail'] = 'Pass'

print(df)


      Name  Math  English Math Pass/Fail
0    Alice    95       85           Pass
1      Bob    89       95           Fail
2  Charlie    92       78           Pass
3    David    88       89           Fail
4   Edward    91       92           Pass


In [7]:
import pandas as pd

# Create a DataFrame
data = {
    'Region': ['North', 'North', 'South', 'South', 'East', 'East'],
    'Sales': [200, 220, 150, 180, 300, 320]
}
df = pd.DataFrame(data)

# Here's a one-liner to filter the groups based on a condition on the grouped data
result = df.loc[df.groupby('Region')['Sales'].transform('sum') > 400]

print(result)


  Region  Sales
0  North    200
1  North    220
4   East    300
5   East    320


## Concatenation

Here, two DataFrames are concatenated along the rows.

In [8]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})

# Concatenate DataFrames
concatenated_df = pd.concat([df1, df2], ignore_index=True)
print(concatenated_df)


    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3


## Melt, Stack, Unstack

This block demonstrates how to restructure a DataFrame by melting, stacking, and unstacking it.

In [9]:
df

Unnamed: 0,Region,Sales
0,North,200
1,North,220
2,South,150
3,South,180
4,East,300
5,East,320


In [10]:
import pandas as pd

# Creating DataFrame
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
                   'B': {0: 1, 1: 3, 2: 5},
                   'C': {0: 2, 1: 4, 2: 6}})

# Melting DataFrame
melted_df = df.melt(id_vars=['A'], value_vars=['B', 'C'])

print(melted_df)

   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6


In [11]:
df

Unnamed: 0,A,B,C
0,a,1,2
1,b,3,4
2,c,5,6


In [12]:
# Stacking DataFrame
stacked_df = df.stack()

stacked_df

0  A    a
   B    1
   C    2
1  A    b
   B    3
   C    4
2  A    c
   B    5
   C    6
dtype: object

In [13]:
# Unstacking DataFrame
unstacked_df = stacked_df.unstack()

print(unstacked_df)

   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6


## Apply

The apply function is used to apply a function along the axis of a DataFrame (either rows or columns).

In the provided code block:

In [14]:
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Define a function to apply
def square(x):
    return x**2

# Use apply to apply the function to each column
squared_df = df.apply(square)
print(squared_df)

   A   B
0  1  16
1  4  25
2  9  36


In [15]:
df

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [16]:
df.loc[:, "square_B"] = df["B"].apply(square)

In [17]:
def mult(t):
    return t[0] * t[1]

In [18]:
df.loc[:, "mult_AB"] = df[["A", "B"]].apply(mult, axis=1)

  return t[0] * t[1]


In [19]:
df

Unnamed: 0,A,B,square_B,mult_AB
0,1,4,16,4
1,2,5,25,10
2,3,6,36,18


## Lambda Functions with Apply

Lambda functions in Python are lightweight, throw-away functions created without a name. They come in handy when we want to perform a quick operation without the overhead of formally defining a function. In conjunction with apply, lambda functions provide a fast way to operate on DataFrame elements without the need for explicit loop structures.

In [20]:
# Use apply with a lambda function to calculate the square root of each column
sqrt_df = df.apply(lambda x: x**0.5)
print(sqrt_df)

          A         B  square_B   mult_AB
0  1.000000  2.000000       4.0  2.000000
1  1.414214  2.236068       5.0  3.162278
2  1.732051  2.449490       6.0  4.242641
