# Advanced operations with numpy and Pandas

## 1. Advanced operations with numpy

### 1.1 Advanced indexing and conditional selection

**Boolean indexing**

In [1]:
import numpy as np

# Create a NumPy array
data = np.array([10, 20, 30, 40, 50, 60])

# Create a boolean mask for values greater than 30
mask = data > 30
print("Boolean Mask (data > 30):", mask)

# Use the mask to filter the array
filtered_data = data[mask]
print("Filtered Data (values > 30):", filtered_data)


Boolean Mask (data > 30): [False False False  True  True  True]
Filtered Data (values > 30): [40 50 60]


**Fancy Indexing**

In [2]:
# Create a 2D NumPy array
data_2d = np.array([[10, 20, 30], 
                    [40, 50, 60], 
                    [70, 80, 90]])

# Use fancy indexing to select specific elements
# Selecting elements at (0, 1) and (2, 0)
fancy_indexed = data_2d[[0, 2], [1, 0]]  # 20 and 70
print("Fancy Indexed Data:", fancy_indexed)


Fancy Indexed Data: [20 70]


### 1.2 Using `np.where`

**Basic usage**

In [3]:
# Create a NumPy array
data = np.array([10, 20, 30, 40, 50, 60])

# Using np.where to find indices where data is greater than 30
indices = np.where(data > 30)
print("Indices of elements > 30:", indices[0])  # Accessing the first element, which contains indices


Indices of elements > 30: [3 4 5]


**Creating New Arrays with Conditions:**

In [4]:
# Create a NumPy array
data = np.array([10, 20, 30, 40, 50, 60])

# Use np.where to create a new array with conditions
new_data = np.where(data > 30, 'High', 'Low')
print("New Data based on conditions:", new_data)


New Data based on conditions: ['Low' 'Low' 'Low' 'High' 'High' 'High']


## 2. Advanced operations in Pandas

### Applying functions

**Using `apply()`**

In [5]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Apply a function to each column
result_apply = df.apply(lambda x: x.sum(), axis=0)  # Sum of each column
print("Using apply() to sum each column:\n", result_apply)


Using apply() to sum each column:
 A     6
B    15
dtype: int64


**Using `map()`**

In [6]:
# Create a Series
s = pd.Series([1, 2, 3, 4])

# Use map to square each element
result_map = s.map(lambda x: x ** 2)
print("Using map() to square each element:\n", result_map)


Using map() to square each element:
 0     1
1     4
2     9
3    16
dtype: int64


### Combining DataFrames

**Using `merge()`**


In [8]:
# Create two DataFrames
df1 = pd.DataFrame({
    'key': ['A', 'B', 'C'],
    'value1': [1, 2, 3]
})

df2 = pd.DataFrame({
    'key': ['B', 'C', 'D'],
    'value2': [4, 5, 6]
})

# Merge DataFrames on the 'key' column
result_merge = pd.merge(df1, df2, on='key', how='inner')  # Inner join
print("Using merge() to combine DataFrames:\n", result_merge)


Using merge() to combine DataFrames:
   key  value1  value2
0   B       2       4
1   C       3       5


**Using `join()`**

In [9]:
# Create two DataFrames with indices
df3 = pd.DataFrame({
    'value1': [1, 2, 3]
}, index=['A', 'B', 'C'])

df4 = pd.DataFrame({
    'value2': [4, 5]
}, index=['B', 'C'])

# Join DataFrames on index
result_join = df3.join(df4, how='inner')
print("Using join() to combine DataFrames:\n", result_join)


Using join() to combine DataFrames:
    value1  value2
B       2       4
C       3       5


**Using `concat()`**

In [10]:
# Create two DataFrames
df5 = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

df6 = pd.DataFrame({
    'A': [5, 6],
    'B': [7, 8]
})

# Concatenate DataFrames vertically
result_concat = pd.concat([df5, df6], axis=0)
print("Using concat() to concatenate DataFrames:\n", result_concat)


Using concat() to concatenate DataFrames:
    A  B
0  1  3
1  2  4
0  5  7
1  6  8
