# 1) Pandas filtering

- filtering by:
  - column names (labels)
  - actual data (Values)


## 1.1) Filter data by labels

- **filter()**


In [1]:
import pandas as pd

# create a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "Marketing", "Marketing", "IT"],
    "Salary": [50000, 60000, 55000, 70000],
}

df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")

# use the filter() method to select columns based on a condition
filtered_df = df.filter(items=["Name", "Salary"])

# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)

Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000


Filtered DataFrame:
      Name  Salary
0    Alice   50000
1      Bob   60000
2  Charlie   55000
3    David   70000


## 1.2) Filter data by values

- pomocou:
  - logical operator
  - isin() metoda
  - str accessor
  - query() metoda


### 1.2.1) Logical operators


In [2]:
import pandas as pd

# create a sample DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "Marketing", "Marketing", "IT"],
    "Salary": [50000, 60000, 55000, 70000],
}

df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")

# use logical operators to filter
filtered_df = df[df.Salary > 55000]

# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)

Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000


Filtered DataFrame:
    Name Department  Salary
1    Bob  Marketing   60000
3  David         IT   70000


### 1.2.2) isin() method


In [3]:
import pandas as pd

# create a sample DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "Marketing", "Marketing", "IT"],
    "Salary": [50000, 60000, 55000, 70000],
}

df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")

# use isin() method
departments = ["HR", "IT"]
filtered_df = df[df.Department.isin(departments)]

# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)

Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000


Filtered DataFrame:
    Name Department  Salary
0  Alice         HR   50000
3  David         IT   70000


### 1.2.3) str accessor

- filtrovanie na zaklade stringu, tj. zadam slovo alebo cast slova a najde/vyfiltruje mi zhodu


In [4]:
import pandas as pd

# create a sample DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "Marketing", "Marketing", "IT"],
    "Salary": [50000, 60000, 55000, 70000],
}

df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")

# use str accessor
filtered_df = df[df.Department.str.contains("Market")]

# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)

Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000


Filtered DataFrame:
      Name Department  Salary
1      Bob  Marketing   60000
2  Charlie  Marketing   55000


### 1.2.4) query() method

- najpouzivanejsia metoda filtrovania
- poziadavka (query) obsahujuca podmienky filtrovania je vlozena ako string argument do query() metody


In [5]:
import pandas as pd

# create a sample DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Department": ["HR", "Marketing", "Marketing", "IT"],
    "Salary": [50000, 60000, 55000, 70000],
}

df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")

# use query method
filtered_df = df.query('Salary > 55000 and Department == "Marketing"')

# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)

Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000


Filtered DataFrame:
  Name Department  Salary
1  Bob  Marketing   60000
