### Pandas - Data Analysis & Manipulation tool

*Reference: Python Pandas 7.pdf (P1)*
#### Pandas is built on top of NumPy
- It means Pandas uses NumPy arrays internally to store data and perform fast calculations.
#### It adds high-level data structures and tools.
- It means Pandas provides easy to use ready-made structures like
    - Series (Single Column)
    - DataFrame (table like Excel)
- It also gives tools for sorting, filtering, grouping and cleaning data.
#### That make it easier to work with tabular, labeled, or heterogeneous datasets
1. `Tabular Data`: Data in rows & columns form.
2. `Labeled Data`: Data with row labels (index) and column names.
3. `Heterogenous Data`: Data with different data types in one table.

In [2]:
# Usage of pandas
import pandas as pd
df = pd.DataFrame({
    "Name":["Rohit","Namo"],
    "CGPA":[9.5,9.7]
})
print(df)

    Name  CGPA
0  Rohit   9.5
1   Namo   9.7


In [3]:
# 1. Series: 
# Series is one-dimensional data structure in Pandas that can store a list of values along with labels (index).
#  It can hold data of any type: integers, floats, strings, Python objects. 
# Values in series can be accessed using indexing and labels.

s = pd.Series(["Rohit","Namo","Manya"])
# s = pd.Series([99,95,92])
# s = pd.Series([99.2,95.5,92.1])
print(s)

# tells the data structure (here Series)
print(type(s))

# tells the data type of the values stored in it.
print(s.dtype)

# Accessing values using Indexing
print(s[1])
print(s[0])


0    Rohit
1     Namo
2    Manya
dtype: object
<class 'pandas.core.series.Series'>
object
Namo
Rohit


In [4]:
# Series with custom indexing
s2 = pd.Series([21,20,25,26],index = ["Rohit","Namo","Charlie","Bob"])
print(s2)

# Accessing values using labels (custom indexing)
print(s2["Rohit"])
print(s2["Namo"])

# Accessing Values using indexes
print(s2[0])

print(s2.index)

Rohit      21
Namo       20
Charlie    25
Bob        26
dtype: int64
21
20
21
Index(['Rohit', 'Namo', 'Charlie', 'Bob'], dtype='object')


  print(s2[0])


In [5]:
# Chracteristics of a series:
# 1. They are Homogeneous - store one type of data. 

# 2. They support Vectorized operations. 
sv = pd.Series([1,2,3])
sv2 = pd.Series([4,5,6])
print(sv + sv2)

# 3. They can handle missing values with NaN. (We see it later) 

# 4. They have mutable values but immutable size, it means modification is allowed 
# for existing data but new data can't be add or old data can't be removed.
# If we try to do a new series will be created.

s = pd.Series([1,2,3,4,5])
s[0] = 100
print(s)

change_s = s.drop(0)

print(s)
print(change_s)


0    5
1    7
2    9
dtype: int64
0    100
1      2
2      3
3      4
4      5
dtype: int64
0    100
1      2
2      3
3      4
4      5
dtype: int64
1    2
2    3
3    4
4    5
dtype: int64


In [6]:
# 2. Dataframe:
# DataFrame is a 2 dimensional, tabular data structure.
# Contains: Rows, Cols, Row labels & Col labels
# Each column in a DataFrame is a "Series".
# DataFrame can be created in 3 ways:
# 1. Using Dictionary
# 2. Using Lists 
# 3. Using Numpy array

info = {
    "Name" : ["Adam", "Eve", "Bob"],  
    "Marks" : [78, 99, 85],  
    "Grade" : ['B', 'O', 'A']  
}

df = pd.DataFrame(info)

print(df)
print(type(df))

print(df.index)
print(df.columns)

# if want to visualize the dataframe table well, write this in next cell
# df

   Name  Marks Grade
0  Adam     78     B
1   Eve     99     O
2   Bob     85     A
<class 'pandas.core.frame.DataFrame'>
RangeIndex(start=0, stop=3, step=1)
Index(['Name', 'Marks', 'Grade'], dtype='object')


In [7]:
df

Unnamed: 0,Name,Marks,Grade
0,Adam,78,B
1,Eve,99,O
2,Bob,85,A


In [9]:
# Creating DataFrom using Lists 
l1 = [["Rohit",96],["Namo",96],["Manya",97]]
df3 = pd.DataFrame(l1,columns=["Name","Marks"])

In [10]:
df3

Unnamed: 0,Name,Marks
0,Rohit,96
1,Namo,96
2,Manya,97


In [11]:
# Creating DataFrom using Numpy array 
import numpy as np
np_arr = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
]) 
df4 = pd.DataFrame(np_arr,columns=["Col1","Col2","Col3"])
print(df4)

   Col1  Col2  Col3
0     1     2     3
1     4     5     6
2     7     8     9


#### Using Pandas to read CSV & JSON files


In [12]:
# CSV Data
df = pd.read_csv("employee_data.csv")
print(df,"\n",type(df))

   ID     Name  Age Department  Salary
0   1    Alice   25         HR   55000
1   2      Bob   32         IT   72000
2   3  Charlie   28    Finance   48000
3   4    David   45  Marketing   91000
4   5      Eva   38         IT   65000
5   6    Frank   29    Finance   50000
6   7    Grace   41         HR   82000
7   8   Hannah   26  Marketing   47000
8   9      Ian   35         IT   75000
9  10    Julia   30    Finance   60000 
 <class 'pandas.core.frame.DataFrame'>


In [13]:
# JSON Data 
df = pd.read_json("employee_data.json")
print(df)

   ID     Name  Age Department  Salary
0   1    Alice   25         HR   55000
1   2      Bob   32         IT   72000
2   3  Charlie   28    Finance   48000
3   4    David   45  Marketing   91000
4   5      Eva   38         IT   65000
5   6    Frank   29    Finance   50000
6   7    Grace   41         HR   82000
7   8   Hannah   26  Marketing   47000
8   9      Ian   35         IT   75000
9  10    Julia   30    Finance   60000


In [14]:
# Exporting Data using Pandas
# df.to_csv("temp.csv")
# df.to_json("temp2.json")
df.to_csv("output.csv", index=False)  # exporting without index It means: do NOT save the DataFrame index in the CSV file.

#### DataFrame Methods

In [15]:
data = {  
'Name': ['Aarav', 'Isha', 'Rohan', 'Sneha', 'Vikram'],  
'Age': [25, 30, 35, 40, 45],  
'City': ['Delhi', 'Mumbai', 'Bangalore', 'Kolkata', 'Chennai']  
}  
df = pd.DataFrame(data)

# List of Dataframe methods
print(df.head()) #Shows the first n rows (default = 5) 
print()
print(df.tail(2)) #Shows the last n rows (default = 5) 
print()
print(df.sample()) #Shows random n rows (default = 1)
print()
print(df.info()) # Displays column names, data types, memory usage  
print()
print(df.describe()) # Shows descriptive statistics for numeric columns.
print()
print(df.nunique()) # It counts how many different (distinct) values are present.
print()

# List of Dataframe attributes:
# What is Attributes & methods ?
# Here df is = A dataframe object and an object has: Information about itself → attributes + Actions it can do → methods
print(df.shape) #  Returns (rows, columns). 
print(df.columns) # List of column names
print(df.dtypes) # Datatype of each column

     Name  Age       City
0   Aarav   25      Delhi
1    Isha   30     Mumbai
2   Rohan   35  Bangalore
3   Sneha   40    Kolkata
4  Vikram   45    Chennai

     Name  Age     City
3   Sneha   40  Kolkata
4  Vikram   45  Chennai

    Name  Age       City
2  Rohan   35  Bangalore

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 252.0+ bytes
None

             Age
count   5.000000
mean   35.000000
std     7.905694
min    25.000000
25%    30.000000
50%    35.000000
75%    40.000000
max    45.000000

Name    5
Age     5
City    5
dtype: int64

(5, 3)
Index(['Name', 'Age', 'City'], dtype='object')
Name    object
Age      int64
City    object
dtype: object


### Now Working on real life data sets

In [17]:
# To arrange real life data we will use kaggle.com
# Here we would use "Global Air Quality Data(15 Days Hourly, 50 Cities)"
# Download the above given data as a zip file

aq = pd.read_csv("globalAirQuality.csv")
print(aq)


                        timestamp country      city  latitude  longitude  \
0      2025-11-04 18:25:17.554219      US  New York    40.713    -74.006   
1      2025-11-04 19:25:17.554219      US  New York    40.713    -74.006   
2      2025-11-04 20:25:17.554219      US  New York    40.713    -74.006   
3      2025-11-04 21:25:17.554219      US  New York    40.713    -74.006   
4      2025-11-04 22:25:17.554219      US  New York    40.713    -74.006   
...                           ...     ...       ...       ...        ...   
17995  2025-11-19 13:25:17.554219      CH    Zurich    47.377      8.542   
17996  2025-11-19 14:25:17.554219      CH    Zurich    47.377      8.542   
17997  2025-11-19 15:25:17.554219      CH    Zurich    47.377      8.542   
17998  2025-11-19 16:25:17.554219      CH    Zurich    47.377      8.542   
17999  2025-11-19 17:25:17.554219      CH    Zurich    47.377      8.542   

         pm25     pm10     no2    so2      o3     co  aqi  temperature  \
0      50.295

In [19]:
aq.head()

Unnamed: 0,timestamp,country,city,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
0,2025-11-04 18:25:17.554219,US,New York,40.713,-74.006,50.295,108.938,27.998,6.539,52.568,1.096,108,18.504,70.168,3.725
1,2025-11-04 19:25:17.554219,US,New York,40.713,-74.006,32.083,63.043,36.12,4.021,43.536,1.075,90,5.838,80.088,8.969
2,2025-11-04 20:25:17.554219,US,New York,40.713,-74.006,42.25,82.553,26.935,9.538,23.32,0.977,84,31.833,62.783,9.65
3,2025-11-04 21:25:17.554219,US,New York,40.713,-74.006,30.403,79.951,63.536,7.609,31.369,0.23,158,23.14,89.153,8.956
4,2025-11-04 22:25:17.554219,US,New York,40.713,-74.006,21.083,66.423,38.997,6.919,45.615,1.085,97,13.632,76.499,4.017


In [20]:
aq.tail()

Unnamed: 0,timestamp,country,city,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
17995,2025-11-19 13:25:17.554219,CH,Zurich,47.377,8.542,27.899,74.179,41.474,6.677,50.869,1.028,103,7.079,52.443,7.452
17996,2025-11-19 14:25:17.554219,CH,Zurich,47.377,8.542,2.95,47.988,42.235,2.821,35.551,0.644,105,28.734,85.678,4.496
17997,2025-11-19 15:25:17.554219,CH,Zurich,47.377,8.542,61.347,72.908,46.976,5.763,66.492,0.947,122,21.951,72.311,9.66
17998,2025-11-19 16:25:17.554219,CH,Zurich,47.377,8.542,40.722,95.152,32.957,5.524,53.193,0.868,95,24.042,31.88,2.642
17999,2025-11-19 17:25:17.554219,CH,Zurich,47.377,8.542,25.83,30.411,35.317,4.336,66.246,0.848,88,8.529,59.104,4.403


In [21]:
aq.describe()

Unnamed: 0,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
count,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0,18000.0
mean,23.06598,37.65556,40.369131,70.152228,32.055176,6.035508,48.0651,0.800595,104.645556,21.510251,57.714351,5.28391
std,26.156536,78.600701,17.64745,24.99944,13.82068,2.45479,14.950849,0.250254,25.61607,9.509444,18.844908,2.741712
min,-37.814,-123.121,0.025,0.061,0.013,0.003,0.114,0.0,16.0,5.0,25.002,0.5
25%,12.972,2.352,27.9045,53.1255,22.3625,4.36075,38.0285,0.633,87.0,13.35775,41.32,2.937
50%,29.232,42.146,40.2865,69.961,32.0195,6.026,48.142,0.8005,103.0,21.4555,57.847,5.297
75%,41.008,103.82,52.43625,87.2565,41.36425,7.71525,58.2585,0.969,121.0,29.68825,74.23475,7.662
max,60.17,174.763,115.683,161.81,90.019,16.559,103.016,1.832,231.0,37.998,89.997,9.999


In [None]:
# Accessing values from DataFrame:

# 1. Column Wise
aq["country"] #single column
aq[["city","aqi"]] #multiple column

Unnamed: 0,city,aqi
0,New York,108
1,New York,90
2,New York,84
3,New York,158
4,New York,97
...,...,...
17995,Zurich,103
17996,Zurich,105
17997,Zurich,122
17998,Zurich,95


In [37]:
# 2. Row Wise (Label & index based) - loc & iloc
# To access rows can use "indexers"

# a) loc - 
# stands for location (for labels) 

print(aq.loc[0]) # Accessing first data of the row
print(aq.loc[0:2]) # Accessing Multiple row through slicing, but here 2 will be included

# b) iloc - Integer Location
# it is used for position-based indexing
aq.iloc[0:2] # multiple row, here 2 will be excluded (n-1)


timestamp      2025-11-04 18:25:17.554219
country                                US
city                             New York
latitude                           40.713
longitude                         -74.006
pm25                               50.295
pm10                              108.938
no2                                27.998
so2                                 6.539
o3                                 52.568
co                                  1.096
aqi                                   108
temperature                        18.504
humidity                           70.168
wind_speed                          3.725
Name: 0, dtype: object
                    timestamp country      city  latitude  longitude    pm25  \
0  2025-11-04 18:25:17.554219      US  New York    40.713    -74.006  50.295   
1  2025-11-04 19:25:17.554219      US  New York    40.713    -74.006  32.083   
2  2025-11-04 20:25:17.554219      US  New York    40.713    -74.006  42.250   

      pm10     no2    so2 

Unnamed: 0,timestamp,country,city,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
0,2025-11-04 18:25:17.554219,US,New York,40.713,-74.006,50.295,108.938,27.998,6.539,52.568,1.096,108,18.504,70.168,3.725
1,2025-11-04 19:25:17.554219,US,New York,40.713,-74.006,32.083,63.043,36.12,4.021,43.536,1.075,90,5.838,80.088,8.969


In [None]:
# 3. Accessing Cells (rows, columns)
# a) for label based
print(aq.loc[0,"aqi"]) # row, column
print(aq.loc[0,["aqi","city"]]) 
print(aq.loc[0:2,["timestamp","country","city"]])

# b) for index based
print(aq.iloc[0:2,0:2]) 

108
aqi          108
city    New York
Name: 0, dtype: object
                    timestamp country      city
0  2025-11-04 18:25:17.554219      US  New York
1  2025-11-04 19:25:17.554219      US  New York
2  2025-11-04 20:25:17.554219      US  New York
                    timestamp country
0  2025-11-04 18:25:17.554219      US
1  2025-11-04 19:25:17.554219      US


In [None]:
# 4. Accessing Single Scalaer value (Single Cells) - at & iat
# a) for label based
aq.at[0,"city"]


'New York'

In [45]:
# b) for index based
aq.iat[0,2]

'New York'

In [51]:
# While accessing dataframe values we always get a view instead of copy
# So changes made to the extracted data can affect the original DataFrame

new_data = aq["city"] # Since Dataframe returns a view which is a reference to the original data, not a separate copy.
new_data[0] = "New York" # New Data will have original reference to the data
print(aq)

# If we want to test or modify data without affecting the original DataFrame,
# we should explicitly create a copy using the .copy() method.
# This ensures that changes are made only to the copied data.

# Creating a copy to avoid modifying the original DataFrame
new_data = aq["city"].copy()
new_data[0] = "Mumbai"
print(aq) 



                        timestamp country      city  latitude  longitude  \
0      2025-11-04 18:25:17.554219      US  New York    40.713    -74.006   
1      2025-11-04 19:25:17.554219      US  New York    40.713    -74.006   
2      2025-11-04 20:25:17.554219      US  New York    40.713    -74.006   
3      2025-11-04 21:25:17.554219      US  New York    40.713    -74.006   
4      2025-11-04 22:25:17.554219      US  New York    40.713    -74.006   
...                           ...     ...       ...       ...        ...   
17995  2025-11-19 13:25:17.554219      CH    Zurich    47.377      8.542   
17996  2025-11-19 14:25:17.554219      CH    Zurich    47.377      8.542   
17997  2025-11-19 15:25:17.554219      CH    Zurich    47.377      8.542   
17998  2025-11-19 16:25:17.554219      CH    Zurich    47.377      8.542   
17999  2025-11-19 17:25:17.554219      CH    Zurich    47.377      8.542   

         pm25     pm10     no2    so2      o3     co  aqi  temperature  \
0      50.295

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data[0] = "New York" # New Data will have original reference to the data


### Filtering Data

In [None]:
# Filtering Data
# a) Boolean Filtering
 
# 1. Those cities whose AQI is Greater Than 100.
aq[aq["aqi"]>100]

Unnamed: 0,timestamp,country,city,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
0,2025-11-04 18:25:17.554219,US,New York,40.713,-74.006,50.295,108.938,27.998,6.539,52.568,1.096,108,18.504,70.168,3.725
3,2025-11-04 21:25:17.554219,US,New York,40.713,-74.006,30.403,79.951,63.536,7.609,31.369,0.230,158,23.140,89.153,8.956
6,2025-11-05 00:25:17.554219,US,New York,40.713,-74.006,77.690,65.198,20.302,7.641,62.687,0.734,155,36.729,47.651,4.542
7,2025-11-05 01:25:17.554219,US,New York,40.713,-74.006,57.816,111.709,34.533,6.945,41.304,0.771,115,37.891,53.314,7.605
8,2025-11-05 02:25:17.554219,US,New York,40.713,-74.006,60.914,44.775,40.936,6.779,36.751,1.014,121,24.092,48.148,8.686
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17991,2025-11-19 09:25:17.554219,CH,Zurich,47.377,8.542,52.120,71.160,35.405,7.752,39.811,0.878,104,32.556,31.372,2.787
17993,2025-11-19 11:25:17.554219,CH,Zurich,47.377,8.542,76.613,37.495,29.662,5.419,57.042,1.103,153,20.437,79.602,2.663
17995,2025-11-19 13:25:17.554219,CH,Zurich,47.377,8.542,27.899,74.179,41.474,6.677,50.869,1.028,103,7.079,52.443,7.452
17996,2025-11-19 14:25:17.554219,CH,Zurich,47.377,8.542,2.950,47.988,42.235,2.821,35.551,0.644,105,28.734,85.678,4.496


In [None]:
# 2. Those cities whose AQI is Greater Than 100 & temperature is more than 30

# for conditional statments in dataframe
# We can use & for AND, | for OR, ~ for NOT and wrap each condition in parentheses (). 
aq[(aq["aqi"]>100) & (aq["temperature"]>30)]

Unnamed: 0,timestamp,country,city,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
6,2025-11-05 00:25:17.554219,US,New York,40.713,-74.006,77.690,65.198,20.302,7.641,62.687,0.734,155,36.729,47.651,4.542
7,2025-11-05 01:25:17.554219,US,New York,40.713,-74.006,57.816,111.709,34.533,6.945,41.304,0.771,115,37.891,53.314,7.605
14,2025-11-05 08:25:17.554219,US,New York,40.713,-74.006,38.815,121.394,25.969,4.112,44.730,1.124,121,34.481,81.790,3.576
17,2025-11-05 11:25:17.554219,US,New York,40.713,-74.006,63.552,57.838,13.754,4.427,42.251,0.997,127,36.193,62.906,7.097
25,2025-11-05 19:25:17.554219,US,New York,40.713,-74.006,56.954,75.045,29.837,9.541,54.198,1.170,113,31.289,43.279,5.264
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17947,2025-11-17 13:25:17.554219,CH,Zurich,47.377,8.542,53.938,75.183,16.255,7.263,67.244,0.457,107,32.584,81.544,7.101
17950,2025-11-17 16:25:17.554219,CH,Zurich,47.377,8.542,70.255,60.620,32.081,2.065,47.803,0.852,140,36.885,34.173,1.748
17975,2025-11-18 17:25:17.554219,CH,Zurich,47.377,8.542,33.406,98.160,44.654,0.807,45.160,0.737,111,30.761,76.073,3.570
17976,2025-11-18 18:25:17.554219,CH,Zurich,47.377,8.542,73.493,89.558,9.313,4.768,51.505,0.745,146,30.909,81.573,1.024


In [54]:
# 3. Those cities whose AQI is lesser than 100 & temprature is less than 30 but only city & Country Name
aq[(aq["aqi"]<100) & (aq["temperature"]<30)][["city","country"]]

Unnamed: 0,city,country
1,New York,US
4,New York,US
5,New York,US
9,New York,US
10,New York,US
...,...,...
17986,Zurich,CH
17989,Zurich,CH
17992,Zurich,CH
17998,Zurich,CH


In [None]:
aqi_data = aq[(aq["aqi"]<100) & (aq["temperature"]<30)][["city","country"]]
# Pandas preserves column names and index labels even when assigned to a new variable.
print(aqi_data) 

# iloc[] is used for position-based indexing (0 means first row in the filtered result)
print(aqi_data.iloc[0])

# loc[] is used for label-based indexing.
# The index value 1 refers to the original DataFrame index, not the row position.
# aqi_data.loc[0] #throw error since not exist
print(aqi_data.loc[1]) 


           city country
1      New York      US
4      New York      US
5      New York      US
9      New York      US
10     New York      US
...         ...     ...
17986    Zurich      CH
17989    Zurich      CH
17992    Zurich      CH
17998    Zurich      CH
17999    Zurich      CH

[6106 rows x 2 columns]
city       New York
country          US
Name: 1, dtype: object
city       New York
country          US
Name: 1, dtype: object


In [None]:
# b) Filtering data using query()
# Query method provides us with SQL-like filtering. We can pass our query in a string.
# Query returns a COPY, not a VIEW. 

# Query String Rules:
# 1. We write the condition in a string.
# 2. We can use operators like and, or, not, ==, !=, >, <, >=, <= etc.
# 3. Use backticks for column names with spaces or symbols.
# 4. Use @ for python variables

# Rule of thumb:
# Direct filtering using df[ ... ]
# Use & for "AND"
# Use | for "OR"
# Conditions must be wrapped in parentheses ()

# Using query()
# - Write conditions as strings
# - Use 'and' for AND
# - Use 'or' for OR

aq.query("aqi>100")

Unnamed: 0,timestamp,country,city,latitude,longitude,pm25,pm10,no2,so2,o3,co,aqi,temperature,humidity,wind_speed
0,2025-11-04 18:25:17.554219,US,New York,40.713,-74.006,50.295,108.938,27.998,6.539,52.568,1.096,108,18.504,70.168,3.725
3,2025-11-04 21:25:17.554219,US,New York,40.713,-74.006,30.403,79.951,63.536,7.609,31.369,0.230,158,23.140,89.153,8.956
6,2025-11-05 00:25:17.554219,US,New York,40.713,-74.006,77.690,65.198,20.302,7.641,62.687,0.734,155,36.729,47.651,4.542
7,2025-11-05 01:25:17.554219,US,New York,40.713,-74.006,57.816,111.709,34.533,6.945,41.304,0.771,115,37.891,53.314,7.605
8,2025-11-05 02:25:17.554219,US,New York,40.713,-74.006,60.914,44.775,40.936,6.779,36.751,1.014,121,24.092,48.148,8.686
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17991,2025-11-19 09:25:17.554219,CH,Zurich,47.377,8.542,52.120,71.160,35.405,7.752,39.811,0.878,104,32.556,31.372,2.787
17993,2025-11-19 11:25:17.554219,CH,Zurich,47.377,8.542,76.613,37.495,29.662,5.419,57.042,1.103,153,20.437,79.602,2.663
17995,2025-11-19 13:25:17.554219,CH,Zurich,47.377,8.542,27.899,74.179,41.474,6.677,50.869,1.028,103,7.079,52.443,7.452
17996,2025-11-19 14:25:17.554219,CH,Zurich,47.377,8.542,2.950,47.988,42.235,2.821,35.551,0.644,105,28.734,85.678,4.496


In [67]:
aq.query("aqi > 100 and temperature > 30")["city"]

6        New York
7        New York
14       New York
17       New York
25       New York
           ...   
17947      Zurich
17950      Zurich
17975      Zurich
17976      Zurich
17991      Zurich
Name: city, Length: 2384, dtype: object

In [68]:
aqi_var = 100
aq.query("aqi > @aqi_var and temperature > 30")[["city","country"]]

Unnamed: 0,city,country
6,New York,US
7,New York,US
14,New York,US
17,New York,US
25,New York,US
...,...,...
17947,Zurich,CH
17950,Zurich,CH
17975,Zurich,CH
17976,Zurich,CH
