<a href="https://colab.research.google.com/github/monikanaumovskaa/Introduction-to-Data-Science/blob/master/Laboratory_exercise_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Laboratory exercise 4

## Warm-Up Mode (2 points)

**Task Description**  
Using the given dataset, develop and implement **3** different neural networks to predict the **air quality level**. Each network should differ in the following ways:  

- **layer configurations** - use different numbers and types of layers;
- **activation functions** - try different activation functions;
- **neurons per layer** - experiment with different numbers of neurons in each layer; and
- **number of layers** - build networks with varying depths.

After developing the models, evaluate and compare the performance of all **3** approaches.

**About the Dataset**  
This dataset focuses on air quality assessment across various regions. The dataset contains 5,000 samples and captures critical environmental and demographic factors that influence pollution levels.

**Features**:  
- **Temperature (°C)**: Average temperature of the region.  
- **Humidity (%)**: Relative humidity recorded in the region.  
- **PM2.5 Concentration (µg/m³)**: Levels of fine particulate matter.  
- **PM10 Concentration (µg/m³)**: Levels of coarse particulate matter.  
- **NO2 Concentration (ppb)**: Nitrogen dioxide levels.  
- **SO2 Concentration (ppb)**: Sulfur dioxide levels.  
- **CO Concentration (ppm)**: Carbon monoxide levels.  
- **Proximity to Industrial Areas (km)**: Distance to the nearest industrial zone.  
- **Population Density (people/km²)**: Number of people per square kilometer in the region.  

**Target Variable**: **Air Quality**  
- **Good**: Clean air with low pollution levels.  
- **Moderate**: Acceptable air quality but with some pollutants present.  
- **Poor**: Noticeable pollution that may cause health issues for sensitive groups.  
- **Hazardous**: Highly polluted air posing serious health risks to the population.  

In [2]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

In [4]:
data = pd.read_csv("pollution_dataset.csv")

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 10 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Temperature                    5000 non-null   float64
 1   Humidity                       5000 non-null   float64
 2   PM2.5                          5000 non-null   float64
 3   PM10                           5000 non-null   float64
 4   NO2                            5000 non-null   float64
 5   SO2                            5000 non-null   float64
 6   CO                             5000 non-null   float64
 7   Proximity_to_Industrial_Areas  5000 non-null   float64
 8   Population_Density             5000 non-null   int64  
 9   Air Quality                    5000 non-null   object 
dtypes: float64(8), int64(1), object(1)
memory usage: 390.8+ KB


In [22]:
encoder = LabelEncoder()
data['Air Quality'] = encoder.fit_transform(data['Air Quality'])
#Кодирањето на целната променлива (Air Quality) го направивме со LabelEncoder, што ја трансформира категоријата на квалитетот на воздухот во броеви.

In [8]:
X = data.drop(columns=['Air Quality'])
y = data['Air Quality']

In [9]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [11]:
from keras.models import Sequential
from keras.layers import Dense

MODEL 1:    
Површна мрежа со само еден скриен слој и помал број на неврони (16 неврони) и ReLU функција на активација. Оваа мрежа е едноставна и лесна, но може да не ја улови целата сложеност на податоците.**bold text**

In [12]:
model1 = Sequential()
model1.add(Dense(16, input_dim=X_train.shape[1], activation='relu'))  # Single hidden layer
model1.add(Dense(4, activation='softmax'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [14]:
model1.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history1 = model1.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.4786 - loss: 1.1708 - val_accuracy: 0.6470 - val_loss: 0.7643
Epoch 2/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.7053 - loss: 0.7233 - val_accuracy: 0.7630 - val_loss: 0.5866
Epoch 3/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.7979 - loss: 0.5573 - val_accuracy: 0.8250 - val_loss: 0.4833
Epoch 4/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8350 - loss: 0.4804 - val_accuracy: 0.8620 - val_loss: 0.4128
Epoch 5/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8574 - loss: 0.4086 - val_accuracy: 0.8770 - val_loss: 0.3630
Epoch 6/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8783 - loss: 0.3662 - val_accuracy: 0.8790 - val_loss: 0.3269
Epoch 7/50
[1m125/125[0m 

In [15]:
score1 = model1.evaluate(X_test, y_test, verbose=0)
print(f"Model 1 - Test accuracy: {score1[1]:.4f}")
y_pred1 = model1.predict(X_test)
print("Model 1 - Classification Report:")
print(classification_report(y_test, y_pred1.argmax(axis=1)))

Model 1 - Test accuracy: 0.9470
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Model 1 - Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       409
           1       0.92      0.81      0.86       111
           2       0.96      0.96      0.96       294
           3       0.83      0.90      0.86       186

    accuracy                           0.95      1000
   macro avg       0.93      0.92      0.92      1000
weighted avg       0.95      0.95      0.95      1000



MODEL 2:    Подлабока мрежа со два скриени слоја (64 неврони по слој) и ReLU функција на активација. Овој модел има повеќе слоеви и може да научи посложени шаблони.

In [16]:
model2 = Sequential()
model2.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))  # First hidden layer
model2.add(Dense(64, activation='relu'))  # Second hidden layer
model2.add(Dense(4, activation='softmax'))  # Output layer (classification)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [17]:
model2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history2 = model2.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.6551 - loss: 0.9075 - val_accuracy: 0.8990 - val_loss: 0.3176
Epoch 2/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8976 - loss: 0.2970 - val_accuracy: 0.9170 - val_loss: 0.2253
Epoch 3/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9192 - loss: 0.2216 - val_accuracy: 0.9370 - val_loss: 0.1901
Epoch 4/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9277 - loss: 0.1909 - val_accuracy: 0.9350 - val_loss: 0.1734
Epoch 5/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9334 - loss: 0.1753 - val_accuracy: 0.9340 - val_loss: 0.1729
Epoch 6/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9337 - loss: 0.1761 - val_accuracy: 0.9310 - val_loss: 0.1745
Epoch 7/50
[1m125/125[0m 

In [18]:
# Evaluate the model
score2 = model2.evaluate(X_test, y_test, verbose=0)
print(f"Model 2 - Test accuracy: {score2[1]:.4f}")
y_pred2 = model2.predict(X_test)
print("Model 2 - Classification Report:")
print(classification_report(y_test, y_pred2.argmax(axis=1)))

Model 2 - Test accuracy: 0.9460
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Model 2 - Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       409
           1       0.83      0.90      0.87       111
           2       0.97      0.96      0.96       294
           3       0.87      0.84      0.85       186

    accuracy                           0.95      1000
   macro avg       0.92      0.92      0.92      1000
weighted avg       0.95      0.95      0.95      1000



MODEL 3:    Широка мрежа со два скриени слоја и повеќе неврони (128 неврони во секој слој) и различни функции на активација - tanh за првиот скриен слој и sigmoid за вториот. Оваа мрежа има поголем број на неврони и различни функции на активација, што и дава поголема флексибилност.

In [19]:
model3 = Sequential()
model3.add(Dense(128, input_dim=X_train.shape[1], activation='tanh'))  # First hidden layer with tanh
model3.add(Dense(128, activation='sigmoid'))  # Second hidden layer with sigmoid
model3.add(Dense(4, activation='softmax'))  # Output layer (classification)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [20]:
model3.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history3 = model3.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.5532 - loss: 1.0102 - val_accuracy: 0.8290 - val_loss: 0.4059
Epoch 2/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8669 - loss: 0.3628 - val_accuracy: 0.9130 - val_loss: 0.2949
Epoch 3/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.9122 - loss: 0.2813 - val_accuracy: 0.9230 - val_loss: 0.2390
Epoch 4/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.9339 - loss: 0.2128 - val_accuracy: 0.9320 - val_loss: 0.1974
Epoch 5/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9353 - loss: 0.1913 - val_accuracy: 0.9390 - val_loss: 0.1756
Epoch 6/50
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9362 - loss: 0.1759 - val_accuracy: 0.9320 - val_loss: 0.1825
Epoch 7/50
[1m125/125[0m 

In [21]:
score3 = model3.evaluate(X_test, y_test, verbose=0)
print(f"Model 3 - Test accuracy: {score3[1]:.4f}")
y_pred3 = model3.predict(X_test)
print("Model 3 - Classification Report:")
print(classification_report(y_test, y_pred3.argmax(axis=1)))

Model 3 - Test accuracy: 0.9550
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Model 3 - Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       409
           1       0.92      0.86      0.89       111
           2       0.96      0.97      0.96       294
           3       0.87      0.90      0.88       186

    accuracy                           0.95      1000
   macro avg       0.94      0.93      0.93      1000
weighted avg       0.96      0.95      0.96      1000

