This project analyzes daily weather data to classify high-humidity days using a decision tree classifier implemented with scikit-learn.
This project utilizes a decision tree algorithm to predict whether a day will have high humidity based on morning weather sensor readings. The data is from a weather station and includes measurements like air pressure, temperature, wind speed, and rain accumulation.
The data for this project is stored in a CSV file named data/data_weather.csv. This file contains weather data collected over a three-year period.
- Import Libraries: Necessary libraries like pandas and scikit-learn are imported for data manipulation and machine learning tasks.
- Load Data: The
data_weather.csvfile is loaded into a pandas DataFrame. - Data Exploration: The DataFrame is analyzed to understand its structure, columns, and potential missing values.
- Data Cleaning:
- The
'number'column, which is not required for analysis, is removed. - Missing values are handled using the pandas
dropnafunction.
- The
- Classification Task:
- The relative humidity at 3 pm is converted into a binary classification label (high humidity or not) based on a threshold.
- Feature Selection: Morning sensor readings (9 am) are chosen as features to predict afternoon (3 pm) humidity.
- Train-Test Split: The data is split into training and testing sets using
train_test_splitfrom scikit-learn. - Model Training: A decision tree classifier is trained on the training data using
DecisionTreeClassifierfrom scikit-learn. - Prediction: The trained model is used to predict humidity labels for the testing data.
- Evaluation: The accuracy of the model is evaluated using
accuracy_scorefrom scikit-learn.
- Ensure you have Python and the required libraries (pandas, scikit-learn) installed.
- Clone or download this repository.
- Navigate to the project directory in your terminal.
- Run the script using
python DAILY WEATHER DATA ANALYSIS USING DECISION TREE CLASSIFICATION.ipynb.
- This is a basic example of using a decision tree for weather classification.
- More advanced techniques and feature engineering could be explored for better performance.
- Implement hyperparameter tuning to optimize the decision tree model.
- Explore other classification algorithms for comparison.
- Visualize the decision tree to understand its decision-making process.
I hope this README provides a clear and informative overview of the project!