# Climate Data Analysis for Research Center

## Assignment Overview
In this assignment, we are tasked with analyzing climate data collected from 500 locations over the span of one year. The data includes daily **temperature** and **humidity** values. As data scientists, our goal is to perform several analyses to uncover climate trends, seasonal patterns, and other metrics that can support climate research efforts.

### Goals of Each Task
1. **Initialize Temperature and Humidity Data**: Create arrays to store temperature (in Celsius) and humidity values for each location across 365 days.
2. **Simulate and Check Missing Data**: Introduce missing values randomly and calculate the total number of missing entries.
3. **Convert Temperature and Calculate Discomfort Index**: Convert temperatures to Fahrenheit and compute a "feels like" index to represent perceived discomfort.
4. **Analyze January Temperatures**: Extract and analyze data specific to January, calculating the average temperature.
5. **Identify Extreme Temperatures**: Mark potentially erroneous data by replacing extreme temperatures with `NaN` values and count them per location.
6. **Calculate Quarterly Temperature Averages**: Calculate the average temperature per season by dividing the year into four quarters.
7. **Classify Humidity Levels**: Categorize daily humidity as "Dry" or "Humid" and count occurrences per location.
8. **Apply Daily Pressure Trend**: Simulate atmospheric pressure changes and apply a trend adjustment to temperature data.

### Instructions
Each task is implemented in its own cell with comments and explanations provided throughout the notebook to document the approach and methodology used.

---
**Note**: Vectorized operations are used wherever possible to improve computational efficiency and avoid using loops.


### Task 1: Initialize Temperature and Humidity Data
We begin by creating two arrays to represent daily climate data for 500 locations across 365 days. The data consists of:
- **temperature_data**: Random temperature values (in Celsius) within the range -10°C to 40°C.
- **humidity_data**: Random humidity values in percentage, ranging from 0% to 100%.

These arrays will allow us to perform analysis and simulations of climate trends across various locations over a year.


In [7]:
import numpy as np

# Initialize random seed for reproducibility
np.random.seed(0)

# Generate random temperature data in Celsius (-10°C to 40°C)
temperature_data = np.random.uniform(-10, 40, (500, 365))

# Generate random humidity data in percentage (0% to 100%)
humidity_data = np.random.uniform(0, 100, (500, 365))

# Display the shapes to verify
temperature_data.shape, humidity_data.shape


((500, 365), (500, 365))

### Task 2: Check for Missing Data
To simulate real-world scenarios where data might be incomplete, we randomly set 5% of the entries in both temperature and humidity arrays to `NaN`. We then count the total number of missing entries in each array to confirm the simulation of missing data.


In [8]:
# Simulate missing data by setting 5% of the values to NaN in both arrays
num_missing = int(0.05 * temperature_data.size)
temperature_data.flat[np.random.choice(temperature_data.size, num_missing, replace=False)] = np.nan
humidity_data.flat[np.random.choice(humidity_data.size, num_missing, replace=False)] = np.nan

# Count missing values in each array
missing_temperature = np.isnan(temperature_data).sum()
missing_humidity = np.isnan(humidity_data).sum()

# Output the missing counts
missing_temperature, missing_humidity


(9125, 9125)

### Task 3: Convert Temperature and Calculate Discomfort Index
1. **Convert Celsius to Fahrenheit**: To facilitate data sharing with international teams, we convert the temperature data from Celsius to Fahrenheit.
2. **Calculate "Feels Like" Discomfort Index**: We calculate a discomfort index, which combines temperature and humidity to estimate perceived discomfort levels.
3. **Cap the Discomfort Index**: Any index value above 80 is capped at 80 to avoid extreme values.


In [9]:
# Convert temperature data from Celsius to Fahrenheit
temperature_data_fahrenheit = temperature_data * 9/5 + 32

# Calculate the discomfort index (basic formula for demonstration purposes)
discomfort_index = temperature_data_fahrenheit + 0.5 * humidity_data

# Cap discomfort index at 80
discomfort_index = np.where(discomfort_index > 80, 80, discomfort_index)

# Display the first few values for verification
discomfort_index[:5, :5]


array([[80.        , 80.        , 75.11449784,         nan, 79.96740392],
       [80.        , 80.        , 80.        , 80.        ,         nan],
       [74.08437348, 80.        , 80.        , 80.        , 80.        ],
       [80.        , 18.52749376, 80.        , 80.        , 69.22447565],
       [80.        ,         nan, 80.        , 80.        , 80.        ]])

### Task 4: Analyze January Temperatures
We extract the temperature data for January, representing the first 31 days, and calculate the average temperature across all locations. This helps us understand the overall climate pattern in January.


In [10]:
# Extract January data (first 31 days)
january_temperatures = temperature_data[:, :31]

# Calculate the average January temperature across all locations
average_january_temperature = np.nanmean(january_temperatures)

# Output the average January temperature
average_january_temperature


15.047620733901034

### Task 5: Identify Extreme Temperatures
Any temperature values above 35°C are marked as potential errors by setting them to `NaN`. We then count the number of `NaN` values for each location to identify sites with more frequent extreme temperatures.


In [11]:
# Replace temperatures above 35°C with NaN to mark as potential error
temperature_data[temperature_data > 35] = np.nan

# Count the number of NaN values per location (each row)
null_counts_per_location = np.isnan(temperature_data).sum(axis=1)

# Display the null counts for the first 10 locations for verification
null_counts_per_location[:10]


array([57, 61, 57, 63, 56, 55, 59, 56, 58, 53])

### Task 6: Calculate Quarterly Temperature Averages
To examine seasonal trends, we divide the temperature data approximately into four quarters:
- Q1: Days 1 to 91
- Q2: Days 92 to 182
- Q3: Days 183 to 273
- Q4: Days 274 to 365

We then calculate the average temperature for each location across these quarters.


In [12]:
# Split data into four approximate quarters
Q1 = temperature_data[:, :91]    # First 91 days
Q2 = temperature_data[:, 91:182] # Next 91 days
Q3 = temperature_data[:, 182:273]# Next 91 days
Q4 = temperature_data[:, 273:]   # Remaining days

# Calculate the average temperature for each quarter
Q1_avg = np.nanmean(Q1, axis=1)
Q2_avg = np.nanmean(Q2, axis=1)
Q3_avg = np.nanmean(Q3, axis=1)
Q4_avg = np.nanmean(Q4, axis=1)

# Combine the quarterly averages into a single array for easier analysis
quarterly_averages = np.vstack((Q1_avg, Q2_avg, Q3_avg, Q4_avg)).T

# Display the first few averages for verification
quarterly_averages[:10]


array([[10.76313659, 14.09292356,  9.86979986, 10.56241871],
       [10.50229151, 12.4561454 , 12.34310543, 11.84997214],
       [12.05066887, 12.69032976, 12.82280756, 10.90078526],
       [11.8511407 , 14.08559833, 13.43938432, 11.76290684],
       [13.66431607, 13.21215313, 12.12261189, 13.8072658 ],
       [12.73705857, 11.12872106, 11.78197749, 13.12431132],
       [12.04175222, 12.11512203, 11.44652519, 14.77271966],
       [14.46117394, 13.0812715 , 10.20498392, 11.66366001],
       [12.09085034, 13.45992445, 12.51104235, 11.54377769],
       [11.49703219, 13.78533493, 12.1069364 , 11.45966829]])

### Task 7: Classify Humidity Levels
We categorize each day's humidity level as "Dry" if it is below 30% and "Humid" if it is above 70%. We then count the total number of "Dry" and "Humid" days for each location, providing insight into the distribution of humidity levels.


In [13]:
# Classify days based on humidity levels
dry_days = (humidity_data < 30).sum(axis=1)
humid_days = (humidity_data > 70).sum(axis=1)

# Display counts for the first 10 locations for verification
dry_days[:10], humid_days[:10]


(array([ 96, 112, 110, 108,  95, 121,  98, 101,  96, 104]),
 array([113, 114, 117, 105, 115,  92, 112, 102, 102, 111]))

### Task 8: Apply Daily Pressure Trend to Temperature Data
In this task, we simulate a daily atmospheric pressure trend (e.g., using a sine wave for simplicity). We apply this trend to adjust daily temperature values at each location, accounting for variations due to atmospheric pressure changes.


In [14]:
# Generate a pressure trend across 365 days (sine wave for demonstration)
pressure_trend = 5 * np.sin(np.linspace(0, 2 * np.pi, 365))

# Adjust temperatures by adding the pressure trend to each day's temperature
adjusted_temperature_data = temperature_data + pressure_trend

# Display the adjusted temperatures for the first few locations for verification
adjusted_temperature_data[:5, :5]


array([[17.4406752 , 25.84577152, 20.3107495 ,         nan, 11.52769569],
       [12.93019809, 26.29468504, 10.12384678,         nan,         nan],
       [15.05315864,  8.90576096,  8.4181725 ,  3.30403173, 15.14347049],
       [27.94922774, -9.06067248,         nan, 21.0117078 , 17.96690867],
       [        nan,         nan, 13.10872513, 31.13146995, 28.79730544]])

## Conclusion

This notebook demonstrates a comprehensive approach to analyzing climate data, including temperature and humidity readings across 500 locations for a full year. The tasks covered:
- Initializing random climate data,
- Simulating missing values,
- Converting temperatures and calculating a "feels like" index,
- Extracting monthly and quarterly trends,
- Identifying extreme values,
- Classifying humidity levels, and
- Applying a daily pressure trend.

These analyses showcase basic data manipulation, statistical analysis, and the use of vectorized operations in Python using NumPy. By employing efficient array operations, we were able to perform complex calculations without loops, demonstrating the power of Python for data science tasks.

## Repository Information

This project is intended as a beginner-friendly example of climate data analysis and will be uploaded to GitHub for further review and sharing. The repository is available at:

[GitHub Repository - Climate Data Analysis](https://github.com/shoaib1522/Data-Science-In-Python)

For questions or contributions, feel free to contact me via email at sa1670001@gmail.com.

---

**Note**: This notebook is designed for Jupyter Notebook use, so all code cells should execute sequentially for accurate results.
