# Project 3: Temperatures Dashboard

In this project, we will analyze a dataset with temperatures from 10 cities around the world extracting some interesting insights and developing two charts about them. We will be using once more Pandas and Matplotlib, but this time we will convert a column into timestamp date type, so we will be able to do some time sereis analysis and plots.

Data extracted from: https://www.kaggle.com/datasets/sudalairajkumar/daily-temperature-of-major-cities (with some cleaning and modifications).


### Project Tasks:

- `3.1.` Load the dataset from the defined data_path and display the first 5 rows.

- `3.2.` Create a new column called `AvgTemperatureCelsius` that contains the temperature in Celsius degrees.

- `3.3.` How many different countries are there? Provide a list of them.

- `3.4.` What is the minimum and maximum timestamps?

- `3.5.` What is the global minimum and maximum temperature? Find the city and the date of each of them.

- `3.6.` For a given city and a range of dates (start and end):
  - Make a line plot with the temperature reads of that city during the selected time period, the x axis has to be the timestamp column.
  - Make a histogram of the temperature reads of that city during the selected time period.
  - Make sure that all plots include a title, axis labels and a legend.

- `3.7.` Now repeat the previous question but for a list of cities instead of a single one:
  - Make a line plot with the temperature reads of the cities in the list, for the selected time period, every city has to be a different line with a different color, the x axis has to be the timestamp column.
  - Make a histogram of the temperature reads of a list of selected cities, for the selected time period, every city has to be its own distribution with a different color.
  - Make sure that all plots include a title, axis labels and a legend.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Ex 3.1: Load the dataset from the defined data_path and display the first 5 rows.

data_path = "../data/cities_temperatures.csv"

temps_df = pd.read_csv(data_path)

temps_df

In [None]:
# Converting the date column to datetime date format in order to be able to analyze better the time series and plot it
temps_df["Date"] = pd.to_datetime(temps_df["Date"]).dt.date

In [None]:
# Ex 3.2: Create a new column called `AvgTemperatureCelsius` that contains the temperature in Celsius degrees.

# Convert from Fahrenheit to Celsius: C = (F - 32) * 5/9
temps_df["AvgTemperatureCelsius"] = (temps_df["AvgTemperatureFahrenheit"] - 32) * 5/9

temps_df

In [None]:
# Ex 3.3: How many different cities are there? Provide a list of them.

unique_countries_list = temps_df["City"].unique().tolist()

print(f"Number of unique cities: {len(unique_countries_list)}")
print(f"List of cities: {unique_countries_list}")

In [None]:
# Ex 3.4: What are the minimum and maximum dates?

min_date = temps_df["Date"].min()
max_date = temps_df["Date"].max()

print(f"Minimum date: {min_date}")
print(f"Maximum date: {max_date}")

In [None]:
# Ex 3.5: What are the global minimum and maximum temperatures? Find the city and the date of each of them.

# Find minimum temperature
min_temp = temps_df["AvgTemperatureCelsius"].min()
min_temp_row = temps_df[temps_df["AvgTemperatureCelsius"] == min_temp].iloc[0]
min_temp_city = min_temp_row["City"]
min_temp_date = min_temp_row["Date"]

# Find maximum temperature
max_temp = temps_df["AvgTemperatureCelsius"].max()
max_temp_row = temps_df[temps_df["AvgTemperatureCelsius"] == max_temp].iloc[0]
max_temp_city = max_temp_row["City"]
max_temp_date = max_temp_row["Date"]

print(f"Minimum temperature: {min_temp:.2f}°C in {min_temp_city} on {min_temp_date}")
print(f"Maximum temperature: {max_temp:.2f}°C in {max_temp_city} on {max_temp_date}")

In [None]:
# Ex 3.6: For a given city and a range of dates (start and end):

city = "Munich"
start_date = pd.to_datetime("2008-01-01").date()
end_date = pd.to_datetime("2010-12-31").date()

# Filter by city
city_df = temps_df[temps_df["City"] == city]

# Filter by date range
city_df_period = city_df[(city_df["Date"] >= start_date) & (city_df["Date"] <= end_date)]

plt.figure(figsize=(10, 5))

# Line plot
plt.plot(city_df_period["Date"], city_df_period["AvgTemperatureCelsius"], label=city)
plt.title(f"Temperature in {city} ({start_date} to {end_date})")
plt.xlabel("Date")
plt.ylabel("Temperature (°C)")
plt.legend()

plt.show()

In [None]:
# Histogram plot for the same city and period

plt.figure(figsize=(10, 5))

plt.hist(city_df_period["AvgTemperatureCelsius"], bins=20)
plt.title(f"Temperature Distribution in {city} ({start_date} to {end_date})")
plt.xlabel("Temperature (°C)")
plt.ylabel("Frequency")

plt.show()

In [None]:
# Ex 3.7: Line plot for multiple cities

selected_cities = ["Munich", "Buenos Aires", "Tokyo"]
start_date = pd.to_datetime("2008-01-01").date()
end_date = pd.to_datetime("2010-12-31").date()

plt.figure(figsize=(15, 5))

# Plot each city
for city in selected_cities:
    city_df = temps_df[temps_df["City"] == city]
    city_df_period = city_df[(city_df["Date"] >= start_date) & (city_df["Date"] <= end_date)]
    plt.plot(city_df_period["Date"], city_df_period["AvgTemperatureCelsius"], label=city)

plt.title(f"Temperature Comparison ({start_date} to {end_date})")
plt.xlabel("Date")
plt.ylabel("Temperature (°C)")

plt.legend()

plt.show()

In [None]:
# Histogram plot for multiple cities

plt.figure(figsize=(15, 5))

# Plot histogram for each city
for city in selected_cities:
    city_df = temps_df[temps_df["City"] == city]
    city_df_period = city_df[(city_df["Date"] >= start_date) & (city_df["Date"] <= end_date)]
    plt.hist(city_df_period["AvgTemperatureCelsius"], bins=20, alpha=0.7, label=city)

plt.title(f"Temperature Distribution Comparison ({start_date} to {end_date})")
plt.xlabel("Temperature (°C)")
plt.ylabel("Frequency")

plt.legend()

plt.show()