In [1]:
import plotly.io as pio

pio.renderers.default = "vscode+jupyterlab+notebook_connected"

# **Project 1: Global Warming Analysis Using Temperature Data**

## **Project Overview**
In this project, I will analyze a dataset on global temperature changes from 1961 to 2019, covering multiple countries and territories worldwide. By calculating key statistics and visualizing trends, I aim to uncover insights into global warming patterns for each region, which, in this analysis, includes both countries and territories.

I will use the dataset available at the following site: [Kaggle - Temperature Change Dataset](https://www.kaggle.com/datasets/sevgisarac/temperature-change/data).

## **Step 1: Loading and Filtering the Dataset**

This step involves loading a dataset of global temperature data, verifying that it meets the required row count (between 1,000 and 1 million), and filtering it to retain only country- and territory-level data. Filtering out larger regions, such as continents or groups, will allow for a more specific analysis by individual locations.

**1.1 Load the Dataset**

In this part, I load the dataset and check the row count to confirm it falls within the specified range.

In [2]:
import pandas as pd

file_path = "C:/Users/natsu/OneDrive/Python/Project1/Environment_Temperature_change_E_All_Data_NOFLAG.csv"
data = pd.read_csv(file_path)

original_row_count = len(data)
if 1000 <= original_row_count <= 1000000:
    print(f"Original row count: {original_row_count} (within the specified range).")
else:
    print(f"Original row count: {original_row_count} (outside the specified range).")

Original row count: 9656 (within the specified range).


The dataset contains 9656 rows, which meets the project’s row count requirement.

**1.2 Filter for Country- and Territory-Level Data**

To focus the analysis on individual countries and territories, I will filter out entries representing larger regions or classifications, such as continents or economic groups.

In [3]:
areas_to_exclude = [
    'World', 'Africa', 'Eastern Africa', 'Middle Africa', 'Northern Africa', 'Southern Africa', 'Western Africa',
    'Americas', 'Northern America', 'Central America', 'Caribbean', 'South America', 'Asia', 'Central Asia',
    'Eastern Asia', 'Southern Asia', 'South-Eastern Asia', 'Western Asia', 'Europe', 'Eastern Europe',
    'Northern Europe', 'Southern Europe', 'Western Europe', 'Oceania', 'Australia and New Zealand',
    'Melanesia', 'Micronesia', 'Polynesia', 'European Union', 'Least Developed Countries',
    'Land Locked Developing Countries', 'Small Island Developing States', 'Low Income Food Deficit Countries',
    'Net Food Importing Developing Countries', 'Annex I countries', 'Non-Annex I countries', 'OECD'
]

data_filtered = data[~data['Area'].isin(areas_to_exclude)].copy()

To confirm that the filtered dataset includes only individual countries and territories, I will display the unique values in the "Area" column.

In [4]:
unique_areas = data_filtered['Area'].unique()
print(unique_areas)

['Afghanistan' 'Albania' 'Algeria' 'American Samoa' 'Andorra' 'Angola'
 'Anguilla' 'Antarctica' 'Antigua and Barbuda' 'Argentina' 'Armenia'
 'Aruba' 'Australia' 'Austria' 'Azerbaijan' 'Bahamas' 'Bahrain'
 'Bangladesh' 'Barbados' 'Belarus' 'Belgium' 'Belgium-Luxembourg' 'Belize'
 'Benin' 'Bhutan' 'Bolivia (Plurinational State of)'
 'Bosnia and Herzegovina' 'Botswana' 'Brazil' 'British Virgin Islands'
 'Brunei Darussalam' 'Bulgaria' 'Burkina Faso' 'Burundi' 'Cabo Verde'
 'Cambodia' 'Cameroon' 'Canada' 'Cayman Islands'
 'Central African Republic' 'Chad' 'Channel Islands' 'Chile' 'China'
 'China, Hong Kong SAR' 'China, Macao SAR' 'China, mainland'
 'China, Taiwan Province of' 'Christmas Island' 'Cocos (Keeling) Islands'
 'Colombia' 'Comoros' 'Congo' 'Cook Islands' 'Costa Rica' "C・te d'Ivoire"
 'Croatia' 'Cuba' 'Cyprus' 'Czechia' 'Czechoslovakia'
 "Democratic People's Republic of Korea"
 'Democratic Republic of the Congo' 'Denmark' 'Djibouti' 'Dominica'
 'Dominican Republic' 'Ecuador' 'Egyp

The unique values check shows that only individual countries and territories are present, confirming that the filtering step was successful. The dataset is now ready for further analysis focused on specific locations.

## **Step 2: Computing the Mean, Median, and Mode with pandas**
In this step, I’ll calculate the mean, median, and mode of temperature changes for each country and territory over the years 1961 to 2019 using pandas. Focusing on the "Meteorological year" entries and the "Temperature change" element, these calculations will help identify central trends in annual temperature shifts, providing insights into long-term global warming patterns.

Note: Although the instructions suggest picking a single numeric column, for this analysis, I’ve chosen to include all yearly columns from 1961 to 2019 for each country and territory. Focusing on the entire row of yearly temperature changes for each location enables a more comprehensive view of long-term climate patterns and trends. Picking only one year would overlook valuable year-over-year variation, limiting insights into the progression of climate change over time.

**2.1 Filter Data for Relevant Entries**

First, I filter the data to include only rows where "Element" is "Temperature change" and "Months" is "Meteorological year," ensuring that the analysis focuses on annual temperature changes for each country and territory.

In [5]:
temperature_data_meteorological = data_filtered[
    (data_filtered['Element'] == 'Temperature change') & (data_filtered['Months'] == 'Meteorological year')
].copy()

**2.2 Calculate Mean, Median, and Mode**

For each region, I calculate the mean, median, and mode across the years 1961 to 2019. Calculating these values will provide a summary of the central tendency of temperature changes over the recorded period.

・The mean and median give an overall sense of typical temperature changes.

・The mode identifies the most frequently occurring value. If there are multiple modes (values with equal frequency), I select the first mode by default. If no mode exists, "No unique mode" is assigned.

In [6]:
mean_by_region = temperature_data_meteorological.loc[:, 'Y1961':'Y2019'].mean(axis=1)

median_by_region = temperature_data_meteorological.loc[:, 'Y1961':'Y2019'].median(axis=1)

mode_by_region = []
for _, row in temperature_data_meteorological.loc[:, 'Y1961':'Y2019'].iterrows():
    mode_series = row.mode()
    if not mode_series.empty:
        mode_by_region.append(mode_series.iloc[0])  
    else:
        mode_by_region.append("No unique mode") 

**2.3 Add Calculated Values to the DataFrame**

The computed mean, median, and mode values are added as new columns in the DataFrame, allowing for a comprehensive view of each country or territory's temperature trends over the selected period.

In [7]:
temperature_data_meteorological['Mean'] = mean_by_region
temperature_data_meteorological['Median'] = median_by_region
temperature_data_meteorological['Mode'] = mode_by_region

**2.4 Display the DataFrame with Computed Statistics**

To display the DataFrame with the newly computed statistics for each country and territory, I adjust the display settings to show all rows.

In [8]:
import pandas as pd
pd.set_option('display.max_rows', None)
temperature_data_meteorological[['Area', 'Mean', 'Median', 'Mode']]

Unnamed: 0,Area,Mean,Median,Mode
32,Afghanistan,0.432322,0.423,1.317
66,Albania,0.485492,0.282,1.399
100,Algeria,0.711153,0.649,0.045
134,American Samoa,0.465,0.337,0.328
168,Andorra,0.691475,0.749,-0.5
202,Angola,0.412932,0.325,-0.333
236,Anguilla,0.256203,0.283,0.916
270,Antarctica,0.157508,0.132,-0.234
304,Antigua and Barbuda,0.261071,0.2875,0.308
338,Argentina,0.264915,0.291,0.395


## **Insights from Step 2: Analysis of Mean, Median, and Mode of Temperature Changes**
**Mean (Average) Temperature Change**
For most countries and territories, the mean temperature change reveals an overall upward trend, highlighting a widespread pattern of global warming. Countries and territories experiencing sustained increases in average temperature are likely to face more severe climate impacts, such as heatwaves, droughts, and shifts in local ecosystems.

**Median Temperature Change**
The median temperature change generally aligns closely with the mean across countries and territories, reinforcing a steady warming trend. This alignment suggests that the warming pattern is consistent over time and is not disproportionately influenced by a few extreme years.

**Mode (Most Frequent) Temperature Change**
In many countries and territories, the mode of temperature change is negative. This likely indicates that, although the overall trend reflects rising temperatures, year-over-year changes do not increase uniformly. Some years may experience smaller or even negative temperature differences, creating fluctuations in annual temperature changes. Thus, while the mean and median capture the steady warming trend, the mode often reflects years with less temperature increase, resulting in a lower or negative value.

## **Step 3: Computing the Mean, Median, and Mode Using Only the Python Standard Library**

Following the instructions, I’ll repeat the previous step, but this time using only the Python standard library.

**3.1 Loading and Filtering Data**

In this part, I load the data manually using Python’s csv module and filter it to include only rows where "Element" is "Temperature change" and "Months" is "Meteorological year." This ensures that the analysis remains focused on annual temperature changes for each country and territory. Each country or territory's yearly temperature values from 1961 to 2019 are stored in a dictionary called temperature_data, where the keys are the names of countries and territories, and the values are lists of temperature change values. Additionally, I check that each yearly value is present in the data before adding it to the list, which prevents issues with missing data.

In [9]:
import csv

file_path = "C:/Users/natsu/OneDrive/Python/Project1/Environment_Temperature_change_E_All_Data_NOFLAG.csv"
temperature_data = {}

with open(file_path, mode='r', encoding='utf-8') as file:
    reader = csv.DictReader(file)

    for row in reader:
        if row['Element'] == 'Temperature change' and row['Months'] == 'Meteorological year':
            country_or_territory = row['Area']
            
            if country_or_territory not in temperature_data:
                temperature_data[country_or_territory] = []
            
            for year in range(1961, 2020):
                year_column = f"Y{year}"
                if row[year_column]:
                    temperature_data[country_or_territory].append(float(row[year_column]))

**3.2 Calculating Mean, Median, and Mode**

Using only Python’s standard library, I calculate the **mean**, **median**, and **mode** manually for each country or territory.

**Mean Calculation**  
- Sum all values and divide by the number of values.

**Median Calculation**  
- Sort the list of values.  
- If the list length is odd, the median is the middle value.  
- If the list length is even, the median is the average of the two middle values.

**Mode Calculation**  
- Use a dictionary to track the frequency of each value.  
- Identify the value(s) with the highest frequency:  
  - *If there is a single mode*: Return the most frequent value.  
  - *If there are multiple modes*: Return the first one based on the original order of values.  
  - *If all values have the same frequency or the list is empty*: Return `"No unique mode"`.

In [10]:
country_territory_stats = {}

for country_or_territory, temps in temperature_data.items():
    if temps:
        # Mean calculation
        total = sum(temps)
        count = len(temps)
        mean_value = round(total / count, 6)

        # Median calculation
        sorted_temps = sorted(temps)
        mid = count // 2
        if count % 2 == 0:
            median_value = round((sorted_temps[mid - 1] + sorted_temps[mid]) / 2, 4)
        else:
            median_value = round(sorted_temps[mid], 4)

        # Mode calculation
        frequency = {}
        for temp in temps:
            frequency[temp] = frequency.get(temp, 0) + 1

        max_freq = max(frequency.values())
        modes = [key for key, value in frequency.items() if value == max_freq]

        # Select the first mode in the order of appearance or handle no unique mode
        first_mode = None
        for temp in temps:
            if temp in modes:
                first_mode = temp
                break

        mode_value = round(first_mode, 4) if first_mode is not None else "No unique mode"

        country_territory_stats[country_or_territory] = {
            'Mean': mean_value,
            'Median': median_value,
            'Mode': mode_value
        }

**3.3 Displaying Results**

To display the calculated statistics, I iterate through the first five entries in the country_territory_stats dictionary. For each country or territory, the Mean, Median, and Mode values are displayed in a formatted string. If the mode is None (indicating no unique mode), it is replaced with "No unique mode" in the output.

In [11]:
for country_or_territory, stats in list(country_territory_stats.items())[:5]:
    if stats['Mode'] != "No unique mode":
        print(f"{country_or_territory}: Mean={stats['Mean']:.6f}, Median={stats['Median']:.4f}, Mode={stats['Mode']:.3f}")
    else:
        print(f"{country_or_territory}: Mean={stats['Mean']:.6f}, Median={stats['Median']:.4f}, Mode=No unique mode")

Afghanistan: Mean=0.432322, Median=0.4230, Mode=1.317
Albania: Mean=0.485492, Median=0.2820, Mode=1.399
Algeria: Mean=0.711153, Median=0.6490, Mode=0.045
American Samoa: Mean=0.465000, Median=0.3370, Mode=0.328
Andorra: Mean=0.691475, Median=0.7490, Mode=-0.500


## **Step 4: Visualizing Temperature Changes Using the Python Standard Library**

For visualization, I’ll create a text-based bar chart using print() to represent the mean temperature change for each country and territory. This approach meets the requirement of using only the standard library and ensures the visualization displays neatly on a narrow screen. Each bar length is scaled dynamically based on the maximum mean temperature change.

**4.1 Determining the Scaling Factor**

To ensure that the bar lengths are scaled proportionally, I first find the maximum mean temperature change in the dataset. This maximum value allows me to normalize each bar’s length relative to the highest mean temperature change, setting a maximum length of 50 characters for the longest bar.

In [12]:
max_temp_change = temperature_data_meteorological['Mean'].max()

**4.2 Generating the Text-Based Bar Chart**

Before iterating through the data, I display a header to describe the visualization as "Temperature Change Visualization (Mean Temperature Change)". This header is printed just before the chart output for clarity.

Then, I iterate through each country and territory, calculating the appropriate length for each bar based on its mean temperature change. Each bar is created by repeating the | character, scaled to a maximum length of 50 characters. The result includes the country or territory name (truncated to 15 characters for consistent formatting) and the mean temperature change.

In [13]:
print("\nTemperature Change Visualization (Mean Temperature Change)\n")

for index, row in temperature_data_meteorological.iterrows():
    country_or_territory = row['Area']
    temp_change = row['Mean']
    bar_length = int((temp_change / max_temp_change) * 50)  # Normalize to max length of 50 characters
    print(f"{country_or_territory[:15]:<15}: {'|' * bar_length} ({temp_change:.2f}°C)")


Temperature Change Visualization (Mean Temperature Change)

Afghanistan    : |||||||||||||| (0.43°C)
Albania        : |||||||||||||||| (0.49°C)
Algeria        : ||||||||||||||||||||||| (0.71°C)
American Samoa : ||||||||||||||| (0.46°C)
Andorra        : |||||||||||||||||||||| (0.69°C)
Angola         : ||||||||||||| (0.41°C)
Anguilla       : |||||||| (0.26°C)
Antarctica     : ||||| (0.16°C)
Antigua and Bar: |||||||| (0.26°C)
Argentina      : |||||||| (0.26°C)
Armenia        : ||||||||||||||||||||||||||||||||||| (1.07°C)
Aruba          : |||||||||||||| (0.43°C)
Australia      : |||||||||||||| (0.43°C)
Austria        : |||||||||||||||||||||||| (0.75°C)
Azerbaijan     : ||||||||||||||||||||||||||||||||||| (1.08°C)
Bahamas        : |||||||||||||| (0.42°C)
Bahrain        : ||||||||||||||||||||||| (0.70°C)
Bangladesh     : ||||||| (0.23°C)
Barbados       : ||||||||||| (0.35°C)
Belarus        : |||||||||||||||||||||||||||||||||||||||||||||| (1.40°C)
Belgium        : |||||||||||||||||||||||||||

## **Insights from Step 4: Examining the Top 3 and Bottom 3 Regions in Temperature Change**
In Step 4, the analysis revealed both the highest and lowest changes in mean annual temperatures among countries and territories from 1961 to 2019. The top three regions—Serbia (1.51°C), Luxembourg (1.49°C), and Montenegro (1.48°C)—experienced significant warming, likely due to factors such as increased urbanization, reduced forest cover, and geographic location. In contrast, the bottom three regions, including Nauru (-0.10°C), Pitcairn Island (-0.02°C), and Midway Island (0.02°C), showed minimal or even negative changes in temperature. This contrast highlights how different environmental, economic, and geographic factors contribute to temperature change variations across regions. Monitoring these patterns is essential for informing climate policy and adapting strategies to mitigate further warming impacts.