# Global Temperatures by Country
## by Junyoung Seo

Dataset from [Kaggle](https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data/data)

Inspired by the BU IGS Visualizing Energy team's "The history of global coal production," this project aims to visualize global temperatures by country. This will create a powerful tool to illustrate the realities of global warming. As a major challenge facing humanity, effectively depicting the impact of rising temperatures across different countries can raise awareness of the significant threat global warming poses.

Let's see the dataset first.

In [63]:
df = pd.read_csv('GlobalLandTemperaturesByCountry.csv')
df

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
0,1743-11-01,4.384,2.294,Åland
1,1743-12-01,,,Åland
2,1744-01-01,,,Åland
3,1744-02-01,,,Åland
4,1744-03-01,,,Åland
...,...,...,...,...
577457,2013-05-01,19.059,1.022,Zimbabwe
577458,2013-06-01,17.613,0.473,Zimbabwe
577459,2013-07-01,17.000,0.453,Zimbabwe
577460,2013-08-01,19.759,0.717,Zimbabwe


This data is composed with:
- **dt:** Date
- **AverageTemperature:** Global average land temperature in celsius
- **AverageTemperatureUncertainty:** The 95% confidence interval around the average
- **Country:** Country name

From here, we only need dt, AverageTemperature, and Country data.

Since the date doesn't have a good format for analysis, we'll take the year value from the date using the to_datetime function.  Also using this information, we'll calculate the average temperature of each country in each year (Since the record contains information of multiple dates in the same year, it's better to get average temperature per year).

In [64]:
df['dt'] = pd.to_datetime(df['dt'])
df['year'] = df['dt'].dt.year

#creating a new dataframe with each countries average temperature by year.
df_avg = df.groupby(['Country', 'year']).mean().reset_index()

Now, let's create a visualization using Dash. I'd like to visualize the historical average temperature by country, with a slider allowing users to select a specific year to view the visualization. This will require implementing a callback in Dash to update the visualization based on the selected year.

In [10]:
import pandas as pd
import plotly.express as px
from dash import Dash, html, dcc, Input, Output

#creating dash app under name average_temp_graph
app = Dash('average_temp_graph')

#layout
app.layout = html.Div([
    dcc.Graph(id='average_temp_graph'),
    
    #making a slider by year
    dcc.Slider(
        id='slider',
        min=df_avg['year'].min(),
        max=df_avg['year'].max(),
        value=df_avg['year'].min(),
        marks={str(year): str(year) for year in df_avg['year'].unique()},   #selecting each year for marks
        step=None
    )
])

#callback for slider
@app.callback(
    Output('average_temp_graph', 'figure'),       #output is the graph
    [Input('slider', 'value')]       #input is the year value that slider indicates
)

#update graph
def update_graph(slider_year):
    #creating new datafram with the values of average temperature of each countries in selected year.
    slider_df = df_avg[df_avg['year'] == slider_year]
    
    #creating figure
    fig = px.bar(slider_df, x='Country', y='AverageTemperature',
                 color='AverageTemperature',
                 labels={'AverageTemperature': 'Average Temperature(°C)'},
                 title=f'Global Average Temperatures in {slider_year}')
    return fig

#Run the app
app.run_server(mode='inline', port=3756)

**Weakness:** The current visualization presents too much information at once, making it difficult for users to focus on specific details such as the year or individual countries. 

**Solution:** To address this, we can create a horizontal bar chart displaying only the top 10 countries with the highest average temperatures. Additionally, since there isn't a significant amount of data before the 1900s, it's preferable to display data from that time onward. We should also adjust the year labels to improve the clarity of the visualization.

In [11]:
import pandas as pd
import plotly.express as px
from dash import Dash, html, dcc, Input, Output

#reading csv file
df = pd.read_csv('GlobalLandTemperaturesByCountry.csv')
df['dt'] = pd.to_datetime(df['dt'])
df['year'] = df['dt'].dt.year

#selecting only after 1900
df = df[df['year'] >= 1900]

#creating a new dataframe with each countries average temperature by year.
df_avg = df.groupby(['Country', 'year'])['AverageTemperature'].mean().reset_index()

#creating app
app = Dash('average_temp_graph_after_1900')

#Creating a slider for selecting each year, with labels displayed every 20 years
slider_marks = {}
for year in range(1900, df_avg['year'].max()+1):
    if year % 20 == 0:      #display each 20 years
        slider_marks[year] = {'label': str(year)}
    else:
        slider_marks[year] = {'label': ''}

#laybout
app.layout = html.Div([
    dcc.Graph(id='average_temp_graph_after_1900'),
    
    #creating slider
    dcc.Slider(
        id='slider',
        min=1900,       #minimum is 1900, the start year
        max=df_avg['year'].max(),
        value=df_avg['year'].max(),
        marks=slider_marks,
        step=1
    )
])

#callback
@app.callback(
    Output('average_temp_graph_after_1900', 'figure'),
    [Input('slider', 'value')]
)

def update_graph(slider_year):
    #creating new datafram with the values of average temperature of each countries in selected year.
    slider_df = df_avg[df_avg['year'] == slider_year]
    
    #Finding top 10 countries of average temperature
    top_10_countries = slider_df.nlargest(10, 'AverageTemperature')
    
    #creating figure
    fig = px.bar(top_10_countries, y='Country', x='AverageTemperature', 
                 color='AverageTemperature',         #changing colors by temperature
                 labels={'AverageTemperature': 'Average Temperature (°C)'},
                 title=f'Global Average Temperatures in {slider_year}',
                 orientation='h')        #make the figure horizontal
    
    #Making the highest values are at the top of the graph by reversing the Y-axis order
    fig.update_yaxes(autorange="reversed")
    
    return fig


app.run_server(mode='inline', port=3210)