# Global Temperature Anomalies

Github Repo: https://github.com/alod83/dsw-2024


## Load the dataset

Load the dataset and perform some preliminary cleaning. Then, show a raw chart

In [37]:
import polars as pl
import altair as alt
import pandas as pd

In [2]:
df_pl = pl.read_csv('https://raw.githubusercontent.com/alod83/dsw-2024/refs/heads/main/source/1850-2024.csv')
df_pl.head()

Date,Anomaly
i64,f64
185001,-0.46
185002,-0.21
185003,-0.22
185004,-0.35
185005,-0.29


In [11]:
df_cleaned = (
    df_pl
    .with_columns(pl.col('Date').cast(str).str.extract(r"(\d{4})").alias("year"),
                  pl.col('Date').cast(str).str.extract(r"(\d{4})(\d{2})", group_index=2).alias("month"),
                  )
    .with_columns(pl.datetime(pl.col("year"), pl.col("month"), 1).alias("Date"))
    [['Date', 'Anomaly']]
)

df_cleaned.head()

Date,Anomaly
datetime[μs],f64
1850-01-01 00:00:00,-0.46
1850-02-01 00:00:00,-0.21
1850-03-01 00:00:00,-0.22
1850-04-01 00:00:00,-0.35
1850-05-01 00:00:00,-0.29


year(df_cleaned.Date)

In [27]:
df_cleaned['Date'].dt.year() - df_cleaned['Date'].dt.year() % 10

Date
i32
1850
1850
1850
1850
1850
…
2020
2020
2020
2020


In [28]:
chart = df_cleaned.plot.line(
    x='Date',
    y='Anomaly'
).properties(
    width=800
)

chart

## Build a story for an audience of professionals 
Apply the DIKW pyramid

### From Data to Information

In [13]:
chart = df_cleaned.plot.bar(
    x='Date',
    y='Anomaly'
).properties(
    width=900
)

chart

Change color using a [color scheme](https://vega.github.io/vega/docs/schemes/#reference)

In [18]:
chart = alt.Chart(df_cleaned).mark_bar().encode(
    x='Date',
    y='Anomaly',
    color=alt.Color('Anomaly', scale=alt.Scale(scheme='redblue', reverse=True))
).properties(
    width=800
).interactive()

chart

Group data by decades and remove the last decade (2020s), which is incomplete

In [30]:
chart = alt.Chart(df_cleaned).mark_bar().encode(
    x='Decade:N',
    y='Anomaly',
    color=alt.Color('Anomaly', scale=alt.Scale(scheme='redblue', reverse=True))
).properties(
    width=800
).transform_filter(
    "year(datum.Date) < 2020"
).transform_calculate(
     Decade = "(year(datum.Date) - year(datum.Date) % 10)"  # Calculate the decade
).transform_aggregate(
    Anomaly='mean(Anomaly)',
    groupby=['Decade']
)

chart

Adjust axes

In [31]:
chart = chart.encode(
    x=alt.X('Decade:O', axis=alt.Axis(
        title='',
        labelAngle=0,
        labelExpr="datum.value + 's'",  # Add 's' to the end of each decade label
        )
    ),
    y=alt.Y('Anomaly', title='Global Surface Temperature Anomalies (°C)'),
    color=alt.Color('Anomaly', scale=alt.Scale(scheme='redblue', reverse=True))
)

chart

### From Information to Knowledge

Add values for each bar

In [33]:
text = chart.mark_text(
    align='center',
    baseline='top',
    dy = alt.expr(alt.expr.if_(alt.datum.Anomaly > 0, -15, 5))
).encode(
    text=alt.Text('mean(Anomaly):Q', format='.2f'),  # Format the anomaly value with 2 decimal places
    
)

# text

chart + text

In [None]:
Adjust y ranges

In [34]:
chart = chart.encode(
   y=alt.Y('Anomaly', 
           title='Global Surface Temperature Anomalies (°C)',
           scale=alt.Scale(domain=[-0.4, 1.5]))
)

chart + text

Add context.
Which context do the professionals want to gain?

* The Gap between 2010s and 1850s
* When did temperatures begin to increase?

In [35]:
chart = chart.properties(
    title=alt.TitleParams(
        text='Global Surface Temperature Anomalies',
        subtitle='Between the 1850s and the 2010s, surface temperatures increased by 0.94°C.',    
    )
)

chart + text

Add a reference line at 1977.

In [38]:
# reference line

rl_df = pd.DataFrame({
    'x'     : [1970],
    'text'  : [['Since 1977 temperatures', 'slowly started to increase.']]
})

rl_df

Unnamed: 0,x,text
0,1970,"[Since 1977 temperatures, slowly started to in..."


In [40]:

rl = alt.Chart(rl_df).mark_rule(
    color='red',
).encode(
    x='x:N'
)

rl

In [45]:

text_rl = rl.mark_text(
    color = 'red',
    baseline='top',
    align='left',
    y=10,
    dx=10
).encode(
    text='text'
)

text_rl

In [46]:

chart + text + rl + text_rl

### From Knowledge to Wisdom

What do we want our audience of professionals to do?

* Trend Analysis and Anomaly Detection
* Correlation with other factors
* Prediction of future trends
* Discussion

In [47]:
pred_df = pd.DataFrame({
    'x'     : ['2050'],
    'y'     : [1.2],
    'text'  :  '?'
})

pred_df

Unnamed: 0,x,y,text
0,2050,1.2,?


In [48]:

pred =  alt.Chart(pred_df
).mark_bar(
    color = 'black'
).encode(
    x = 'x:N',
    y = 'y'
)

pred_text = pred.mark_text(
    color = 'black',
    dy=-15
).encode(
    text = 'text'
)

chart = chart.properties(
    title=alt.TitleParams(
        text='How big will the temperature anomaly be in 2050?',
        subtitle='Between the 1850s and the 2010s, surface temperatures increased by 0.94°C.'
    )
)

final = (chart + text + rl + text_rl + pred + pred_text)
final 

Refine title

In [49]:
final.configure_title(
    fontSize = 30,
    subtitleFontSize= 20
)

## Audience of decision-makers

### From Data to Information

In [51]:
chart = alt.Chart(df_cleaned
).mark_line(
    point=True,
    color='black'
).encode(
    x=alt.X('Decade:O', axis=alt.Axis(
        title='',
        labelAngle=0,
        labelExpr="datum.value + 's'",  # Add 's' to the end of each decade label
        )
    ),
    y=alt.Y('Anomaly', title='Global Surface Temperature Anomalies (°C)'),
    
).properties(
    width=700
).transform_filter(
    "year(datum.Date) < 1860 || (year(datum.Date) > 2009 && year(datum.Date) < 2020)"
).transform_calculate(
     Decade = "(year(datum.Date) - year(datum.Date) % 10)"  # Calculate the decade
).transform_aggregate(
    Anomaly='mean(Anomaly)',
    groupby=['Decade']
)

chart

### From Information to Knowledge 

In [52]:
text = chart.mark_text(
    align='center',
    baseline='top',
    dy = alt.expr(alt.expr.if_(alt.datum.Anomaly > 0, -15, 5))
).encode(
    text=alt.Text('mean(Anomaly):Q', format='.2f'),  # Format the anomaly value with 2 decimal places
    
)

chart + text

In [53]:
rl_df = pd.DataFrame({
    'x'     : [2010, 2010],
    'y'     : [ -0.11, 0.81]    
})

rl = alt.Chart(rl_df).mark_line(
    color='red',
    strokeDash=[2,2]
    
).encode(
    x='x:N',
    y='y'
)



chart + text + rl 

In [54]:
ban_df = pd.DataFrame(
    {
        'text' : [0.94],
        'x' : [2010],
        'y' : [0.4]
    }
)
ban_text = alt.Chart(ban_df
).mark_text(
    color = 'red',
    baseline='top',
    align='left',
    dx = 10,
    size = 30
).encode(
    text='text',
    x = 'x:N',
    y = 'y'
)

chart + text + rl + ban_text

In [55]:
chart = chart.properties(
    title=alt.TitleParams(
        text='What can we do to reduce the temperature gap?',
        subtitle=['The term temperature anomaly means a departure from a reference value or long-term average.', 
                  'A positive anomaly indicates that the observed temperature was warmer than the reference value,' ,
                  'while a negative anomaly indicates that the observed temperature was cooler than the reference value.']
    )
)

chart = chart + text + rl + ban_text

In [56]:
hrl_df = pd.DataFrame({
    'y'     : [0],
})

hrl = alt.Chart(hrl_df).mark_rule(
    color='grey',
).encode(
    y='y'
)

chart = chart + text + rl + ban_text + hrl
chart

### From Knowledge to Wisdom

What do we want our audience of decision-makers to do?

1. Develop and Implement Environmental Policies
2. Plan and Finance Mitigation Initiatives
3. Promote International Collaboration
4. Implement Education and Awareness Programs
5. Integrate Sustainability into Business Decisions
6. Monitor and Evaluate the Effectiveness of Actions
7. Adaptation and Future Planning
8. Encourage Community Participation

In [57]:
# Next Steps

width = 10
space = 5
N = 3

x = [i*(width+space) for i in range(N)]
y = [0 for i in range(N)]
x2 = [(i+1)*width+i*space for i in range(N)]
y2 = [10 for i in range(N)]
text_ns = ['Online Campaign', 'Influencers Engagement', 'Social Media Promotion']

df_rect = pd.DataFrame(
    {   'x': x,
        'y': y,
        'x2': x2,
        'y2': y2,
        'text' : text_ns
    }
)

rect = alt.Chart(df_rect).mark_rect(
    color='lightgrey',
    opacity=0.2
).encode(
    x=alt.X('x:Q', axis=None),
    y=alt.Y('y:Q', axis=None),
    x2='x2:Q',
    y2='y2:Q'
).properties(
    width=700,
    height=100,
    title=alt.TitleParams(
        text=['What can we do next?'],
        fontSize=20,
        offset=10
    )
)

ns_text = alt.Chart(df_rect).mark_text(
    fontSize=14,
    align='left',
    dx=10,
).encode(
    text='text:N',
    x=alt.X('x:Q', axis=None),
    y=alt.Y('y_half:Q', axis=None),
).transform_calculate(
    y_half='datum.y2/2'
)

# add lines connecting the rectangles
#x = [10,25]
x = [width*i+space*(i-1) for i in range(1,N)]
y = [5 for i in range(N-1)]
y2 = [5 for i in range(N-1)]
#x2 = [15,30]
x2 = [(width+space)*i for i in range(1,N)]

df_line = pd.DataFrame(
    {   'x': x,
        'y': y,
        'x2': x2,
        'y2': y2
    }
)

line = alt.Chart(df_line).mark_line(
    point=True,
    strokeWidth=2
).encode(
    x=alt.X('x:Q', axis=None),
    y=alt.Y('y:Q', axis=None),
    x2='x2:Q',
    y2='y2:Q'
)

ns = rect + line + ns_text
ns

In [58]:
final = alt.vconcat(chart, ns)
final

In [59]:
final.configure_title(
    fontSize = 30,
    subtitleFontSize= 20
).configure_axis(
    grid = False
)