### Import Dataset

In [16]:
import pandas as pd
import numpy as np
import plotly.express as px
predict_outcome = pd.read_csv('../data/predict_outcome.csv')
nearterm_cleaned = pd.read_csv('../data/nearterm_cleaned.csv')


In [17]:
predict_outcome.head()

Unnamed: 0,long,lat,year,TimePeriod,RCP,scenario,DrySoilDays_Summer_whole,Evap_Summer,ExtremeShortTermDryStress_Summer_whole,FrostDays_Winter,...,PPT_Winter,PPT_Summer,T_Winter,T_Summer,Tmax_Summer,Tmin_Winter,VWC_Winter_whole,VWC_Spring_whole,VWC_Summer_whole,VWC_Fall_whole
0,-110.047,37.604,2021,NT,4.5,sc22,0.0,2.14072,35.247,69.0,...,2.67,4.48,2.131,24.55,35.95,-12.15,0.04704,0.0439,0.04214,0.09343
1,-110.047,37.604,2021,NT,4.5,sc23,0.0,2.14072,35.247,69.0,...,2.67,4.48,2.131,24.55,35.95,-12.15,0.04704,0.0439,0.04214,0.09343
2,-110.047,37.604,2021,NT,4.5,sc24,0.0,2.14072,35.247,69.0,...,2.67,4.48,2.131,24.55,35.95,-12.15,0.04704,0.0439,0.04214,0.09343
3,-110.047,37.604,2021,NT,4.5,sc25,0.0,2.14072,35.247,69.0,...,2.67,4.48,2.131,24.55,35.95,-12.15,0.04704,0.0439,0.04214,0.09343
4,-110.047,37.604,2021,NT,4.5,sc26,0.0,2.14072,35.247,69.0,...,2.67,4.48,2.131,24.55,35.95,-12.15,0.04704,0.0439,0.04214,0.09343


### Q1: Which Scenario Most Fits Prediction?

In [18]:
common_columns = ['long', 'lat', 'year', 'TimePeriod', 'RCP', 'scenario']
merged_data = pd.merge(predict_outcome, nearterm_cleaned, on=common_columns, suffixes=('_pred', '_actual'))

#cleaned_merged_data = merged_data.dropna()

prediction_columns = ['DrySoilDays_Summer_whole', 'Evap_Summer', 'ExtremeShortTermDryStress_Summer_whole', 
                      'FrostDays_Winter', 'PPT_Winter', 'PPT_Summer', 'T_Winter', 'T_Summer', 
                      'Tmax_Summer', 'Tmin_Winter', 'VWC_Winter_whole', 'VWC_Spring_whole', 
                      'VWC_Summer_whole', 'VWC_Fall_whole']

errors_by_scenario = {}
for column in prediction_columns:
    pred_column = column + '_pred'
    actual_column = column + '_actual'
    merged_data['error'] = (merged_data[actual_column] - merged_data[pred_column]) ** 2
    rmse_by_scenario = merged_data.groupby('scenario')['error'].mean().apply(np.sqrt)
    errors_by_scenario[column] = rmse_by_scenario
combined_errors = pd.DataFrame(errors_by_scenario)

combined_errors['mean_error'] = combined_errors.mean(axis=1)
lowest_error_scenario = combined_errors['mean_error'].idxmin()


In [19]:
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'iframe_connected'

fig = px.bar(combined_errors.reset_index(), x='scenario', y='mean_error', title='Mean RMSE by Scenario',
             labels={'mean_error': 'Mean RMSE', 'scenario': 'Scenario'}, text='mean_error')

fig.add_vline(x=combined_errors.index.get_loc(lowest_error_scenario), line_width=3, line_dash="dash", line_color="green")

fig.show()


But the examination of the Mean RMSE values across different situations shows that sc22 (Scenario 22) gives the lowest Mean RMSE (i.e. gives the best possible predictive accuracy among Scenarios that I have analysed). This performance is truly remarkable, and this tells us that data predicted by sc22 will give the best possible alignment with true to-be values whenever we shall put them with our Random forest model in the future.
The sc22 corresponds to Representative Concentration Pathway 4.5 (RCP45), and uses the Beijing Climate Center Climate System Model version 1.1 (bcc-csm1-1) – which stands to a moderate trajectory in terms of the climate change.<br>

For any future forecasting experiment, we shall ensure the highest possible level of accuracy in our projections by making use of sc22 forecasts, precisely because the scenario is extremely robust in this regard.<br>

Ahead in our study, the next crucial step will be to gauge future temporal changes in temperature for the next 50 years – by appropriately using Scenario 22 which will help us to better forecast future trends in Tmax_Summer and Tmin_Winter measurements – both insights are useful for building future climate models and informing strategic deliberations and decision-making processes.

### Q2: How will temprature change in future 50 years based on senario 22?

In [20]:
historic_data = pd.read_csv('../data/historic_cleaned.csv')
nearterm_data = nearterm_cleaned[nearterm_cleaned['scenario'] == 'sc22']

combined_data = pd.concat([historic_data, nearterm_data], ignore_index=True)

In [21]:
from sklearn.linear_model import LinearRegression
historical_data = combined_data[combined_data['scenario'] == 'sc1']

models = {}
predictions = []

locations = historical_data[['long', 'lat']].drop_duplicates()

for _, location in locations.iterrows():
    long = location['long']
    lat = location['lat']
    
    location_data = historical_data[(historical_data['long'] == long) & (historical_data['lat'] == lat)]
    
    X = location_data[['year']].values
    y_winter = location_data['T_Winter'].values
    y_summer = location_data['T_Summer'].values
    
    model_winter = LinearRegression().fit(X, y_winter)
    model_summer = LinearRegression().fit(X, y_summer)
    
    models[(long, lat)] = {
        'winter': model_winter,
        'summer': model_summer
    }
    future_years = np.arange(2025, 2024 + 50).reshape(-1, 1)
    predicted_winter = model_winter.predict(future_years)
    predicted_summer = model_summer.predict(future_years)
    
    for year, winter_temp, summer_temp in zip(future_years.flatten(), predicted_winter, predicted_summer):
        predictions.append({
            'long': long,
            'lat': lat,
            'year': year,
            'T_Winter': winter_temp,
            'T_Summer': summer_temp
        })

predictions_df = pd.DataFrame(predictions)


In [22]:
pio.renderers.default = 'iframe_connected'
fig = px.scatter_geo(
    predictions_df,
    lon='long',
    lat='lat',
    color='T_Summer',
    animation_frame='year',
    projection="natural earth",
    title='Yearly Predicted Temperatures (2024-2074)',
    range_color=[22,24],
    color_continuous_scale=px.colors.diverging.RdBu_r
)

fig.update_geos(
    showcountries=True,
    showcoastlines=True,
    showland=True,
    landcolor="lightgray",
    fitbounds="locations",
        lonaxis=dict(
        showgrid=True,
        gridwidth=1,
        range=[-180, 180],
        dtick=10
    ),
    lataxis=dict(
        showgrid=True,
        gridwidth=1,
        range=[-90, 90],
        dtick=10
    )
)
fig.update_layout(template='simple_white',geo=dict(landcolor='white'))
fig.show()


In the observed data, the temperature in this region is increasing year by year from 2025 to 2074, and the temperature of the west is higher than that of the east. The next step we should take is to investigate the reasons why the temperature changes so much in a certain area. We can consider various variables for this part, such as variations of plant coverage areas, urbanisation features, local topographic features, and regional climatic influences on the region. By conducting such an analysis, we can identify the main causal factors behind the variation in temperature and take actions to minimise potential climate impacts.

### Q3: Why does east part has lower temperature than the west part?

In [23]:
pio.renderers.default = 'iframe_connected'
historic_data.head()
scatter_hist = px.scatter(
    historic_data,
    x='lat',
    y='Bare_percent',
    hover_name='lat',
    marginal_x='histogram',
    marginal_y='histogram',
    title='Scatter Plot with Marginal Histograms: Bare Ground Coverage Percent with Latitude',
    labels={'lat': 'Latitude', 'Bare_percent': 'Bare Ground Coverage Percent (%)'},
    trendline="ols",
    template='simple_white'
)
scatter_hist.show()

From the scatter plot analysis, it is obvious that the east part of the region has less bare ground coverage compared to the west part; thus, more plant coverage will result in less temperature. This further underpins the importance of vegetation in terms of reducing temperature increase and countering global warming in the future. Further research that this analysis could bring about will be to find out the exact types of plants that highly influence temperature. Through such an analysis of the influence of different plant species upon temperature variation, it will be possible to make educated guesses on which type of vegetation best regulates the climate and, therefore, come up with specific strategies for environmental management and conservation.

### Q4: What kind of plant helps prevent global warming most efficiently?

In [24]:
import plotly.figure_factory as ff
pio.renderers.default = 'iframe_connected'
correlation_data = historical_data[['T_Summer', 'T_Winter', 'treecanopy_percent', 'Ann_Herb_percent', 
                                    'Herb_percent', 'Litter_percent', 'Shrub_percent']]

correlation_matrix = correlation_data.corr()

fig = ff.create_annotated_heatmap(
    z=correlation_matrix.values,
    x=list(correlation_matrix.columns),
    y=list(correlation_matrix.index),
    annotation_text=correlation_matrix.round(2).values,
    colorscale='RdBu',  
    showscale=True
)

fig.update_layout(
    title='Correlation Matrix of Climate Factors and Vegetation Types',
    xaxis_title='Variables',
    yaxis_title='Variables',
    xaxis=dict(tickmode='array', tickvals=list(range(len(correlation_matrix.columns))), ticktext=list(correlation_matrix.columns)),
    yaxis=dict(tickmode='array', tickvals=list(range(len(correlation_matrix.index))), ticktext=list(correlation_matrix.index))
)

Plant litter has a strong negative correlation with both T_Summer and T_Winter, which becomes significant further. In other words, more litter means a lower temperature, actually explaining that plant litter does make a difference, core to temperature regulation. There, again, is a very strong positive relationship with herb coverage and litter coverage, meaning that as much as literature coverage, ventilation must be provided.

Therefore, plant litter seems to be the major variable associated in the support of this increasing temperature, and the temperature regulation by herb coverage is in an indirect way because of a strong association with litter coverage. Following on, maintenance and management in order to enhance litter and herb coverage could thus be successful in fighting against this increase in temperatures at the same time assuring some stability in the environment.

### Conclusion

Using data from Scenario 22 (sc22) to look at all the changes in temperature trends from 2025 to 2074 gives us useful information about the things that cause temperature changes. Initially, we saw a steady rise in temperatures across the area, with the western part always having warmer temperatures than the eastern part. So, it was thought that these differences in temperature might be caused by differences in the amount of plants covering the ground.

Our further research confirmed that there is less open land in the eastern part of the area than in the western part. This highlights how important vegetation is for keeping the temperature stable. This finding shows how important plants are for lowering temperatures and stopping global warming. So, our main goal was to find the specific plant types that have the most significant effect on temperature.

A detailed study that looked at the relationships between different types of plants and temperature found that plant litter has a big effect on lowering temperature. There is a negative relationship between plant litter and both T_Summer and T_Winter. This means that more litter means cooler temperatures. This shows that plant litter is very important for keeping temperatures in check and stopping things from getting too hot.

The connection matrix also showed a strong positive link between the amount of herb cover and the amount of litter cover. Basically, this means that areas with more herbaceous plants tend to have more fallen leaves and other organic matter, which has a secondary effect on how temperature is controlled. When you look at how the coverage of herbs and litter is linked, you can see how different types of plants work together to keep the temperature stable and cold.

To sum up, the study shows that plant litter is very important for keeping temperatures from rising, and herb covering is even more helpful because it is closely related to litter. These results make it clear that protecting and growing plant litter and herb coverage are effective ways to slow down rising temperatures and keep the environment stable. It is very important to keep different kinds of plants alive and growing so that the environment stays healthy and strong in the face of climate change.The Mean RMSE values for different cases show that Scenario 22 (sc22) has the lowest RMSE, which means it can predict events more accurately than any other scenario. The great outcome shows that the data forecasted by Scenario 22 is the most accurate when our Random Forest model is used. The Beijing Climate Center Climate System Model version 1.1 (bcc-csm1-1) is used in Scenario 22 to show a mild path of climate change. It is linked to the Representative Concentration Pathway 4.5 (RCP45). Using Scenario 22's predictions for more research will ensure the highest level of accuracy in our predictions because they are strong and reliable. The next important step is to predict how temperatures will change over the next 50 years by using Scenario 22 to guess how Tmax_Summer and Tmin_Winter will change in the future. This study will give us useful details about how the weather will change in the future, which will help us make plans and decisions.