<a href="https://www.kaggle.com/code/mikedelong/let-s-fit-polynomials?scriptVersionId=152289822" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd
df = pd.read_csv(filepath_or_buffer='/kaggle/input/average-temperature-from-1900-to-2023/Average Temperature 1900-2023.csv')
df.head()

Unnamed: 0,Year,Average_Fahrenheit_Temperature
0,1900,53.9
1,1901,53.5
2,1902,52.1
3,1903,50.6
4,1904,51.8


In [2]:
from plotly.express import line
line(data_frame=df, x='Year', y='Average_Fahrenheit_Temperature')

If we squint we can sort of see two regimes: one (1900-1960) that is flat or maybe declining a little and another (1960-present) that is rising.

In [3]:
from plotly.express import scatter
scatter(data_frame=df, x='Year', y='Average_Fahrenheit_Temperature', trendline='ols')

Of course OLS won't capture two regimes; it will understate the slope in the upward-sloping regime and overstate it in the flat regime.

In [4]:
scatter(data_frame=df, x='Year', y='Average_Fahrenheit_Temperature', trendline='lowess')

Our Lowess trendline looks like what our eyes told us must be true above, but it puts the inflection point at about 1952.

In [5]:
# adapted from 
# https://www.kaggle.com/code/mattop/visualizing-polynomial-features-w-plotly
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from plotly.graph_objects import Scatter
from sklearn.metrics import r2_score


x_range = np.linspace(df['Year'].min(), df['Year'].max(), len(df)).reshape(-1, 1)

fig = scatter(df, x = 'Year', y = 'Average_Fahrenheit_Temperature',)

for polyfeatures in [1, 2, 3, ]:
    poly = PolynomialFeatures(polyfeatures)
    poly.fit(np.array(df['Year'].values).reshape(-1, 1))
    X_poly = poly.transform(np.array(df['Year'].values).reshape(-1, 1))
    x_range_poly = poly.transform(x_range)

    model = LinearRegression()
    model.fit(X_poly, df['Average_Fahrenheit_Temperature'])
    y_poly = model.predict(x_range_poly)
    r2 = r2_score(y_true=df['Average_Fahrenheit_Temperature'].values, y_pred=y_poly)
    print('degree: {} r2 score: {}'.format(polyfeatures, round(100 * r2)/100))

    fig.add_traces(Scatter(x = x_range.squeeze(), y = y_poly, name = 'deg {}'.format(polyfeatures)))


fig.update_coloraxes(showscale = False)
fig.update_layout(template = 'plotly', font = dict(family = 'PT Sans', size = 12))
fig.update_traces(marker = dict(size = 8, line = dict(width = 0.75, color = '#FFFFFF')))
fig.show()

degree: 1 r2 score: 0.21
degree: 2 r2 score: 0.33
degree: 3 r2 score: 0.35


Our R^2s are never great but they're never going to be super accurate if we fit all the data. There's too much Brownian motion from year to year anyway.