## OOP  & Simple models for ice break up
An effective way to create and use python model is by using and Object Oriented Programming approach, in our case we will use the class `IceModel` to construct different models by instancing an object of the class.

#### Preprocessing
In part 1 of this interactive textbook you familiarized yourself with the **Nenana Ice Classic**  got introduce to a `DataFrame` containing environmental variables that may help to predict the break up, and learned some basic preprocessing techniques.

For the moment let just consider the data associated only with the past break up dates, information that is stored in `'../data/BreakUpTimes.csv'`

In [33]:
import pprint
import funciones
# loading the file
ice_data = pd.read_csv('../../data/BreakUpTimes.csv')

#### Basic Usage
The first step to our simple OOP ice break model is to create an instance of the class.

In [34]:
Model_1=IceModel(ice_data)

The class has simple methods that can be use to predict the ice break up, among these methods, the most important are 
- `.polyfit()`: Fits a polynomial equation to the data
- `.distfit()` :  Fits a distribution to the data
- `.predict()`: Uses the fits to predict a variable
- `.get_prediction():`: Get the predicted date and time.  
#### Prediction using `.polyfit()`
The class and object are constructed such that  the predicted date and time of break up independent of each other. 

Lets start by using `.polyfit()` to predict the **day** of break up for the year 2025 using a simple linear regression. 


In [35]:
date_fit=Model_1.polyfit('year','day_of_year')               

 
-0.07984 x + 281.3


The result of the linear regression is 
$$\text{dayofyear}=-0.07984*\text{year}+281.3$$

Meaning that the break-up date seems to be happening earlier each year. What could be the cause of this ? (mention that this is super simple and reductionist)


The output of `.polyfit()` is a dictionary than contains multiple `keys` with details about the *fit*. 

In [36]:
pprint.pprint(date_fit)

{'(x,y)=': ['year', 'day_of_year'],
 'Poly fit coefficients': poly1d([-7.98409060e-02,  2.81267893e+02]),
 'gofs metrics': {'2th norm': 61.703,
                  'R2': 0.1378,
                  'RMSE': 5.9651,
                  'normalized RMSE': 0.1612,
                  'r2': 0.146}}


The details of the*fit* are stored in the attribute`.fit_day_of_year`, this attribute is then used to predict the break up **date**.  

In [37]:
fitted_date=Model_1.predict('day_of_year', 2025)                            
print(fitted_date)

{'(x,y)': ['year', 'day_of_year'], 'x_hat': 2025, 'y_hat': 119.5901}


Similarly, for the  break up **time**, we need to assign/create a fit associated with time and the use this fit to predict **time**. 

In [38]:
time_fit=Model_1.polyfit('year','time')                                            
fitted_time=Model_1.predict('decimal_time',2025)                           # predicting using the fit
print(fitted_date) 

 
0.01105 x - 7.39
{'(x,y)': ['year', 'day_of_year'], 'x_hat': 2025, 'y_hat': 119.5901}


You may have noticed that in the case of the predicted time, the method used 'decimal time'. The conversion is handled by the function `decimal_time()`

$$ 18:30 \equiv 18.5$$

Naturally the predicted time of break up will also be in 'decimal time'. The method `get_prediction()` re-calls the method `.predict()` for the date and time (that is why the argument is a list of two x_variable)  gets the **datetime** of break up into a single formatted prediction (YYYY-MM-DD). 


In [39]:
Model_1.get_prediction([2025,2025])


2025-04-29 14:59:00


We can easily change the year we want to predict by simply passing another year

In [40]:
Model_1.get_prediction([2030,2030])

2030-04-29 15:02:00


This prediction uses the same *fit* as we have not change them. We can easily do this by re-calling `.polyfit()`.

For example we could predict the **date** for a specific year, then predict the **time** using a fit generated by the relationship between  the **dayofyear** of the break up and the **time** of break up 


In [41]:
time_fit_2=Model_1.polyfit('day_of_year','time') 
day_of_year_2025=pd.to_datetime('2025-04-29').dayofyear
Model_1.get_prediction([2030,day_of_year_2025])

 
-0.07006 x + 23.06
2030-04-29 14:43:00


The polynomial fit sugest that latter the break-up date the earlier the break-up time, however , the goodness-of-fit metrics are not **very_good**

#### Prediction using `.Dist_fit()` 

The method `dist_fit1` is similar to `.polyfit()`, but instead of fitting a polynomial to the data, it fits a distribution to the data. The information of the distribution can then be passed on to `predict()` to predict the **datetime** of break up.

In [42]:
date_fit=Model_1.dist_fit('year','day_of_year',distribution='norm')         

Distribution: norm
Parameters: (123.98130841121495, 6.454703868217181)


The detail fo the fitted distribution are stored in the same way

In [43]:
pprint.pprint(date_fit)

{'(x,y)=': ['year', 'day_of_year'],
 'Fitted Distribution': 'norm',
 'Goodness-of-Fit Metrics': {'KS Statistic': 0.0704, 'KS p-value': 0.6372},
 'Parameters': array([123.9813,   6.4547]),
 'confidence interval': 1.96}


The method `predict()` and by consequence `get_prediction()` work a little bit different when we pass a fitted distriution instead of fitted equation,  it uses the expected value of the distribution as prediction. 

In [44]:
fitted_date=Model_1.predict('day_of_year', 2025)                            
print(fitted_date)

{'(x,y)': ['year', 'day_of_year'], 'x_hat': 2025, 'y_hat': 123.9813, 'confidence_interval': (111.3301, 136.6325)}


In [45]:
time_fit=Model_1.dist_fit('year','time')                                            
fitted_time=Model_1.predict('decimal_time',2025)                           
print(fitted_time)

Distribution: norm
Parameters: (14.378504672897197, 4.834644794331982)
{'(x,y)': ['year', 'decimal_time'], 'x_hat': 2025, 'y_hat': 14.3785, 'confidence_interval': (4.9027, 23.8543)}


In [46]:
Model_1.get_prediction([2025,2025])


2025-05-03 14:22:00
