![Astrofisica Computacional](../../../new_logo.png)

## Dr. rer. nat. Jose Ivan Campos Rozo<sup>1,2</sup>

1. Astronomical Institute of the Czech Academy of Sciences\
   Department of Solar Physics\
   Ondřejov, Czec Republic

2. Observatorio Astronómico Nacional\
   Facultad de Ciencias\
   Universidad Nacional de Colombia

e-mail: jicamposr@unal.edu.co & rozo@asu.cas.cz)

---

Advanced NumPy Exercises + Functions/Classes + Files

**Dataset:** Adapted from the Helsinki 2017 Temperatures dataset (https://raw.githubusercontent.com/csmastersUH/data_analysis_with_python_2020/master/kumpula-weather-2017.csv)

Includes:
- NumPy: slicing, masks, polyfit, FFT.

- Functions/Classes: inheritance, decorators, vectorized methods.

- Files: loadtxt, processed savetxt.

In [None]:
import numpy as np

## Exercise 1: Data Loading + NumPy Preprocessing

Load temps from the file "kumpula-weather-2017.csv". Write a function to convert mm-dd to days since start (days = monthday_to_days).

- Filter NaNs or infs/masks.

- Calculate anomalies (temperature - 30-day moving average).

- Detect outliers (IQR method, Q1-1.5*IQR).

**Expected output:** `temps_clean` (N x 2), `dias` (N,).

In [1]:
#TODO

## Exercise 2: Advanced Class with Inheritance/Decorators 

Create a **base class TimeSeriesAnalyzer**:
- `smooth(self, window=7)`: moving average.

- Decorator `@vectorize` to apply functions to columns.

**Child class WeatherAnalyzer** inherits and adds:
- `seasonal_decompose(self)`: trend (polyfit deg=2), seasonal (FFT top 4 freq), residual.

- `forecast(self, days_ahead=30)`: simple ARIMA-like (last 30 days, polyfit deg=3) + noise.

- **Use:** `analyzer = WeatherAnalyzer(days, temps_clean[:,1])`

In [None]:
# Decorator to vectorize method over the columns
def vectorize(method):
    def wrapper(self, *args, **kwargs):
        results = np.array([method(self, col, *args, **kwargs) for col in self.data.T]).T
        return results if results.ndim > 1 else results.flatten()
    return wrapper

class TimeSeriesAnalyzer:
    """Clase base para análisis de series temporales con NumPy."""
    def __init__(self, t: np.ndarray, data: np.ndarray):
        self.t = np.asarray(t)
        self.data = np.asarray(data)
        if self.t.shape[0] != self.data.shape[0]:
            raise ValueError("t y data deben tener misma longitud")
    
    @vectorize
    def smooth(self, col: np.ndarray, window: int = 7, mode: str = 'same') -> np.ndarray:
        """Media móvil con convolve."""
        kernel = np.ones(window) / window
        return np.convolve(col, kernel, mode=mode)
    
    def stats(self) -> dict:
        """Estadísticas básicas por columna."""
        return {
            'mean': np.mean(self.data, axis=0),
            'std': np.std(self.data, axis=0),
            'min': np.min(self.data, axis=0),
            'max': np.max(self.data, axis=0)
        }

class WeatherAnalyzer(TimeSeriesAnalyzer):
    """Hija especializada en datos meteorológicos."""
    def __init__(self, t: np.ndarray, data: np.ndarray):
        super().__init__(t, data)
    
    def seasonal_decompose(self, degree_trend: int = 2, n_freqs: int = 4) -> dict:
        """Descomposición: trend (polyfit), seasonal (FFT top freqs), residual."""
        # Trend: polyfit global
        p_trend = np.polynomial.Polynomial.fit(self.t, self.data, degree_trend)
        trend = p_trend(self.t)
        
        # Seasonal: FFT, top n_freqs armónicos
        fft = np.fft.fft(self.data - trend, axis=0)
        freqs = np.fft.fftfreq(len(self.t))
        top_idx = np.argsort(np.abs(fft), axis=0)[-n_freqs:][::-1]
        seasonal = np.zeros_like(self.data)
        for idx in top_idx[:]:
            seasonal[:] += 2 * np.real(np.fft.ifft(fft[:] * (np.abs(fft[idx]) > 1e-3)))
        
        residual = self.data - trend - seasonal
        return {'trend': trend, 'seasonal': seasonal, 'residual': residual}
    
    def forecast(self, days_ahead: int = 30, degree: int = 3, noise_std: float = 1.0) -> tuple:
        """Pronóstico simple: polyfit últimos datos + ruido gaussiano."""
        n_last = min(degree * 10, len(self.t) // 2)
        t_last = self.t[-n_last:]
        data_last = self.data[-n_last:]
        
        t_future = np.linspace(self.t[-1], self.t[-1] + days_ahead, days_ahead)
        forecast = np.zeros(days_ahead) #np.zeros((days_ahead, self.data.shape[1]))
        
        p = np.polyfit(t_last, data_last[:], degree)
        forecast[:] = np.polyval(p, t_future) + np.random.normal(0, noise_std, days_ahead)
        
        return t_future, forecast

# # DEMO de USO (¡ejecuta después de Ej.1!)
# analyzer = WeatherAnalyzer(days, temps_clean)
# smoothed = analyzer.smooth(window=15)
# decomp = analyzer.seasonal_decompose()
# t_fc, fc = analyzer.forecast(30)
# print(analyzer.stats())

## Exercise 3: Robust I/O + Processed 

- Save `temps_clean` + `anomalies` to 'processed_weather.npz' (np.savez).

- Write subset (first 100 days, columns: days, temp_smooth, anomaly) to 'subset.csv' (savetxt, fmt='%.2f', header).

- Function `load_and_validate(filename)`: loads npz/csv, checks shape/inf/NaNs, returns a dictionary.

In [None]:
# np.savez('processed_weather.npz', temps=temps_clean, anomalies=anomalies)
# subset = np.column_stack([dias[:100], smoothed[0,:100], anomalies[:100]])
# np.savetxt('subset.csv', subset, delimiter=',', header='day,temp_smooth,anomaly', fmt='%.3f')

def load_and_validate(filename):
    # Maneja .npz (load), .csv; check np.isfinite.all(), shape==(?,3)
    pass

# Some Questions:

- For outliers (IQR method): what fraction of data is removed? Is this conservative/aggressive? Propose alternative (e.g., sigma=3).
- Trace @vectorize execution: for data.shape=(365,2), how many calls to smooth(col)? Why results.T? What other vectorizing method you propose?
- In seasonal_decompose(): why subtract trend before FFT? What do top-4 frequencies represent (daily/weekly/monthly/yearly)?
- Run analyzer.stats() on raw vs smoothed: % std reduction per column? Why does polyfit deg=2 capture trend well?