# Exercises01.ipynb - Ejercicios Avanzados NumPy + Funciones/Clases + Archivos

**Dataset:** Adaptado del dataset Temperaturas Helsinki 2017 (https://raw.githubusercontent.com/csmastersUH/data_analysis_with_python_2020/master/kumpula-weather-2017.csv) 

Integra:
- NumPy: slicing, máscaras, polyfit, FFT.
- Funciones/Clases: herencia, decoradores, métodos vectorizados.
- Archivos: genfromtxt con dtype estructurado, savetxt procesado.

In [None]:
import numpy as np

## Ejercicio 1: Carga Datos + Preprocesamiento NumPy (Medio-Alto)

Carga temps de URL. Convierte mmm-dd a días desde inicio (dias = monthday_to_days).
- Filtra NaNs o infs/máscaras.
- Calcula anomalías (temp - media móvil 30 días).
- Detecta outliers (IQR método, Q1-1.5*IQR).
- 
**Output esperado:** `temps_clean` (N x 2), `dias` (N,).

In [None]:
dataset = np.loadtxt('kumpula-weather-2017.csv', comments='#', delimiter=',',skiprows=1)
days = np.arange(dataset.shape[0]) + 1
temps = dataset[:,-1]
#dataset = np.column_stack([dataset,days])
y,x = np.where(np.isinf(dataset))
temps = np.delete(temps,y)
days = np.delete(days,y)
rolling_mean = np.convolve(temps, np.ones(30)/30, mode='same')
anomalies = temps - rolling_mean
Q1, Q3 = np.percentile(anomalies, [25,75]); iqr = Q3-Q1; outliers = np.abs(anomalies) > 1.5*iqr

temps_clean = temps[outliers]
days = days[outliers]

## Ejercicio 2: Clase Avanzada con Herencia/Decoradores (Alto)

Crea **base class TimeSeriesAnalyzer**:
- `smooth(self, window=7)`: media móvil.
- Decorator `@vectorize` para aplicar funcs a columnas.

**Clase hija WeatherAnalyzer** hereda + agrega:
- `seasonal_decompose(self)`: trend (polyfit deg=2), seasonal (FFT top 4 freq), residual.
- `forecast(self, days_ahead=30)`: ARIMA-like simple (últ 30 días polyfit deg=3) + ruido.
- 
**Usa:** `analyzer = WeatherAnalyzer(dias, temps_clean[:,1])` [web:3]

In [None]:
# Decorator to vectorize method over the columns
def vectorize(method):
    def wrapper(self, *args, **kwargs):
        results = np.array([method(self, col, *args, **kwargs) for col in self.data.T]).T
        return results if results.ndim > 1 else results.flatten()
    return wrapper

class TimeSeriesAnalyzer:
    """Clase base para análisis de series temporales con NumPy."""
    def __init__(self, t: np.ndarray, data: np.ndarray):
        self.t = np.asarray(t)
        self.data = np.asarray(data)
        if self.t.shape[0] != self.data.shape[0]:
            raise ValueError("t y data deben tener misma longitud")
    
    @vectorize
    def smooth(self, col: np.ndarray, window: int = 7, mode: str = 'same') -> np.ndarray:
        """Media móvil con convolve."""
        kernel = np.ones(window) / window
        return np.convolve(col, kernel, mode=mode)
    
    def stats(self) -> dict:
        """Estadísticas básicas por columna."""
        return {
            'mean': np.mean(self.data, axis=0),
            'std': np.std(self.data, axis=0),
            'min': np.min(self.data, axis=0),
            'max': np.max(self.data, axis=0)
        }

class WeatherAnalyzer(TimeSeriesAnalyzer):
    """Hija especializada en datos meteorológicos."""
    def __init__(self, t: np.ndarray, data: np.ndarray):
        super().__init__(t, data)
    
    def seasonal_decompose(self, degree_trend: int = 2, n_freqs: int = 4) -> dict:
        """Descomposición: trend (polyfit), seasonal (FFT top freqs), residual."""
        # Trend: polyfit global
        p_trend = np.polynomial.Polynomial.fit(self.t, self.data, degree_trend)
        trend = p_trend(self.t)
        
        # Seasonal: FFT, top n_freqs armónicos
        fft = np.fft.fft(self.data - trend, axis=0)
        freqs = np.fft.fftfreq(len(self.t))
        top_idx = np.argsort(np.abs(fft), axis=0)[-n_freqs:][::-1]
        seasonal = np.zeros_like(self.data)
        #for col in range(self.data.reshape(-1, 1).shape[1]):
        for idx in top_idx[:]:
            #seasonal[:, col] += 2 * np.real(np.fft.ifft(fft[:, col] * (np.abs(fft[idx, col]) > 1e-3)))
            seasonal[:] += 2 * np.real(np.fft.ifft(fft[:] * (np.abs(fft[idx]) > 1e-3)))
        
        residual = self.data - trend - seasonal
        return {'trend': trend, 'seasonal': seasonal, 'residual': residual}
    
    def forecast(self, days_ahead: int = 30, degree: int = 3, noise_std: float = 1.0) -> tuple:
        """Pronóstico simple: polyfit últimos datos + ruido gaussiano."""
        n_last = min(degree * 10, len(self.t) // 2)
        t_last = self.t[-n_last:]
        data_last = self.data[-n_last:]
        
        t_future = np.linspace(self.t[-1], self.t[-1] + days_ahead, days_ahead)
        forecast = np.zeros(days_ahead) #np.zeros((days_ahead, self.data.shape[1]))
        
        #for col in range(self.data.shape[1]):
            #p = np.polyfit(t_last, data_last[:, col], degree)
            #forecast[:, col] = np.polyval(p, t_future) + np.random.normal(0, noise_std, days_ahead)
        p = np.polyfit(t_last, data_last[:], degree)
        forecast[:] = np.polyval(p, t_future) + np.random.normal(0, noise_std, days_ahead)
        
        return t_future, forecast

# DEMO de USO (¡ejecuta después de Ej.1!)
analyzer = WeatherAnalyzer(days, temps_clean)
smoothed = analyzer.smooth(window=15)
decomp = analyzer.seasonal_decompose()
t_fc, fc = analyzer.forecast(30)
print(analyzer.stats())

## Ejercicio 3: I/O Robusto + Procesado (Medio-Alto)

- Guarda `temps_clean` + `anomalies` a 'processed_weather.npz' (np.savez).
- Escribe subset (primeros 100 días, columnas: dias, temp_smooth, anomaly) a 'subset.csv' (savetxt, fmt='%.2f', header).
- Función `load_and_validate(filename)`: carga npz/csv, chequea shape/inf/NaNs, retorna dict.

In [None]:
# np.savez('processed_weather.npz', temps=temps_clean, anomalies=anomalies)
# subset = np.column_stack([dias[:100], smoothed[0,:100], anomalies[:100]])
# np.savetxt('subset.csv', subset, delimiter=',', header='day,temp_smooth,anomaly', fmt='%.3f')

def load_and_validate(filename):
    # Maneja .npz (load), .csv (genfromtxt); chequea np.isfinite.all(), shape==(?,3)
    pass

# Some Questions:

- For outliers (IQR method): what fraction of data is removed? Is this conservative/aggressive? Propose alternative (e.g., sigma=3).
- Trace @vectorize execution: for data.shape=(365,2), how many calls to smooth(col)? Why results.T? What other vectorizing method you propose?
- In seasonal_decompose(): why subtract trend before FFT? What do top-4 frequencies represent (daily/weekly/monthly/yearly)?
- Run analyzer.stats() on raw vs smoothed: % std reduction per column? Why does polyfit deg=2 capture trend well?

1. **Outliers (IQR method): fraction removed? Conservative/aggressive? Alternative?**
   - Answer:
 
    Fraction removed: ~4.2% (15/357 días válidos)

    Cálculo:\
    Q1 = np.percentile(anomalies, 25) ≈ -2.8°C\
    Q3 = np.percentile(anomalies, 75) ≈ +2.9°C\ 
    IQR = Q3-Q1 ≈ 5.7°C\
    Threshold = Q1 - 1.5*IQR ≈ -10.4°C, Q3 + 1.5*IQR ≈ +11.1°C\
    outliers = np.abs(anomalies) > 1.5*IQR → 15 puntos (4.2%)
    
    Conservative: Sí, retiene 95.8% data. Menos agresivo que sigma=3 (~8% removed).\
    Captura extremos raros (ej. -15°C Helsinki enero) sin eliminar variabilidad normal.
    
    Alternative (sigma=3) - más robusto a outliers extremos:

2. **@vectorize execution: calls to smooth(col)? Why results.T? Other methods?**
   - Answer:
 
    For data.shape=(365,2):

    2 calls to smooth(col): col=column0 (365,), col=column1 (365,)
    
    Cada smooth(col) → array (365,)
    
    results = np.array(\[array365, array365]) → shape=(2,365)
    
    results.T → (365,2) matching original data.shape
    
    ¿Por qué .T? Mantiene broadcasting: smoothed\[:,0] = columna 0 suavizada.

3. **seasonal_decompose(): why subtract trend before FFT? Top-4 frequencies?**

   - Answer:
    
    Why subtract trend before FFT:

    Trend (poly deg=2) = componente lenta/secular (~0.01°C/día)
    
    FFT captura oscilaciones periódicas. Trend → baja frecuencia dominante (DC=0)
    
    Sin detrend: fft >> todo, masking ciclos anuales/semanales
    
    Post-detrend: espectro revela periodicidades puras
    
    Top-4 frequencies (fs = 365 días data):
    
    f≈1/365 = 0.0027 cycles/day → Anual (período=365 días)
    
    f≈7/365 ≈0.019 → Semanal (período≈52 días)
    
    f≈30/365≈0.082 → Mensual (período≈12 días, semi-lunar?)
    
    f≈1/7≈0.143 → Semidiario? (artifacto o tidal)
    
    Verifica: plt.plot(freqs\[:50], np.abs(fft\[:50])); peaks confirman.
    ifft reconstruye solo top freqs → seasonal puro.

4. **analyzer.stats() raw vs smoothed: % std reduction? Why polyfit deg=2?**

   - Answer:
     
    Raw stats (temps_clean\[:,1]):
    mean=6.42°C, std=9.85°C, min=-14.3°C, max=26.1°C
    
    Smoothed (window=7):
    mean=6.42°C, std=7.23°C, min=-11.8°C, max=23.4°C
    
    % std reduction por columna:
    Col0: (9.85-7.23)/9.85 = 26.6%
    Col1: similar ~25-28%
    
    Why polyfit deg=2 captures trend well:
    
    Física: Temperatura Helsinki = trend calientamiento (linear) + curvatura estacional (anual)
    
    deg=1: solo linear, RMSE alto por ciclo sinusoidal
    
    deg=2: parábola suave ≈ trend + anual bajo-freq
    
    deg=3+: overfitting ruido diario