# Task
Obtener, limpiar y unificar datos meteorológicos de Meteoblue (usando la clave API "API_KEY"), SIATA e IDEAM RADAR para Medellín y su Área Metropolitana. Posteriormente, almacenar los datos procesados ​​y proporcionar un resumen del desarrollo, sugerencias para la creación de un panel de control y pasos prácticos de programación adicionales.

### Investigación de Fuentes de Datos: SIATA

Para investigar las APIs o métodos de acceso a datos de SIATA (Sistema de Alerta Temprana de Medellín y el Valle de Aburrá), se recomienda seguir los siguientes pasos:

1.  **Visitar el sitio web oficial de SIATA:** Busca secciones dedicadas a 'Desarrolladores', 'APIs', 'Datos Abiertos' o 'Documentación Técnica'.
2.  **Consultar la documentación de la API:** Si encuentras una API, revisa cuidadosamente su documentación para entender:
    *   Los _endpoints_ disponibles.
    *   Los tipos de datos que ofrece (sensores meteorológicos, calidad del aire, hidrología, etc.).
    *   El formato de los datos (JSON, XML, CSV).
    *   Requisitos de autenticación o claves de API.
    *   Límites de tasa (rate limits) o restricciones de uso.
3.  **Identificar el estado de las secciones:** Presta especial atención a cualquier aviso sobre secciones fuera de servicio o información desactualizada, como se mencionó en la descripción de la tarea.
4.  **Explorar alternativas (si es necesario):** Si las APIs principales no son funcionales o no proporcionan los datos necesarios, considera:
    *   **Repositorios de datos abiertos:** Muchos gobiernos locales y entidades ofrecen portales de datos abiertos donde se publican conjuntos de datos históricos o en tiempo casi real. Busca el portal de datos abiertos de Medellín o Antioquia.
    *   **Web Scraping (con precaución):** Si no hay una API disponible y los datos están públicamente accesibles en el sitio web de SIATA de forma estructurada, el web scraping podría ser una opción. Sin embargo, esto debe hacerse con ética, respetando los términos de servicio del sitio web y asegurando que no se sobrecargue el servidor. Esta es una opción de último recurso y requiere consideración legal y ética.
    *   **Contacto directo:** En algunos casos, contactar directamente a SIATA para solicitar acceso a datos o información sobre APIs puede ser una vía.

Documentaremos todos los hallazgos, incluyendo la disponibilidad, formato, autenticación y limitaciones.

## Descargar y Procesar Datos de SIATA (Archivo)

### Subtask:
Acceder a `https://www.siata.gov.co/operacional/` para identificar y descargar un archivo de datos relevantes. Se implementará la lógica para leer este archivo, limpiar los datos de posibles errores y prepararlos para su integración.

In [4]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import json

# 1. Define the URL for SIATA's operational data section
SIATA_OPERACIONAL_URL = "https://www.siata.gov.co/operacional/"

print(f"Attempting to fetch data from SIATA operational URL: {SIATA_OPERACIONAL_URL}")

siata_operacional_soup = None
siata_downloaded_data = None # Variable to store raw file content if downloaded

try:
    # 2. Make an HTTP GET request to the URL, allowing redirects and including a timeout
    response = requests.get(SIATA_OPERACIONAL_URL, allow_redirects=True, timeout=20)
    # 3. Implement robust error handling
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    print(f"Successfully fetched SIATA operational page from {SIATA_OPERACIONAL_URL}.")

    # 4. Parse the HTML content of the response
    siata_operacional_soup = BeautifulSoup(response.text, 'html.parser')
    print(f"Title of the page: {siata_operacional_soup.title.string if siata_operacional_soup.title else 'No title found'}")
    print("HTML content parsed successfully.")

    # 5. Inspect the parsed HTML to identify potential links to downloadable data files
    # This is a placeholder for detailed inspection. We'll look for common download link patterns.
    download_links = []
    for link in siata_operacional_soup.find_all('a', href=True):
        href = link['href']
        # Look for common file extensions or keywords indicating data downloads
        if any(ext in href for ext in ['.csv', '.xlsx', '.json', '.zip', '.txt', 'download', 'export', 'data']):
            # Construct absolute URL if it's relative
            if not href.startswith('http'):
                # Simple join for relative paths, more complex logic needed for all cases
                absolute_href = requests.compat.urljoin(SIATA_OPERACIONAL_URL, href)
            else:
                absolute_href = href
            download_links.append(absolute_href)

    if download_links:
        print("\nPotential downloadable data links found:")
        for dl_link in download_links:
            print(f"- {dl_link}")
            # 6. For simplicity, attempt to download the first identified relevant file for now
            # In a real scenario, we might need to filter more specifically or present options.
            if siata_downloaded_data is None: # Only download the first one for initial check
                print(f"Attempting to download data from: {dl_link}")
                data_file_response = requests.get(dl_link, allow_redirects=True, timeout=20)
                data_file_response.raise_for_status()
                siata_downloaded_data = data_file_response.content # Store raw bytes
                print(f"Successfully downloaded data from {dl_link}. Size: {len(siata_downloaded_data)} bytes.")
                # Decide if we want to break after first download or list all and then pick.
                # For this step, let's download the first and stop.
                # If the content type is text/json, we might try to decode it.
                if 'json' in data_file_response.headers.get('Content-Type', ''):
                    try:
                        siata_downloaded_data = data_file_response.json() # Try parsing as JSON
                        print("Downloaded data appears to be JSON.")
                    except json.JSONDecodeError:
                        print("Downloaded data is not valid JSON, storing as raw content.")
                elif 'text' in data_file_response.headers.get('Content-Type', '') or 'csv' in data_file_response.headers.get('Content-Type', ''):
                    try:
                        siata_downloaded_data = data_file_response.text # Store as text for potential CSV/text parsing
                        print("Downloaded data appears to be text/CSV.")
                    except Exception as parse_e:
                        print(f"Error parsing downloaded text: {parse_e}")
                break # Only download the first for now
    else:
        # 7. If no direct downloadable file link is found
        print("\nNo direct downloadable data links (.csv, .xlsx, .json, etc.) found on the page.")
        print("Manual inspection of the page or network requests might be necessary to locate the data source.")

except requests.exceptions.Timeout:
    print(f"Error: Request to {SIATA_OPERACIONAL_URL} timed out after 20 seconds.")
except requests.exceptions.RequestException as e:
    print(f"Error fetching data from SIATA operational URL: {e}")
except Exception as e:
    print(f"An unexpected error occurred during processing: {e}")

if siata_downloaded_data is not None:
    print("SIATA data successfully downloaded and stored in 'siata_downloaded_data'.")
else:
    print("Failed to download SIATA data or no direct download links found.")

Attempting to fetch data from SIATA operational URL: https://www.siata.gov.co/operacional/
Successfully fetched SIATA operational page from https://www.siata.gov.co/operacional/.
Title of the page: Listing folder
HTML content parsed successfully.

No direct downloadable data links (.csv, .xlsx, .json, etc.) found on the page.
Manual inspection of the page or network requests might be necessary to locate the data source.
Failed to download SIATA data or no direct download links found.


## Analizar el directorio de la página operativa de SIATA
### Subtask:
Analizar el contenido HTML de la página operativa de SIATA (`https://www.siata.gov.co/operacional/`) para extraer todos los enlaces disponibles, luego identificar y listar los posibles archivos de datos o subdirectorios dentro de este directorio para su posterior análisis.

**Reasoning**:
El paso anterior recuperó y analizó correctamente la página operativa de SIATA. Ahora, analizaré el objeto `siata_operacional_soup` para extraer todos los enlaces, categorizarlos en «Archivos de datos» y «Subdirectorios» según los criterios especificados e imprimiré un resumen.

In [5]:
import requests
from urllib.parse import urlparse

# Assuming siata_operacional_soup is available from the previous step
# And SIATA_OPERACIONAL_URL is also available

if siata_operacional_soup is None:
    print("Error: siata_operacional_soup is not available. Please ensure the previous step ran successfully.")
else:
    print("Analyzing SIATA operational page for links...")

    data_files = []
    subdirectories = []
    other_links = []

    # Common data file extensions and keywords
    data_extensions = ['.csv', '.xlsx', '.json', '.zip', '.txt', '.xml', '.kmz', '.tgz', '.gz']
    data_keywords = ['data', 'export', 'archivo'] # in file name part

    for link in siata_operacional_soup.find_all('a', href=True):
        href = link['href']
        absolute_href = requests.compat.urljoin(SIATA_OPERACIONAL_URL, href)

        # Extract the path part of the URL to check for extensions/keywords
        parsed_url = urlparse(absolute_href)
        path = parsed_url.path.lower()

        # 3. Categorize links
        if path.endswith('/'):
            subdirectories.append(absolute_href)
        elif any(path.endswith(ext) for ext in data_extensions) or any(keyword in path for keyword in data_keywords):
            data_files.append(absolute_href)
        else:
            other_links.append(absolute_href)

    print("\n--- Link Analysis Summary ---")

    if data_files:
        print(f"Found {len(data_files)} potential Data Files:")
        for df_link in sorted(list(set(data_files))): # Use set to avoid duplicates and sort for readability
            print(f"- {df_link}")
    else:
        print("No obvious data files found.")

    if subdirectories:
        print(f"\nFound {len(subdirectories)} potential Subdirectories:")
        for sub_link in sorted(list(set(subdirectories))): # Use set to avoid duplicates and sort for readability
            print(f"- {sub_link}")
    else:
        print("No obvious subdirectories found.")

    # Optionally, print other links for full transparency
    # if other_links:
    #     print(f"\nFound {len(other_links)} other links (not categorized as data files or subdirectories):")
    #     for other_link_item in sorted(list(set(other_links))):
    #         print(f"- {other_link_item}")

    # Store the categorized links for potential future steps if needed
    siata_found_data_files = list(set(data_files))
    siata_found_subdirectories = list(set(subdirectories))

    print("\nLink categorization complete.")

Analyzing SIATA operational page for links...

--- Link Analysis Summary ---
No obvious data files found.

Found 32 potential Subdirectories:
- http://www.siata.gov.co/
- https://www.siata.gov.co/
- https://www.siata.gov.co/operacional/
- https://www.siata.gov.co/operacional/?C=M;O=A
- https://www.siata.gov.co/operacional/?C=N;O=A
- https://www.siata.gov.co/operacional/?C=S;O=A
- https://www.siata.gov.co/operacional/CicloAnual/
- https://www.siata.gov.co/operacional/Meteorologia/
- https://www.siata.gov.co/operacional/WRF/
- https://www.siata.gov.co/operacional/enso/
- https://www.siata.gov.co/operacional/mapas/
- https://www.siata.gov.co/operacional/monitoreo/
- https://www.siata.gov.co/operacional/prcSirena/
- https://www.siata.gov.co/operacional/radar/
- https://www.siata.gov.co/operacional/radiometro/
- https://www.siata.gov.co/operacional/seriesdetiempo/
- https://www.siata.gov.co/operacional/seriesdetiempo_prueba/

Link categorization complete.


In [6]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, urljoin

# 1. Define the URL for SIATA's Meteorological subdirectory
SIATA_METEOROLOGIA_URL = "https://www.siata.gov.co/operacional/Meteorologia/"

print(f"Attempting to fetch data from SIATA Meteorological URL: {SIATA_METEOROLOGIA_URL}")

siata_meteorologia_soup = None

try:
    # 2. Make an HTTP GET request to the URL, allowing redirects and including a timeout
    response = requests.get(SIATA_METEOROLOGIA_URL, allow_redirects=True, timeout=20)
    # 3. Implement robust error handling
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    print(f"Successfully fetched SIATA Meteorological page from {SIATA_METEOROLOGIA_URL}.")

    # 4. Parse the HTML content of the response
    siata_meteorologia_soup = BeautifulSoup(response.text, 'html.parser')
    print(f"Title of the page: {siata_meteorologia_soup.title.string if siata_meteorologia_soup.title else 'No title found'}")
    print("HTML content parsed successfully.")

    # 5. Extract all links found on the page.
    data_files = []
    subdirectories = []

    # Common data file extensions and keywords
    data_extensions = ['.csv', '.xlsx', '.json', '.zip', '.txt', '.xml', '.kmz', '.tgz', '.gz']
    data_keywords = ['data', 'export', 'archivo'] # in file name part

    for link in siata_meteorologia_soup.find_all('a', href=True):
        href = link['href']
        absolute_href = urljoin(SIATA_METEOROLOGIA_URL, href)

        # Extract the path part of the URL to check for extensions/keywords
        parsed_url = urlparse(absolute_href)
        path = parsed_url.path.lower()

        # 6. Categorize the extracted links
        if path.endswith('/') and absolute_href != SIATA_METEOROLOGIA_URL and absolute_href != SIATA_METEOROLOGIA_URL + '../': # Exclude current dir and parent dir
            subdirectories.append(absolute_href)
        elif any(path.endswith(ext) for ext in data_extensions) or any(keyword in path for keyword in data_keywords):
            data_files.append(absolute_href)

    # 7. Print a summary of the findings
    print("\n--- Link Analysis Summary for Meteorologia Directory ---")

    if data_files:
        print(f"Found {len(data_files)} potential Data Files:")
        for df_link in sorted(list(set(data_files))): # Use set to avoid duplicates and sort for readability
            print(f"- {df_link}")
    else:
        print("No obvious data files found in this directory.")

    if subdirectories:
        print(f"\nFound {len(subdirectories)} potential Subdirectories:")
        for sub_link in sorted(list(set(subdirectories))): # Use set to avoid duplicates and sort for readability
            print(f"- {sub_link}")
    else:
        print("No obvious subdirectories found in this directory.")

except requests.exceptions.Timeout:
    print(f"Error: Request to {SIATA_METEOROLOGIA_URL} timed out after 20 seconds.")
except requests.exceptions.RequestException as e:
    print(f"Error fetching data from SIATA Meteorological URL: {e}")
except Exception as e:
    print(f"An unexpected error occurred during processing: {e}")

Attempting to fetch data from SIATA Meteorological URL: https://www.siata.gov.co/operacional/Meteorologia/
Successfully fetched SIATA Meteorological page from https://www.siata.gov.co/operacional/Meteorologia/.
Title of the page: Listing folder
HTML content parsed successfully.

--- Link Analysis Summary for Meteorologia Directory ---
No obvious data files found in this directory.

Found 11 potential Subdirectories:
- http://www.siata.gov.co/
- https://www.siata.gov.co/
- https://www.siata.gov.co/operacional/
- https://www.siata.gov.co/operacional/Meteorologia/?C=M;O=A
- https://www.siata.gov.co/operacional/Meteorologia/?C=N;O=A
- https://www.siata.gov.co/operacional/Meteorologia/?C=S;O=A
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/


In [7]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, urljoin

# 1. Define the URL for SIATA's Accumulative Precipitation subdirectory
SIATA_ACUMPRECIPITACION_URL = "https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/"

print(f"Attempting to fetch data from SIATA Accumulative Precipitation URL: {SIATA_ACUMPRECIPITACION_URL}")

siata_acumprecipitacion_soup = None

try:
    # 2. Make an HTTP GET request to the URL, allowing redirects and including a timeout
    response = requests.get(SIATA_ACUMPRECIPITACION_URL, allow_redirects=True, timeout=20)
    # 3. Implement robust error handling
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    print(f"Successfully fetched SIATA Accumulative Precipitation page from {SIATA_ACUMPRECIPITACION_URL}.")

    # 4. Parse the HTML content of the response
    siata_acumprecipitacion_soup = BeautifulSoup(response.text, 'html.parser')
    print(f"Title of the page: {siata_acumprecipitacion_soup.title.string if siata_acumprecipitacion_soup.title else 'No title found'}")
    print("HTML content parsed successfully.")

    # 5. Extract all links found on the page.
    data_files = []
    subdirectories = []

    # Common data file extensions and keywords
    data_extensions = ['.csv', '.xlsx', '.json', '.zip', '.txt', '.xml', '.kmz', '.tgz', '.gz']
    data_keywords = ['data', 'export', 'archivo'] # in file name part

    for link in siata_acumprecipitacion_soup.find_all('a', href=True):
        href = link['href']
        absolute_href = urljoin(SIATA_ACUMPRECIPITACION_URL, href)

        # Extract the path part of the URL to check for extensions/keywords
        parsed_url = urlparse(absolute_href)
        path = parsed_url.path.lower()

        # 6. Categorize the extracted links
        if path.endswith('/') and absolute_href != SIATA_ACUMPRECIPITACION_URL and absolute_href != SIATA_ACUMPRECIPITACION_URL + '../': # Exclude current dir and parent dir
            subdirectories.append(absolute_href)
        elif any(path.endswith(ext) for ext in data_extensions) or any(keyword in path for keyword in data_keywords):
            data_files.append(absolute_href)

    # 7. Print a summary of the findings
    print("\n--- Link Analysis Summary for AcumPrecipitacion Directory ---")

    if data_files:
        print(f"Found {len(data_files)} potential Data Files:")
        for df_link in sorted(list(set(data_files))): # Use set to avoid duplicates and sort for readability
            print(f"- {df_link}")
    else:
        print("No obvious data files found in this directory.")

    if subdirectories:
        print(f"\nFound {len(subdirectories)} potential Subdirectories:")
        for sub_link in sorted(list(set(subdirectories))): # Use set to avoid duplicates and sort for readability
            print(f"- {sub_link}")
    else:
        print("No obvious subdirectories found in this directory.")

except requests.exceptions.Timeout:
    print(f"Error: Request to {SIATA_ACUMPRECIPITACION_URL} timed out after 20 seconds.")
except requests.exceptions.RequestException as e:
    print(f"Error fetching data from SIATA Accumulative Precipitation URL: {e}")
except Exception as e:
    print(f"An unexpected error occurred during processing: {e}")

Attempting to fetch data from SIATA Accumulative Precipitation URL: https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/
Successfully fetched SIATA Accumulative Precipitation page from https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/.
Title of the page: Listing folder
HTML content parsed successfully.

--- Link Analysis Summary for AcumPrecipitacion Directory ---
Found 20 potential Data Files:
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Abril2025.txt
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Agosto2025.txt
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Febrero2025.txt
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Julio2025.txt
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Junio2025.txt
- https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Marzo202

In [8]:
import requests
import pandas as pd
import io

# URL of a specific data file identified in the previous step
# Choosing DatosPacum_Noviembre2025.txt as an example
SIATA_DATA_FILE_URL = "https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Noviembre2025.txt"

print(f"Attempting to download data file from: {SIATA_DATA_FILE_URL}")

siata_precipitacion_data = None

try:
    # Make an HTTP GET request to download the file
    response = requests.get(SIATA_DATA_FILE_URL, allow_redirects=True, timeout=30)
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    print(f"Successfully downloaded data from {SIATA_DATA_FILE_URL}.")

    # The content is text, likely tab-separated based on typical SIATA files or simple text
    # Let's try to read it into a pandas DataFrame, first as a raw string to inspect.
    raw_content = response.text
    print("\n--- Sample of raw file content (first 500 characters) ---")
    print(raw_content[:500])
    print("-----------------------------------------------------------")

    # Attempt to parse as CSV/TXT, assuming tab or space separated for now
    # Given the output format, it's highly likely to be tab-separated or fixed-width.
    # Let's try reading with pandas, inferring delimiter or using common ones.
    try:
        # Using io.StringIO to treat the string content as a file
        siata_precipitacion_data = pd.read_csv(io.StringIO(raw_content), sep='\t', encoding='utf-8')
        print("\nSuccessfully parsed data into a DataFrame (assuming tab-separated).")
        print("First 5 rows of SIATA precipitation DataFrame:")
        print(siata_precipitacion_data.head())
        print("\nDataFrame Info:")
        siata_precipitacion_data.info()
    except Exception as parse_error:
        print(f"Error parsing data into DataFrame: {parse_error}")
        print("Attempting to read with space as delimiter.")
        try:
            siata_precipitacion_data = pd.read_csv(io.StringIO(raw_content), sep=' ', skipinitialspace=True, encoding='utf-8')
            print("\nSuccessfully parsed data into a DataFrame (assuming space-separated).")
            print("First 5 rows of SIATA precipitation DataFrame:")
            print(siata_precipitacion_data.head())
            print("\nDataFrame Info:")
            siata_precipitacion_data.info()
        except Exception as second_parse_error:
            print(f"Error parsing data with space delimiter: {second_parse_error}")
            print("Could not parse data into DataFrame. Raw content stored as string.")
            siata_precipitacion_data = raw_content # Fallback to raw string if parsing fails

except requests.exceptions.Timeout:
    print(f"Error: Request to {SIATA_DATA_FILE_URL} timed out after 30 seconds.")
except requests.exceptions.RequestException as e:
    print(f"Error fetching data file from SIATA: {e}")
except Exception as e:
    print(f"An unexpected error occurred during processing: {e}")

if siata_precipitacion_data is not None and isinstance(siata_precipitacion_data, pd.DataFrame):
    print("SIATA precipitation data successfully loaded into 'siata_precipitacion_data' DataFrame.")
elif siata_precipitacion_data is not None:
    print("SIATA precipitation data downloaded but not parsed into DataFrame. Stored as raw content.")
else:
    print("Failed to download or process SIATA precipitation data.")


Attempting to download data file from: https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Noviembre2025.txt
Successfully downloaded data from https://www.siata.gov.co/operacional/Meteorologia/AcumPrecipitacion/DatosPacum_Noviembre2025.txt.

--- Sample of raw file content (first 500 characters) ---
Fecha actualizacion: 2025/11/21 00:01
Estacion,Nombre,Municipio,Barrio,Climatologia mes,Acumulado Mes (mm),Porcentaje Mes
420, Pueblo Viejo - Pluviometro, La Estrella, NULL, 172.720, 318.770, 184.559
66, I.E San Andres (Sede El Socorro), Girardota, NA, 120.730, 217.424, 180.091
62, Gimnasio Cantabria, La Estrella, NA, 208.720, 308.356, 147.737
248, Vivero EPM Piedras Blancas - Pluviometro, Guarne, NA, 190.500, 276.352, 145.067
127, I.E. Manuel Jose Sierra - Sede la Holanda, Girardota, NA, 146.71
-----------------------------------------------------------

Successfully parsed data into a DataFrame (assuming tab-separated).
First 5 rows of SIATA precipitation DataFra

In [9]:
import io
import pandas as pd

# Assuming raw_content is available from the previous step
# It contains the entire text content of the downloaded file.

if raw_content:
    print("Attempting to re-parse SIATA precipitation data with correct delimiter and skipping header.")
    try:
        # Use io.StringIO to treat the string content as a file
        # Skip the first row (metadata) using skiprows=1
        # Use comma as the delimiter
        siata_precipitacion_data_cleaned = pd.read_csv(
            io.StringIO(raw_content),
            sep=',',
            skiprows=1, # Skip the "Fecha actualizacion" line
            encoding='utf-8',
            # We might have issues with extra spaces or inconsistent delimiters, let's refine this if needed
            # For now, assuming standard CSV format after skipping the first line.
            on_bad_lines='skip' # Skip lines that have too many fields
        )
        print("\nSuccessfully parsed data into a DataFrame with correct delimiter and skipped header.")
        print("First 5 rows of cleaned SIATA precipitation DataFrame:")
        print(siata_precipitacion_data_cleaned.head())
        print("\nDataFrame Info:")
        siata_precipitacion_data_cleaned.info()

        siata_precipitacion_data = siata_precipitacion_data_cleaned # Update the main variable

    except Exception as parse_error:
        print(f"Error re-parsing data into DataFrame: {parse_error}")
        print("Could not parse data into DataFrame after cleaning attempt. Raw content stored as string.")
        # Fallback to raw string if parsing fails even after cleaning attempts
        siata_precipitacion_data = raw_content
else:
    print("No raw content available for re-parsing.")

if siata_precipitacion_data is not None and isinstance(siata_precipitacion_data, pd.DataFrame):
    print("SIATA precipitation data successfully loaded into 'siata_precipitacion_data' DataFrame.")
elif siata_precipitacion_data is not None:
    print("SIATA precipitation data downloaded but not parsed into DataFrame. Stored as raw content.")
else:
    print("Failed to download or process SIATA precipitation data.")

Attempting to re-parse SIATA precipitation data with correct delimiter and skipping header.

Successfully parsed data into a DataFrame with correct delimiter and skipped header.
First 5 rows of cleaned SIATA precipitation DataFrame:
   Estacion                                      Nombre     Municipio Barrio  \
0       420                  Pueblo Viejo - Pluviometro   La Estrella   NULL   
1        66            I.E San Andres (Sede El Socorro)     Girardota     NA   
2        62                          Gimnasio Cantabria   La Estrella     NA   
3       248    Vivero EPM Piedras Blancas - Pluviometro        Guarne     NA   
4       127   I.E. Manuel Jose Sierra - Sede la Holanda     Girardota     NA   

   Climatologia mes  Acumulado Mes (mm)  Porcentaje Mes  
0            172.72             318.770         184.559  
1            120.73             217.424         180.091  
2            208.72             308.356         147.737  
3            190.50             276.352         145.06

### Investigación de Fuentes de Datos: IDEAM RADAR

Para investigar las APIs o métodos de acceso a datos de IDEAM RADAR, especialmente para la región de Medellín, se recomienda seguir los siguientes pasos:

1.  **Visitar el sitio web oficial de IDEAM:** Busca secciones dedicadas a 'Radar Meteorológico', 'Datos Abiertos', 'Servicios' o 'API'.
2.  **Identificar el acceso a datos de radar:** El IDEAM es la autoridad meteorológica nacional en Colombia. Su sitio web o portales asociados suelen ofrecer acceso a información de radar. Busca específicamente datos de radar para la región de Antioquia o, más específicamente, Medellín.
3.  **Consultar la documentación de la API (si existe):** Si IDEAM proporciona una API para datos de radar, revisa su documentación para entender:
    *   Los _endpoints_ para datos de radar.
    *   El formato de los datos (imágenes, datos geoespaciales como GeoTIFF, NetCDF, etc.).
    *   La resolución espacial y temporal de los datos.
    *   Requisitos de autenticación o claves de API.
    *   Cualquier restricción de uso o límite de tasa.
4.  **Explorar portales de datos abiertos o geoservicios:** El IDEAM a menudo comparte sus datos a través de portales de datos abiertos gubernamentales o geoservicios (como WMS, WCS) que permiten el acceso a capas de información geográfica, incluyendo datos de radar.
5.  **Contacto directo:** Si la información no es clara o no se encuentra una API adecuada, contactar directamente al IDEAM puede ser necesario para obtener información sobre el acceso programático o la descarga de datos de radar.

Documentaremos todos los hallazgos, incluyendo la disponibilidad de datos de radar para Medellín, su formato, los requisitos de autenticación (si los hay) y cualquier limitación de uso.

In [10]:
import requests
from bs4 import BeautifulSoup
import re

# 1. Define the URL for the dedicated IDEAM radar viewer page
IDEAM_RADAR_VIEWER_URL = "http://www.ideam.gov.co/web/tiempo-y-clima/radar-meteorologico"

print(f"Attempting to fetch content from IDEAM Radar Viewer URL: {IDEAM_RADAR_VIEWER_URL}")

ideam_radar_viewer_soup = None
radar_image_urls = []

# Keywords to look for in src attributes or surrounding text
image_keywords = ['radar', 'imagen', 'animacion', 'actual', 'tiempo', 'meteo', 'gif', 'png', 'jpg']

try:
    # 2. Make an HTTP GET request to the URL with robust error handling and timeout
    response = requests.get(IDEAM_RADAR_VIEWER_URL, allow_redirects=True, timeout=20)
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    print("Successfully fetched IDEAM Radar Viewer page.")

    # 3. Parse the HTML content of the response
    ideam_radar_viewer_soup = BeautifulSoup(response.text, 'html.parser')
    print(f"Title of the page: {ideam_radar_viewer_soup.title.string if ideam_radar_viewer_soup.title else 'No title found'}")

    # 4. Inspect the HTML for <img> or <iframe> tags
    print("\n--- Inspecting for potential radar image/iframe sources ---")
    for tag in ideam_radar_viewer_soup.find_all(['img', 'iframe']):
        if 'src' in tag.attrs:
            src = tag['src']
            # Construct absolute URL
            full_src = requests.compat.urljoin(IDEAM_RADAR_VIEWER_URL, src)

            # Check for keywords in the URL or surrounding text
            if any(kw in full_src.lower() for kw in image_keywords):
                radar_image_urls.append(full_src)
            # Also check text content around the tag for descriptive titles
            elif tag.name == 'iframe':
                # Check previous/next sibling text, or parent's text
                if tag.previous_sibling and isinstance(tag.previous_sibling, str) and any(kw in tag.previous_sibling.lower() for kw in image_keywords):
                    radar_image_urls.append(full_src)
                elif tag.next_sibling and isinstance(tag.next_sibling, str) and any(kw in tag.next_sibling.lower() for kw in image_keywords):
                    radar_image_urls.append(full_src)
                # A more general approach would involve searching the parent's text or checking title/alt attributes
                elif 'title' in tag.attrs and any(kw in tag['title'].lower() for kw in image_keywords):
                    radar_image_urls.append(full_src)

    # Remove duplicates
    radar_image_urls = list(set(radar_image_urls))

    # 5. List potential radar image URLs
    if radar_image_urls:
        print(f"\nFound {len(radar_image_urls)} potential radar image/iframe URLs:")
        for url in radar_image_urls:
            print(f"- {url}")

        # 6. If URLs are found, attempt to download the most promising one
        # Prioritize URLs that explicitly mention 'radar' and common image formats
        prioritized_urls = sorted(radar_image_urls, key=lambda x: ('radar' not in x.lower(), '.gif' not in x.lower(), '.png' not in x.lower(), '.jpg' not in x.lower()))

        ideam_radar_image_content = None
        if prioritized_urls:
            most_promising_url = prioritized_urls[0]
            print(f"\nAttempting to download content from the most promising URL: {most_promising_url}")
            try:
                response_image = requests.get(most_promising_url, allow_redirects=True, timeout=30)
                response_image.raise_for_status()

                content_type = response_image.headers.get('Content-Type', '').lower()
                print(f"Downloaded content type: {content_type}")

                if 'image' in content_type or 'application/octet-stream' in content_type:
                    ideam_radar_image_content = response_image.content # Store raw bytes
                    print(f"Successfully downloaded image content. Size: {len(ideam_radar_image_content)} bytes.")
                elif 'text' in content_type or 'html' in content_type:
                    ideam_radar_image_content = response_image.text # Store as text for inspection
                    print(f"Downloaded HTML/text content (not a direct image). Size: {len(ideam_radar_image_content)} bytes.")
                    print(f"First 500 characters of content:\n{ideam_radar_image_content[:500]}")
                else:
                    ideam_radar_image_content = response_image.content
                    print(f"Downloaded content of unknown type. Size: {len(ideam_radar_image_content)} bytes.")

            except requests.exceptions.Timeout:
                print(f"Error: Request to {most_promising_url} timed out after 30 seconds.")
            except requests.exceptions.RequestException as e:
                print(f"Error fetching content from {most_promising_url}: {e}")
            except Exception as e:
                print(f"An unexpected error occurred during download: {e}")

        if ideam_radar_image_content:
            print("IDEAM radar image/content stored in 'ideam_radar_image_content'.")
        else:
            print("Failed to obtain IDEAM radar image/content from promising URL.")

    else:
        print("No potential radar image URLs were found on the page.")

except requests.exceptions.Timeout:
    print(f"Error: Request to {IDEAM_RADAR_VIEWER_URL} timed out after 20 seconds.")
except requests.exceptions.RequestException as e:
    print(f"Error fetching IDEAM Radar Viewer page: {e}")
except Exception as e:
    print(f"An unexpected error occurred during processing: {e}")


Attempting to fetch content from IDEAM Radar Viewer URL: http://www.ideam.gov.co/web/tiempo-y-clima/radar-meteorologico
Error fetching IDEAM Radar Viewer page: 404 Client Error: Not Found for url: http://www.ideam.gov.co/web/tiempo-y-clima/radar-meteorologico


## Limpieza y Unificación de Datos (Meteoblue, Meteosource, SIATA)

### Subtask:
Limpiar, transformar y unificar los datos obtenidos de Meteoblue, Meteosource y SIATA en un DataFrame de Pandas común.

**Reasoning**:
El código anterior fallaba porque el DataFrame 'daily' de Meteosource tenía una columna 'day' para las fechas en lugar de 'date'. Corregiré esto actualizando la referencia del nombre de la columna a 'day' para convertir correctamente la columna de fecha a objetos datetime.

In [11]:
import pandas as pd
import json

# --- Extracting and processing Meteosource data ---

meteosource_current_df = None
meteosource_hourly_df = None
meteosource_daily_df = None

if meteosource_raw_data:
    # 1. Process 'current' data
    if 'current' in meteosource_raw_data:
        current_data = meteosource_raw_data['current']
        # Flatten nested dictionaries like 'wind' and 'precipitation'
        flattened_current = {}
        for key, value in current_data.items():
            if isinstance(value, dict):
                for sub_key, sub_value in value.items():
                    flattened_current[f"{key}_{sub_key}"] = sub_value
            else:
                flattened_current[key] = value
        meteosource_current_df = pd.DataFrame([flattened_current])
        # Add a timestamp for the current data
        meteosource_current_df['date'] = pd.to_datetime(meteosource_raw_data['hourly']['data'][0]['date'] if 'hourly' in meteosource_raw_data and meteosource_raw_data['hourly']['data'] else pd.Timestamp.now())
        print("Meteosource current data extracted and converted to DataFrame.")
        print("First 5 rows of meteosource_current_df:")
        print(meteosource_current_df.head())
        print("\nDataFrame Info for meteosource_current_df:")
        meteosource_current_df.info()

    # 2. Process 'hourly' data
    if 'hourly' in meteosource_raw_data and meteosource_raw_data['hourly']['data']:
        meteosource_hourly_df = pd.DataFrame(meteosource_raw_data['hourly']['data'])
        meteosource_hourly_df['date'] = pd.to_datetime(meteosource_hourly_df['date'])
        print("\n Meteosource hourly data extracted and converted to DataFrame.")
        print("First 5 rows of meteosource_hourly_df:")
        print(meteosource_hourly_df.head())
        print("\nDataFrame Info for meteosource_hourly_df:")
        meteosource_hourly_df.info()

    # 3. Process 'daily' data
    if 'daily' in meteosource_raw_data and meteosource_raw_data['daily']['data']:
        meteosource_daily_df = pd.DataFrame(meteosource_raw_data['daily']['data'])
        # Corrected: 'day' column is the date in daily data, not 'date'
        meteosource_daily_df['date'] = pd.to_datetime(meteosource_daily_df['day'])
        print("\nMeteosource daily data extracted and converted to DataFrame.")
        print("First 5 rows of meteosource_daily_df:")
        print(meteosource_daily_df.head())
        print("\nDataFrame Info for meteosource_daily_df:")
        meteosource_daily_df.info()

else:
    print("No raw Meteosource data available to process.")

NameError: name 'meteosource_raw_data' is not defined

### Obtener datos de Meteosource API

**Subtask:** Utilizar la clave API de Meteosource y las coordenadas de Medellín para obtener datos meteorológicos de la API de Meteosource.

In [None]:
# Replace 'YOUR_METEOSOURCE_API_KEY' with your actual Meteosource API key
METEOSOURCE_API_KEY = "8o3g3rf2mjb82r8zxb6vjhvo556beidj0xk1h9p6"

# Define the API endpoint and parameters for Medellín
METEOSOURCE_BASE_URL = "https://www.meteosource.com/api/v1/free/point"

# Assuming LATITUDE and LONGITUDE are already defined from Meteoblue setup
LATITUDE = 6.2442
LONGITUDE = -75.5812

params = {
    "lat": LATITUDE,
    "lon": LONGITUDE,
    "sections": "current,hourly,daily", # Request current, hourly, and daily data
    "timezone": "America/Bogota",
    "language": "en",
    "units": "metric", # Use metric units
    "key": METEOSOURCE_API_KEY
}

print(f"Attempting to fetch data from Meteosource API for Latitude: {LATITUDE}, Longitude: {LONGITUDE}")

meterosource_raw_data = None

try:
    response = requests.get(METEOSOURCE_BASE_URL, params=params, timeout=30)
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    meterosource_raw_data = response.json()
    print("Successfully fetched data from Meteosource API.")
    print(f"Meteosource raw data keys: {meterosource_raw_data.keys()}")

except requests.exceptions.Timeout:
    print(f"Error: Request to Meteosource API timed out after 30 seconds.")
except requests.exceptions.RequestException as e:
    print(f"Error fetching data from Meteosource API: {e}")
except Exception as e:
    print(f"An unexpected error occurred during Meteosource API processing: {e}")

if meterosource_raw_data:
    print("Meteosource raw data has been stored in 'meterosource_raw_data'.")
else:
    print("Failed to retrieve Meteosource data.")


Attempting to fetch data from Meteosource API for Latitude: 6.2442, Longitude: -75.5812
Successfully fetched data from Meteosource API.
Meteosource raw data keys: dict_keys(['lat', 'lon', 'elevation', 'timezone', 'units', 'current', 'hourly', 'daily'])
Meteosource raw data has been stored in 'meterosource_raw_data'.


In [None]:
print("\n--- Inspecting Meteoblue DataFrame (meteoblue_df) ---")
if meteoblue_df is not None:
    print(meteoblue_df.head())
    print(meteoblue_df.info())
else:
    print("meteoblue_df is not available.")

print("\n--- Inspecting Meteosource Current DataFrame (meteosource_current_df) ---")
if meteosource_current_df is not None:
    print(meteosource_current_df.head())
    print(meteosource_current_df.info())
else:
    print("meteosource_current_df is not available.")

print("\n--- Inspecting Meteosource Hourly DataFrame (meteosource_hourly_df) ---")
if meteosource_hourly_df is not None:
    print(meteosource_hourly_df.head())
    print(meteosource_hourly_df.info())
else:
    print("meteosource_hourly_df is not available.")

print("\n--- Inspecting Meteosource Daily DataFrame (meteosource_daily_df) ---")
if meteosource_daily_df is not None:
    print(meteosource_daily_df.head())
    print(meteosource_daily_df.info())
else:
    print("meteosource_daily_df is not available.")

print("\n--- Inspecting SIATA Precipitation DataFrame (siata_precipitacion_data) ---")
if siata_precipitacion_data is not None:
    print(siata_precipitacion_data.head())
    print(siata_precipitacion_data.info())
else:
    print("siata_precipitacion_data is not available.")


--- Inspecting Meteoblue DataFrame (meteoblue_df) ---


NameError: name 'meteoblue_df' is not defined

In [None]:
import pandas as pd

# --- Flattening nested columns in Meteosource DataFrames ---

# Flatten 'wind', 'precipitation', 'cloud_cover' in meteosource_hourly_df
if meteosource_hourly_df is not None:
    print("Flattening nested columns in meteosource_hourly_df...")
    # Wind data
    if 'wind' in meteosource_hourly_df.columns:
        meteosource_hourly_df = pd.concat([meteosource_hourly_df.drop('wind', axis=1),
                                           meteosource_hourly_df['wind'].apply(pd.Series).add_prefix('wind_')], axis=1)
    # Precipitation data
    if 'precipitation' in meteosource_hourly_df.columns:
        meteosource_hourly_df = pd.concat([meteosource_hourly_df.drop('precipitation', axis=1),
                                           meteosource_hourly_df['precipitation'].apply(pd.Series).add_prefix('precipitation_')], axis=1)
    # Cloud cover data
    if 'cloud_cover' in meteosource_hourly_df.columns:
        meteosource_hourly_df = pd.concat([meteosource_hourly_df.drop('cloud_cover', axis=1),
                                           meteosource_hourly_df['cloud_cover'].apply(pd.Series).add_prefix('cloud_cover_')], axis=1)

    print("Meteosource hourly DataFrame after flattening:")
    print(meteosource_hourly_df.head())
    print(meteosource_hourly_df.info())

# Flatten 'all_day' and similar in meteosource_daily_df (which contain nested weather details)
if meteosource_daily_df is not None:
    print("\nFlattening nested columns in meteosource_daily_df...")

    # The 'all_day' column in daily data contains detailed weather for the day.
    # We will extract temperature_max, temperature_min, precipitation_total, and wind_speed from it.
    if 'all_day' in meteosource_daily_df.columns:
        temp_df = meteosource_daily_df['all_day'].apply(pd.Series)

        # Extract temperature info directly from temp_df as 'temperature_max' and 'temperature_min' are already there
        if 'temperature_max' in temp_df.columns:
            meteosource_daily_df['temperature_max'] = temp_df['temperature_max']
        if 'temperature_min' in temp_df.columns:
            meteosource_daily_df['temperature_min'] = temp_df['temperature_min']

        # Extract precipitation info
        if 'precipitation' in temp_df.columns:
            temp_precip_df = temp_df['precipitation'].apply(pd.Series)
            if 'total' in temp_precip_df.columns:
                meteosource_daily_df['precipitation_total'] = temp_precip_df['total']

        # Extract wind speed (might be nested under wind as well)
        if 'wind' in temp_df.columns:
            temp_wind_df = temp_df['wind'].apply(pd.Series)
            if 'speed' in temp_wind_df.columns:
                meteosource_daily_df['wind_speed'] = temp_wind_df['speed']

        # Drop the original nested 'all_day' column
        meteosource_daily_df = meteosource_daily_df.drop(columns=['all_day'])

    print("Meteosource daily DataFrame after flattening:")
    print(meteosource_daily_df.head())
    print(meteosource_daily_df.info())


print("\nFlattening of nested Meteosource columns complete.")


Flattening of nested Meteosource columns complete.


**Reasoning**:
El código anterior falló porque `temp_df['temperature']` ya era una serie escalar, lo que convertía a `temp_temp_df` en un DataFrame con una sola columna `0`, y por lo tanto, `temp_temp_df['max']` generaba un KeyError. Los valores de `temperature_max` y `temperature_min` ya están disponibles directamente en `temp_df` gracias al aplanamiento de `all_day`. Corregiré esto accediendo directamente a estas columnas desde `temp_df`.

In [None]:
import pandas as pd

# Define common column names for unification
COMMON_COLUMNS = {
    'date': 'date',
    'time': 'date', # Meteoblue uses 'time' for date
    'temperature_mean': 'avg_temperature_c',
    'temperature': 'avg_temperature_c',
    'temperature_instant': 'current_temperature_c',
    'temperature_max': 'max_temperature_c',
    'temperature_min': 'min_temperature_c',
    'precipitation': 'total_precipitation_mm', # Meteoblue
    'precipitation_total': 'total_precipitation_mm', # Meteosource
    'windspeed_mean': 'avg_wind_speed_ms',
    'wind_speed': 'avg_wind_speed_ms',
    'windspeed_max': 'max_wind_speed_ms',
    'windspeed_min': 'min_wind_speed_ms',
    'relativehumidity_mean': 'avg_relative_humidity_percent',
    'relativehumidity_max': 'max_relative_humidity_percent',
    'relativehumidity_min': 'min_relative_humidity_percent',
    'Acumulado Mes (mm)': 'total_precipitation_mm', # SIATA
    'Estacion': 'station_id',
    'Nombre': 'station_name',
    'Municipio': 'municipality',
    'Barrio': 'neighborhood'
}

# --- Process Meteoblue DataFrame ---
if meteoblue_df is not None:
    print("Processing Meteoblue DataFrame...")
    # Rename columns to common names
    meteoblue_df = meteoblue_df.rename(columns=COMMON_COLUMNS)
    # Select relevant columns for unification
    meteoblue_df_cleaned = meteoblue_df[['date', 'current_temperature_c', 'max_temperature_c', 'min_temperature_c',
                                           'total_precipitation_mm', 'avg_wind_speed_ms']].copy()
    meteoblue_df_cleaned['source'] = 'Meteoblue'
    # Ensure date is just a date for daily comparison
    meteoblue_df_cleaned['date'] = meteoblue_df_cleaned['date'].dt.date
    print("Meteoblue DataFrame processed.")
    print(meteoblue_df_cleaned.head())
    print(meteoblue_df_cleaned.info())

# --- Process Meteosource Current DataFrame ---
if meteosource_current_df is not None:
    print("\nProcessing Meteosource Current DataFrame...")
    meteosource_current_df = meteosource_current_df.rename(columns=COMMON_COLUMNS)
    # Ensure the 'date' column is datetime64[ns] before using .dt accessor
    meteosource_current_df['date'] = pd.to_datetime(meteosource_current_df['date'])

    # The 'date' column is datetime, convert to date only for daily comparison
    meteosource_current_df['date'] = meteosource_current_df['date'].dt.date
    # Corrected: use 'avg_temperature_c' as 'temperature' was mapped to it
    meteosource_current_df_cleaned = meteosource_current_df[['date', 'avg_temperature_c',
                                                             'avg_wind_speed_ms', 'total_precipitation_mm']].copy()
    meteosource_current_df_cleaned['source'] = 'Meteosource_Current'
    print("Meteosource Current DataFrame processed.")
    print(meteosource_current_df_cleaned.head())
    print(meteosource_current_df_cleaned.info())

# --- Process Meteosource Hourly DataFrame ---
if meteosource_hourly_df is not None:
    print("\nProcessing Meteosource Hourly DataFrame...")
    meteosource_hourly_df = meteosource_hourly_df.rename(columns=COMMON_COLUMNS)
    # Hourly data might need to be aggregated to daily or kept as is, depending on the target granularity.
    # For now, let's keep it hourly, but extract some key daily aggregations if needed later.
    # For direct unification, we'll aim for daily for now.
    meteosource_hourly_df['date_only'] = meteosource_hourly_df['date'].dt.date
    meteosource_hourly_df_cleaned = meteosource_hourly_df.groupby('date_only').agg(
        avg_temperature_c=('avg_temperature_c', 'mean'),
        max_temperature_c=('avg_temperature_c', 'max'),
        min_temperature_c=('avg_temperature_c', 'min'),
        total_precipitation_mm=('total_precipitation_mm', 'sum'),
        avg_wind_speed_ms=('avg_wind_speed_ms', 'mean')
    ).reset_index().rename(columns={'date_only': 'date'})
    meteosource_hourly_df_cleaned['source'] = 'Meteosource_Hourly_Agg'
    print("Meteosource Hourly DataFrame processed and aggregated.")
    print(meteosource_hourly_df_cleaned.head())
    print(meteosource_hourly_df_cleaned.info())

# --- Process Meteosource Daily DataFrame ---
if meteosource_daily_df is not None:
    print("\nProcessing Meteosource Daily DataFrame...")
    meteosource_daily_df = meteosource_daily_df.rename(columns=COMMON_COLUMNS)
    meteosource_daily_df_cleaned = meteosource_daily_df[['date', 'max_temperature_c', 'min_temperature_c',
                                                            'total_precipitation_mm', 'avg_wind_speed_ms']].copy()
    meteosource_daily_df_cleaned['source'] = 'Meteosource_Daily'
    # Ensure date is just a date for daily comparison
    meteosource_daily_df_cleaned['date'] = meteosource_daily_df_cleaned['date'].dt.date
    print("Meteosource Daily DataFrame processed.")
    print(meteosource_daily_df_cleaned.head())
    print(meteosource_daily_df_cleaned.info())

# --- Process SIATA Precipitation DataFrame ---
if siata_precipitacion_data is not None:
    print("\nProcessing SIATA Precipitation DataFrame...")
    siata_precipitacion_data = siata_precipitacion_data.rename(columns=COMMON_COLUMNS)
    siata_precipitacion_data_cleaned = siata_precipitacion_data[['station_id', 'station_name', 'municipality',
                                                                'neighborhood', 'total_precipitation_mm']].copy()
    # SIATA data is not time-series in this file, it's station-based monthly accumulation.
    # For unification with daily data, this will need special handling or be used as a separate dataset.
    # For now, we will keep it as is, noting its different granularity.
    siata_precipitacion_data_cleaned['source'] = 'SIATA_Monthly_Precipitation'
    print("SIATA Precipitation DataFrame processed.")
    print(siata_precipitacion_data_cleaned.head())
    print(siata_precipitacion_data_cleaned.info())

NameError: name 'meteoblue_df' is not defined

In [None]:
import pandas as pd

# Define target common columns for daily weather data unification
target_daily_columns = [
    'date',
    'avg_temperature_c',
    'max_temperature_c',
    'min_temperature_c',
    'total_precipitation_mm',
    'avg_wind_speed_ms',
    'source'
]

unified_weather_data = []

# --- Prepare Meteoblue DataFrame ---
if meteoblue_df_cleaned is not None:
    print("Preparing Meteoblue for unification...")
    # Use 'current_temperature_c' as 'avg_temperature_c' for daily view
    meteoblue_df_temp = meteoblue_df_cleaned.copy()
    meteoblue_df_temp['avg_temperature_c'] = meteoblue_df_temp['current_temperature_c']
    # Ensure all target columns exist, fill missing with NaN
    for col in target_daily_columns:
        if col not in meteoblue_df_temp.columns:
            meteoblue_df_temp[col] = pd.NA
    unified_weather_data.append(meteoblue_df_temp[target_daily_columns])

# --- Prepare Meteosource Current DataFrame ---
if meteosource_current_df_cleaned is not None:
    print("Preparing Meteosource Current for unification...")
    meteosource_current_temp = meteosource_current_df_cleaned.copy()
    # For current data, max/min temp might not be explicitly available for the full day.
    # Fill them with avg_temperature_c for consistency in daily view, or NaN if preferred.
    meteosource_current_temp['max_temperature_c'] = meteosource_current_temp['avg_temperature_c']
    meteosource_current_temp['min_temperature_c'] = meteosource_current_temp['avg_temperature_c']
    for col in target_daily_columns:
        if col not in meteosource_current_temp.columns:
            meteosource_current_temp[col] = pd.NA
    unified_weather_data.append(meteosource_current_temp[target_daily_columns])

# --- Prepare Meteosource Hourly Aggregated DataFrame ---
if meteosource_hourly_df_cleaned is not None:
    print("Preparing Meteosource Hourly Aggregated for unification...")
    # Already has target columns after aggregation
    for col in target_daily_columns:
        if col not in meteosource_hourly_df_cleaned.columns:
            meteosource_hourly_df_cleaned[col] = pd.NA
    unified_weather_data.append(meteosource_hourly_df_cleaned[target_daily_columns])

# --- Prepare Meteosource Daily DataFrame ---
if meteosource_daily_df_cleaned is not None:
    print("Preparing Meteosource Daily for unification...")
    # Ensure all target columns exist, fill missing with NaN
    for col in target_daily_columns:
        if col not in meteosource_daily_df_cleaned.columns:
            meteosource_daily_df_cleaned[col] = pd.NA
    unified_weather_data.append(meteosource_daily_df_cleaned[target_daily_columns])

# --- Concatenate all prepared DataFrames ---
if unified_weather_data:
    print("Concatenating all daily weather dataframes...")
    unified_daily_weather_df = pd.concat(unified_weather_data, ignore_index=True)
    unified_daily_weather_df['date'] = pd.to_datetime(unified_daily_weather_df['date'])
    print("\nUnified Daily Weather DataFrame created.")
    print("First 10 rows of unified_daily_weather_df:")
    print(unified_daily_weather_df.head(10))
    print("\nDataFrame Info for unified_daily_weather_df:")
    unified_daily_weather_df.info()
else:
    print("No weather data available for unification.")


print("\n\nSIATA Precipitation Data (kept separate due to different granularity):")
if siata_precipitacion_data_cleaned is not None:
    print(siata_precipitacion_data_cleaned.head())
    print(siata_precipitacion_data_cleaned.info())
else:
    print("SIATA precipitation data is not available.")

NameError: name 'meteoblue_df_cleaned' is not defined

In [None]:
import pandas as pd

# Define target common columns for daily weather data unification
target_daily_columns = [
    'date',
    'avg_temperature_c',
    'max_temperature_c',
    'min_temperature_c',
    'total_precipitation_mm',
    'avg_wind_speed_ms',
    'source'
]

unified_weather_data = []

# --- Prepare Meteoblue DataFrame ---
if meteoblue_df_cleaned is not None:
    print("Preparing Meteoblue for unification...")
    # Use 'current_temperature_c' as 'avg_temperature_c' for daily view
    meteoblue_df_temp = meteoblue_df_cleaned.copy()
    meteoblue_df_temp['avg_temperature_c'] = meteoblue_df_temp['current_temperature_c']
    # Ensure all target columns exist, fill missing with NaN
    for col in target_daily_columns:
        if col not in meteoblue_df_temp.columns:
            meteoblue_df_temp[col] = pd.NA
    unified_weather_data.append(meteoblue_df_temp[target_daily_columns])

# --- Prepare Meteosource Current DataFrame ---
if meteosource_current_df_cleaned is not None:
    print("Preparing Meteosource Current for unification...")
    meteosource_current_temp = meteosource_current_df_cleaned.copy()
    # For current data, max/min temp might not be explicitly available for the full day.
    # Fill them with avg_temperature_c for consistency in daily view, or NaN if preferred.
    meteosource_current_temp['max_temperature_c'] = meteosource_current_temp['avg_temperature_c']
    meteosource_current_temp['min_temperature_c'] = meteosource_current_temp['avg_temperature_c']
    for col in target_daily_columns:
        if col not in meteosource_current_temp.columns:
            meteosource_current_temp[col] = pd.NA
    unified_weather_data.append(meteosource_current_temp[target_daily_columns])

# --- Prepare Meteosource Hourly Aggregated DataFrame ---
if meteosource_hourly_df_cleaned is not None:
    print("Preparing Meteosource Hourly Aggregated for unification...")
    # Already has target columns after aggregation
    for col in target_daily_columns:
        if col not in meteosource_hourly_df_cleaned.columns:
            meteosource_hourly_df_cleaned[col] = pd.NA
    unified_weather_data.append(meteosource_hourly_df_cleaned[target_daily_columns])

# --- Prepare Meteosource Daily DataFrame ---
if meteosource_daily_df_cleaned is not None:
    print("Preparing Meteosource Daily for unification...")
    # Ensure all target columns exist, fill missing with NaN
    for col in target_daily_columns:
        if col not in meteosource_daily_df_cleaned.columns:
            meteosource_daily_df_cleaned[col] = pd.NA
    unified_weather_data.append(meteosource_daily_df_cleaned[target_daily_columns])

# --- Concatenate all prepared DataFrames ---
if unified_weather_data:
    # Filter out any potentially empty DataFrames before concatenation
    non_empty_dfs = [df for df in unified_weather_data if not df.empty]
    if non_empty_dfs:
        print("Concatenating all daily weather dataframes...")
        unified_daily_weather_df = pd.concat(non_empty_dfs, ignore_index=True)
        unified_daily_weather_df['date'] = pd.to_datetime(unified_daily_weather_df['date'])
        print("\nUnified Daily Weather DataFrame created.")
        print("First 10 rows of unified_daily_weather_df:")
        print(unified_daily_weather_df.head(10))
        print("\nDataFrame Info for unified_daily_weather_df:")
        unified_daily_weather_df.info()
    else:
        print("All prepared DataFrames were empty. No data to unify.")
else:
    print("No weather data available for unification.")


print("\n\nSIATA Precipitation Data (kept separate due to different granularity):")
if siata_precipitacion_data_cleaned is not None:
    print(siata_precipitacion_data_cleaned.head())
    print(siata_precipitacion_data_cleaned.info())
else:
    print("SIATA precipitation data is not available.")

NameError: name 'meteoblue_df_cleaned' is not defined

In [None]:
import pandas as pd
import numpy as np

# Define target common columns for daily weather data unification
target_daily_columns = [
    'date',
    'avg_temperature_c',
    'max_temperature_c',
    'min_temperature_c',
    'total_precipitation_mm',
    'avg_wind_speed_ms',
    'source'
]

unified_weather_data = []

# --- Prepare Meteoblue DataFrame ---
if meteoblue_df_cleaned is not None:
    print("Preparing Meteoblue for unification...")
    meteoblue_df_temp = meteoblue_df_cleaned.copy()
    meteoblue_df_temp['avg_temperature_c'] = meteoblue_df_temp['current_temperature_c']

    # Ensure all target columns exist and explicitly set nullable float dtype for numeric columns
    for col in target_daily_columns:
        if col not in meteoblue_df_temp.columns:
            if col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
                meteoblue_df_temp[col] = pd.Series(dtype=pd.Float64Dtype())
            else:
                meteoblue_df_temp[col] = pd.NA
    # Cast existing numeric columns to nullable float to ensure consistency
    for col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
        if col in meteoblue_df_temp.columns: # Check if column exists after potential addition
            meteoblue_df_temp[col] = meteoblue_df_temp[col].astype(pd.Float64Dtype())

    unified_weather_data.append(meteoblue_df_temp[target_daily_columns])

# --- Prepare Meteosource Current DataFrame ---
if meteosource_current_df_cleaned is not None:
    print("Preparing Meteosource Current for unification...")
    meteosource_current_temp = meteosource_current_df_cleaned.copy()
    meteosource_current_temp['max_temperature_c'] = meteosource_current_temp['avg_temperature_c']
    meteosource_current_temp['min_temperature_c'] = meteosource_current_temp['avg_temperature_c']

    # Ensure all target columns exist and explicitly set nullable float dtype for numeric columns
    for col in target_daily_columns:
        if col not in meteosource_current_temp.columns:
            if col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
                meteosource_current_temp[col] = pd.Series(dtype=pd.Float64Dtype())
            else:
                meteosource_current_temp[col] = pd.NA
    # Cast existing numeric columns to nullable float
    for col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
        if col in meteosource_current_temp.columns: # Check if column exists after potential addition
            meteosource_current_temp[col] = meteosource_current_temp[col].astype(pd.Float64Dtype())

    unified_weather_data.append(meteosource_current_temp[target_daily_columns])

# --- Prepare Meteosource Hourly Aggregated DataFrame ---
if meteosource_hourly_df_cleaned is not None:
    print("Preparing Meteosource Hourly Aggregated for unification...")
    # Ensure all target columns exist and explicitly set nullable float dtype for numeric columns
    for col in target_daily_columns:
        if col not in meteosource_hourly_df_cleaned.columns:
            if col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
                meteosource_hourly_df_cleaned[col] = pd.Series(dtype=pd.Float64Dtype())
            else:
                meteosource_hourly_df_cleaned[col] = pd.NA
    # Cast existing numeric columns to nullable float
    for col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
        if col in meteosource_hourly_df_cleaned.columns: # Check if column exists after potential addition
            meteosource_hourly_df_cleaned[col] = meteosource_hourly_df_cleaned[col].astype(pd.Float64Dtype())

    unified_weather_data.append(meteosource_hourly_df_cleaned[target_daily_columns])

# --- Prepare Meteosource Daily DataFrame ---
if meteosource_daily_df_cleaned is not None:
    print("Preparing Meteosource Daily for unification...")
    # Ensure all target columns exist and explicitly set nullable float dtype for numeric columns
    for col in target_daily_columns:
        if col not in meteosource_daily_df_cleaned.columns:
            if col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
                meteosource_daily_df_cleaned[col] = pd.Series(dtype=pd.Float64Dtype())
            else:
                meteosource_daily_df_cleaned[col] = pd.NA
    # Cast existing numeric columns to nullable float
    for col in ['avg_temperature_c', 'max_temperature_c', 'min_temperature_c', 'total_precipitation_mm', 'avg_wind_speed_ms']:
        if col in meteosource_daily_df_cleaned.columns: # Check if column exists after potential addition
            meteosource_daily_df_cleaned[col] = meteosource_daily_df_cleaned[col].astype(pd.Float64Dtype())

    unified_weather_data.append(meteosource_daily_df_cleaned[target_daily_columns])

# --- Concatenate all prepared DataFrames ---
if unified_weather_data:
    # Filter out any potentially empty DataFrames before concatenation
    non_empty_dfs = [df for df in unified_weather_data if not df.empty]
    if non_empty_dfs:
        print("Concatenating all daily weather dataframes...")
        unified_daily_weather_df = pd.concat(non_empty_dfs, ignore_index=True)
        unified_daily_weather_df['date'] = pd.to_datetime(unified_daily_weather_df['date'])
        print("\nUnified Daily Weather DataFrame created.")
        print("First 10 rows of unified_daily_weather_df:")
        print(unified_daily_weather_df.head(10))
        print("\nDataFrame Info for unified_daily_weather_df:")
        unified_daily_weather_df.info()
    else:
        print("All prepared DataFrames were empty. No data to unify.")
else:
    print("No weather data available for unification.")


print("\n\nSIATA Precipitation Data (kept separate due to different granularity):")
if siata_precipitacion_data_cleaned is not None:
    print(siata_precipitacion_data_cleaned.head())
    print(siata_precipitacion_data_cleaned.info())
else:
    print("SIATA precipitation data is not available.")

In [None]:
import pandas as pd
import os

# Define filenames
UNIFIED_DAILY_WEATHER_FILENAME = "unified_daily_weather_data.csv"
SIATA_PRECIPITATION_FILENAME = "siata_precipitation_data.csv"

# 1. Save unified_daily_weather_df
if 'unified_daily_weather_df' in locals() and unified_daily_weather_df is not None and not unified_daily_weather_df.empty:
    unified_daily_weather_df.to_csv(UNIFIED_DAILY_WEATHER_FILENAME, index=False)
    print(f"Unified daily weather data saved to '{UNIFIED_DAILY_WEATHER_FILENAME}' in the current directory: {os.getcwd()}")
else:
    print("Unified daily weather DataFrame is not available or is empty, skipping save.")

# 2. Save siata_precipitacion_data_cleaned
if 'siata_precipitacion_data_cleaned' in locals() and siata_precipitacion_data_cleaned is not None and not siata_precipitacion_data_cleaned.empty:
    siata_precipitacion_data_cleaned.to_csv(SIATA_PRECIPITATION_FILENAME, index=False)
    print(f"SIATA precipitation data saved to '{SIATA_PRECIPITATION_FILENAME}' in the current directory: {os.getcwd()}")
else:
    print("SIATA precipitation data DataFrame is not available or is empty, skipping save.")

## Obtener Datos de Meteosource

### Subtask:
Desarrollar código Python para obtener datos meteorológicos de Meteosource API para Medellín y su Área Metropolitana. (Necesitaremos la API key de Meteosource para esta implementación).

In [None]:
API_KEY = "Z2AnKNoxLJul08UQ"
print("Meteoblue API key defined.")

**Reasoning**: A continuación, especificaré las coordenadas de Medellín según las instrucciones, las cuales son cruciales para realizar solicitudes a la API de Meteoblue.

In [None]:
LATITUDE = 6.2442
LONGITUDE = -75.5812

print(f"Coordinates for Medellín defined: Latitude={LATITUDE}, Longitude={LONGITUDE}")

**Reasoning**: Una vez definidos la clave API y las coordenadas, debo construir la URL para la llamada a la API de Meteoblue. Esto implica especificar la URL base, los parámetros de la clave API, las coordenadas y las variables meteorológicas específicas que se recuperarán, tal como se indica en las instrucciones.

In [None]:
BASE_URL = "http://my.meteoblue.com/packages/basic-day"

# Define the parameters for the API request
# We'll request temperature, precipitation, wind speed, and relative humidity for now
params = {
    "apikey": API_KEY,
    "lat": LATITUDE,
    "lon": LONGITUDE,
    "asl": 1500, # Approximate altitude for Medellín (meters above sea level)
    "tz": "America/Bogota", # Timezone for Medellín
    "format": "json",
    "temperature": "2m_temp,max_temp,min_temp",
    "precipitation": "total_precip",
    "wind": "wind_speed_10m",
    "relativehumidity": "relative_humidity_2m"
}

print(f"Meteoblue API base URL: {BASE_URL}")
print("API parameters defined.")

**Reasoning**: [texto del enlace](https://)Con la URL base y los parámetros definidos, realizaré una solicitud HTTP GET a la API de Meteoblue para obtener los datos meteorológicos. Esto se corresponde con la instrucción 4: «Realiza una solicitud HTTP GET a la URL de la API de Meteoblue para obtener los datos».

In [None]:
# Make the API request again with corrected parameters
import requests

response = requests.get(BASE_URL, params=params)

# Check if the request was successful
if response.status_code == 200:
    print("Successfully fetched data from Meteoblue API with corrected parameters.")
    meteoblue_raw_data = response.json()
    print(f"Data keys received: {meteoblue_raw_data.keys()}")
else:
    print(f"Error fetching data with corrected parameters: {response.status_code} - {response.text}")
    meteoblue_raw_data = None

**Reasoning**: El paso anterior recuperó y analizó correctamente la respuesta JSON en `meteoblue_raw_data`. Ahora, según la instrucción 6, confirmaré que los datos sin procesar se almacenan como un diccionario y mostraré sus claves y una muestra de los datos para facilitar su posterior manipulación y limpieza.

In [None]:
import json

if meteoblue_raw_data:
    print("Meteoblue raw data successfully obtained and parsed as a dictionary.")
    print(f"Top-level keys in meteoblue_raw_data: {meteoblue_raw_data.keys()}")

    # Display a sample of the 'data_day' key, which typically contains the weather data
    if 'data_day' in meteoblue_raw_data:
        data_day_content = meteoblue_raw_data['data_day']
        if isinstance(data_day_content, list):
            print("\nSample of 'data_day' (first 2 entries):")
            # Safely slice the list; list[:2] works even for lists with fewer than 2 elements
            print(json.dumps(data_day_content[:2], indent=2))
        else:
            print(f"\n'data_day' content is not a list. Type: {type(data_day_content)}. Printing full content for inspection:")
            # Attempt to dump the content even if not a list, for debugging
            print(json.dumps(data_day_content, indent=2))

    if 'metadata' in meteoblue_raw_data:
        print("\nMetadata information:")
        print(json.dumps(meteoblue_raw_data['metadata'], indent=2))
else:
    print("No raw data available from Meteoblue API.")

**Reasoning**: La clave `data_day` contiene un diccionario donde cada clave representa una variable meteorológica y su valor es una lista de observaciones. Para facilitar su manipulación y limpieza, convertiré este diccionario en un DataFrame de Pandas.

In [None]:
import pandas as pd

if meteoblue_raw_data and 'data_day' in meteoblue_raw_data:
    # Convert the 'data_day' dictionary into a Pandas DataFrame
    meteoblue_df = pd.DataFrame(meteoblue_raw_data['data_day'])

    # Convert the 'time' column to datetime objects
    meteoblue_df['time'] = pd.to_datetime(meteoblue_df['time'])

    print("Meteoblue data successfully loaded into a Pandas DataFrame.")
    print("First 5 rows of the DataFrame:")
    print(meteoblue_df.head())
    print("\nDataFrame Info:")
    meteoblue_df.info()
else:
    print("No 'data_day' found in meteoblue_raw_data or data is empty.")