# Data Extraction and Cleaning - Brazilian Cities Weather Data

This notebook focuses on importing, processing, and cleaning data to be used in the next step. For this, it is necessary to understand how these data are made available by the Brazilian government.

---

Data are provided by the 'Instituto Nacional de Meteorologia', which mean literally National Institute of Meteorology. This data is open and can be found on their website where it is possible to get data since 2000 with hourly registration. The data is in excel sheets which can be downloaded in a zip file for each year.

Data can be retrieved at this link: [INMET](https://portal.inmet.gov.br/dadoshistoricos)

In [8]:
import os
import shutil
import urllib
import zipfile

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns

# Setting Seaborn Theme
sns.set_theme(style="whitegrid", font_scale=1.4)

plt.rcParams["figure.figsize"] = (12,7)

In [2]:
# Getting Initial Files
initial_files = os.listdir()

For this project, we will select the brazilian city 'Brasilia', Capital of Brazil.

Initially, it will be selected years between 2011 and 2020 for no special reason (arbitrary).

In [3]:
# Setting chosen data
city = 'brasilia'
start_year = 2011
end_year = 2020

# Creating a list of years for iteration
years = np.arange(start_year,end_year+1,1)

# List with strings(path) for the downloaded files
files_path = []

Instead of downloading each zip file manually, it is better to simply download using python.

In [4]:
# Getting Year by Year zip files
for year in years:

    # Using f-strings to select zip files.
    zip_url = f"https://portal.inmet.gov.br/uploads/dadoshistoricos/{year}.zip"

    # Getting original zip file name
    zip_file = zip_url.split('/')[-1]

    # Downloading File
    urllib.request.urlretrieve(zip_url, zip_file)

    # Reading Zip File and getting only the file of the selected city
    with zipfile.ZipFile(zip_file, "r") as f:
        for name in f.namelist():
            if city in name.lower():
              f.extract(name, path=None, pwd=None)
              files_path.append(name)

In [9]:
files_list = os.listdir()

for name in files_list:

    if name not in initial_files:

        if os.path.isdir(name):
            shutil.rmtree(name, ignore_errors=True)
        
        elif os.path.isfile(name):
            os.remove(name)