# Personal project for storing useful EDA techniques

## Dataset Download

All the data is available at the Brazilian Transparency Portal for public download and is uploaded by ANAC

### Dataset Lists

root: https://sistemas.anac.gov.br/dadosabertos/

- Registro Aeronáutico Brasileiro (RAB)

    Metadata: https://www.gov.br/anac/pt-br/acesso-a-informacao/dados-abertos/areas-de-atuacao/aeronaves-1/registro-aeronautico-brasileiro-historico/metadados-registro-aeronautico-brasileiro-historico
    
    Dataset: https://sistemas.anac.gov.br/dadosabertos/Aeronaves/RAB/Historico_RAB/

- Ocorrências Aeronáuticas
    
    Metadata: https://www.gov.br/anac/pt-br/acesso-a-informacao/dados-abertos/areas-de-atuacao/seguranca-operacional/ocorrencias-aeronauticas/metadados-do-conjunto-de-dados-ocorrencias-aeronauticas

    Dataset: https://sistemas.anac.gov.br/dadosabertos/Seguranca%20Operacional/Ocorrencia/V_OCORRENCIA_AMPLA.csv

- Percentuais de atrasos e cancelamentos

    Metadata: https://www.gov.br/anac/pt-br/acesso-a-informacao/dados-abertos/areas-de-atuacao/voos-e-operacoes-aereas/percentuais-de-atrasos-e-cancelamentos/50-percentuais-de-atrasos-e-cancelamentos

    Dataset: https://sistemas.anac.gov.br/dadosabertos/Voos%20e%20opera%C3%A7%C3%B5es%20a%C3%A9reas/Percentuais%20de%20atrasos%20e%20cancelamentos/

- Processos Administrativos Relacionados a Aeronaves

    Metadata: https://www.gov.br/anac/pt-br/acesso-a-informacao/dados-abertos/areas-de-atuacao/aeronaves-1/processos-administrativos-relacionados-a-aeronaves/metadados-aeronaves-processos-administrativos-relacionados-a-aeronaves

    Dataset: https://sistemas.anac.gov.br/dadosabertos/Aeronaves/ProcessosAdministrativosRelacionadosaAeronaves/

- Voo Regular Ativo (VRA)

    Metadata: https://www.gov.br/anac/pt-br/acesso-a-informacao/dados-abertos/areas-de-atuacao/voos-e-operacoes-aereas/voo-regular-ativo-vra/62-voo-regular-ativo-vra

    Dataset: https://siros.anac.gov.br/siros/registros/diversos/vra/

- Recomendações de Segurança Aeronáutica

    Metadata: https://www.gov.br/anac/pt-br/acesso-a-informacao/dados-abertos/areas-de-atuacao/seguranca-operacional/recomendacoes-de-seguranca-aeronautica/metadados-seguranca-operacional-recomendacoes-de-seguranca-aeronautica

    Dataset: https://sistemas.anac.gov.br/dadosabertos/Seguranca%20Operacional/Recomenda%C3%A7%C3%A3o%20de%20Seguran%C3%A7a/

Boeing 737 Max

2010s

October 29, 2018: Lion Air Flight 610, a 737 MAX 8 (registration PK-LQP), on a flight from Jakarta, Indonesia to Pangkal Pinang, Indonesia, crashed into the sea 13 minutes after takeoff, with 189 people on board the aircraft: 181 passengers (178 adults and three children), as well as six cabin crew and two pilots. All on board died. This is the deadliest air accident involving all variants of the Boeing 737 and also the first accident involving the Boeing 737 MAX.[168][169][170]


March 10, 2019: Ethiopian Airlines Flight 302, a 737 MAX 8 (registration ET-AVJ), on a flight from Addis Ababa Bole International Airport, Ethiopia to Jomo Kenyatta International Airport in Nairobi, Kenya, crashed six minutes after takeoff; all 157 people aboard (149 passengers and 8 crew members) died. The plane was only four months old at the time of the accident.[171] In response, numerous aviation authorities around the world grounded the 737 MAX series, and many airlines followed suit on a voluntary basis. On March 13, 2019, the FAA became the last authority to ground the aircraft, reversing its previous stance that the MAX was safe to fly.[172]


2020s

December 4, 2023: Ryanair flight FR1269, a 737 MAX 200 landing at Stansted Airport dropped more than 2000 ft in only 17 seconds. The airline is reportedly coopearing with AAIB. This serious incident is stil under investigation.[173]


January 5, 2024: Alaska Airlines Flight 1282, a 737 MAX 9 (registration N704AL), on a flight from Portland, Oregon to Ontario, California, experienced an explosive decompression shortly after take off, after the loss of incorrectly installed door plug. The aircraft returned to Portland and landed. Some on board sustained minor injuries, but there were no deaths. The type was subsequently grounded for nearly a month.[174]


May 25, 2024: Southwest Airlines Flight 746 experienced "an uncontrolled side to side yawing motion" called a Dutch roll at an elevation of 32,000 feet less than an hour after departing from Phoenix. The pilots were able to safely land the plane in Oakland. FAA investigations showed that a Passenger Controller Unit for backup power control responsible for controlling tail rudder movements was damaged.[175]


June 23, 2024 : Korean Air Flight 189, a 737 MAX 8 departing from Incheon International Airport (RKSI) bounded for Taiwan, experienced a fault with the cabin pressurization system roughly 30 minutes into the flight. At the time, the aircraft was flying over South Korea’s southern Jeju Island, per Yonhap. The pilots were forced to turn back, dropping 25,000 ft in five minutes, resulting in 15 passengers reporting injuries of hyperventilation and eardrum pain. A Korean Air spokesperson added that the aircraft was just under 5 years old and was delivered to Korean Air in July 2022.[176] Investigations into this incident are ongoing.

In [1]:
from bs4 import BeautifulSoup
import requests
import time
from multiprocessing import cpu_count
from multiprocessing.pool import ThreadPool
import os
import re
from pathlib import Path

In [41]:
def download_url(args): 
    t0 = time.time() 
    url, fn = args[0], args[1] 
    try: 
        r = requests.get(url)
        try:
            os.makedirs(fn.split(".")[0].split(fn.split(".")[0].split("\\")[-1])[0])
        except FileExistsError:
            # print("dir already exists")
            pass
        if Path(fn).exists():
            return(f"File {fn} already exists", "skipping download")
        else:
            with open(fn, 'wb') as f: 
                f.write(r.content) 
                return(url, time.time() - t0) 
    except Exception as e: 
        return(f'Exception in download_url({url}): {e}',fn)

In [42]:
def download_parallel(args): 
    cpus = cpu_count() 
    results = ThreadPool(cpus - 1).imap_unordered(download_url, args)
    if results:
        for result in results:
            if result[1] != "skipping download":
                print('url:', result[0], 'time (s):', result[1])

In [4]:
url_base = "https://sistemas.anac.gov.br/dadosabertos"

In [5]:
response  = requests.get(url_base)

In [6]:
local_fs_path = "E:\\projects_datasets\\ANAC\\"

In [7]:
try:
    os.makedirs(local_fs_path)
except FileExistsError:
    print("dir already exists")

dir already exists


In [8]:
# Generate zip with href and local fs
# root
hrefs, urls = [],[]
if response.status_code == 200:
    data = response.text
    soup = BeautifulSoup(data)
    for link in soup.find_all('a'):
        href = link.get('href')
        if href[-4:] in [".csv","json",'.txt']:
            hrefs.append(url_base+href)
        elif href[-1] == "/" and href[-2] != ".":
            urls.append(url_base+"/"+href)

In [9]:
# sub-folders
while len(urls) > 0:
    print(f"Tamanho de links: {len(urls)}")
    url = urls[0]
    urls.pop(0)
    print(f"Link atual: {url}")
    response  = requests.get(url)
    if response.status_code == 200:
        data = response.text
        soup = BeautifulSoup(data)
        for link in soup.find_all('a'):
            href = link.get('href')
            if href[-4:] in [".csv","json",'.txt']:
                hrefs.append(url+href)
            elif href[-1] == "/" and href[-2] != ".":
                urls.append(url+"/"+href)


Tamanho de links: 14
Link atual: https://sistemas.anac.gov.br/dadosabertos/Aerodromos/
Tamanho de links: 16
Link atual: https://sistemas.anac.gov.br/dadosabertos/Aeronaves/
Tamanho de links: 22
Link atual: https://sistemas.anac.gov.br/dadosabertos/Certifica%C3%A7%C3%A3o%20e%20Outorga/
Tamanho de links: 28
Link atual: https://sistemas.anac.gov.br/dadosabertos/Gestao%20Interna/
Tamanho de links: 38
Link atual: https://sistemas.anac.gov.br/dadosabertos/Operador%20Aeroportu%C3%A1rio/
Tamanho de links: 39
Link atual: https://sistemas.anac.gov.br/dadosabertos/Operador%20A%C3%A9reo/
Tamanho de links: 41
Link atual: https://sistemas.anac.gov.br/dadosabertos/Organiza%C3%A7%C3%B5es%20de%20Forma%C3%A7%C3%A3o/
Tamanho de links: 44
Link atual: https://sistemas.anac.gov.br/dadosabertos/Organiza%C3%A7%C3%B5es%20de%20Manuten%C3%A7%C3%A3o/
Tamanho de links: 44
Link atual: https://sistemas.anac.gov.br/dadosabertos/Pessoal%20da%20Avia%C3%A7%C3%A3o%20Civil/
Tamanho de links: 48
Link atual: https://sistema

In [10]:
len(hrefs)

7828

In [11]:
hfs,lfs = [],[]
with open('./hrefs.txt', 'w') as f:
    for item in hrefs:
        if item[-4:] == ".csv":
            f.write(f"{str(item)}\n")
            hfs.append(item)
            lfs.append(local_fs_path.replace("\\","\\")+re.sub("%[a-zA-Z0-9]{2}","_",item.replace(url_base+"/","")).replace("//","\\").replace("/","\\"))
with open('./fs.txt', 'w') as f:
    for item in lfs:
        if item[-4:] == ".csv":
            f.write(f"{str(item)}\n")

In [44]:
# import from files if already run
if Path('./fs.txt').exists():
    lfs = []
    with open('./fs.txt', 'r') as f:
        for item in f:
            lfs.append(item.replace("\n",""))
if Path('./hrefs.txt').exists():
    hfs = []
    with open('./hrefs.txt', 'r') as f:
        for item in f:
            hfs.append(item.replace("\n","").replace("//","/").replace(":/","://"))

In [45]:
inputs = zip(hfs, lfs)

In [46]:
len(lfs)

3780

In [None]:
download_parallel(inputs)

In [28]:
print('Finito')

Finito
