# Notebook para leitura de arquivos CTL

## RegEx

RegEx é um método para identificar strings que estejam incorporadas em textos de caracteres (THOMPSON, 1968). 
No escopo do nosso projeto, ela foi utilizada para auxiliar na busca de informações relevantes dentro de um arquivo CTL.

## CTL

O arquivo CTL pode ser tratado basicamente como um arquivo de texto, no qual são descritos os metadados dos arquivos GRIB.

In [1]:
import re

# Basepath for the file
CTL_PATH = r'GPOSNMC20170906122017090612P.inz.TQ0666L064.ctl'
# CTL_PATH = r'GPOSNMC20170906122017090618P.fct.TQ0666L064.ctl'

# Open the file
ctl = open(CTL_PATH)
content = ctl.read()
print(content)

dset ^GPOSNMC20170906122017090612P.inz.TQ0666L064.grb
*
index ^GPOSNMC20170906122017090612P.inz.TQ0666L064.idx
*
undef -2.56E+33
*
title PRESSURE HISTORY    PTEC AGCM REVIS 1.0 2000  T066664   COLD
*
dtype grib   255
*
options yrev
*
xdef  2000 linear    0.000   0.1800000000
ydef  1000 linear  -89.910   0.1800000000
tdef     1 linear 12Z06SEP2017 6hr
*
zdef    33 levels  1020 1000  975  950  925  900  875  850  825  800
                  775  750  725  700  675  650  600  550  500  450
                  400  350  300  250  200  150  100   70   50   30
                   20   10    3
vars    30
topo  0 132,1,0 ** surface TOPOGRAPHY [m]
lsmk  0  81,1,0 ** surface LAND SEA MASK [0,1]
PSLC    0  135,    1,    0  ** sfc    SURFACE PRESSURE                        (HPA             )
UVES    0  192,    1,    0  ** sfc    SURFACE ZONAL WIND (U)                  (M/S             )
UVEL   33   33,  100,    0  **        ZONAL WIND (U)                          (M/S             )
VVES    0  194,    

## Obtendo Informações 

A função abaixo tem como objetivo obter os valores x, y e t do ctl.

1. x = pontos de longitude
2. y = pontos de latitude
3. t = quantos valores sobre o tempo/horário

In [12]:
def get_xyt(ctl_file):
    ctl_file.seek(0)
    content = ctl_file.read()
    
    # Pattern to get the xyt vars
    pattern = re.compile(r'\d+(?=\slinear)')
    matches = pattern.finditer(content)
    
    # List to store the values
    xyt = []
    
    for m in matches:
        xyt.append(int(m.group()))
    
    return (xyt)

x, y, t = get_xyt(ctl)
print(x, y, t)

(2000, 1000, 1)


Retornar a quantidade de níveis de pressão em que as variáveis meteorológicas se encontram.

In [13]:
def get_total_levels(ctl_file):
    ctl_file.seek(0)
    ctl_content = ctl_file.read()
    # (?<=zdef\s{4})\d+ --> wrong
    
    # RegEx pattern
    pattern = re.compile(r"(?<=zdef\s{4})\d+")
    matches = pattern.search(ctl_content)

    return matches.group()

tot_levels = get_total_levels(ctl)
tot_levels

'33'

Obter a quantidade total de variáveis meteorológicas contidas no arquivo

In [14]:
# Get the total number of meteorological variables in the file
def get_total_vars(ctl_file):
    ctl_file.seek(0)
    ctl_content = ctl_file.read()
    
    # RegEx pattern to get the vars.
    pattern = re.compile(r"(?<=vars(\s{4}))\d+")
    match = pattern.search(ctl_content)
    return match.group()

tot = get_total_vars(ctl)
tot

'30'

Obter o nome do arquivo GRIB com o qual o Ctl foi gerado. 

In [15]:
# This function is to get the correspondent grib file that the ctl is related to.
def get_file_name(ctl_file):
    ctl_file.seek(0)
    ctl_content = ctl_file.read()
    pattern = re.compile(r'(?<=dset\s\^)[A-Za-z0-9]+\.?(fct|icn|inz)?\.?[a-zA-Z0-9]*\.(grb|grib2)')
    match = pattern.search(ctl_content)
    return match.group()

dset_grib = get_file_name(ctl)
dset_grib

'GPOSNMC20170906122017090612P.inz.TQ0666L064.grb'

Retorna informações sobre o início de cada valor de latitude e longitude, bem como a variação em graus entre eles.

In [17]:
# Function to get the starting point in lats and lons.
# It also returns the spacing between each point.
def get_space_latlons(ctl_file):
    ctl_file.seek(0)
    ctl_content = ctl_file.read()
    
    # Pattern to get the starting and spacing
    pattern = re.compile(r'[-+]?[0-9]+\.\d{3,10}')
    matches = pattern.finditer(ctl_content)

    keys = ['lon_start', 'lon_dist', 'lat_start', 'lat_dist']
    values = []
    info = {}

    values = [m.group() for m in matches]
    
    # Make a dict with the values
    for i, j in enumerate(keys):
        info[j] = values[i]

    return info

info_about_coords = get_space_latlons(ctl)
info_about_coords

{'lat_dist': '0.1800000000',
 'lat_start': '-89.910',
 'lon_dist': '0.1800000000',
 'lon_start': '0.000'}

Retorna as unidades de cada variável meteorológica. Está funcionando mas não está 100% perfeita. As expressões regulares estão retornando espaços vazios junto com os matches.

In [22]:
def get_vars_units(ctl_file):
    ctl_file.seek(0)
    ctl_content = ctl_file.read()
    pattern = re.compile(r'(?<=(\(|\[))[A-Za-z*0-9,\%/-]*\s?[A-Za-z*0-9,\%/-]*(\s+|\])\W')
    matches = pattern.finditer(ctl_content)

    for m in matches:
        print(m.group()[:-2])

get_vars_units(ctl)

m
0,1
HPA            
M/S            
M/S            
M/S            
M/S            
PA/S           
1/S            
M2/S           
M2/S           
GPM            
HPA            
K              
K              
NO DIM         
NO DIM         
KG/KG          
KG/M2          
K              
K              
0-1            
0-1            
0-1            
K              
KG/KG          
g/m**3         
KG/KG          
KG/KG          
%              


Para obter os nomes das variáveis foram utilizadas as colunas padrões do Ctl. Foi observado que o arquivo segue uma estrutura sempre padrão na sua formatação, sendo possível obter as informações necessárias apenas por indexando as colunas pois é fato que as informações estarão sempre nas mesmas posições.

Existem 2 variáveis que parecem ser padrões no início de todo arquivo. Para efeitos de teste, elas foram consideradas padrão.

In [37]:
def get_start_end_vars(ctl_file):
    ctl_file.seek(0)
    start, end = 0, 0 
    for no, line in enumerate(ctl_file):
        if line[:4] == 'vars':
            # Add 1 because it starts at 0
            start = no + 1
        if line[:7] == 'endvars':
            end = no + 1

    return start, end

def get_name_vars(ctl_file, start_line, end_line):
    ctl_file.seek(0)
    var_list = []
    for n, l in enumerate(ctl_file):
        if n > start_line + 1 and n < end_line:
            var = l[38:78]
            var_list.append(var)
    two_vars = get_two_vars(ctl_file)
    for t in two_vars:
        var_list.append(t)
#     print(two_vars)
    return var_list

# Get the first 2 vars in the ctl file
# They are pattern variables and never change
# Consider them
def get_two_vars(ctl_file):
    ctl_file.seek(0)
    ctl_content = ctl_file.read()
    pattern = re.compile(r'surface\s[A-Z]*\s?[A-Z]*\s?[A-Z]*')
    matches = pattern.finditer(ctl_content)
    var = []
    for m in matches:
        var.append(m.group().strip())
    return var

start, end = get_start_end_vars(ctl)
grib_vars = get_name_vars(ctl, start, end)
for v in grib_vars:
    print(v)

SURFACE PRESSURE                        
SURFACE ZONAL WIND (U)                  
ZONAL WIND (U)                          
SURFACE MERIDIONAL WIND (V)             
MERIDIONAL WIND (V)                     
OMEGA                                   
VORTICITY                               
STREAM FUNCTION                         
VELOCITY POTENTIAL                      
GEOPOTENTIAL HEIGHT                     
SEA LEVEL PRESSURE                      
SURFACE ABSOLUTE TEMPERATURE            
ABSOLUTE TEMPERATURE                    
SURFACE RELATIVE HUMIDITY               
RELATIVE HUMIDITY                       
SPECIFIC HUMIDITY                       
INST. PRECIPITABLE WATER                
SURFACE TEMPERATURE                     
DEEP SOIL TEMPERATURE                   
SOIL WETNESS OF SURFACE                 
SOIL WETNESS OF ROOT ZONE               
SOIL WETNESS OF DRAINAGE ZONE           
TEMPERATURE AT 2-M FROM SURFACE         
SPECIFIC HUMIDITY AT 2-M FROM SURFACE   
PARTIAL OXYGEN D

## Arquivo GRIB

Os teste foram realizados apenas com os arquivos no formato GRIB1.

In [1]:
import pygrib
GRIB_PATH = r'GPOSNMC20170906122017090612P.inz.TQ0666L064.grb'
grib = pygrib.open(GRIB_PATH)

# Saida das mensagens GRIB
for msg in grib:
    print(msg)

1:Topography:m (instant):regular_ll:surface:level 0:fcst time 0 6 hr periods:from 201709061200
2:Land sea mask:(0 - 1) (instant):regular_ll:surface:level 0:fcst time 0 6 hr periods:from 201709061200
3:Surface pressure:hPa (instant):regular_ll:surface:level 0:fcst time 0 6 hr periods:from 201709061200
4:Surface zonal wind (u):m s**-1 (instant):regular_ll:surface:level 0:fcst time 0 6 hr periods:from 201709061200
5:Zonal wind (u):m s**-1 (instant):regular_ll:isobaricInhPa:level 1020:fcst time 0 6 hr periods:from 201709061200
6:Zonal wind (u):m s**-1 (instant):regular_ll:isobaricInhPa:level 1000:fcst time 0 6 hr periods:from 201709061200
7:Zonal wind (u):m s**-1 (instant):regular_ll:isobaricInhPa:level 975:fcst time 0 6 hr periods:from 201709061200
8:Zonal wind (u):m s**-1 (instant):regular_ll:isobaricInhPa:level 950:fcst time 0 6 hr periods:from 201709061200
9:Zonal wind (u):m s**-1 (instant):regular_ll:isobaricInhPa:level 925:fcst time 0 6 hr periods:from 201709061200
10:Zonal wind (u):

74:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 975:fcst time 0 6 hr periods:from 201709061200
75:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 950:fcst time 0 6 hr periods:from 201709061200
76:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 925:fcst time 0 6 hr periods:from 201709061200
77:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 900:fcst time 0 6 hr periods:from 201709061200
78:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 875:fcst time 0 6 hr periods:from 201709061200
79:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 850:fcst time 0 6 hr periods:from 201709061200
80:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 825:fcst time 0 6 hr periods:from 201709061200
81:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 800:fcst time 0 6 hr periods:from 201709061200
82:Omega:Pa s**-1 (instant):regular_ll:isobaricInhPa:level 775:fcst time 0 6 hr periods:from 201709061200
83:Omega:Pa s**-1 (instant):regular_ll:isobari

153:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 650:fcst time 0 6 hr periods:from 201709061200
154:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 600:fcst time 0 6 hr periods:from 201709061200
155:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 550:fcst time 0 6 hr periods:from 201709061200
156:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 500:fcst time 0 6 hr periods:from 201709061200
157:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 450:fcst time 0 6 hr periods:from 201709061200
158:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 400:fcst time 0 6 hr periods:from 201709061200
159:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 350:fcst time 0 6 hr periods:from 201709061200
160:Stream function:m**2 s**-1 (instant):regular_ll:isobaricInhPa:level 300:fcst time 0 6 hr periods:from 201709061200
161:Stream function:m**2 s**-1 (instant):regular

243:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 925:fcst time 0 6 hr periods:from 201709061200
244:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 900:fcst time 0 6 hr periods:from 201709061200
245:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 875:fcst time 0 6 hr periods:from 201709061200
246:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 850:fcst time 0 6 hr periods:from 201709061200
247:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 825:fcst time 0 6 hr periods:from 201709061200
248:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 800:fcst time 0 6 hr periods:from 201709061200
249:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 775:fcst time 0 6 hr periods:from 201709061200
250:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 750:fcst time 0 6 hr periods:from 201709061200
251:Absolute temperature:K (instant):regular_ll:isobaricInhPa:level 725:fcst tim

315:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 800:fcst time 0 6 hr periods:from 201709061200
316:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 775:fcst time 0 6 hr periods:from 201709061200
317:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 750:fcst time 0 6 hr periods:from 201709061200
318:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 725:fcst time 0 6 hr periods:from 201709061200
319:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 700:fcst time 0 6 hr periods:from 201709061200
320:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 675:fcst time 0 6 hr periods:from 201709061200
321:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 650:fcst time 0 6 hr periods:from 201709061200
322:Specific humidity:kg kg**-1 (instant):regular_ll:isobaricInhPa:level 600:fcst time 0 6 hr periods:from 201709061200
323:Specific humidity:kg kg**-1 (instant

420:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 875:fcst time 0 6 hr periods:from 201709061200
421:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 850:fcst time 0 6 hr periods:from 201709061200
422:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 825:fcst time 0 6 hr periods:from 201709061200
423:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 800:fcst time 0 6 hr periods:from 201709061200
424:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 775:fcst time 0 6 hr periods:from 201709061200
425:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 750:fcst time 0 6 hr periods:from 201709061200
426:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 725:fcst time 0 6 hr periods:from 201709061200
427:Bare soil latent heat:Ws m**-2 (instant):regular_ll:isobaricInhPa:level 700:fcst time 0 6 hr periods:from 201709061200
428:Bare soil la

In [3]:
msg = grib[10]
for key in sorted(msg.keys()):
    print(key)

GRIBEX_boustrophedonic
GRIBEditionNumber
Ni
Nj
P1
P2
PLPresent
PVPresent
UseEcmfConventions
WMO
additionalFlagPresent
alternativeRowScanning
analDate
angularPrecision
average
binaryScaleFactor
bitMapIndicator
bitmapPresent
bitsPerValue
bitsPerValueAndRepack
boustrophedonic
centre
centreDescription
centuryOfReferenceTimeOfData
cfName
cfNameECMF
cfVarName
cfVarNameECMF
changeDecimalPrecision
complexPacking
constantFieldHalfByte
dataDate
dataFlag
dataLength
dataRepresentationType
dataTime
day
decimalPrecision
decimalScaleFactor
deleteLocalDefinition
deletePV
distinctLatitudes
distinctLongitudes
earthIsOblate
editionNumber
endStep
eps
generatingProcessIdentifier
getNumberOfValues
globalDomain
gridDefinition
gridDefinitionDescription
gridDefinitionTemplateNumber
gridDescriptionSectionPresent
gridType
halfByte
hideThis
hour
hundred
iDirectionIncrement
iDirectionIncrementInDegrees
iScansNegatively
iScansPositively
ifsParam
ijDirectionIncrementGiven
indicatorOfParameter
indicatorOfTypeOfLevel


### Referências

THOMPSON, Ken. Programming techniques: Regular expression search algorithm. Communications of the ACM, v. 11, n. 6, p. 419-422, 1968.