# Creation of PTB Municipal Rate Dataset

Source: Federal Government of Brazil's SINASC

Author: `Márcio Lopes Jr` 

*Master's student of `Computer Engineering, Intelligent Information Processing` at UFRN-Natal*.

## Libraries

In [None]:
import pandas as pd
from glob import glob
pd.set_option('display.max_columns', None)

## Read and Join State Files

SINASC files are separated by state.

In [None]:
df = pd.DataFrame()

# Iterate over each state file
for file in glob(f"data/2018*.csv"):
    print(file, end='\r')
    temp = pd.read_csv(file, encoding='latin', sep=';', low_memory=False) 
    df = df.append(temp, sort=False)

print(df.shape)

# Remove lines without gestation length info
df = df[df.SEMAGESTAC.notnull()]
df.head()

(2944932, 62)
2944932


Unnamed: 0.1,Unnamed: 0,ORIGEM,CODESTAB,CODMUNNASC,LOCNASC,IDADEMAE,ESTCIVMAE,ESCMAE,CODOCUPMAE,QTDFILVIVO,QTDFILMORT,CODMUNRES,GESTACAO,GRAVIDEZ,PARTO,CONSULTAS,DTNASC,HORANASC,SEXO,APGAR1,APGAR5,RACACOR,PESO,IDANOMAL,DTCADASTRO,CODANOMAL,NUMEROLOTE,VERSAOSIST,DTRECEBIM,DIFDATA,DTRECORIGA,NATURALMAE,CODMUNNATU,CODUFNATU,ESCMAE2010,SERIESCMAE,DTNASCMAE,RACACORMAE,QTDGESTANT,QTDPARTNOR,QTDPARTCES,IDADEPAI,DTULTMENST,SEMAGESTAC,TPMETESTIM,CONSPRENAT,MESPRENAT,TPAPRESENT,STTRABPART,STCESPARTO,TPNASCASSI,TPFUNCRESP,TPDOCRESP,DTDECLARAC,ESCMAEAGR1,STDNEPIDEM,STDNNOVA,CODPAISRES,TPROBSON,PARIDADE,KOTELCHUCK,CONTADOR
0,1,1,2516381.0,110004,1,28.0,2.0,4.0,999994.0,1.0,0.0,120050,5.0,1.0,2.0,4.0,5032018,842.0,2,9.0,10.0,4.0,3050.0,2.0,16032018,,20180004.0,3.2.01,12042018.0,38,,811.0,110004.0,11,3.0,3.0,20041989.0,4.0,1.0,0.0,1.0,35.0,1062017.0,39.0,8.0,8.0,2.0,1.0,2.0,1.0,1.0,2.0,3.0,5032018.0,6.0,0.0,1,1.0,5,1,5,1623
1,2,1,3152928.0,110012,1,39.0,2.0,5.0,223232.0,1.0,0.0,120040,5.0,1.0,1.0,4.0,11012018,1657.0,1,9.0,10.0,2.0,3440.0,2.0,19022018,,20180005.0,3.2.01,5032018.0,53,,812.0,120040.0,12,5.0,,11011979.0,2.0,1.0,1.0,0.0,32.0,20042017.0,37.0,8.0,9.0,2.0,1.0,2.0,3.0,1.0,2.0,4.0,,8.0,0.0,1,1.0,3,1,5,3807
2,3,1,5618347.0,110020,1,33.0,2.0,4.0,521110.0,1.0,1.0,120001,5.0,1.0,2.0,4.0,8022018,1415.0,1,9.0,10.0,4.0,2920.0,2.0,16032018,,20180004.0,3.2.01,10042018.0,61,,811.0,110015.0,11,3.0,,26071984.0,4.0,2.0,0.0,1.0,,10052017.0,38.0,8.0,8.0,3.0,1.0,2.0,2.0,1.0,2.0,4.0,9022018.0,12.0,0.0,1,1.0,5,1,5,7192
3,4,1,5618347.0,110020,1,35.0,2.0,2.0,848305.0,2.0,0.0,120001,5.0,1.0,2.0,3.0,9022018,1850.0,1,9.0,9.0,4.0,3020.0,2.0,16032018,,20180004.0,3.2.01,10042018.0,60,,811.0,110008.0,11,1.0,,9101982.0,4.0,2.0,1.0,1.0,,5052017.0,39.0,8.0,6.0,4.0,1.0,2.0,2.0,1.0,2.0,4.0,9022018.0,10.0,0.0,1,1.0,5,1,2,7194
4,5,1,5618347.0,110020,1,21.0,2.0,3.0,,2.0,0.0,120001,5.0,1.0,1.0,3.0,2032018,336.0,2,8.0,9.0,1.0,3785.0,2.0,3042018,,20180004.0,3.2.01,10042018.0,39,,812.0,120001.0,12,2.0,,30051996.0,1.0,2.0,2.0,0.0,,5062017.0,38.0,8.0,6.0,4.0,2.0,2.0,3.0,1.0,2.0,3.0,3032018.0,11.0,0.0,1,1.0,7,1,2,7328


## IBGE Data

Since SINASC data and IBGE data use different municipal codes, an intermediary dataset `code_translator` was used to join both datasets.

In [None]:
pop = pd.read_csv("data/pop_ibge.csv", sep='\t', encoding='utf8', thousands=',', dtype={'COD. MUNIC':'str'})
code_translator = pd.read_csv("data/cid_ibge_cod.tsv", sep='\t', encoding='utf8')

pop['MUNCODDV'] = (pop['COD. UF'].astype('str') + pop['COD. MUNIC'].astype('str')).astype('int64')
pop = pop.merge(code_translator[['MUNCOD', 'MUNCODDV', 'MUNNOMEX']], on='MUNCODDV').iloc[:,4:8]
pop.columns = ['population', 'cd_ibge', 'CODMUNRES', 'name']

## Calculate PTB Rate

Define and calculate preterm birth using ranges:

| Type | Period |
|---|---|
| *Preterm*  | < 37 weeks |
| *Early Preterm* | < 33 weeks |
| *Extremely Preterm* | < 29 weeks |

In [None]:
df['extremely_preterm'] = df.SEMAGESTAC.apply(lambda g : g < 29).astype('int64')
df['early_preterm']     = df.SEMAGESTAC.apply(lambda g : g < 33).astype('int64')
df['preterm']           = df.SEMAGESTAC.apply(lambda g : g < 37).astype('int64')

df = df.groupby(['CODMUNRES'], as_index=False)[['preterm', 'early_preterm', 'extremely_preterm']].sum()

Join `df` e `pop` and calculate the PTB Municipal Rate (PMR)

In [None]:
df = df.merge(pop, on='CODMUNRES')

df['ptb_rate'] = df.preterm / df.population
df['early_ptb_rate'] = df.early_preterm / df.population
df['extr_ptb_rate'] = df.extremely_preterm / df.population

Save data

In [None]:
df[['cd_ibge', 'ptb_rate', 'early_ptb_rate', 'extr_ptb_rate']].to_csv("ptb_by_municipality.csv", index=False)