<a href="https://colab.research.google.com/github/lcsbiffi/projetoTera_evasaoescolar/blob/main/Projeto_Tera_Evas%C3%A3o_Escolar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Projeto Tera - Evasão Escolar

**Problema**

Prever a evasão escolar e suas principais variáveis para que gestores escolares atuem ativamente em políticas de combate.


**Hipóteses**

1) As principais variáveis de evasão estão relacionadas a renda familiar.

2) A variável de transporte está diretamente relacionada à evasão.

3) Maternidade e paternidade na adolescência está entre as principais variáveis de evasão.

4) A falta de professores está relacionada a evasão.

## Carregando bibliotecas

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Importando bancos de dados

In [2]:
ideb = pd.read_csv('.\BDs\ideb_escola.csv')
inep = pd.read_csv('.\BDs\inep.csv')

In [3]:
censo = pd.read_csv('.\BDs\censo_escolar.csv')

# Analisando o df do Ideb

**Legenda**

*ano* - Ano

*sigla-uf* - Sigla da Unidade da Federação

*id_município* - ID Município - IBGE 7 Dígitos

*id_escola* - ID Escola

*rede* - Rede Escolar

*ensino* - Tipo de Ensino (fundamental ou médio)

*anos_escolares* - Anos escolares (Iniciais (1-5), Finais (6-9), Todos (1-4) <- Ensino Médio)

*taxa_aprovação* - Taxa de Aprovação (0 a 100)

*indicador_rendimento* - Indicador de Rendimento (P)

*nota_saeb_matematica* - Nota SAEB - Matemática

*nota_saeb_lingua_portuguesa* - Nota SAEB - Português

*nota_saeb_media_padronizada* - Nota SAEB - Média Padronizada (N)

*ideb* - IDEB (Média Padronizada (N) X Indicador de Rendimento (P)

*projecao* - Projeção (sei lá)

In [4]:
ideb.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
0,2005,AC,1200013,12008966,municipal,fundamental,finais (6-9),,,,,,,
1,2005,AC,1200013,12008966,municipal,fundamental,iniciais (1-5),,,,,,,
2,2005,AC,1200013,12009156,municipal,fundamental,finais (6-9),,,,,,,
3,2005,AC,1200013,12009156,municipal,fundamental,iniciais (1-5),,,,,,,
4,2005,AC,1200013,12009164,estadual,fundamental,finais (6-9),,,,,,,


In [5]:
ideb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1027767 entries, 0 to 1027766
Data columns (total 14 columns):
 #   Column                       Non-Null Count    Dtype  
---  ------                       --------------    -----  
 0   ano                          1027767 non-null  int64  
 1   sigla_uf                     1027767 non-null  object 
 2   id_municipio                 1027767 non-null  int64  
 3   id_escola                    1027767 non-null  int64  
 4   rede                         1027767 non-null  object 
 5   ensino                       1027767 non-null  object 
 6   anos_escolares               1027767 non-null  object 
 7   taxa_aprovacao               619628 non-null   float64
 8   indicador_rendimento         619580 non-null   float64
 9   nota_saeb_matematica         488139 non-null   float64
 10  nota_saeb_lingua_portuguesa  488139 non-null   float64
 11  nota_saeb_media_padronizada  556480 non-null   float64
 12  ideb                         556306 non-nu

Temos valores NaN nas seguintes colunas:

- taxa_aprovacao
- indicador_rendimento
- nota_saeb_matematica
- nota_saeb_lingua_portuguesa
- nota_saeb_media_padronizada
- ideb 
- projecao

In [6]:
ideb.describe(include = 'all').round(1)

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
count,1027767.0,1027767,1027767.0,1027767.0,1027767,1027767,1027767,619628.0,619580.0,488139.0,488139.0,556480.0,556306.0,705363.0
unique,,27,,,4,2,3,,,,,,,
top,,SP,,,municipal,fundamental,iniciais (1-5),,,,,,,
freq,,130434,,,645582,965727,559242,,,,,,,
mean,2013.4,,3088825.9,30875714.7,,,,86.7,0.9,222.2,211.5,5.0,4.4,4.7
std,5.2,,995739.8,9926413.0,,,,11.4,0.1,34.8,37.5,0.9,1.2,1.1
min,2005.0,,1100015.0,11000058.0,,,,0.0,0.0,99.9,103.7,1.4,0.1,0.6
25%,2009.0,,2311306.0,23197269.0,,,,80.5,0.8,196.1,181.7,4.4,3.6,3.9
50%,2013.0,,3114402.0,31044164.0,,,,89.5,0.9,223.9,212.6,4.9,4.3,4.7
75%,2017.0,,3550308.0,35436690.0,,,,95.7,1.0,246.6,239.0,5.6,5.2,5.4


In [7]:
ideb_anos = ideb['ano'].value_counts()
print(ideb_anos)

2017    127983
2019    127983
2021    127983
2005    107303
2007    107303
2009    107303
2011    107303
2013    107303
2015    107303
Name: ano, dtype: int64


Temos dados de escolas nos anos:

- 2005
- 2007
- 2009
- 2011
- 2013
- 2015 (ano com menos observações)
- 2017 (ano com mais observações)
- 2019
- 2021

In [8]:
ideb_ensino = ideb['ensino'].value_counts(normalize=True)*100
print(ideb_ensino)

fundamental    93.963612
medio           6.036388
Name: ensino, dtype: float64


In [9]:
ideb_anos_escolares = ideb['anos_escolares'].value_counts()
print(ideb_anos_escolares)

iniciais (1-5)    559242
finais (6-9)      406485
todos (1-4)        62040
Name: anos_escolares, dtype: int64


Maioria das observações (93%) são de *turmas do Ensino Fundamental*

In [10]:
ideb_em_bool = ideb['anos_escolares'] == 'todos (1-4)'
ideb_em = ideb[ideb_em_bool]
print(ideb_em)

          ano sigla_uf  id_municipio  id_escola      rede ensino  \
643828   2017       AC       1200013   12018422  estadual  medio   
643829   2017       AC       1200013   12021768  estadual  medio   
643835   2017       AC       1200013   12128236  estadual  medio   
643839   2017       AC       1200054   12015946  estadual  medio   
643845   2017       AC       1200104   12016284  estadual  medio   
...       ...      ...           ...        ...       ...    ...   
1027737  2021       TO       1722081   17009995  estadual  medio   
1027742  2021       TO       1722081   17010020  estadual  medio   
1027751  2021       TO       1722107   17010322  estadual  medio   
1027754  2021       TO       1722107   17010330  estadual  medio   
1027764  2021       TO       1722107   17010462  estadual  medio   

        anos_escolares  taxa_aprovacao  indicador_rendimento  \
643828     todos (1-4)            89.7              0.894401   
643829     todos (1-4)            74.6              0.7

Confirmando aqui que o valor *todos (1-4)* da coluna *anos_escolares* se refere ao **ensino médio**

## Arrumando valores NaN em cada Estado

Primeiro, vamos criar DataFrames separados para cada estado.

In [11]:
# Checando quais as siglas
ideb['sigla_uf'].value_counts()

SP    130434
MG     99678
BA     91995
RS     70107
MA     62844
CE     61152
RJ     57183
PA     54213
PR     47964
PE     47484
SC     39036
GO     32241
PI     30117
PB     27693
AM     23148
RN     21585
MT     20739
AL     19974
ES     18351
SE     14583
MS     14148
TO     12486
RO     10698
DF      6105
AC      5202
AP      4869
RR      3738
Name: sigla_uf, dtype: int64

## Amazonas (AM)

In [12]:
# AMAZONAS
df_AM = ideb.loc[ideb['sigla_uf'] == "AM"]
df_AM[['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,12207.0,12204.0,9565.0,9565.0,10983.0,10983.0
mean,85.308438,0.850359,209.374421,201.557655,4.727575,4.07599
std,11.029058,0.114087,31.831154,36.294609,0.896721,1.124383
min,0.0,0.106764,100.16,109.0,1.826167,0.6
25%,79.0,0.785641,183.69,170.89,4.124565,3.3
50%,87.2,0.87107,211.02,200.71,4.626876,3.9
75%,94.0,0.93987,232.44,230.19,5.227979,4.8
max,100.0,1.0,396.39,367.69,9.374801,8.7


In [13]:
df_AM.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23148 entries, 2673 to 905685
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          23148 non-null  int64  
 1   sigla_uf                     23148 non-null  object 
 2   id_municipio                 23148 non-null  int64  
 3   id_escola                    23148 non-null  int64  
 4   rede                         23148 non-null  object 
 5   ensino                       23148 non-null  object 
 6   anos_escolares               23148 non-null  object 
 7   taxa_aprovacao               12207 non-null  float64
 8   indicador_rendimento         12204 non-null  float64
 9   nota_saeb_matematica         9565 non-null   float64
 10  nota_saeb_lingua_portuguesa  9565 non-null   float64
 11  nota_saeb_media_padronizada  10983 non-null  float64
 12  ideb                         10983 non-null  float64
 13  projecao    

In [14]:
df_AM.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 10941
indicador_rendimento           10944
nota_saeb_matematica           13583
nota_saeb_lingua_portuguesa    13583
nota_saeb_media_padronizada    12165
ideb                           12165
projecao                        9115
dtype: int64

In [15]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
for row in df_AM:
    for coluna in colunas:
        df_AM[coluna].fillna((df_AM[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [16]:
df_AM.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
2673,2005,AM,1300029,13012533,estadual,fundamental,finais (6-9),85.308438,0.850359,209.374421,201.557655,4.727575,4.07599,
2674,2005,AM,1300029,13012568,estadual,fundamental,finais (6-9),85.308438,0.850359,209.374421,201.557655,4.727575,4.07599,
2675,2005,AM,1300029,13012568,estadual,fundamental,iniciais (1-5),67.5,0.675545,168.33,164.16,4.16129,2.8,
2676,2005,AM,1300029,13012630,municipal,fundamental,iniciais (1-5),85.308438,0.850359,209.374421,201.557655,4.727575,4.07599,
2677,2005,AM,1300029,13012827,municipal,fundamental,finais (6-9),85.308438,0.850359,209.374421,201.557655,4.727575,4.07599,


In [17]:
df_AM.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23148 entries, 2673 to 905685
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          23148 non-null  int64  
 1   sigla_uf                     23148 non-null  object 
 2   id_municipio                 23148 non-null  int64  
 3   id_escola                    23148 non-null  int64  
 4   rede                         23148 non-null  object 
 5   ensino                       23148 non-null  object 
 6   anos_escolares               23148 non-null  object 
 7   taxa_aprovacao               23148 non-null  float64
 8   indicador_rendimento         23148 non-null  float64
 9   nota_saeb_matematica         23148 non-null  float64
 10  nota_saeb_lingua_portuguesa  23148 non-null  float64
 11  nota_saeb_media_padronizada  23148 non-null  float64
 12  ideb                         23148 non-null  float64
 13  projecao    

## São Paulo (SP)

In [18]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_SP = ideb.loc[ideb['sigla_uf'] == "SP"]
df_SP[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,88752.0,88750.0,69704.0,69704.0,80440.0,80432.0
mean,94.064977,0.939289,236.406227,225.164133,5.45157,5.159987
std,5.844558,0.060461,28.920474,33.584867,0.898935,1.020607
min,0.0,0.122145,131.82,122.2,2.386667,0.6
25%,91.5,0.913416,217.75,199.17,4.765667,4.4
50%,95.8,0.957283,237.69,226.44,5.326333,5.1
75%,98.4,0.983575,255.07,248.33,6.077823,5.9
max,100.0,1.0,432.67,377.85,9.102075,9.0


In [19]:
df_SP.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 130434 entries, 92937 to 1026180
Data columns (total 14 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   ano                          130434 non-null  int64  
 1   sigla_uf                     130434 non-null  object 
 2   id_municipio                 130434 non-null  int64  
 3   id_escola                    130434 non-null  int64  
 4   rede                         130434 non-null  object 
 5   ensino                       130434 non-null  object 
 6   anos_escolares               130434 non-null  object 
 7   taxa_aprovacao               88752 non-null   float64
 8   indicador_rendimento         88750 non-null   float64
 9   nota_saeb_matematica         69704 non-null   float64
 10  nota_saeb_lingua_portuguesa  69704 non-null   float64
 11  nota_saeb_media_padronizada  80440 non-null   float64
 12  ideb                         80432 non-null   float64

In [20]:
df_SP.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 41682
indicador_rendimento           41684
nota_saeb_matematica           60730
nota_saeb_lingua_portuguesa    60730
nota_saeb_media_padronizada    49994
ideb                           50002
projecao                       37102
dtype: int64

In [21]:
for row in df_SP:
    for coluna in colunas:
        df_SP[coluna].fillna((df_SP[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [22]:
df_SP.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
92937,2005,SP,3500105,35030806,estadual,fundamental,finais (6-9),94.064977,0.939289,236.406227,225.164133,5.45157,5.159987,
92938,2005,SP,3500105,35031045,estadual,fundamental,finais (6-9),94.064977,0.939289,236.406227,225.164133,5.45157,5.159987,
92939,2005,SP,3500105,35031112,estadual,fundamental,finais (6-9),94.064977,0.939289,236.406227,225.164133,5.45157,5.159987,
92940,2005,SP,3500105,35079911,municipal,fundamental,iniciais (1-5),90.5,0.901503,184.16,177.78,4.711114,4.2,
92941,2005,SP,3500105,35079923,municipal,fundamental,iniciais (1-5),94.2,0.939028,171.97,168.77,4.314822,4.1,


In [23]:
df_SP.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 130434 entries, 92937 to 1026180
Data columns (total 14 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   ano                          130434 non-null  int64  
 1   sigla_uf                     130434 non-null  object 
 2   id_municipio                 130434 non-null  int64  
 3   id_escola                    130434 non-null  int64  
 4   rede                         130434 non-null  object 
 5   ensino                       130434 non-null  object 
 6   anos_escolares               130434 non-null  object 
 7   taxa_aprovacao               130434 non-null  float64
 8   indicador_rendimento         130434 non-null  float64
 9   nota_saeb_matematica         130434 non-null  float64
 10  nota_saeb_lingua_portuguesa  130434 non-null  float64
 11  nota_saeb_media_padronizada  130434 non-null  float64
 12  ideb                         130434 non-null  float64

## Minas Gerais (MG)

In [24]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_MG = ideb.loc[ideb['sigla_uf'] == "MG"]
df_MG[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,68163.0,68161.0,55052.0,55052.0,62348.0,62342.0
mean,89.584354,0.894207,236.471497,223.85205,5.431524,4.919077
std,10.3547,0.105969,31.483184,33.950082,0.883201,1.161131
min,0.0,0.031164,130.5,111.91,1.8395,0.2
25%,83.8,0.835396,215.0,198.27,4.798898,4.1
50%,92.5,0.924598,237.88,224.67,5.366045,4.9
75%,98.3,0.982492,258.16,249.27,6.021255,5.8
max,100.0,1.0,418.82,372.85,8.939802,8.6


In [25]:
df_MG.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 99678 entries, 34597 to 951952
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          99678 non-null  int64  
 1   sigla_uf                     99678 non-null  object 
 2   id_municipio                 99678 non-null  int64  
 3   id_escola                    99678 non-null  int64  
 4   rede                         99678 non-null  object 
 5   ensino                       99678 non-null  object 
 6   anos_escolares               99678 non-null  object 
 7   taxa_aprovacao               68163 non-null  float64
 8   indicador_rendimento         68161 non-null  float64
 9   nota_saeb_matematica         55052 non-null  float64
 10  nota_saeb_lingua_portuguesa  55052 non-null  float64
 11  nota_saeb_media_padronizada  62348 non-null  float64
 12  ideb                         62342 non-null  float64
 13  projecao   

In [26]:
df_MG.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 31515
indicador_rendimento           31517
nota_saeb_matematica           44626
nota_saeb_lingua_portuguesa    44626
nota_saeb_media_padronizada    37330
ideb                           37336
projecao                       25804
dtype: int64

In [27]:
for row in df_MG:
    for coluna in colunas:
        df_MG[coluna].fillna((df_MG[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [28]:
df_MG.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
34597,2005,MG,3100104,31200271,estadual,fundamental,finais (6-9),75.8,0.757987,268.76,248.86,5.293833,4.0,
34598,2005,MG,3100104,31200794,municipal,fundamental,iniciais (1-5),89.584354,0.894207,236.471497,223.85205,5.431524,4.919077,
34599,2005,MG,3100104,31271870,municipal,fundamental,iniciais (1-5),97.3,0.972765,196.8,185.63,5.095029,5.0,
34600,2005,MG,3100203,31031771,estadual,fundamental,iniciais (1-5),98.5,0.981273,188.55,178.72,4.811881,4.7,
34601,2005,MG,3100203,31031836,estadual,fundamental,finais (6-9),54.9,0.562319,267.26,240.92,5.136448,2.9,


In [29]:
df_MG.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 99678 entries, 34597 to 951952
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          99678 non-null  int64  
 1   sigla_uf                     99678 non-null  object 
 2   id_municipio                 99678 non-null  int64  
 3   id_escola                    99678 non-null  int64  
 4   rede                         99678 non-null  object 
 5   ensino                       99678 non-null  object 
 6   anos_escolares               99678 non-null  object 
 7   taxa_aprovacao               99678 non-null  float64
 8   indicador_rendimento         99678 non-null  float64
 9   nota_saeb_matematica         99678 non-null  float64
 10  nota_saeb_lingua_portuguesa  99678 non-null  float64
 11  nota_saeb_media_padronizada  99678 non-null  float64
 12  ideb                         99678 non-null  float64
 13  projecao   

## Bahia (BA)

In [30]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_BA = ideb.loc[ideb['sigla_uf'] == "BA"]
df_BA[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,48222.0,48213.0,37575.0,37575.0,43023.0,43001.0
mean,78.565661,0.781301,203.434772,193.758423,4.491267,3.554269
std,12.865025,0.133984,30.596544,34.512496,0.731352,0.981075
min,0.0,0.025037,110.0,110.45,1.574833,0.1
25%,70.4,0.697475,178.97,164.88,3.984333,2.9
50%,80.2,0.798997,202.05,190.38,4.416394,3.5
75%,88.4,0.88287,226.36,220.17,4.921,4.2
max,100.0,1.0,403.04,357.89,8.136167,8.1


In [31]:
df_BA.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 91995 entries, 5605 to 917309
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          91995 non-null  int64  
 1   sigla_uf                     91995 non-null  object 
 2   id_municipio                 91995 non-null  int64  
 3   id_escola                    91995 non-null  int64  
 4   rede                         91995 non-null  object 
 5   ensino                       91995 non-null  object 
 6   anos_escolares               91995 non-null  object 
 7   taxa_aprovacao               48222 non-null  float64
 8   indicador_rendimento         48213 non-null  float64
 9   nota_saeb_matematica         37575 non-null  float64
 10  nota_saeb_lingua_portuguesa  37575 non-null  float64
 11  nota_saeb_media_padronizada  43023 non-null  float64
 12  ideb                         43001 non-null  float64
 13  projecao    

In [32]:
df_BA.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 43773
indicador_rendimento           43782
nota_saeb_matematica           54420
nota_saeb_lingua_portuguesa    54420
nota_saeb_media_padronizada    48972
ideb                           48994
projecao                       31620
dtype: int64

In [33]:
for row in df_BA:
    for coluna in colunas:
        df_BA[coluna].fillna((df_BA[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [34]:
df_BA.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
5605,2005,BA,2900108,29211590,estadual,fundamental,finais (6-9),78.565661,0.781301,203.434772,193.758423,4.491267,3.554269,
5606,2005,BA,2900108,29211956,municipal,fundamental,iniciais (1-5),86.0,0.84829,180.91,165.94,4.433734,3.8,
5607,2005,BA,2900108,29211964,estadual,fundamental,finais (6-9),97.6,0.968795,258.97,223.72,4.711592,4.6,
5608,2005,BA,2900108,29211999,municipal,fundamental,finais (6-9),78.565661,0.781301,203.434772,193.758423,4.491267,3.554269,
5609,2005,BA,2900108,29211999,municipal,fundamental,iniciais (1-5),78.565661,0.781301,203.434772,193.758423,4.491267,3.554269,


In [35]:
df_BA.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 91995 entries, 5605 to 917309
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          91995 non-null  int64  
 1   sigla_uf                     91995 non-null  object 
 2   id_municipio                 91995 non-null  int64  
 3   id_escola                    91995 non-null  int64  
 4   rede                         91995 non-null  object 
 5   ensino                       91995 non-null  object 
 6   anos_escolares               91995 non-null  object 
 7   taxa_aprovacao               91995 non-null  float64
 8   indicador_rendimento         91995 non-null  float64
 9   nota_saeb_matematica         91995 non-null  float64
 10  nota_saeb_lingua_portuguesa  91995 non-null  float64
 11  nota_saeb_media_padronizada  91995 non-null  float64
 12  ideb                         91995 non-null  float64
 13  projecao    

## Rio Grande do Sul (RS)

In [36]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_RS = ideb.loc[ideb['sigla_uf'] == "RS"]
df_RS[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,45902.0,45902.0,33806.0,33806.0,38285.0,38283.0
mean,85.185656,0.849346,233.677299,221.739918,5.405161,4.66097
std,11.176428,0.11376,32.913391,35.301887,0.798609,1.095596
min,10.6,0.093682,132.72,119.87,2.15525,0.8
25%,78.8,0.784425,207.94,192.17,4.849667,3.9
50%,87.7,0.874794,234.25,222.67,5.341,4.6
75%,93.8,0.937416,257.73,248.72,5.912553,5.4
max,100.0,1.0,408.11,369.38,8.675709,8.7


In [37]:
df_RS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 70107 entries, 79899 to 1002257
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          70107 non-null  int64  
 1   sigla_uf                     70107 non-null  object 
 2   id_municipio                 70107 non-null  int64  
 3   id_escola                    70107 non-null  int64  
 4   rede                         70107 non-null  object 
 5   ensino                       70107 non-null  object 
 6   anos_escolares               70107 non-null  object 
 7   taxa_aprovacao               45902 non-null  float64
 8   indicador_rendimento         45902 non-null  float64
 9   nota_saeb_matematica         33806 non-null  float64
 10  nota_saeb_lingua_portuguesa  33806 non-null  float64
 11  nota_saeb_media_padronizada  38285 non-null  float64
 12  ideb                         38283 non-null  float64
 13  projecao  

In [38]:
df_RS.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 24205
indicador_rendimento           24205
nota_saeb_matematica           36301
nota_saeb_lingua_portuguesa    36301
nota_saeb_media_padronizada    31822
ideb                           31824
projecao                       20274
dtype: int64

In [39]:
for row in df_RS:
    for coluna in colunas:
        df_RS[coluna].fillna((df_RS[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [40]:
df_RS.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
79899,2005,RS,4300034,43019285,municipal,fundamental,finais (6-9),85.185656,0.849346,233.677299,221.739918,5.405161,4.66097,
79900,2005,RS,4300034,43019285,municipal,fundamental,iniciais (1-5),85.185656,0.849346,233.677299,221.739918,5.405161,4.66097,
79901,2005,RS,4300034,43019650,estadual,fundamental,finais (6-9),85.185656,0.849346,233.677299,221.739918,5.405161,4.66097,
79902,2005,RS,4300034,43019650,estadual,fundamental,iniciais (1-5),85.185656,0.849346,233.677299,221.739918,5.405161,4.66097,
79903,2005,RS,4300034,43200770,municipal,fundamental,finais (6-9),85.185656,0.849346,233.677299,221.739918,5.405161,4.66097,


In [41]:
df_RS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 70107 entries, 79899 to 1002257
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          70107 non-null  int64  
 1   sigla_uf                     70107 non-null  object 
 2   id_municipio                 70107 non-null  int64  
 3   id_escola                    70107 non-null  int64  
 4   rede                         70107 non-null  object 
 5   ensino                       70107 non-null  object 
 6   anos_escolares               70107 non-null  object 
 7   taxa_aprovacao               70107 non-null  float64
 8   indicador_rendimento         70107 non-null  float64
 9   nota_saeb_matematica         70107 non-null  float64
 10  nota_saeb_lingua_portuguesa  70107 non-null  float64
 11  nota_saeb_media_padronizada  70107 non-null  float64
 12  ideb                         70107 non-null  float64
 13  projecao  

## Maranhão (MA)

In [42]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_MA = ideb.loc[ideb['sigla_uf'] == "MA"]
df_MA[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,30535.0,30535.0,23733.0,23733.0,26990.0,26990.0
mean,86.794724,0.86602,197.50722,190.368288,4.201946,3.662197
std,9.094457,0.094313,31.674636,36.705169,0.698141,0.825055
min,10.3,0.06893,99.89,104.18,1.350903,0.3
25%,81.8,0.814566,171.58,158.78,3.726521,3.1
50%,88.3,0.882673,196.63,187.69,4.146265,3.6
75%,93.6,0.936272,221.77,219.83,4.6035,4.1
max,100.0,1.0,366.6,340.7,8.418257,8.2


In [43]:
df_MA.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 62844 entries, 27894 to 939216
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          62844 non-null  int64  
 1   sigla_uf                     62844 non-null  object 
 2   id_municipio                 62844 non-null  int64  
 3   id_escola                    62844 non-null  int64  
 4   rede                         62844 non-null  object 
 5   ensino                       62844 non-null  object 
 6   anos_escolares               62844 non-null  object 
 7   taxa_aprovacao               30535 non-null  float64
 8   indicador_rendimento         30535 non-null  float64
 9   nota_saeb_matematica         23733 non-null  float64
 10  nota_saeb_lingua_portuguesa  23733 non-null  float64
 11  nota_saeb_media_padronizada  26990 non-null  float64
 12  ideb                         26990 non-null  float64
 13  projecao   

In [44]:
df_MA.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 32309
indicador_rendimento           32309
nota_saeb_matematica           39111
nota_saeb_lingua_portuguesa    39111
nota_saeb_media_padronizada    35854
ideb                           35854
projecao                       25048
dtype: int64

In [45]:
for row in df_MA:
    for coluna in colunas:
        df_MA[coluna].fillna((df_MA[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [46]:
df_MA.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
27894,2005,MA,2100055,21088179,estadual,fundamental,finais (6-9),77.1,0.64764,226.27,218.66,4.082389,2.6,
27895,2005,MA,2100055,21088276,estadual,fundamental,finais (6-9),85.9,0.849371,238.66,229.19,4.464179,3.8,
27896,2005,MA,2100055,21088276,estadual,fundamental,iniciais (1-5),86.794724,0.86602,197.50722,190.368288,4.201946,3.662197,
27897,2005,MA,2100055,21088357,municipal,fundamental,finais (6-9),86.794724,0.86602,197.50722,190.368288,4.201946,3.662197,
27898,2005,MA,2100055,21088357,municipal,fundamental,iniciais (1-5),81.9,0.822169,185.91,175.8,4.708828,3.9,


In [47]:
df_MA.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 62844 entries, 27894 to 939216
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          62844 non-null  int64  
 1   sigla_uf                     62844 non-null  object 
 2   id_municipio                 62844 non-null  int64  
 3   id_escola                    62844 non-null  int64  
 4   rede                         62844 non-null  object 
 5   ensino                       62844 non-null  object 
 6   anos_escolares               62844 non-null  object 
 7   taxa_aprovacao               62844 non-null  float64
 8   indicador_rendimento         62844 non-null  float64
 9   nota_saeb_matematica         62844 non-null  float64
 10  nota_saeb_lingua_portuguesa  62844 non-null  float64
 11  nota_saeb_media_padronizada  62844 non-null  float64
 12  ideb                         62844 non-null  float64
 13  projecao   

## Ceará (CE)

In [48]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_CE = ideb.loc[ideb['sigla_uf'] == "CE"]
df_CE[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,32008.0,32005.0,26886.0,26886.0,30381.0,30372.0
mean,89.88585,0.897711,222.388058,214.185066,5.02558,4.563588
std,9.904805,0.101174,37.905379,39.640414,1.072797,1.295345
min,0.0,0.229,118.16,107.98,1.726667,0.8
25%,85.0,0.84863,195.0225,182.66,4.244333,3.6
50%,92.7,0.92607,223.29,215.83,4.868667,4.4
75%,97.6,0.975637,246.16,242.75,5.640817,5.4
max,100.0,1.0,423.77,373.47,9.849818,9.8


In [49]:
df_CE.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61152 entries, 15435 to 924575
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          61152 non-null  int64  
 1   sigla_uf                     61152 non-null  object 
 2   id_municipio                 61152 non-null  int64  
 3   id_escola                    61152 non-null  int64  
 4   rede                         61152 non-null  object 
 5   ensino                       61152 non-null  object 
 6   anos_escolares               61152 non-null  object 
 7   taxa_aprovacao               32008 non-null  float64
 8   indicador_rendimento         32005 non-null  float64
 9   nota_saeb_matematica         26886 non-null  float64
 10  nota_saeb_lingua_portuguesa  26886 non-null  float64
 11  nota_saeb_media_padronizada  30381 non-null  float64
 12  ideb                         30372 non-null  float64
 13  projecao   

In [50]:
df_CE.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 29144
indicador_rendimento           29147
nota_saeb_matematica           34266
nota_saeb_lingua_portuguesa    34266
nota_saeb_media_padronizada    30771
ideb                           30780
projecao                       20901
dtype: int64

In [51]:
for row in df_CE:
    for coluna in colunas:
        df_CE[coluna].fillna((df_CE[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [52]:
df_CE.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
15435,2005,CE,2300101,23168749,estadual,fundamental,finais (6-9),89.88585,0.897711,222.388058,214.185066,5.02558,4.563588,
15436,2005,CE,2300101,23168862,municipal,fundamental,finais (6-9),89.88585,0.897711,222.388058,214.185066,5.02558,4.563588,
15437,2005,CE,2300101,23168862,municipal,fundamental,iniciais (1-5),84.2,0.843504,163.94,145.94,3.74646,3.2,
15438,2005,CE,2300101,23168951,municipal,fundamental,finais (6-9),89.88585,0.897711,222.388058,214.185066,5.02558,4.563588,
15439,2005,CE,2300101,23168951,municipal,fundamental,iniciais (1-5),89.88585,0.897711,222.388058,214.185066,5.02558,4.563588,


In [53]:
df_CE.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61152 entries, 15435 to 924575
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          61152 non-null  int64  
 1   sigla_uf                     61152 non-null  object 
 2   id_municipio                 61152 non-null  int64  
 3   id_escola                    61152 non-null  int64  
 4   rede                         61152 non-null  object 
 5   ensino                       61152 non-null  object 
 6   anos_escolares               61152 non-null  object 
 7   taxa_aprovacao               61152 non-null  float64
 8   indicador_rendimento         61152 non-null  float64
 9   nota_saeb_matematica         61152 non-null  float64
 10  nota_saeb_lingua_portuguesa  61152 non-null  float64
 11  nota_saeb_media_padronizada  61152 non-null  float64
 12  ideb                         61152 non-null  float64
 13  projecao   

## Rio de Janeiro (RJ)

In [54]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_RJ = ideb.loc[ideb['sigla_uf'] == "RJ"]
df_RJ[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,36543.0,36539.0,29353.0,29353.0,33258.0,33240.0
mean,83.024089,0.827832,224.157966,214.065709,5.166093,4.334603
std,10.892947,0.111618,30.468504,33.703016,0.827342,1.064252
min,0.0,0.068,113.45,106.63,2.081699,0.7
25%,76.8,0.764947,201.53,187.22,4.574667,3.6
50%,84.8,0.845976,223.83,212.13,5.105542,4.3
75%,91.2,0.911075,244.36,238.63,5.70606,5.0
max,100.0,1.0,417.64,371.45,8.824924,8.7


In [55]:
df_RJ.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 57183 entries, 70162 to 989265
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          57183 non-null  int64  
 1   sigla_uf                     57183 non-null  object 
 2   id_municipio                 57183 non-null  int64  
 3   id_escola                    57183 non-null  int64  
 4   rede                         57183 non-null  object 
 5   ensino                       57183 non-null  object 
 6   anos_escolares               57183 non-null  object 
 7   taxa_aprovacao               36543 non-null  float64
 8   indicador_rendimento         36539 non-null  float64
 9   nota_saeb_matematica         29353 non-null  float64
 10  nota_saeb_lingua_portuguesa  29353 non-null  float64
 11  nota_saeb_media_padronizada  33258 non-null  float64
 12  ideb                         33240 non-null  float64
 13  projecao   

In [56]:
df_RJ.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 20640
indicador_rendimento           20644
nota_saeb_matematica           27830
nota_saeb_lingua_portuguesa    27830
nota_saeb_media_padronizada    23925
ideb                           23943
projecao                       14607
dtype: int64

In [57]:
for row in df_RJ:
    for coluna in colunas:
        df_RJ[coluna].fillna((df_RJ[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [58]:
df_RJ.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
70162,2005,RJ,3300100,33036594,estadual,fundamental,finais (6-9),61.4,0.615316,238.0,230.74,4.478998,2.8,
70163,2005,RJ,3300100,33036594,estadual,fundamental,iniciais (1-5),93.9,0.939,191.0,185.85,4.988094,4.7,
70164,2005,RJ,3300100,33036632,estadual,fundamental,finais (6-9),44.7,0.444022,227.11,227.99,4.251915,1.9,
70165,2005,RJ,3300100,33036632,estadual,fundamental,iniciais (1-5),63.2,0.583957,174.92,164.97,4.301806,2.5,
70166,2005,RJ,3300100,33036640,estadual,fundamental,finais (6-9),63.9,0.622762,240.27,225.44,4.428616,2.8,


In [59]:
df_RJ.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 57183 entries, 70162 to 989265
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          57183 non-null  int64  
 1   sigla_uf                     57183 non-null  object 
 2   id_municipio                 57183 non-null  int64  
 3   id_escola                    57183 non-null  int64  
 4   rede                         57183 non-null  object 
 5   ensino                       57183 non-null  object 
 6   anos_escolares               57183 non-null  object 
 7   taxa_aprovacao               57183 non-null  float64
 8   indicador_rendimento         57183 non-null  float64
 9   nota_saeb_matematica         57183 non-null  float64
 10  nota_saeb_lingua_portuguesa  57183 non-null  float64
 11  nota_saeb_media_padronizada  57183 non-null  float64
 12  ideb                         57183 non-null  float64
 13  projecao   

## Pará (PA)

In [60]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_PA = ideb.loc[ideb['sigla_uf'] == "PA"]
df_PA[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,28209.0,28204.0,20992.0,20992.0,24485.0,24479.0
mean,81.447059,0.811964,199.002899,191.09514,4.433903,3.652
std,11.29626,0.116798,30.145767,34.622553,0.65989,0.883946
min,0.0,0.013085,104.16,109.08,1.935112,0.1
25%,74.5,0.741659,174.33,162.16,3.994833,3.0
50%,82.9,0.828096,194.825,183.55,4.3675,3.6
75%,90.1,0.900559,224.25,220.92,4.804347,4.2
max,100.0,1.0,415.73,360.7,7.928409,7.9


In [61]:
df_PA.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54213 entries, 48460 to 962802
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          54213 non-null  int64  
 1   sigla_uf                     54213 non-null  object 
 2   id_municipio                 54213 non-null  int64  
 3   id_escola                    54213 non-null  int64  
 4   rede                         54213 non-null  object 
 5   ensino                       54213 non-null  object 
 6   anos_escolares               54213 non-null  object 
 7   taxa_aprovacao               28209 non-null  float64
 8   indicador_rendimento         28204 non-null  float64
 9   nota_saeb_matematica         20992 non-null  float64
 10  nota_saeb_lingua_portuguesa  20992 non-null  float64
 11  nota_saeb_media_padronizada  24485 non-null  float64
 12  ideb                         24479 non-null  float64
 13  projecao   

In [62]:
df_PA.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 26004
indicador_rendimento           26009
nota_saeb_matematica           33221
nota_saeb_lingua_portuguesa    33221
nota_saeb_media_padronizada    29728
ideb                           29734
projecao                       22745
dtype: int64

In [63]:
for row in df_PA:
    for coluna in colunas:
        df_PA[coluna].fillna((df_PA[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [64]:
df_PA.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
48460,2005,PA,1500107,15064255,municipal,fundamental,iniciais (1-5),71.9,0.71318,157.83,153.27,3.763157,2.7,
48461,2005,PA,1500107,15064280,estadual,fundamental,finais (6-9),92.0,0.920342,250.47,238.96,4.823869,4.4,
48462,2005,PA,1500107,15064301,municipal,fundamental,iniciais (1-5),78.8,0.788683,172.82,165.16,4.264936,3.4,
48463,2005,PA,1500107,15064310,municipal,fundamental,iniciais (1-5),96.3,0.962741,172.44,181.22,4.550136,4.4,
48464,2005,PA,1500107,15064352,municipal,fundamental,iniciais (1-5),87.9,0.884813,174.19,171.38,4.404412,3.9,


In [65]:
df_PA.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54213 entries, 48460 to 962802
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          54213 non-null  int64  
 1   sigla_uf                     54213 non-null  object 
 2   id_municipio                 54213 non-null  int64  
 3   id_escola                    54213 non-null  int64  
 4   rede                         54213 non-null  object 
 5   ensino                       54213 non-null  object 
 6   anos_escolares               54213 non-null  object 
 7   taxa_aprovacao               54213 non-null  float64
 8   indicador_rendimento         54213 non-null  float64
 9   nota_saeb_matematica         54213 non-null  float64
 10  nota_saeb_lingua_portuguesa  54213 non-null  float64
 11  nota_saeb_media_padronizada  54213 non-null  float64
 12  ideb                         54213 non-null  float64
 13  projecao   

## Paraná (PR)

In [66]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_PR = ideb.loc[ideb['sigla_uf'] == "PR"]
df_PR[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,33386.0,33386.0,26970.0,26970.0,30535.0,30531.0
mean,89.45594,0.892652,236.673901,222.146571,5.521133,4.995975
std,8.587024,0.087657,29.424274,32.604383,0.908386,1.121322
min,2.9,0.022887,116.66,108.76,2.269996,0.1
25%,85.0,0.847415,215.78,195.475,4.842459,4.2
50%,91.6,0.914322,238.825,223.55,5.395,4.9
75%,96.0,0.95896,256.8575,245.19,6.121789,5.8
max,100.0,1.0,386.1,351.32,8.979574,8.9


In [67]:
df_PR.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47964 entries, 65357 to 982098
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          47964 non-null  int64  
 1   sigla_uf                     47964 non-null  object 
 2   id_municipio                 47964 non-null  int64  
 3   id_escola                    47964 non-null  int64  
 4   rede                         47964 non-null  object 
 5   ensino                       47964 non-null  object 
 6   anos_escolares               47964 non-null  object 
 7   taxa_aprovacao               33386 non-null  float64
 8   indicador_rendimento         33386 non-null  float64
 9   nota_saeb_matematica         26970 non-null  float64
 10  nota_saeb_lingua_portuguesa  26970 non-null  float64
 11  nota_saeb_media_padronizada  30535 non-null  float64
 12  ideb                         30531 non-null  float64
 13  projecao   

In [68]:
df_PR.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 14578
indicador_rendimento           14578
nota_saeb_matematica           20994
nota_saeb_lingua_portuguesa    20994
nota_saeb_media_padronizada    17429
ideb                           17433
projecao                       12508
dtype: int64

In [69]:
for row in df_PR:
    for coluna in colunas:
        df_PR[coluna].fillna((df_PR[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [70]:
df_PR.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
65357,2005,PR,4100103,41042476,estadual,fundamental,finais (6-9),81.3,0.811214,232.86,215.8,4.144622,3.4,
65358,2005,PR,4100103,41042590,municipal,fundamental,iniciais (1-5),87.9,0.880622,180.35,171.88,4.530956,4.0,
65359,2005,PR,4100103,41042670,estadual,fundamental,finais (6-9),89.45594,0.892652,236.673901,222.146571,5.521133,4.995975,
65360,2005,PR,4100202,41120027,municipal,fundamental,iniciais (1-5),89.45594,0.892652,236.673901,222.146571,5.521133,4.995975,
65361,2005,PR,4100202,41120140,municipal,fundamental,iniciais (1-5),89.45594,0.892652,236.673901,222.146571,5.521133,4.995975,


In [71]:
df_PR.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47964 entries, 65357 to 982098
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          47964 non-null  int64  
 1   sigla_uf                     47964 non-null  object 
 2   id_municipio                 47964 non-null  int64  
 3   id_escola                    47964 non-null  int64  
 4   rede                         47964 non-null  object 
 5   ensino                       47964 non-null  object 
 6   anos_escolares               47964 non-null  object 
 7   taxa_aprovacao               47964 non-null  float64
 8   indicador_rendimento         47964 non-null  float64
 9   nota_saeb_matematica         47964 non-null  float64
 10  nota_saeb_lingua_portuguesa  47964 non-null  float64
 11  nota_saeb_media_padronizada  47964 non-null  float64
 12  ideb                         47964 non-null  float64
 13  projecao   

## Pernambuco (PE)

In [72]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_PE = ideb.loc[ideb['sigla_uf'] == "PE"]
df_PE[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,27047.0,27046.0,22429.0,22429.0,25422.0,25417.0
mean,83.700876,0.83509,209.480883,199.255089,4.57349,3.861435
std,11.743282,0.120176,35.02522,38.695747,0.776245,1.030917
min,0.0,0.066818,123.3,110.58,2.199333,0.3
25%,77.1,0.767743,180.94,165.92,4.021494,3.1
50%,86.1,0.860216,208.96,197.5,4.472293,3.8
75%,92.7,0.927549,232.02,226.16,5.02561,4.5
max,100.0,1.0,402.95,367.05,8.850679,8.8


In [73]:
df_PE.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47484 entries, 57189 to 972003
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          47484 non-null  int64  
 1   sigla_uf                     47484 non-null  object 
 2   id_municipio                 47484 non-null  int64  
 3   id_escola                    47484 non-null  int64  
 4   rede                         47484 non-null  object 
 5   ensino                       47484 non-null  object 
 6   anos_escolares               47484 non-null  object 
 7   taxa_aprovacao               27047 non-null  float64
 8   indicador_rendimento         27046 non-null  float64
 9   nota_saeb_matematica         22429 non-null  float64
 10  nota_saeb_lingua_portuguesa  22429 non-null  float64
 11  nota_saeb_media_padronizada  25422 non-null  float64
 12  ideb                         25417 non-null  float64
 13  projecao   

In [74]:
df_PE.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 20437
indicador_rendimento           20438
nota_saeb_matematica           25055
nota_saeb_lingua_portuguesa    25055
nota_saeb_media_padronizada    22062
ideb                           22067
projecao                       14335
dtype: int64

In [75]:
for row in df_PE:
    for coluna in colunas:
        df_PE[coluna].fillna((df_PE[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [76]:
df_PE.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
57189,2005,PE,2600054,26106450,municipal,fundamental,iniciais (1-5),70.7,0.710647,169.63,160.39,4.117825,2.9,
57190,2005,PE,2600054,26106477,municipal,fundamental,finais (6-9),83.700876,0.83509,209.480883,199.255089,4.57349,3.861435,
57191,2005,PE,2600054,26106477,municipal,fundamental,iniciais (1-5),78.3,0.776054,168.91,163.89,4.167718,3.2,
57192,2005,PE,2600054,26106582,estadual,fundamental,finais (6-9),44.7,0.458012,215.53,202.61,3.636003,1.7,
57193,2005,PE,2600054,26106612,estadual,fundamental,finais (6-9),78.8,0.791348,218.86,205.57,3.74049,3.0,


In [77]:
df_PE.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47484 entries, 57189 to 972003
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          47484 non-null  int64  
 1   sigla_uf                     47484 non-null  object 
 2   id_municipio                 47484 non-null  int64  
 3   id_escola                    47484 non-null  int64  
 4   rede                         47484 non-null  object 
 5   ensino                       47484 non-null  object 
 6   anos_escolares               47484 non-null  object 
 7   taxa_aprovacao               47484 non-null  float64
 8   indicador_rendimento         47484 non-null  float64
 9   nota_saeb_matematica         47484 non-null  float64
 10  nota_saeb_lingua_portuguesa  47484 non-null  float64
 11  nota_saeb_media_padronizada  47484 non-null  float64
 12  ideb                         47484 non-null  float64
 13  projecao   

## Santa Catarina (SC)

In [78]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_SC = ideb.loc[ideb['sigla_uf'] == "SC"]
df_SC[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,26084.0,26084.0,20391.0,20391.0,23221.0,23220.0
mean,91.443,0.91311,235.16914,222.398845,5.491073,5.070168
std,7.73565,0.078637,30.990134,33.189777,0.861633,1.030348
min,45.3,0.4106,138.55,119.34,2.310167,1.7
25%,87.7,0.874214,212.99,196.26,4.854579,4.3
50%,93.6,0.935471,237.52,224.66,5.404,5.0
75%,97.2,0.971988,257.17,246.22,6.08169,5.8
max,100.0,1.0,392.05,350.82,9.357981,9.3


In [79]:
df_SC.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 39036 entries, 87301 to 1007099
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          39036 non-null  int64  
 1   sigla_uf                     39036 non-null  object 
 2   id_municipio                 39036 non-null  int64  
 3   id_escola                    39036 non-null  int64  
 4   rede                         39036 non-null  object 
 5   ensino                       39036 non-null  object 
 6   anos_escolares               39036 non-null  object 
 7   taxa_aprovacao               26084 non-null  float64
 8   indicador_rendimento         26084 non-null  float64
 9   nota_saeb_matematica         20391 non-null  float64
 10  nota_saeb_lingua_portuguesa  20391 non-null  float64
 11  nota_saeb_media_padronizada  23221 non-null  float64
 12  ideb                         23220 non-null  float64
 13  projecao  

In [80]:
df_SC.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 12952
indicador_rendimento           12952
nota_saeb_matematica           18645
nota_saeb_lingua_portuguesa    18645
nota_saeb_media_padronizada    15815
ideb                           15816
projecao                       11132
dtype: int64

In [81]:
for row in df_SC:
    for coluna in colunas:
        df_SC[coluna].fillna((df_SC[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [82]:
df_SC.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
87301,2005,SC,4200051,42011272,municipal,fundamental,iniciais (1-5),91.443,0.91311,235.16914,222.398845,5.491073,5.070168,
87302,2005,SC,4200051,42044340,municipal,fundamental,iniciais (1-5),91.443,0.91311,235.16914,222.398845,5.491073,5.070168,
87303,2005,SC,4200051,42044553,estadual,fundamental,finais (6-9),83.9,0.836903,258.26,229.69,4.799228,4.0,
87304,2005,SC,4200051,42044553,estadual,fundamental,iniciais (1-5),91.443,0.91311,235.16914,222.398845,5.491073,5.070168,
87305,2005,SC,4200101,42086671,estadual,fundamental,finais (6-9),91.443,0.91311,235.16914,222.398845,5.491073,5.070168,


In [83]:
df_SC.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 39036 entries, 87301 to 1007099
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          39036 non-null  int64  
 1   sigla_uf                     39036 non-null  object 
 2   id_municipio                 39036 non-null  int64  
 3   id_escola                    39036 non-null  int64  
 4   rede                         39036 non-null  object 
 5   ensino                       39036 non-null  object 
 6   anos_escolares               39036 non-null  object 
 7   taxa_aprovacao               39036 non-null  float64
 8   indicador_rendimento         39036 non-null  float64
 9   nota_saeb_matematica         39036 non-null  float64
 10  nota_saeb_lingua_portuguesa  39036 non-null  float64
 11  nota_saeb_media_padronizada  39036 non-null  float64
 12  ideb                         39036 non-null  float64
 13  projecao  

## Goiás (GO)

In [84]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_GO = ideb.loc[ideb['sigla_uf'] == "GO"]
df_GO[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,20606.0,20606.0,16826.0,16826.0,19076.0,19074.0
mean,90.412938,0.901787,227.599801,218.658569,5.188734,4.715262
std,8.97585,0.094736,32.012301,35.533963,0.818021,1.059904
min,17.8,0.056648,124.54,118.52,2.307,0.2
25%,86.0,0.857172,203.38,189.77,4.577785,4.0
50%,92.8,0.927672,229.2,219.0,5.100996,4.7
75%,97.3,0.973322,249.33,243.955,5.705783,5.4
max,100.0,1.0,393.69,350.48,8.571165,8.6


In [85]:
df_GO.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32241 entries, 24548 to 931674
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          32241 non-null  int64  
 1   sigla_uf                     32241 non-null  object 
 2   id_municipio                 32241 non-null  int64  
 3   id_escola                    32241 non-null  int64  
 4   rede                         32241 non-null  object 
 5   ensino                       32241 non-null  object 
 6   anos_escolares               32241 non-null  object 
 7   taxa_aprovacao               20606 non-null  float64
 8   indicador_rendimento         20606 non-null  float64
 9   nota_saeb_matematica         16826 non-null  float64
 10  nota_saeb_lingua_portuguesa  16826 non-null  float64
 11  nota_saeb_media_padronizada  19076 non-null  float64
 12  ideb                         19074 non-null  float64
 13  projecao   

In [86]:
df_GO.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 11635
indicador_rendimento           11635
nota_saeb_matematica           15415
nota_saeb_lingua_portuguesa    15415
nota_saeb_media_padronizada    13165
ideb                           13167
projecao                        7939
dtype: int64

In [87]:
for row in df_GO:
    for coluna in colunas:
        df_GO[coluna].fillna((df_GO[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [88]:
df_GO.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
24548,2005,GO,5200050,52032094,municipal,fundamental,iniciais (1-5),90.412938,0.901787,227.599801,218.658569,5.188734,4.715262,
24549,2005,GO,5200050,52036499,municipal,fundamental,iniciais (1-5),90.412938,0.901787,227.599801,218.658569,5.188734,4.715262,
24550,2005,GO,5200050,52040127,estadual,fundamental,finais (6-9),80.2,0.796033,235.72,223.19,4.315203,3.4,
24551,2005,GO,5200050,52040127,estadual,fundamental,iniciais (1-5),78.8,0.788,172.86,161.91,4.206981,3.3,
24552,2005,GO,5200050,52074820,municipal,fundamental,finais (6-9),57.6,0.483697,248.86,232.69,4.69252,2.3,


In [89]:
df_GO.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32241 entries, 24548 to 931674
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          32241 non-null  int64  
 1   sigla_uf                     32241 non-null  object 
 2   id_municipio                 32241 non-null  int64  
 3   id_escola                    32241 non-null  int64  
 4   rede                         32241 non-null  object 
 5   ensino                       32241 non-null  object 
 6   anos_escolares               32241 non-null  object 
 7   taxa_aprovacao               32241 non-null  float64
 8   indicador_rendimento         32241 non-null  float64
 9   nota_saeb_matematica         32241 non-null  float64
 10  nota_saeb_lingua_portuguesa  32241 non-null  float64
 11  nota_saeb_media_padronizada  32241 non-null  float64
 12  ideb                         32241 non-null  float64
 13  projecao   

## Piauí (PI)

In [90]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_PI = ideb.loc[ideb['sigla_uf'] == "PI"]
df_PI[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,14622.0,14622.0,11415.0,11415.0,12938.0,12936.0
mean,82.865873,0.823459,212.737974,202.835714,4.637174,3.878254
std,12.304731,0.129939,36.100597,38.687646,0.848664,1.124641
min,26.9,0.1538,115.48,110.01,2.149948,0.7
25%,75.0,0.742949,183.015,169.86,4.054657,3.1
50%,84.6,0.844801,212.91,201.77,4.528833,3.7
75%,92.6,0.927118,239.09,232.855,5.073932,4.5
max,100.0,1.0,429.88,372.26,8.68919,8.7


In [91]:
df_PI.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 30117 entries, 62196 to 975720
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          30117 non-null  int64  
 1   sigla_uf                     30117 non-null  object 
 2   id_municipio                 30117 non-null  int64  
 3   id_escola                    30117 non-null  int64  
 4   rede                         30117 non-null  object 
 5   ensino                       30117 non-null  object 
 6   anos_escolares               30117 non-null  object 
 7   taxa_aprovacao               14622 non-null  float64
 8   indicador_rendimento         14622 non-null  float64
 9   nota_saeb_matematica         11415 non-null  float64
 10  nota_saeb_lingua_portuguesa  11415 non-null  float64
 11  nota_saeb_media_padronizada  12938 non-null  float64
 12  ideb                         12936 non-null  float64
 13  projecao   

In [92]:
df_PI.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 15495
indicador_rendimento           15495
nota_saeb_matematica           18702
nota_saeb_lingua_portuguesa    18702
nota_saeb_media_padronizada    17179
ideb                           17181
projecao                       11993
dtype: int64

In [93]:
for row in df_PI:
    for coluna in colunas:
        df_PI[coluna].fillna((df_PI[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [94]:
df_PI.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
62196,2005,PI,2200053,22095942,municipal,fundamental,finais (6-9),82.865873,0.823459,212.737974,202.835714,4.637174,3.878254,
62197,2005,PI,2200053,22095942,municipal,fundamental,iniciais (1-5),82.865873,0.823459,212.737974,202.835714,4.637174,3.878254,
62198,2005,PI,2200053,22096620,municipal,fundamental,iniciais (1-5),82.865873,0.823459,212.737974,202.835714,4.637174,3.878254,
62199,2005,PI,2200053,22097236,municipal,fundamental,iniciais (1-5),82.865873,0.823459,212.737974,202.835714,4.637174,3.878254,
62200,2005,PI,2200053,22098232,municipal,fundamental,finais (6-9),82.865873,0.823459,212.737974,202.835714,4.637174,3.878254,


In [95]:
df_PI.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 30117 entries, 62196 to 975720
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          30117 non-null  int64  
 1   sigla_uf                     30117 non-null  object 
 2   id_municipio                 30117 non-null  int64  
 3   id_escola                    30117 non-null  int64  
 4   rede                         30117 non-null  object 
 5   ensino                       30117 non-null  object 
 6   anos_escolares               30117 non-null  object 
 7   taxa_aprovacao               30117 non-null  float64
 8   indicador_rendimento         30117 non-null  float64
 9   nota_saeb_matematica         30117 non-null  float64
 10  nota_saeb_lingua_portuguesa  30117 non-null  float64
 11  nota_saeb_media_padronizada  30117 non-null  float64
 12  ideb                         30117 non-null  float64
 13  projecao   

## Paraíba (PB)

In [96]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_PB = ideb.loc[ideb['sigla_uf'] == "PB"]
df_PB[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,15509.0,15507.0,12021.0,12021.0,13599.0,13593.0
mean,79.580895,0.792569,207.100494,197.133796,4.540713,3.650578
std,12.496804,0.128873,31.602339,36.070468,0.724455,0.994294
min,0.0,0.077318,123.94,118.02,2.245,0.3
25%,71.6,0.711291,181.55,166.52,4.019728,2.9
50%,81.4,0.811823,207.02,194.55,4.460833,3.6
75%,89.1,0.890469,230.0,223.8,4.99063,4.3
max,100.0,1.0,379.07,347.22,7.710936,7.7


In [97]:
df_PB.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27693 entries, 54267 to 966189
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          27693 non-null  int64  
 1   sigla_uf                     27693 non-null  object 
 2   id_municipio                 27693 non-null  int64  
 3   id_escola                    27693 non-null  int64  
 4   rede                         27693 non-null  object 
 5   ensino                       27693 non-null  object 
 6   anos_escolares               27693 non-null  object 
 7   taxa_aprovacao               15509 non-null  float64
 8   indicador_rendimento         15507 non-null  float64
 9   nota_saeb_matematica         12021 non-null  float64
 10  nota_saeb_lingua_portuguesa  12021 non-null  float64
 11  nota_saeb_media_padronizada  13599 non-null  float64
 12  ideb                         13593 non-null  float64
 13  projecao   

In [98]:
df_PB.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                 12184
indicador_rendimento           12186
nota_saeb_matematica           15672
nota_saeb_lingua_portuguesa    15672
nota_saeb_media_padronizada    14094
ideb                           14100
projecao                        9315
dtype: int64

In [99]:
for row in df_PB:
    for coluna in colunas:
        df_PB[coluna].fillna((df_PB[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [100]:
df_PB.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
54267,2005,PB,2500106,25033158,municipal,fundamental,iniciais (1-5),79.580895,0.792569,207.100494,197.133796,4.540713,3.650578,
54268,2005,PB,2500106,25033204,estadual,fundamental,finais (6-9),68.3,0.66942,223.02,214.64,3.961368,2.7,
54269,2005,PB,2500106,25033212,estadual,fundamental,finais (6-9),79.580895,0.792569,207.100494,197.133796,4.540713,3.650578,
54270,2005,PB,2500106,25033263,estadual,fundamental,finais (6-9),79.580895,0.792569,207.100494,197.133796,4.540713,3.650578,
54271,2005,PB,2500106,25033557,municipal,fundamental,finais (6-9),79.580895,0.792569,207.100494,197.133796,4.540713,3.650578,


In [101]:
df_PB.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27693 entries, 54267 to 966189
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          27693 non-null  int64  
 1   sigla_uf                     27693 non-null  object 
 2   id_municipio                 27693 non-null  int64  
 3   id_escola                    27693 non-null  int64  
 4   rede                         27693 non-null  object 
 5   ensino                       27693 non-null  object 
 6   anos_escolares               27693 non-null  object 
 7   taxa_aprovacao               27693 non-null  float64
 8   indicador_rendimento         27693 non-null  float64
 9   nota_saeb_matematica         27693 non-null  float64
 10  nota_saeb_lingua_portuguesa  27693 non-null  float64
 11  nota_saeb_media_padronizada  27693 non-null  float64
 12  ideb                         27693 non-null  float64
 13  projecao   

## Rio Grande do Norte (RN)

In [102]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_RN = ideb.loc[ideb['sigla_uf'] == "RN"]
df_RN[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,12883.0,12883.0,9374.0,9374.0,10771.0,10760.0
mean,77.907001,0.775693,200.462087,190.250627,4.428807,3.500678
std,12.682639,0.132176,33.478743,37.138807,0.722175,0.981105
min,7.7,0.077,117.54,103.66,2.035167,0.5
25%,69.8,0.694052,172.645,158.3,3.908061,2.8
50%,79.6,0.793419,197.86,185.835,4.364167,3.4
75%,87.5,0.876149,226.69,220.1525,4.870448,4.1
max,100.0,1.0,412.04,365.66,8.115654,8.1


In [103]:
df_RN.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21585 entries, 76109 to 991894
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          21585 non-null  int64  
 1   sigla_uf                     21585 non-null  object 
 2   id_municipio                 21585 non-null  int64  
 3   id_escola                    21585 non-null  int64  
 4   rede                         21585 non-null  object 
 5   ensino                       21585 non-null  object 
 6   anos_escolares               21585 non-null  object 
 7   taxa_aprovacao               12883 non-null  float64
 8   indicador_rendimento         12883 non-null  float64
 9   nota_saeb_matematica         9374 non-null   float64
 10  nota_saeb_lingua_portuguesa  9374 non-null   float64
 11  nota_saeb_media_padronizada  10771 non-null  float64
 12  ideb                         10760 non-null  float64
 13  projecao   

In [104]:
df_RN.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                  8702
indicador_rendimento            8702
nota_saeb_matematica           12211
nota_saeb_lingua_portuguesa    12211
nota_saeb_media_padronizada    10814
ideb                           10825
projecao                        6913
dtype: int64

In [105]:
for row in df_RN:
    for coluna in colunas:
        df_RN[coluna].fillna((df_RN[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [106]:
df_RN.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
76109,2005,RN,2400109,24034410,estadual,fundamental,finais (6-9),86.3,0.862523,239.17,223.52,4.378481,3.8,
76110,2005,RN,2400109,24034444,estadual,fundamental,finais (6-9),77.907001,0.775693,200.462087,190.250627,4.428807,3.500678,
76111,2005,RN,2400109,24034444,estadual,fundamental,iniciais (1-5),85.8,0.882302,182.19,161.44,4.376364,3.9,
76112,2005,RN,2400109,24034495,municipal,fundamental,iniciais (1-5),77.907001,0.775693,200.462087,190.250627,4.428807,3.500678,
76113,2005,RN,2400109,24034517,municipal,fundamental,iniciais (1-5),77.907001,0.775693,200.462087,190.250627,4.428807,3.500678,


In [107]:
df_RN.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21585 entries, 76109 to 991894
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          21585 non-null  int64  
 1   sigla_uf                     21585 non-null  object 
 2   id_municipio                 21585 non-null  int64  
 3   id_escola                    21585 non-null  int64  
 4   rede                         21585 non-null  object 
 5   ensino                       21585 non-null  object 
 6   anos_escolares               21585 non-null  object 
 7   taxa_aprovacao               21585 non-null  float64
 8   indicador_rendimento         21585 non-null  float64
 9   nota_saeb_matematica         21585 non-null  float64
 10  nota_saeb_lingua_portuguesa  21585 non-null  float64
 11  nota_saeb_media_padronizada  21585 non-null  float64
 12  ideb                         21585 non-null  float64
 13  projecao   

## Mato Grosso (MT)

In [108]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_MT = ideb.loc[ideb['sigla_uf'] == "MT"]
df_MT[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,12661.0,12659.0,9587.0,9587.0,10887.0,10882.0
mean,91.787221,0.916747,217.815153,207.731237,4.895617,4.550983
std,10.481249,0.105517,29.567666,32.589463,0.733381,0.950302
min,0.0,0.146802,127.16,116.05,2.33602,0.6
25%,88.7,0.884544,194.745,180.245,4.377986,3.9
50%,95.5,0.954341,218.44,207.5,4.811,4.5
75%,99.2,0.991736,239.19,232.825,5.358165,5.2
max,100.0,1.0,368.54,342.39,8.087271,8.1


In [109]:
df_MT.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20739 entries, 46305 to 956345
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          20739 non-null  int64  
 1   sigla_uf                     20739 non-null  object 
 2   id_municipio                 20739 non-null  int64  
 3   id_escola                    20739 non-null  int64  
 4   rede                         20739 non-null  object 
 5   ensino                       20739 non-null  object 
 6   anos_escolares               20739 non-null  object 
 7   taxa_aprovacao               12661 non-null  float64
 8   indicador_rendimento         12659 non-null  float64
 9   nota_saeb_matematica         9587 non-null   float64
 10  nota_saeb_lingua_portuguesa  9587 non-null   float64
 11  nota_saeb_media_padronizada  10887 non-null  float64
 12  ideb                         10882 non-null  float64
 13  projecao   

In [110]:
df_MT.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                  8078
indicador_rendimento            8080
nota_saeb_matematica           11152
nota_saeb_lingua_portuguesa    11152
nota_saeb_media_padronizada     9852
ideb                            9857
projecao                        6261
dtype: int64

In [111]:
for row in df_MT:
    for coluna in colunas:
        df_MT[coluna].fillna((df_MT[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [112]:
df_MT.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
46305,2005,MT,5100102,51034000,estadual,fundamental,iniciais (1-5),94.2,0.939721,191.47,186.94,5.017266,4.7,
46306,2005,MT,5100102,51034026,estadual,fundamental,finais (6-9),91.8,0.914931,223.07,203.05,3.768631,3.4,
46307,2005,MT,5100102,51034050,estadual,fundamental,finais (6-9),91.787221,0.916747,217.815153,207.731237,4.895617,4.550983,
46308,2005,MT,5100102,51034050,estadual,fundamental,iniciais (1-5),91.787221,0.916747,217.815153,207.731237,4.895617,4.550983,
46309,2005,MT,5100102,51034077,estadual,fundamental,finais (6-9),91.787221,0.916747,217.815153,207.731237,4.895617,4.550983,


In [113]:
df_MT.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20739 entries, 46305 to 956345
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          20739 non-null  int64  
 1   sigla_uf                     20739 non-null  object 
 2   id_municipio                 20739 non-null  int64  
 3   id_escola                    20739 non-null  int64  
 4   rede                         20739 non-null  object 
 5   ensino                       20739 non-null  object 
 6   anos_escolares               20739 non-null  object 
 7   taxa_aprovacao               20739 non-null  float64
 8   indicador_rendimento         20739 non-null  float64
 9   nota_saeb_matematica         20739 non-null  float64
 10  nota_saeb_lingua_portuguesa  20739 non-null  float64
 11  nota_saeb_media_padronizada  20739 non-null  float64
 12  ideb                         20739 non-null  float64
 13  projecao   

## Alagoas (AL)

In [114]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_AL = ideb.loc[ideb['sigla_uf'] == "AL"]
df_AL[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,10982.0,10976.0,9068.0,9068.0,10319.0,10277.0
mean,80.952304,0.80766,200.293973,189.218411,4.371396,3.595962
std,13.889768,0.141123,34.919665,38.04017,0.833424,1.13971
min,0.0,0.173977,117.26,114.4,2.136833,0.9
25%,72.6,0.72296,171.235,156.19,3.809855,2.8
50%,83.3,0.832709,198.98,185.595,4.234008,3.5
75%,91.7,0.918379,225.03,217.225,4.780639,4.3
max,100.0,1.0,373.22,334.42,9.907636,9.9


In [115]:
df_AL.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19974 entries, 543 to 902829
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          19974 non-null  int64  
 1   sigla_uf                     19974 non-null  object 
 2   id_municipio                 19974 non-null  int64  
 3   id_escola                    19974 non-null  int64  
 4   rede                         19974 non-null  object 
 5   ensino                       19974 non-null  object 
 6   anos_escolares               19974 non-null  object 
 7   taxa_aprovacao               10982 non-null  float64
 8   indicador_rendimento         10976 non-null  float64
 9   nota_saeb_matematica         9068 non-null   float64
 10  nota_saeb_lingua_portuguesa  9068 non-null   float64
 11  nota_saeb_media_padronizada  10319 non-null  float64
 12  ideb                         10277 non-null  float64
 13  projecao     

In [116]:
df_AL.isnull().sum()

ano                                0
sigla_uf                           0
id_municipio                       0
id_escola                          0
rede                               0
ensino                             0
anos_escolares                     0
taxa_aprovacao                  8992
indicador_rendimento            8998
nota_saeb_matematica           10906
nota_saeb_lingua_portuguesa    10906
nota_saeb_media_padronizada     9655
ideb                            9697
projecao                        6279
dtype: int64

In [117]:
for row in df_AL:
    for coluna in colunas:
        df_AL[coluna].fillna((df_AL[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [118]:
df_AL.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
543,2005,AL,2700102,27000095,estadual,fundamental,finais (6-9),68.2,0.673814,220.53,202.46,3.716809,2.5,
544,2005,AL,2700102,27000281,municipal,fundamental,iniciais (1-5),55.5,0.578396,166.19,153.02,3.917922,2.3,
545,2005,AL,2700102,27000400,municipal,fundamental,iniciais (1-5),68.6,0.683689,168.21,146.77,3.842855,2.6,
546,2005,AL,2700102,27000435,municipal,fundamental,finais (6-9),80.952304,0.80766,200.293973,189.218411,4.371396,3.595962,
547,2005,AL,2700102,27000435,municipal,fundamental,iniciais (1-5),80.952304,0.80766,200.293973,189.218411,4.371396,3.595962,


In [119]:
df_AL.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19974 entries, 543 to 902829
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          19974 non-null  int64  
 1   sigla_uf                     19974 non-null  object 
 2   id_municipio                 19974 non-null  int64  
 3   id_escola                    19974 non-null  int64  
 4   rede                         19974 non-null  object 
 5   ensino                       19974 non-null  object 
 6   anos_escolares               19974 non-null  object 
 7   taxa_aprovacao               19974 non-null  float64
 8   indicador_rendimento         19974 non-null  float64
 9   nota_saeb_matematica         19974 non-null  float64
 10  nota_saeb_lingua_portuguesa  19974 non-null  float64
 11  nota_saeb_media_padronizada  19974 non-null  float64
 12  ideb                         19974 non-null  float64
 13  projecao     

## Espírito Santo (ES)

In [120]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_ES = ideb.loc[ideb['sigla_uf'] == "ES"]
df_ES[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,12097.0,12096.0,9706.0,9706.0,11097.0,11096.0
mean,86.750178,0.865518,232.233401,220.096083,5.312806,4.639492
std,9.715819,0.099228,33.266123,35.075394,0.795813,1.031529
min,0.0,0.109673,140.91,129.41,2.405,0.5
25%,81.4,0.810988,207.75,191.61,4.737207,3.9
50%,89.1,0.888858,231.825,219.38,5.257167,4.6
75%,94.1,0.94024,254.4175,244.97,5.843395,5.4
max,100.0,1.0,403.19,363.45,8.391974,8.4


In [121]:
df_ES.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18351 entries, 22627 to 927619
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          18351 non-null  int64  
 1   sigla_uf                     18351 non-null  object 
 2   id_municipio                 18351 non-null  int64  
 3   id_escola                    18351 non-null  int64  
 4   rede                         18351 non-null  object 
 5   ensino                       18351 non-null  object 
 6   anos_escolares               18351 non-null  object 
 7   taxa_aprovacao               12097 non-null  float64
 8   indicador_rendimento         12096 non-null  float64
 9   nota_saeb_matematica         9706 non-null   float64
 10  nota_saeb_lingua_portuguesa  9706 non-null   float64
 11  nota_saeb_media_padronizada  11097 non-null  float64
 12  ideb                         11096 non-null  float64
 13  projecao   

In [122]:
df_ES.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 6254
indicador_rendimento           6255
nota_saeb_matematica           8645
nota_saeb_lingua_portuguesa    8645
nota_saeb_media_padronizada    7254
ideb                           7255
projecao                       5209
dtype: int64

In [123]:
for row in df_ES:
    for coluna in colunas:
        df_ES[coluna].fillna((df_ES[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [124]:
df_ES.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
22627,2005,ES,3200102,32025793,municipal,fundamental,finais (6-9),86.750178,0.865518,232.233401,220.096083,5.312806,4.639492,
22628,2005,ES,3200102,32025793,municipal,fundamental,iniciais (1-5),86.750178,0.865518,232.233401,220.096083,5.312806,4.639492,
22629,2005,ES,3200102,32025858,municipal,fundamental,finais (6-9),86.4,0.870698,277.57,254.16,5.528984,4.8,
22630,2005,ES,3200102,32025858,municipal,fundamental,iniciais (1-5),88.6,0.866511,213.94,207.49,5.819302,5.0,
22631,2005,ES,3200102,32025866,municipal,fundamental,iniciais (1-5),89.9,0.890985,202.07,197.99,5.420212,4.8,


In [125]:
df_ES.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18351 entries, 22627 to 927619
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          18351 non-null  int64  
 1   sigla_uf                     18351 non-null  object 
 2   id_municipio                 18351 non-null  int64  
 3   id_escola                    18351 non-null  int64  
 4   rede                         18351 non-null  object 
 5   ensino                       18351 non-null  object 
 6   anos_escolares               18351 non-null  object 
 7   taxa_aprovacao               18351 non-null  float64
 8   indicador_rendimento         18351 non-null  float64
 9   nota_saeb_matematica         18351 non-null  float64
 10  nota_saeb_lingua_portuguesa  18351 non-null  float64
 11  nota_saeb_media_padronizada  18351 non-null  float64
 12  ideb                         18351 non-null  float64
 13  projecao   

## Sergipe (SE)

In [126]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_SE = ideb.loc[ideb['sigla_uf'] == "SE"]
df_SE[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,8240.0,8237.0,6373.0,6373.0,7300.0,7297.0
mean,75.373507,0.750451,205.609454,194.214623,4.510797,3.427381
std,13.081098,0.133586,31.694792,35.982036,0.629691,0.895791
min,0.0,0.08676,124.61,116.61,2.177833,0.3
25%,67.1,0.665679,179.47,163.6,4.065154,2.8
50%,76.7,0.766028,201.41,187.19,4.457063,3.4
75%,85.2,0.850353,230.8,223.27,4.898941,4.0
max,100.0,1.0,365.54,339.32,8.032638,7.6


In [127]:
df_SE.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14583 entries, 91386 to 1008858
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          14583 non-null  int64  
 1   sigla_uf                     14583 non-null  object 
 2   id_municipio                 14583 non-null  int64  
 3   id_escola                    14583 non-null  int64  
 4   rede                         14583 non-null  object 
 5   ensino                       14583 non-null  object 
 6   anos_escolares               14583 non-null  object 
 7   taxa_aprovacao               8240 non-null   float64
 8   indicador_rendimento         8237 non-null   float64
 9   nota_saeb_matematica         6373 non-null   float64
 10  nota_saeb_lingua_portuguesa  6373 non-null   float64
 11  nota_saeb_media_padronizada  7300 non-null   float64
 12  ideb                         7297 non-null   float64
 13  projecao  

In [128]:
df_SE.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 6343
indicador_rendimento           6346
nota_saeb_matematica           8210
nota_saeb_lingua_portuguesa    8210
nota_saeb_media_padronizada    7283
ideb                           7286
projecao                       5111
dtype: int64

In [129]:
for row in df_SE:
    for coluna in colunas:
        df_SE[coluna].fillna((df_SE[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [130]:
df_SE.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
91386,2005,SE,2800100,28012674,municipal,fundamental,finais (6-9),75.373507,0.750451,205.609454,194.214623,4.510797,3.427381,
91387,2005,SE,2800100,28012674,municipal,fundamental,iniciais (1-5),64.3,0.645764,165.05,149.72,3.836221,2.5,
91388,2005,SE,2800100,28012712,municipal,fundamental,iniciais (1-5),75.373507,0.750451,205.609454,194.214623,4.510797,3.427381,
91389,2005,SE,2800209,28005163,estadual,fundamental,finais (6-9),75.373507,0.750451,205.609454,194.214623,4.510797,3.427381,
91390,2005,SE,2800209,28005198,estadual,fundamental,iniciais (1-5),76.6,0.77447,173.83,168.66,4.348212,3.4,


In [131]:
df_SE.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14583 entries, 91386 to 1008858
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          14583 non-null  int64  
 1   sigla_uf                     14583 non-null  object 
 2   id_municipio                 14583 non-null  int64  
 3   id_escola                    14583 non-null  int64  
 4   rede                         14583 non-null  object 
 5   ensino                       14583 non-null  object 
 6   anos_escolares               14583 non-null  object 
 7   taxa_aprovacao               14583 non-null  float64
 8   indicador_rendimento         14583 non-null  float64
 9   nota_saeb_matematica         14583 non-null  float64
 10  nota_saeb_lingua_portuguesa  14583 non-null  float64
 11  nota_saeb_media_padronizada  14583 non-null  float64
 12  ideb                         14583 non-null  float64
 13  projecao  

## Mato Grosso do Sul (MS)

In [132]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_MS = ideb.loc[ideb['sigla_uf'] == "MS"]
df_MS[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,9777.0,9776.0,7503.0,7503.0,8535.0,8532.0
mean,83.216856,0.830068,227.024212,216.473826,5.271589,4.421249
std,10.796673,0.109808,32.460351,35.391403,0.748358,1.033515
min,0.0,0.340041,133.66,125.62,2.051667,1.5
25%,76.9,0.766818,201.985,187.13,4.746902,3.7
50%,85.3,0.84959,228.3,216.35,5.218685,4.4
75%,91.5,0.914628,251.22,244.605,5.760417,5.2
max,100.0,1.0,421.3,377.79,8.439445,8.3


In [133]:
df_MS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14148 entries, 44842 to 953742
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          14148 non-null  int64  
 1   sigla_uf                     14148 non-null  object 
 2   id_municipio                 14148 non-null  int64  
 3   id_escola                    14148 non-null  int64  
 4   rede                         14148 non-null  object 
 5   ensino                       14148 non-null  object 
 6   anos_escolares               14148 non-null  object 
 7   taxa_aprovacao               9777 non-null   float64
 8   indicador_rendimento         9776 non-null   float64
 9   nota_saeb_matematica         7503 non-null   float64
 10  nota_saeb_lingua_portuguesa  7503 non-null   float64
 11  nota_saeb_media_padronizada  8535 non-null   float64
 12  ideb                         8532 non-null   float64
 13  projecao   

In [134]:
df_MS.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 4371
indicador_rendimento           4372
nota_saeb_matematica           6645
nota_saeb_lingua_portuguesa    6645
nota_saeb_media_padronizada    5613
ideb                           5616
projecao                       3800
dtype: int64

In [135]:
for row in df_MS:
    for coluna in colunas:
        df_MS[coluna].fillna((df_MS[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [136]:
df_MS.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
44842,2005,MS,5000203,50011774,estadual,fundamental,finais (6-9),60.4,0.610465,230.72,213.08,4.063745,2.5,
44843,2005,MS,5000203,50011774,estadual,fundamental,iniciais (1-5),77.1,0.775434,166.57,160.13,4.054661,3.1,
44844,2005,MS,5000203,50011790,municipal,fundamental,finais (6-9),83.216856,0.830068,227.024212,216.473826,5.271589,4.421249,
44845,2005,MS,5000203,50011790,municipal,fundamental,iniciais (1-5),83.216856,0.830068,227.024212,216.473826,5.271589,4.421249,
44846,2005,MS,5000203,50011804,municipal,fundamental,finais (6-9),90.3,0.913042,246.03,233.38,4.657224,4.3,


In [137]:
df_MS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14148 entries, 44842 to 953742
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          14148 non-null  int64  
 1   sigla_uf                     14148 non-null  object 
 2   id_municipio                 14148 non-null  int64  
 3   id_escola                    14148 non-null  int64  
 4   rede                         14148 non-null  object 
 5   ensino                       14148 non-null  object 
 6   anos_escolares               14148 non-null  object 
 7   taxa_aprovacao               14148 non-null  float64
 8   indicador_rendimento         14148 non-null  float64
 9   nota_saeb_matematica         14148 non-null  float64
 10  nota_saeb_lingua_portuguesa  14148 non-null  float64
 11  nota_saeb_media_padronizada  14148 non-null  float64
 12  ideb                         14148 non-null  float64
 13  projecao   

## Tocantins (TO)

In [138]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_TO = ideb.loc[ideb['sigla_uf'] == "TO"]
df_TO[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,7278.0,7277.0,5809.0,5809.0,6569.0,6566.0
mean,89.249217,0.891222,215.130924,205.408983,4.753888,4.26724
std,8.197802,0.083374,33.103762,35.98894,0.779547,0.925777
min,0.0,0.438919,134.47,124.25,1.557167,1.1
25%,84.7,0.844949,187.33,174.75,4.209922,3.6
50%,90.9,0.908393,217.08,205.11,4.63943,4.1
75%,95.5,0.955171,238.02,232.69,5.191,4.8
max,100.0,1.0,378.72,352.07,8.102734,8.1


In [139]:
df_TO.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12486 entries, 106015 to 1027766
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          12486 non-null  int64  
 1   sigla_uf                     12486 non-null  object 
 2   id_municipio                 12486 non-null  int64  
 3   id_escola                    12486 non-null  int64  
 4   rede                         12486 non-null  object 
 5   ensino                       12486 non-null  object 
 6   anos_escolares               12486 non-null  object 
 7   taxa_aprovacao               7278 non-null   float64
 8   indicador_rendimento         7277 non-null   float64
 9   nota_saeb_matematica         5809 non-null   float64
 10  nota_saeb_lingua_portuguesa  5809 non-null   float64
 11  nota_saeb_media_padronizada  6569 non-null   float64
 12  ideb                         6566 non-null   float64
 13  projecao 

In [140]:
df_TO.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 5208
indicador_rendimento           5209
nota_saeb_matematica           6677
nota_saeb_lingua_portuguesa    6677
nota_saeb_media_padronizada    5917
ideb                           5920
projecao                       3993
dtype: int64

In [141]:
for row in df_TO:
    for coluna in colunas:
        df_TO[coluna].fillna((df_TO[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [142]:
df_TO.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
106015,2005,TO,1700251,17010535,estadual,fundamental,finais (6-9),89.249217,0.891222,215.130924,205.408983,4.753888,4.26724,
106016,2005,TO,1700251,17010535,estadual,fundamental,iniciais (1-5),89.249217,0.891222,215.130924,205.408983,4.753888,4.26724,
106017,2005,TO,1700251,17038332,municipal,fundamental,iniciais (1-5),89.249217,0.891222,215.130924,205.408983,4.753888,4.26724,
106018,2005,TO,1700301,17004268,estadual,fundamental,finais (6-9),96.2,0.962917,229.86,226.5,4.272641,4.1,
106019,2005,TO,1700301,17004268,estadual,fundamental,iniciais (1-5),75.8,0.748422,183.88,167.5,4.519059,3.4,


In [143]:
df_TO.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12486 entries, 106015 to 1027766
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          12486 non-null  int64  
 1   sigla_uf                     12486 non-null  object 
 2   id_municipio                 12486 non-null  int64  
 3   id_escola                    12486 non-null  int64  
 4   rede                         12486 non-null  object 
 5   ensino                       12486 non-null  object 
 6   anos_escolares               12486 non-null  object 
 7   taxa_aprovacao               12486 non-null  float64
 8   indicador_rendimento         12486 non-null  float64
 9   nota_saeb_matematica         12486 non-null  float64
 10  nota_saeb_lingua_portuguesa  12486 non-null  float64
 11  nota_saeb_media_padronizada  12486 non-null  float64
 12  ideb                         12486 non-null  float64
 13  projecao 

## Rondônia (RO)

In [144]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_RO = ideb.loc[ideb['sigla_uf'] == "RO"]
df_RO[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,6436.0,6435.0,5147.0,5147.0,5858.0,5856.0
mean,86.343816,0.862094,222.44641,210.827773,5.041949,4.388029
std,10.645021,0.107518,31.605073,35.298214,0.734302,1.00327
min,0.0,0.229768,144.05,125.8,2.531167,1.0
25%,81.0,0.807171,196.355,179.84,4.49975,3.7
50%,88.6,0.884658,224.44,211.27,4.950064,4.3
75%,94.4,0.943878,245.87,238.065,5.489318,5.1
max,100.0,1.0,360.42,344.58,8.242959,8.2


In [145]:
df_RO.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10698 entries, 78392 to 993214
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          10698 non-null  int64  
 1   sigla_uf                     10698 non-null  object 
 2   id_municipio                 10698 non-null  int64  
 3   id_escola                    10698 non-null  int64  
 4   rede                         10698 non-null  object 
 5   ensino                       10698 non-null  object 
 6   anos_escolares               10698 non-null  object 
 7   taxa_aprovacao               6436 non-null   float64
 8   indicador_rendimento         6435 non-null   float64
 9   nota_saeb_matematica         5147 non-null   float64
 10  nota_saeb_lingua_portuguesa  5147 non-null   float64
 11  nota_saeb_media_padronizada  5858 non-null   float64
 12  ideb                         5856 non-null   float64
 13  projecao   

In [146]:
df_RO.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 4262
indicador_rendimento           4263
nota_saeb_matematica           5551
nota_saeb_lingua_portuguesa    5551
nota_saeb_media_padronizada    4840
ideb                           4842
projecao                       3296
dtype: int64

In [147]:
for row in df_RO:
    for coluna in colunas:
        df_RO[coluna].fillna((df_RO[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [148]:
df_RO.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
78392,2005,RO,1100015,11024666,municipal,fundamental,finais (6-9),86.343816,0.862094,222.44641,210.827773,5.041949,4.388029,
78393,2005,RO,1100015,11024666,municipal,fundamental,iniciais (1-5),86.343816,0.862094,222.44641,210.827773,5.041949,4.388029,
78394,2005,RO,1100015,11024682,estadual,fundamental,finais (6-9),81.0,0.798345,260.01,231.66,4.861337,3.9,
78395,2005,RO,1100015,11024682,estadual,fundamental,iniciais (1-5),93.1,0.943084,181.55,164.82,4.425578,4.2,
78396,2005,RO,1100015,11024828,municipal,fundamental,finais (6-9),86.343816,0.862094,222.44641,210.827773,5.041949,4.388029,


In [149]:
df_RO.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10698 entries, 78392 to 993214
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          10698 non-null  int64  
 1   sigla_uf                     10698 non-null  object 
 2   id_municipio                 10698 non-null  int64  
 3   id_escola                    10698 non-null  int64  
 4   rede                         10698 non-null  object 
 5   ensino                       10698 non-null  object 
 6   anos_escolares               10698 non-null  object 
 7   taxa_aprovacao               10698 non-null  float64
 8   indicador_rendimento         10698 non-null  float64
 9   nota_saeb_matematica         10698 non-null  float64
 10  nota_saeb_lingua_portuguesa  10698 non-null  float64
 11  nota_saeb_media_padronizada  10698 non-null  float64
 12  ideb                         10698 non-null  float64
 13  projecao   

## Distrito Federal (DF)

In [150]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_DF = ideb.loc[ideb['sigla_uf'] == "DF"]
df_DF[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,4153.0,4153.0,3271.0,3271.0,3774.0,3772.0
mean,87.057091,0.868731,232.102819,219.576169,5.633365,4.962301
std,9.427406,0.097942,24.488177,26.958402,0.75663,1.054787
min,21.6,0.183884,168.22,144.63,2.696667,1.6
25%,82.8,0.82622,215.04,198.775,5.073506,4.2
50%,89.8,0.896121,230.78,216.35,5.599142,5.0
75%,93.8,0.938642,246.85,236.525,6.158804,5.7
max,100.0,1.0,382.6,348.66,7.993078,7.9


In [151]:
df_DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6105 entries, 21994 to 925344
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          6105 non-null   int64  
 1   sigla_uf                     6105 non-null   object 
 2   id_municipio                 6105 non-null   int64  
 3   id_escola                    6105 non-null   int64  
 4   rede                         6105 non-null   object 
 5   ensino                       6105 non-null   object 
 6   anos_escolares               6105 non-null   object 
 7   taxa_aprovacao               4153 non-null   float64
 8   indicador_rendimento         4153 non-null   float64
 9   nota_saeb_matematica         3271 non-null   float64
 10  nota_saeb_lingua_portuguesa  3271 non-null   float64
 11  nota_saeb_media_padronizada  3774 non-null   float64
 12  ideb                         3772 non-null   float64
 13  projecao    

In [152]:
df_DF.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 1952
indicador_rendimento           1952
nota_saeb_matematica           2834
nota_saeb_lingua_portuguesa    2834
nota_saeb_media_padronizada    2331
ideb                           2333
projecao                       1666
dtype: int64

In [153]:
for row in df_DF:
    for coluna in colunas:
        df_DF[coluna].fillna((df_DF[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [154]:
df_DF.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
21994,2005,DF,5300108,53000846,estadual,fundamental,finais (6-9),80.3,0.796079,270.86,257.07,5.465747,4.4,
21995,2005,DF,5300108,53000854,estadual,fundamental,finais (6-9),56.0,0.570622,234.91,234.52,4.490839,2.6,
21996,2005,DF,5300108,53000854,estadual,fundamental,iniciais (1-5),73.2,0.736268,194.97,183.75,5.026065,3.7,
21997,2005,DF,5300108,53000862,estadual,fundamental,finais (6-9),78.9,0.78082,290.72,257.76,5.808288,4.5,
21998,2005,DF,5300108,53000870,estadual,fundamental,finais (6-9),87.057091,0.868731,232.102819,219.576169,5.633365,4.962301,


In [155]:
df_DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6105 entries, 21994 to 925344
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          6105 non-null   int64  
 1   sigla_uf                     6105 non-null   object 
 2   id_municipio                 6105 non-null   int64  
 3   id_escola                    6105 non-null   int64  
 4   rede                         6105 non-null   object 
 5   ensino                       6105 non-null   object 
 6   anos_escolares               6105 non-null   object 
 7   taxa_aprovacao               6105 non-null   float64
 8   indicador_rendimento         6105 non-null   float64
 9   nota_saeb_matematica         6105 non-null   float64
 10  nota_saeb_lingua_portuguesa  6105 non-null   float64
 11  nota_saeb_media_padronizada  6105 non-null   float64
 12  ideb                         6105 non-null   float64
 13  projecao    

## Acre (AC)

In [156]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_AC = ideb.loc[ideb['sigla_uf'] == "AC"]
df_AC[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,2892.0,2892.0,2255.0,2255.0,2581.0,2575.0
mean,90.291563,0.901286,216.775889,208.683406,5.005993,4.552078
std,8.180328,0.084118,31.275684,33.974762,0.880799,1.049591
min,24.2,0.182239,122.94,118.33,2.462063,0.7
25%,86.5,0.86263,192.065,181.48,4.360368,3.8
50%,92.2,0.920868,219.58,208.91,4.883037,4.5
75%,96.3,0.962923,238.395,233.9,5.590331,5.2
max,100.0,1.0,332.17,327.66,8.428176,8.1


In [157]:
df_AC.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5202 entries, 0 to 900431
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          5202 non-null   int64  
 1   sigla_uf                     5202 non-null   object 
 2   id_municipio                 5202 non-null   int64  
 3   id_escola                    5202 non-null   int64  
 4   rede                         5202 non-null   object 
 5   ensino                       5202 non-null   object 
 6   anos_escolares               5202 non-null   object 
 7   taxa_aprovacao               2892 non-null   float64
 8   indicador_rendimento         2892 non-null   float64
 9   nota_saeb_matematica         2255 non-null   float64
 10  nota_saeb_lingua_portuguesa  2255 non-null   float64
 11  nota_saeb_media_padronizada  2581 non-null   float64
 12  ideb                         2575 non-null   float64
 13  projecao        

In [158]:
df_AC.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 2310
indicador_rendimento           2310
nota_saeb_matematica           2947
nota_saeb_lingua_portuguesa    2947
nota_saeb_media_padronizada    2621
ideb                           2627
projecao                       1941
dtype: int64

In [159]:
for row in df_AC:
    for coluna in colunas:
        df_AC[coluna].fillna((df_AC[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [160]:
df_AC.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
0,2005,AC,1200013,12008966,municipal,fundamental,finais (6-9),90.291563,0.901286,216.775889,208.683406,5.005993,4.552078,
1,2005,AC,1200013,12008966,municipal,fundamental,iniciais (1-5),90.291563,0.901286,216.775889,208.683406,5.005993,4.552078,
2,2005,AC,1200013,12009156,municipal,fundamental,finais (6-9),90.291563,0.901286,216.775889,208.683406,5.005993,4.552078,
3,2005,AC,1200013,12009156,municipal,fundamental,iniciais (1-5),90.291563,0.901286,216.775889,208.683406,5.005993,4.552078,
4,2005,AC,1200013,12009164,estadual,fundamental,finais (6-9),90.291563,0.901286,216.775889,208.683406,5.005993,4.552078,


In [161]:
df_AC.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5202 entries, 0 to 900431
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          5202 non-null   int64  
 1   sigla_uf                     5202 non-null   object 
 2   id_municipio                 5202 non-null   int64  
 3   id_escola                    5202 non-null   int64  
 4   rede                         5202 non-null   object 
 5   ensino                       5202 non-null   object 
 6   anos_escolares               5202 non-null   object 
 7   taxa_aprovacao               5202 non-null   float64
 8   indicador_rendimento         5202 non-null   float64
 9   nota_saeb_matematica         5202 non-null   float64
 10  nota_saeb_lingua_portuguesa  5202 non-null   float64
 11  nota_saeb_media_padronizada  5202 non-null   float64
 12  ideb                         5202 non-null   float64
 13  projecao        

## Amapá (AP)

In [162]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_AP = ideb.loc[ideb['sigla_uf'] == "AP"]
df_AP[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,2700.0,2699.0,2030.0,2030.0,2336.0,2331.0
mean,83.583815,0.832349,196.655143,189.644039,4.354447,3.678378
std,9.640816,0.098127,27.873689,33.299281,0.552516,0.730832
min,0.0,0.242294,123.91,114.01,2.364333,1.2
25%,77.8,0.771742,173.535,162.485,4.002448,3.2
50%,84.4,0.842836,190.38,180.31,4.306417,3.6
75%,90.7,0.905019,221.6625,219.7625,4.677545,4.1
max,100.0,1.0,342.29,329.64,6.647582,6.6


In [163]:
df_AP.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4869 entries, 5103 to 906304
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          4869 non-null   int64  
 1   sigla_uf                     4869 non-null   object 
 2   id_municipio                 4869 non-null   int64  
 3   id_escola                    4869 non-null   int64  
 4   rede                         4869 non-null   object 
 5   ensino                       4869 non-null   object 
 6   anos_escolares               4869 non-null   object 
 7   taxa_aprovacao               2700 non-null   float64
 8   indicador_rendimento         2699 non-null   float64
 9   nota_saeb_matematica         2030 non-null   float64
 10  nota_saeb_lingua_portuguesa  2030 non-null   float64
 11  nota_saeb_media_padronizada  2336 non-null   float64
 12  ideb                         2331 non-null   float64
 13  projecao     

In [164]:
df_AP.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 2169
indicador_rendimento           2170
nota_saeb_matematica           2839
nota_saeb_lingua_portuguesa    2839
nota_saeb_media_padronizada    2533
ideb                           2538
projecao                       1877
dtype: int64

In [165]:
for row in df_AP:
    for coluna in colunas:
        df_AP[coluna].fillna((df_AP[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [166]:
df_AP.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
5103,2005,AP,1600055,16001192,estadual,fundamental,finais (6-9),88.6,0.877333,229.8,221.16,4.183065,3.7,
5104,2005,AP,1600055,16001192,estadual,fundamental,iniciais (1-5),83.583815,0.832349,196.655143,189.644039,4.354447,3.678378,
5105,2005,AP,1600055,16001206,estadual,fundamental,finais (6-9),83.583815,0.832349,196.655143,189.644039,4.354447,3.678378,
5106,2005,AP,1600055,16001206,estadual,fundamental,iniciais (1-5),83.583815,0.832349,196.655143,189.644039,4.354447,3.678378,
5107,2005,AP,1600055,16001222,estadual,fundamental,finais (6-9),83.583815,0.832349,196.655143,189.644039,4.354447,3.678378,


In [167]:
df_AP.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4869 entries, 5103 to 906304
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          4869 non-null   int64  
 1   sigla_uf                     4869 non-null   object 
 2   id_municipio                 4869 non-null   int64  
 3   id_escola                    4869 non-null   int64  
 4   rede                         4869 non-null   object 
 5   ensino                       4869 non-null   object 
 6   anos_escolares               4869 non-null   object 
 7   taxa_aprovacao               4869 non-null   float64
 8   indicador_rendimento         4869 non-null   float64
 9   nota_saeb_matematica         4869 non-null   float64
 10  nota_saeb_lingua_portuguesa  4869 non-null   float64
 11  nota_saeb_media_padronizada  4869 non-null   float64
 12  ideb                         4869 non-null   float64
 13  projecao     

## Roraima (RR)

In [168]:
colunas = ['taxa_aprovacao', 'indicador_rendimento', 'nota_saeb_matematica', 'nota_saeb_lingua_portuguesa', 'nota_saeb_media_padronizada', 'ideb']
df_RR = ideb.loc[ideb['sigla_uf'] == "RR"]
df_RR[colunas].describe()

Unnamed: 0,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
count,1734.0,1733.0,1298.0,1298.0,1469.0,1469.0
mean,87.200634,0.869584,207.671718,198.244245,4.561895,4.027774
std,9.77072,0.099035,31.69582,34.500374,0.811172,0.993318
min,0.0,0.141275,126.45,114.7,2.272333,0.6
25%,81.9,0.817622,181.76,169.4275,4.054509,3.3
50%,89.3,0.892168,209.9,198.175,4.423833,3.9
75%,94.5,0.944739,230.925,224.41,4.942148,4.6
max,100.0,1.0,361.6,348.76,7.768333,7.6


In [169]:
df_RR.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3738 entries, 79515 to 993692
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          3738 non-null   int64  
 1   sigla_uf                     3738 non-null   object 
 2   id_municipio                 3738 non-null   int64  
 3   id_escola                    3738 non-null   int64  
 4   rede                         3738 non-null   object 
 5   ensino                       3738 non-null   object 
 6   anos_escolares               3738 non-null   object 
 7   taxa_aprovacao               1734 non-null   float64
 8   indicador_rendimento         1733 non-null   float64
 9   nota_saeb_matematica         1298 non-null   float64
 10  nota_saeb_lingua_portuguesa  1298 non-null   float64
 11  nota_saeb_media_padronizada  1469 non-null   float64
 12  ideb                         1469 non-null   float64
 13  projecao    

In [170]:
df_RR.isnull().sum()

ano                               0
sigla_uf                          0
id_municipio                      0
id_escola                         0
rede                              0
ensino                            0
anos_escolares                    0
taxa_aprovacao                 2004
indicador_rendimento           2005
nota_saeb_matematica           2440
nota_saeb_lingua_portuguesa    2440
nota_saeb_media_padronizada    2269
ideb                           2269
projecao                       1620
dtype: int64

In [171]:
for row in df_RR:
    for coluna in colunas:
        df_RR[coluna].fillna((df_RR[coluna].mean()), inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


In [172]:
df_RR.head()

Unnamed: 0,ano,sigla_uf,id_municipio,id_escola,rede,ensino,anos_escolares,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb,projecao
79515,2005,RR,1400027,14001098,municipal,fundamental,finais (6-9),87.200634,0.869584,207.671718,198.244245,4.561895,4.027774,
79516,2005,RR,1400027,14001098,municipal,fundamental,iniciais (1-5),87.200634,0.869584,207.671718,198.244245,4.561895,4.027774,
79517,2005,RR,1400027,14001110,estadual,fundamental,finais (6-9),82.8,0.783695,240.22,227.19,4.45722,3.5,
79518,2005,RR,1400027,14001110,estadual,fundamental,iniciais (1-5),75.8,0.734983,168.25,164.11,4.158864,3.1,
79519,2005,RR,1400027,14001411,estadual,fundamental,finais (6-9),87.200634,0.869584,207.671718,198.244245,4.561895,4.027774,


In [173]:
df_RR.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3738 entries, 79515 to 993692
Data columns (total 14 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ano                          3738 non-null   int64  
 1   sigla_uf                     3738 non-null   object 
 2   id_municipio                 3738 non-null   int64  
 3   id_escola                    3738 non-null   int64  
 4   rede                         3738 non-null   object 
 5   ensino                       3738 non-null   object 
 6   anos_escolares               3738 non-null   object 
 7   taxa_aprovacao               3738 non-null   float64
 8   indicador_rendimento         3738 non-null   float64
 9   nota_saeb_matematica         3738 non-null   float64
 10  nota_saeb_lingua_portuguesa  3738 non-null   float64
 11  nota_saeb_media_padronizada  3738 non-null   float64
 12  ideb                         3738 non-null   float64
 13  projecao    

## Reagrupando os DFs

In [174]:
frames = [df_SP, df_MG, df_BA, df_RS, df_MA, df_CE, df_RJ, df_PA, df_PR, df_PE, df_SC, df_GO, df_PI, df_PB, df_AM, df_RN, df_MT, df_AL, df_ES, df_SE, df_MS, df_TO, df_RO, df_DF, df_AC, df_AP, df_RR]
ideb_tratado = pd.concat(frames)
ideb_tratado = ideb_tratado.drop('projecao', axis=1)

In [175]:
ideb_tratado.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1027767 entries, 92937 to 993692
Data columns (total 13 columns):
 #   Column                       Non-Null Count    Dtype  
---  ------                       --------------    -----  
 0   ano                          1027767 non-null  int64  
 1   sigla_uf                     1027767 non-null  object 
 2   id_municipio                 1027767 non-null  int64  
 3   id_escola                    1027767 non-null  int64  
 4   rede                         1027767 non-null  object 
 5   ensino                       1027767 non-null  object 
 6   anos_escolares               1027767 non-null  object 
 7   taxa_aprovacao               1027767 non-null  float64
 8   indicador_rendimento         1027767 non-null  float64
 9   nota_saeb_matematica         1027767 non-null  float64
 10  nota_saeb_lingua_portuguesa  1027767 non-null  float64
 11  nota_saeb_media_padronizada  1027767 non-null  float64
 12  ideb                         1027767 no

## Matriz de correlação (com os dados tratados)

In [176]:
corr = ideb_tratado.corr()
corr.style.background_gradient(cmap='coolwarm')

Unnamed: 0,ano,id_municipio,id_escola,taxa_aprovacao,indicador_rendimento,nota_saeb_matematica,nota_saeb_lingua_portuguesa,nota_saeb_media_padronizada,ideb
ano,1.0,0.011018,0.010951,0.172311,0.174546,0.2109,0.239594,0.228496,0.246405
id_municipio,0.011018,1.0,0.99982,0.199361,0.19665,0.362174,0.305954,0.39262,0.361467
id_escola,0.010951,0.99982,1.0,0.197932,0.195244,0.360019,0.304125,0.390846,0.359568
taxa_aprovacao,0.172311,0.199361,0.197932,1.0,0.988052,0.228417,0.187547,0.485305,0.746434
indicador_rendimento,0.174546,0.19665,0.195244,0.988052,1.0,0.234392,0.194812,0.483197,0.749406
nota_saeb_matematica,0.2109,0.362174,0.360019,0.228417,0.234392,1.0,0.959769,0.587465,0.513861
nota_saeb_lingua_portuguesa,0.239594,0.305954,0.304125,0.187547,0.194812,0.959769,1.0,0.489761,0.425685
nota_saeb_media_padronizada,0.228496,0.39262,0.390846,0.485305,0.483197,0.587465,0.489761,1.0,0.92654
ideb,0.246405,0.361467,0.359568,0.746434,0.749406,0.513861,0.425685,0.92654,1.0


## Planilha do Inse 2019

In [177]:
inse_2019 = pd.read_excel('.\BDs\inse_2019.xlsx')

In [178]:
inse_2019.head(25)

Unnamed: 0,CO_ESCOLA,NOME_ESCOLA,CO_UF,NOME_UF,CO_MUNICIPIO,NOME_MUNICIPIO,ID_AREA,TP_DEPENDENCIA,TP_LOCALIZACAO,QTD_ALUNOS_INSE,INSE_VALOR_ABSOLUTO,INSE_CLASSIFICACAO,PC_NIVEL_1,PC_NIVEL_2,PC_NIVEL_3,PC_NIVEL_4,PC_NIVEL_5,PC_NIVEL_6,PC_NIVEL_7,PC_NIVEL_8
0,11024666,EMEIEF BOA ESPERANCA,11,Rondônia,1100015,Alta Floresta D'Oeste,2,3,2,30,4.54,Nível IV,0.0,16.67,33.33,33.33,13.61,3.06,0.0,0.0
1,11024682,EEEF EURIDICE LOPES PEDROSO,11,Rondônia,1100015,Alta Floresta D'Oeste,2,2,1,147,4.96,Nível IV,0.0,15.67,16.69,22.32,17.46,15.67,10.05,2.14
2,11024828,EMEIEF IZIDORO STEDILE,11,Rondônia,1100015,Alta Floresta D'Oeste,2,3,1,30,4.87,Nível IV,0.0,3.39,26.48,33.37,23.34,10.04,3.39,0.0
3,11024968,EEEMTI JUSCELINO KUBITSCHEK DE OLIVEIRA,11,Rondônia,1100015,Alta Floresta D'Oeste,2,2,1,66,4.78,Nível IV,0.0,10.6,27.12,27.13,18.16,10.58,6.42,0.0
4,11025077,EMEIEF MARIA DE SOUZA PEGO,11,Rondônia,1100015,Alta Floresta D'Oeste,2,3,1,21,4.6,Nível IV,0.0,23.66,32.28,15.04,15.04,4.31,9.67,0.0
5,11025280,EMEIEF PADRE FEIJO,11,Rondônia,1100015,Alta Floresta D'Oeste,2,3,1,19,4.65,Nível IV,0.0,10.0,42.22,16.11,21.67,10.0,0.0,0.0
6,11025310,EMEIEF PEDRO ALEIXO,11,Rondônia,1100015,Alta Floresta D'Oeste,2,3,1,35,4.42,Nível III,3.33,29.53,25.46,19.55,16.97,2.58,2.58,0.0
7,11025352,EMEIEF POTY,11,Rondônia,1100015,Alta Floresta D'Oeste,2,3,2,30,4.61,Nível IV,0.0,20.81,27.74,16.13,28.39,6.94,0.0,0.0
8,11025620,EEEF TANCREDO DE ALMEIDA NEVES,11,Rondônia,1100015,Alta Floresta D'Oeste,2,2,1,120,4.65,Nível IV,0.0,15.73,29.7,23.25,22.69,6.76,0.96,0.92
9,11025638,EEEFM PADRE EZEQUIEL RAMIN,11,Rondônia,1100015,Alta Floresta D'Oeste,2,2,1,335,4.76,Nível IV,0.0,15.26,25.93,22.89,20.29,8.46,7.16,0.0


# Analisando o df do INEP (Indicadores Educacionais)

## CONTINUAR DAQUI

Legenda

*ano* - Ano
*id_municipio* - ID do município (7 dígitos)
*id_escola* - ID da Escola
*localizacao* - Localização (rural ou urbana)
*rede* - Estadual ou Municipal
*atu_ei* - Média de Alunos por Turma - Educação Infantil
*atu_ei_creche* - Média de Alunos por Turma - Educação Infantil - Creche
*atu_ei_pre_escola* - Média de Alunos por Turma - Educação Infantil - Pré escola
*atu_ef* - Média de Alunos por Turma - Ensino Fundamental (Turmas Unificadas inclusas no cálculo do indicador a partir de 2017)
*atu_ef_anos_iniciais* - Média de Alunos por Turma - Ensino Fundamental Anos Iniciais
*atu_ef_anos_finais* - Média de Alunos por Turma - Ensino Fundamental Anos Finais
*atu_ef_1_ano* - Média de Alunos por Turma - Ensino Fundamental 1º ano
*atu_ef_2_ano* - Média de Alunos por Turma - Ensino Fundamental 2º ano
*atu_ef_3_ano* - Média de Alunos por Turma - Ensino Fundamental 3º ano
*atu_ef_4_ano* - Média de Alunos por Turma - Ensino Fundamental 4º ano
*atu_ef_5_ano* - Média de Alunos por Turma - Ensino Fundamental 5º ano
*atu_ef_6_ano* - Média de Alunos por Turma - Ensino Fundamental 6º ano


In [179]:
inep.head()

Unnamed: 0,ano,id_municipio,rede,id_escola,localizacao,atu_ei,atu_ei_creche,atu_ei_pre_escola,atu_ef,atu_ef_anos_iniciais,...,ied_ef_anos_finais_nivel_4,ied_ef_anos_finais_nivel_5,ied_ef_anos_finais_nivel_6,ied_em_nivel_1,ied_em_nivel_2,ied_em_nivel_3,ied_em_nivel_4,ied_em_nivel_5,ied_em_nivel_6,icg_nivel_complexidade_gestao_escola
0,2007,1100015,estadual,11022558,rural,,,,15.0,,...,,,,,,,,,,
1,2007,1100015,estadual,11037938,rural,,,,10.0,,...,,,,,,,,,,
2,2007,1100015,estadual,11037946,rural,,,,9.0,,...,,,,,,,,,,
3,2007,1100015,estadual,11037962,rural,,,,17.0,,...,,,,,,,,,,
4,2007,1100015,estadual,11038047,rural,,,,13.0,,...,,,,,,,,,,
