## Data Analysis from first year of the UEA's students
> By: Juliany Raiol

This database has informations about students in your first year.

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

In [2]:
dataframe = pd.read_csv('data/ciclo_2013_2014_05_03.csv')
dataframe.head()

Unnamed: 0,ALUNO,LYCEUM.GET_MUNICIPIO_NASCIMENTO_UEA(P.PESSOA),DT_NASC,CURSO,NOME,TIPO_INGRESSO,ANO_INGRESSO,UNIDADE_FISICA,SERIE,DISCIPLINA,NOME_DISCIPLINA,NOTA_FINAL,SITUACAO_HIST,PERC_PRESENCA,FALTAS
0,1,Manaus,23/07/1995,EST17KSN,Análise e Desenvolvimento de Sistemas,SAES,2013,EST,1.0,ESTTPD108,Algoritmos e Programação,5.17,Rep Nota,92,6.0
1,2,Manaus,23/07/1995,EST17KSN,Análise e Desenvolvimento de Sistemas,SAES,2013,EST,1.0,ESTBAS006,Comunicação e Expressão,6.0,Aprovado,1,
2,3,Manaus,23/07/1995,EST17KSN,Análise e Desenvolvimento de Sistemas,SAES,2013,EST,1.0,ESTBAS002,Cálculo I,2.9,Rep Nota,1,0.0
3,4,Manaus,23/07/1995,EST17KSN,Análise e Desenvolvimento de Sistemas,SAES,2013,EST,1.0,ESTBAS035,Empreendedorismo,8.5,Aprovado,1,
4,5,Manaus,23/07/1995,EST17KSN,Análise e Desenvolvimento de Sistemas,SAES,2013,EST,1.0,ESTTPD100,Introdução à Computação,7.7,Aprovado,8667,8.0


## Cleaning and formating dataframe

In [3]:
#rename columns
dataframe.rename(columns=lambda x: x.strip().lower(), inplace=True)
dataframe.rename(columns = {'lyceum.get_municipio_nascimento_uea(p.pessoa)':'municipio_nascimento'}, inplace=True)

In [4]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57075 entries, 0 to 57074
Data columns (total 15 columns):
aluno                   57075 non-null int64
municipio_nascimento    57075 non-null object
dt_nasc                 57075 non-null object
curso                   57075 non-null object
nome                    57075 non-null object
tipo_ingresso           57075 non-null object
ano_ingresso            57075 non-null int64
unidade_fisica          57075 non-null object
serie                   56274 non-null float64
disciplina              57075 non-null object
nome_disciplina         57075 non-null object
nota_final              56976 non-null object
situacao_hist           57075 non-null object
perc_presenca           57054 non-null object
faltas                  38850 non-null object
dtypes: float64(1), int64(2), object(12)
memory usage: 6.5+ MB


In [14]:
dataframe['nota_final'].astype(float)

ValueError: could not convert string to float: 

> Some disciplines has null values, for example, serie, nota_final and faltas. In this case, the professor of discipline don't added this informations at Lyceum. Then, this field stayed have been empty.

In [10]:
#fill null values
dataframe.fillna('-1', inplace=True)

> I changed null values to -1 for don't get confused with the values added by professor

In [10]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57075 entries, 0 to 57074
Data columns (total 15 columns):
aluno                   57075 non-null int64
municipio_nascimento    57075 non-null object
dt_nasc                 57075 non-null object
curso                   57075 non-null object
nome                    57075 non-null object
tipo_ingresso           57075 non-null object
ano_ingresso            57075 non-null int64
unidade_fisica          57075 non-null object
serie                   57075 non-null float64
disciplina              57075 non-null object
nome_disciplina         57075 non-null object
nota_final              57075 non-null object
situacao_hist           57075 non-null object
perc_presenca           57075 non-null object
faltas                  57075 non-null object
dtypes: float64(1), int64(2), object(12)
memory usage: 6.5+ MB


['5.17',
 '6.00',
 '2.9',
 '8.50',
 '7.70',
 '5.83',
 '8.25',
 '2.83',
 '8.80',
 '5.97',
 '7.33',
 '3.73',
 '1.83',
 '0.00',
 '2.50',
 '0.00',
 '0.00',
 '0.00',
 '0.00',
 '0.00',
 '3.00',
 '8.00',
 '4.7',
 '8.50',
 '7.00',
 '6.17',
 '8.25',
 '1.50',
 ' ',
 '1.8',
 '8.60',
 '6.77',
 '3.85',
 '7.33',
 '6.00',
 '0.67',
 ' ',
 '0.00',
 '4.00',
 ' ',
 '6.00',
 ' ',
 '2.25',
 '3.15',
 '1.55',
 ' ',
 '8.00',
 ' ',
 ' ',
 '3.00',
 '0.00',
 '0.00',
 '0.00',
 '1.00',
 '0.00',
 '9.00',
 '8.75',
 '8.2',
 '8.00',
 '8.65',
 '8.17',
 '4.67',
 '8.65',
 '9.00',
 '8.00',
 '8.75',
 '10.00',
 '7.00',
 '10.00',
 '6.00',
 '7.70',
 '8.50',
 '10.00',
 '7.43',
 '8.50',
 '6.83',
 '6.90',
 '9.00',
 '8.10',
 '9.05',
 '9.00',
 '4.50',
 '0.00',
 '10.00',
 '3.33',
 '6.23',
 '8.00',
 '7.80',
 '0.00',
 '0.00',
 '0.00',
 '0.00',
 '8.00',
 '6.67',
 '8.00',
 '9.50',
 '3.50',
 '8.50',
 '4.5',
 '8.00',
 '8.45',
 '3.00',
 '8.50',
 '4.6',
 '9.40',
 '5.67',
 '1.75',
 '5.43',
 '1.70',
 '3.45',
 '0.00',
 '0.00',
 '0.0',
 '0.00'