## Reto 02
En este reto vamos a normalizar campos. Para ello primero limpiaremos el dataset como lo hemos hecho en el reto pasado...

In [4]:
import pandas as pd
import numpy as np
import json

with open("meteoritos.json","rt", encoding='UTF-8') as archivo:
    json_meteoritos = json.load(archivo)
    
df_meteoritos = pd.DataFrame.from_dict(json_meteoritos)

## Alternativamente
##  dataframe=pd.read_json('meteoritos.json', encoding='UTF-8')

dataframe = df_meteoritos[['fall','id','mass','name','reclat','reclong']]

convert_dictionary= {
    'fall':str,
    'id':int,
    'mass':float,
    'name':str,
    'reclat':float,
    'reclong':float
}
df_converted = dataframe.astype(convert_dictionary)
df_converted.dtypes

df_converted = df_converted.dropna(how='any', axis=0)
df_converted.head(5)

Unnamed: 0,fall,id,mass,name,reclat,reclong
0,Fell,1,21.0,Aachen,50.775,6.08333
1,Fell,2,720.0,Aarhus,56.18333,10.23333
2,Fell,6,107000.0,Abee,54.21667,-113.0
3,Fell,10,1914.0,Acapulco,16.88333,-99.9
4,Fell,370,780.0,Achiras,-33.16667,-64.95


Reiniciamos el índice, para que vuelva a tener valores seguidos (eso es útil cuando hemos retirado datos con dropna)

In [5]:
df_converted = df_converted.reset_index()

Creamos la función de normalización, y la utilizamos.

In [3]:
def normalize(values, maxvalue, minvalue):
    norm_values = (values - minvalue) / (maxvalue - minvalue)
    return norm_values

In [8]:
mass_series = df_converted['mass']
max_mass = mass_series.max()
min_mass = mass_series.min()
numpy_mass = mass_series.to_numpy()

print("Valor máximo: "+str(max_mass)+
      ", valor mínimo: "+str(min_mass)+
      ", Num de valores a normalizar:"+str(len(numpy_mass)))

Valor máximo: 23000000.0, valor mínimo: 0.15, Num de valores a normalizar:960


In [9]:
normalized_mass_np = normalize(numpy_mass, max_mass, min_mass)
print("Normalizado!")
print("Valores de "+str(np.min(normalized_mass_np))+" a "+str(np.max(normalized_mass_np)))

Normalizado!
Valores de 0.0 a 1.0


Ahora mismo tenemos valores normalizados en un arreglo de NumPy, hay que agregarlos en nuestro dataframe.

In [10]:
print(normalized_mass_np.shape)
normalized_df = pd.Series(normalized_mass_np,dtype = float, name="Normalized Mass")
df_normalized = df_converted.join(normalized_df)

# Alternativamente
# df_normalized['Normalized Mass']=normalized_df
df_normalized.tail(20)

(960,)


Unnamed: 0,index,fall,id,mass,name,reclat,reclong,Normalized Mass
940,980,Fell,23873,3850.0,Taonan,45.4,122.9,0.000167
941,981,Fell,23884,12000.0,Tatahouine,32.95,10.41667,0.000522
942,982,Fell,23885,2500.0,Tathlith,19.38333,43.73333,0.000109
943,983,Fell,23887,6000.0,Tauk,35.13333,44.45,0.000261
944,984,Fell,23888,21000.0,Tauti,46.71667,23.5,0.000913
945,985,Fell,23897,160000.0,Tenham,-25.73333,142.95,0.006957
946,986,Fell,23898,28500.0,Tennasilm,58.03333,26.95,0.001239
947,987,Fell,23908,342.0,Thal,33.4,70.6,1.5e-05
948,988,Fell,54493,14200.0,Thika,-1.00278,37.15028,0.000617
949,989,Fell,23976,45300.0,Thuathe,-29.33333,27.58333,0.00197
