## Reto 5: Filtros

### 1. Objetivos:
    - Practicar el uso de filtros para la obtención de subconjuntos de datos
    
---
    
### 2. Desarrollo:

#### a) Filtrando por fechas, booleanos y valores numéricos

Vamos a trabajar con el mismo dataset que guardaste del Reto anterior. Este Reto consiste en los siguiente:

Usando filtros, crea 3 subconjuntos de datos:

1. Un subconjunto llamado `df_hazardous` que contenga sólo los records que correspondan a los objetos donde `is_potentially_hazardous_asteroid` sea `True` (o `1`).
2. Un subconjunto llamado `df_greater_than_1000` que contenga sólo los records donde el `estimated_diameter.meters.estimated_diameter_max` sea mayor a 1000 metros.
3. Un subconjunto llamado `df_february` que contenga sólo los records que pertenezcan exactamente al mes de Febrero de 1995. Recuerda que los datos en la columna `epoch_date_close_approach` están en milisegundos.


In [1]:
import pandas as pd

In [2]:
df_reto_5 = pd.read_csv("../Ejemplo-04/objetos_cercanos_4.csv", index_col=0)
df_reto_5.head(3)

Unnamed: 0,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description,id,name,relative_velocity.kilometers_per_minute,proportion_of_max_diameter_to_earth
0,2154652-154652 (2004 EP20),0,483.676488,1081.533507,1995-01-07,1995-01-07 08:33:00,Earth,16.142864,58114.308667,Near Earth asteroid orbits similar to that of ...,2154652,154652 (2004 EP20),968.571811,8.5e-05
1,3153509-(2003 HM),1,96.506147,215.794305,1995-01-07,1995-01-07 15:09:00,Earth,12.351044,44463.757734,Near Earth asteroid orbits which cross the Ear...,3153509,(2003 HM),741.062629,1.7e-05
2,3837644-(2019 AY3),0,46.190746,103.285648,1995-01-07,1995-01-07 21:25:00,Earth,22.478615,80923.015021,Near Earth asteroid orbits similar to that of ...,3837644,(2019 AY3),1348.716917,8e-06


In [3]:
df_reto_5["is_potentially_hazardous_asteroid"]

0      0
1      1
2      0
3      0
4      0
      ..
296    0
297    0
298    0
299    0
300    0
Name: is_potentially_hazardous_asteroid, Length: 301, dtype: int64

In [7]:
condicion = df_reto_5["is_potentially_hazardous_asteroid"] == 1
df_hazardous = df_reto_5[ condicion ]
# imprime la cantidad de registros
df_hazardous["is_potentially_hazardous_asteroid"].sum()

58

In [8]:
df_reto_5["estimated_diameter.meters.estimated_diameter_max"]

0      1081.533507
1       215.794305
2       103.285648
3        49.435619
4       358.129403
          ...     
296    1081.533507
297     986.370281
298     986.370281
299     358.129403
300     941.976306
Name: estimated_diameter.meters.estimated_diameter_max, Length: 301, dtype: float64

In [11]:
condicion = df_reto_5["estimated_diameter.meters.estimated_diameter_max"] > 1000
df_bigger_than_1000 = df_reto_5[ condicion ]
# imprime la cantidad de registros
df_bigger_than_1000["estimated_diameter.meters.estimated_diameter_max"].count()

28

In [12]:
df_reto_5["epoch_date_close_approach"]

0      1995-01-07 08:33:00
1      1995-01-07 15:09:00
2      1995-01-07 21:25:00
3      1995-01-07 02:45:00
4      1995-01-08 12:46:00
              ...         
296    1995-02-21 17:29:00
297    1995-02-21 04:17:00
298    1995-02-21 15:44:00
299    1995-02-21 12:08:00
300    1995-02-21 12:54:00
Name: epoch_date_close_approach, Length: 301, dtype: object

In [14]:
df_reto_5["epoch_date_close_approach"] = \
    pd.to_datetime(df_reto_5["epoch_date_close_approach"])
df_reto_5["epoch_date_close_approach"]

0     1995-01-07 08:33:00
1     1995-01-07 15:09:00
2     1995-01-07 21:25:00
3     1995-01-07 02:45:00
4     1995-01-08 12:46:00
              ...        
296   1995-02-21 17:29:00
297   1995-02-21 04:17:00
298   1995-02-21 15:44:00
299   1995-02-21 12:08:00
300   1995-02-21 12:54:00
Name: epoch_date_close_approach, Length: 301, dtype: datetime64[ns]

In [29]:
condicion = df_reto_5["epoch_date_close_approach"].dt.month == 2
df_february = df_reto_5[ condicion ]
df_february["epoch_date_close_approach"]

156   1995-02-04 09:06:00
157   1995-02-04 00:37:00
158   1995-02-05 10:31:00
159   1995-02-05 18:17:00
160   1995-02-05 12:05:00
              ...        
296   1995-02-21 17:29:00
297   1995-02-21 04:17:00
298   1995-02-21 15:44:00
299   1995-02-21 12:08:00
300   1995-02-21 12:54:00
Name: epoch_date_close_approach, Length: 134, dtype: datetime64[ns]

In [32]:
# imprime la cantidad de registros
df_february.shape

(134, 14)