# Caçador de OVNIs

Em que estado e horário é mais "fácil" de avistar um OVNI nos EUA?

### Importando o que importa

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

import warnings
warnings.simplefilter(action='ignore')

### Lendo o arquivo com todas as ocorrências

In [3]:
df = pd.read_json('https://raw.githubusercontent.com/joaoariedi/presentations/master/Ovnis%20Hunter/ovnis_hunter/ovins_data.json')

### Visualizando uma amostra dos dados

In [4]:
df.head()

Unnamed: 0,date,city,state,country,shape,duration,summary,posted,images
0,2023-05-19 01:49:00,Harrisburg East of I5,OR,,Light,Approx. 3-4 min.,9 lights in straight line,5/19/23,
1,2023-05-18 19:40:00,Kippens,NF,,Circle,5 minutes,Looked out window and saw the sun catching on ...,5/19/23,Yes
2,2023-05-18 13:27:00,Hurricane,UT,,Cigar,10 minutes,Saw what looked like a long cigar shaped objec...,5/19/23,
3,2023-05-17 23:20:00,Sharpsville,IN,,Oval,1 hour,Oval like object over town no brightness,5/19/23,
4,2023-05-17 23:00:00,Tadepalligudem,Andhra Pradesh,,Cube,5 minutes,We are watching the night sky suddenly there i...,5/19/23,Yes


### Renomeando as colunas

In [6]:
df.columns = ['data', 'cidade', 'estado', 'pais', 'forma', 'duração', 'relato', 'postado', 'imagens']
df.head()

Unnamed: 0,data,cidade,estado,pais,forma,duração,relato,postado,imagens
0,2023-05-19 01:49:00,Harrisburg East of I5,OR,,Light,Approx. 3-4 min.,9 lights in straight line,5/19/23,
1,2023-05-18 19:40:00,Kippens,NF,,Circle,5 minutes,Looked out window and saw the sun catching on ...,5/19/23,Yes
2,2023-05-18 13:27:00,Hurricane,UT,,Cigar,10 minutes,Saw what looked like a long cigar shaped objec...,5/19/23,
3,2023-05-17 23:20:00,Sharpsville,IN,,Oval,1 hour,Oval like object over town no brightness,5/19/23,
4,2023-05-17 23:00:00,Tadepalligudem,Andhra Pradesh,,Cube,5 minutes,We are watching the night sky suddenly there i...,5/19/23,Yes


In [7]:
df.tail()

Unnamed: 0,data,cidade,estado,pais,forma,duração,relato,postado,imagens
143280,2020-04-01 10:00:00,Johnscreek,GA,,Circle,10,"Glowing circular disc was spinning clock wise,...",5/15/20,
143281,2020-04-01 09:45:00,Woodbury,MN,,Sphere,1 minutes,I was looking out my window and saw a bright w...,6/25/20,
143282,2020-04-01 09:28:00,Nairobi (Kenya),,,Sphere,15-20 minutes,Shining sphere,6/25/20,
143283,2020-04-01 04:00:00,Anderson,SC,,Light,30+ minutes,2 lights appeared out of nowhere just down and...,6/25/20,
143284,2020-04-01 03:45:00,Petersham,MA,,Fireball,2+ minutes,"Ok, this is NOT an April fools joke. At 3:45 t...",4/9/20,


### Convertendo uma coluna com datas de <code>string</code> para <code>datetime</code>

In [None]:
df['ocorrido'] = pd.to_datetime(df['ocorrido'])

In [None]:
df.head()

### Limpando os relatos com dados faltantes

In [None]:
selector = pd.isnull(df['forma'])
df[selector].head()

In [None]:
df_dropped = df.dropna()

In [None]:
df_dropped.head()

### Agrupando os resultados por estado

In [None]:
ocorrencias_por_estado = df_dropped.groupby('estado')['ocorrido'].count()

In [None]:
ocorrencias_por_estado.sort_values(ascending=False, inplace=True)

### Plotando os 10 estados com mais ocorrências

In [None]:
ocorrencias_por_estado.head(10).plot.bar(rot=0)

### Extraindo um <code>subset</code> com somente com os dados da Califórnia

In [None]:
selector = df_dropped['estado'] == "CA"

In [None]:
california = df_dropped[selector]

In [None]:
california.head()

### Extraindo a hora de cada ocorrência e adicionando em uma nova coluna

In [None]:
california['hora'] = california['ocorrido'].dt.hour

### Plotando um <code>histograma</code> para examinar a distribuição de ocorrências

In [None]:
hora_hist = california['hora'].plot.hist(bins=12, grid=True)
plt.xticks(range(california['hora'].min(), california['hora'].max()+1, 2))
plt.show()

# Desafio

### Quais os 5 formatos de OVNI mais comuns nos relatos?

In [None]:
# CODE

### Qual o horário mais difícil para avistarmos um OVNI em forma de Charuto?

In [None]:
# CODE