# Projeto final

### Ironhack - Data Analytics Bootcamp

### Fernanda Harumi Tanaka

***

## Índice:

[Introdução](#Introdução) <br>
[Metodologia](#Metodologia) <br>
> [1. Análise do dataset](#1.-Análise-do-dataset:) <br>
> [2. Mapa astral](#2.-Mapa-astral:) <br>
> [3. Cálculo de distâncias](#3.-Cálculo-de-distâncias:) <br>
>>[3.1 Por similaridade](#3.1-Por-similaridade:) <br>
>>[3.2 Por compatibilidade](#3.2-Por-compatibilidade:) <br>

> [4. Interação com os dados do usuário](#4.-Interação-com-os-dados-do-usuário:)


***

## Introdução:

O principal objetivo deste projeto é identificar quais famosos mais se assemelham e são compatíveis com a personalidade do usuário, de acordo com os seus mapas astrais. <br>
Para tanto, foi realizado webscraping do site https://www.astro.com/astro-databank/Main_Page, que reúne dados astrológicos de acontecimentos marcantes e personalidades da mídia.

***

## Metodologia:

In [1]:
import pandas as pd
import flatlib
import geopy
import pytz
import re
import warnings

In [2]:
from flatlib.datetime import Datetime
from flatlib.geopos import GeoPos
from flatlib.chart import Chart
from geopy.geocoders import Nominatim
from timezonefinder import TimezoneFinder
from pytz import timezone
from datetime import datetime
from sklearn.neighbors import BallTree

In [3]:
warnings.filterwarnings('ignore')

In [4]:
class pessoa():
    """
    Essa classe calcula o mapa astral de uma pessoa. Forneça como input seu nome, data de nascimento, horário, fuso, local de nascimento (ex: "São Paulo, SP").
    """
       
    def __init__(self, nome, data, horario, fuso, local):
        self.nome = nome
        self.data = data
        self.horario = horario
        self.fuso = fuso
        self.local = local
        self.chart = None
        
    def mapa(self):
        geolocator = Nominatim(user_agent="my-application")
        location = geolocator.geocode(self.local)
        mapa_pessoa = Chart(Datetime(self.data, self.horario, self.fuso), GeoPos(location.latitude, location.longitude)).objects
        dicionario = {"nome":self.nome}
        for item in mapa_pessoa:
            dicionario[item.id] = item.sign
        self.chart = dicionario

In [5]:
def drop_word (df, word):
    """
    Essa função elimina as linhas correspondentes à determinada palavra presente na coluna "bio".
    Insira o nome do dataframe (df) e a palavra que deseja buscar (word). 
    Retorna o dataset sem linhas que contenham essa palavra.
    """
    df.drop(df.loc[df['bio'].str.contains(word)].index, axis=0, inplace=True)
    return df

In [6]:
tf = TimezoneFinder()
utc = pytz.utc

def offset(df):
    """
    Essa função retorna o fuso horário de um local, de acordo com o deslocamento UTC, em horas.
    Insira um dataframe (df), que contenha as colunas lat_2 e long_2 (em float).
    Função retirada da documentação do módulo TimezoneFinder "https://github.com/MrMinimal64/timezonefinder"
    """
    
    today = datetime.now()
    try:
        tz_target = timezone(tf.certain_timezone_at(lat=df['lat_2'], lng=df['long_2']))
    
        today_target = tz_target.localize(today)
        today_utc = utc.localize(today)
        offset = today_utc - today_target
    
        offset -= tz_target.dst(today)
        return offset.total_seconds() / 3600

    except:
        pass

In [7]:
def mapa(df):
    """
    Essa função retorna o mapa astral em um dicionário 
    """
    mapa_pessoa = Chart(Datetime(df['data'], df['horario'], df['fuso']), GeoPos(df['lat_2'], df['long_2'])).objects
    dicionario = {}
    for item in mapa_pessoa:
        dicionario[item.id] = item.sign
    return dicionario

In [8]:
def usuario():
    """
    Essa função coleta os dados que o usuário fornece e retorna as 10 personalidades mais semelhantes e as 10 mais compatíveis
    """
    #1. Zerar a base, através de uma cópia da base original
    df_astro_copia = df_astro.copy()
    df_astro_dummies_copia = df_astro_dummies.copy()
    df_astro_compat_dummies_copia = df_astro_compat_dummies.copy()
    df_astro_compat_copia = df_astro_compat.copy()
    
    #2. Coletar os dados
    us_nome = input('Qual o seu nome?').lower()
    
    checker_data = False
    checker_hora = False
    checker_fuso = False
    while checker_data == False:
        us_data = input('Em que dia você nasceu? (AAAA/MM/DD)')
        if re.search('^\d\d\d\d\/(0?[1-9]|1[0-2])\/(0?[1-9]|[12][0-9]|3[01])', us_data):
            checker_data = True
        else:
            print('Valor inválido. Insira novamente.')
    
    while checker_hora == False:
        us_hora = input('Em qual horário você nasceu? (HH:MM)')
        if re.search('(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]', us_hora):
            checker_hora = True
        else:
            print('Valor inválido. Insira novamente.')
            
    while checker_fuso == False:
        us_fuso = input('Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)')
        if re.search('-?(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]', us_fuso):
            checker_fuso = True
        else:
            print('Valor inválido. Insira novamente.')
            
    us_cidade = input('Em qual cidade você nasceu? (Cidade, UF)')
    user = pessoa(us_nome, us_data, us_hora, us_fuso, us_cidade)
    user.mapa()
    dicio = user.chart
    
    #3. Armazenar os dados em um dataframe
    df_user = pd.DataFrame([dicio])
    df_user = df_user[['nome', 'Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter', 'Saturn',
                       'North Node', 'South Node', 'Syzygy', 'Pars Fortuna']]
    df_user_dummies = pd.get_dummies(df_user, columns=['Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter',
                        'Saturn','North Node', 'South Node', 'Syzygy', 'Pars Fortuna'])
    
    #4. Juntar o dataframe do usuário com a base de dados
    df_astro_user = df_user.append(df_astro_copia, sort=False)
    df_astro_user_dummies = df_user_dummies.append(df_astro_dummies_copia, sort=False)
    df_astro_user_dummies.fillna(0, inplace=True)
    df_astro_user_compat_dummies = df_user_dummies.append(df_astro_compat_dummies_copia, sort=False)
    df_astro_user_compat_dummies.fillna(0, inplace=True)
    df_astro_user_compat = df_user.append(df_astro_compat_copia, sort=False)
    
    df_astro_user.reset_index(inplace=True)
    df_astro_user.drop("index", axis=1, inplace=True)
    df_astro_user_dummies.reset_index(inplace=True)
    df_astro_user_dummies.drop("index", axis=1, inplace=True)
    df_astro_user_compat_dummies.reset_index(inplace=True)
    df_astro_user_compat_dummies.drop("index", axis=1, inplace=True)
    df_astro_user_compat.reset_index(inplace=True)
    df_astro_user_compat.drop("index", axis=1, inplace=True)
    
    #5. Cálculos: similaridade
    tree = BallTree(df_astro_user_dummies.drop('nome', axis=1), leaf_size=2)
    dist, ind = tree.query(df_astro_user_dummies.loc[df_astro_user_dummies["nome"]==us_nome].drop('nome', axis=1), k=11) 
    
    ind_list=[]
    for ind1 in ind:
        for i in ind1:
            ind_list.append(i)
    ind_list.remove(df_astro_user_dummies.loc[df_astro_user_dummies["nome"]==us_nome].index)
    
    dist_list=[]
    for dis in dist:
        for d in dis:
            dist_list.append(d)
    dist_list.remove(0)
    
    df_igual=pd.DataFrame()
    df_semelhante=pd.DataFrame()
    for n,d in enumerate(dist_list):
        if d == 0:
            df_igual = df_igual.append(df_astro_user.loc[[ind_list[n]]])
        elif d!=0: 
            df_semelhante = df_semelhante.append(df_astro_user.loc[[ind_list[n]]])
    
    porcentagem = []
    for index, row in df_semelhante.iterrows():
        for index2, row2 in df_astro_user.loc[df_astro_user["nome"]==us_nome].iterrows():
            comparacao = row==row2
            porcentagem.append(round((((comparacao==True).value_counts(sort=False).values[1])/11)*100,2))
            
    #6. Cálculos: compatibilidade
    tree_compat = BallTree(df_astro_user_compat_dummies.drop('nome', axis=1), leaf_size=2)
    dist2, ind2 = tree_compat.query(df_astro_user_compat_dummies.loc[df_astro_user_compat_dummies["nome"]==us_nome].drop('nome', axis=1), k=11) 
    
    ind_list2=[]
    for ind1_2 in ind2:
        for i in ind1_2:
            ind_list2.append(i)
    ind_list2.remove(df_astro_user_compat_dummies.loc[df_astro_user_compat_dummies["nome"]==us_nome].index)
    
    dist_list2=[]
    for dis in dist2:
        for d in dis:
            dist_list2.append(d)
    dist_list2.remove(0)
    
    df_comp = df_astro_user_compat.iloc[ind_list2]
    
    porcentagem2 = []
    for index, row in (df_comp.drop('nome', axis=1)).iterrows():
        for index2, row2 in (df_astro_user.loc[df_astro_user["nome"]==us_nome].drop('nome', axis=1)).iterrows():
            comparacao2 = row==row2
            porcentagem2.append(round((((comparacao==True).value_counts(sort=False).values[1])/11)*100,2))
    porcentagem2.sort(reverse=True)
    
    #7. Outputs
            
    print('-------------------------------------------------------------')
    print('Olá, %s!' %us_nome.capitalize())
    print('\n')
    if df_igual.empty == True:
        print('Você não é idêntico a ninguém!')
    else:
        print('Você é idêntico às seguintes pessoas:')
        print('>>','\n>> '.join([i for i in df_igual['nome']]))
    print('\n')
    print('Você é similar às seguintes pessoas:')
    print('>>','\n>> '.join([i+" - "+str(p)+"%" for i,p in zip(df_semelhante['nome'],porcentagem)]))
    print('-------------------------------------------------------------')
    print('\n')
    print('Você é compatível às seguintes pessoas:')
    print('>>','\n>> '.join([i+" - "+str(p)+"%" for i,p in zip(df_comp['nome'],porcentagem2)]))
    print('-------------------------------------------------------------')

In [9]:
def sou_semelhante(personalidade):
    
    #1. Zerar a base, através de uma cópia da base original
    df_astro_copia = df_astro.copy()
    
    if df_astro_copia.loc[df_astro_copia["nome"].str.contains(personalidade)].empty == True:
        print('Não há dados sobre essa personalidade!')
    else:
    
        #2. Coletar os dados
        us_nome = input('Qual o seu nome?').lower()

        checker_data = False
        checker_hora = False
        checker_fuso = False
        while checker_data == False:
            us_data = input('Em que dia você nasceu? (AAAA/MM/DD)')
            if re.search('^\d\d\d\d\/(0?[1-9]|1[0-2])\/(0?[1-9]|[12][0-9]|3[01])', us_data):
                checker_data = True
            else:
                print('Valor inválido. Insira novamente.')

        while checker_hora == False:
            us_hora = input('Em qual horário você nasceu? (HH:MM)')
            if re.search('(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]', us_hora):
                checker_hora = True
            else:
                print('Valor inválido. Insira novamente.')

        while checker_fuso == False:
            us_fuso = input('Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)')
            if re.search('-?(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]', us_fuso):
                checker_fuso = True
            else:
                print('Valor inválido. Insira novamente.')

        us_cidade = input('Em qual cidade você nasceu? (Cidade, UF)')
        user = pessoa(us_nome, us_data, us_hora, us_fuso, us_cidade)
        user.mapa()
        dicio = user.chart

        #3. Armazenar os dados em um dataframe
        df_user = pd.DataFrame([dicio])
        df_user = df_user[['nome', 'Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter', 'Saturn',
                           'North Node', 'South Node', 'Syzygy', 'Pars Fortuna']]

        #4. Juntar o dataframe do usuário com a base de dados
        df_astro_user = df_user.append(df_astro_copia, sort=False)
        df_astro_user.reset_index(inplace=True)
        df_astro_user.drop("index", axis=1, inplace=True)

        for index, row in df_astro_user.loc[df_astro_user["nome"].str.contains(personalidade)].iterrows():
            for index2, row2 in df_astro_user.loc[df_astro_user["nome"]==us_nome].iterrows():
                comparacao = row==row2
                porcentagem_semelhanca=round(((comparacao==True).values.sum())/11*100,2)

        print('-------------------------------------------------------------')
        print('Olá, %s!' %us_nome.capitalize())
        print('\n')
        print('Você é %s %% similar a %s.' %(porcentagem_semelhanca, df_astro_user.loc[df_astro_user["nome"].str.contains(personalidade)]['nome'].values[0]))
        print('-------------------------------------------------------------')

In [10]:
def sou_compativel(personalidade):
    
    #1. Zerar a base, através de uma cópia da base original
    df_astro_copia = df_astro.copy()
    df_astro_compat_copia = df_astro_compat.copy()
    
    if df_astro_copia.loc[df_astro_copia["nome"].str.contains(personalidade)].empty == True:
        print('Não há dados sobre essa personalidade!')
        
    else:
    #2. Coletar os dados
        us_nome = input('Qual o seu nome?').lower()

        checker_data = False
        checker_hora = False
        checker_fuso = False
        while checker_data == False:
            us_data = input('Em que dia você nasceu? (AAAA/MM/DD)')
            if re.search('^\d\d\d\d\/(0?[1-9]|1[0-2])\/(0?[1-9]|[12][0-9]|3[01])', us_data):
                checker_data = True
            else:
                print('Valor inválido. Insira novamente.')

        while checker_hora == False:
            us_hora = input('Em qual horário você nasceu? (HH:MM)')
            if re.search('(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]', us_hora):
                checker_hora = True
            else:
                print('Valor inválido. Insira novamente.')

        while checker_fuso == False:
            us_fuso = input('Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)')
            if re.search('-?(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]', us_fuso):
                checker_fuso = True
            else:
                print('Valor inválido. Insira novamente.')

        us_cidade = input('Em qual cidade você nasceu? (Cidade, UF)')
        user = pessoa(us_nome, us_data, us_hora, us_fuso, us_cidade)
        user.mapa()
        dicio = user.chart

        #3. Armazenar os dados em um dataframe
        df_user = pd.DataFrame([dicio])
        df_user = df_user[['nome', 'Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter', 'Saturn',
                           'North Node', 'South Node', 'Syzygy', 'Pars Fortuna']]

        #4. Juntar o dataframe do usuário com a base de dados
        df_astro_user = df_user.append(df_astro_copia, sort=False)
        df_astro_user.reset_index(inplace=True)
        df_astro_user.drop("index", axis=1, inplace=True)
        df_astro_user_compat = df_user.append(df_astro_compat_copia, sort=False)
        df_astro_user_compat.reset_index(inplace=True)
        df_astro_user_compat.drop("index", axis=1, inplace=True)

        for index, row in (df_astro_user_compat.loc[df_astro_user_compat["nome"].str.contains(personalidade)].drop('nome', axis=1)).iterrows():
            for index2, row2 in (df_astro_user_compat.loc[df_astro_user_compat["nome"]==us_nome].drop('nome', axis=1)).iterrows():
                comparacao = row==row2
                porcentagem_compat=round((comparacao==True).values.sum()/11*100,2)

        print('-------------------------------------------------------------')
        print('Olá, %s!' %us_nome.capitalize())
        print('\n')
        print('Você é %s %% compatível com %s.' %(porcentagem_compat, df_astro_user.loc[df_astro_user["nome"].str.contains(personalidade)]['nome'].values[0]))
        print('-------------------------------------------------------------')

### 1. Análise do dataset:

In [11]:
# Importando o dataset:
astro = pd.read_csv("astro_databank.csv")
astro.columns = ['nome', 'data', 'horario', 'lat', 'long', 'bio', 'url']

In [12]:
astro.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65417 entries, 0 to 65416
Data columns (total 7 columns):
nome       65417 non-null object
data       65417 non-null object
horario    65417 non-null object
lat        65417 non-null object
long       65417 non-null object
bio        65417 non-null object
url        65417 non-null object
dtypes: object(7)
memory usage: 3.5+ MB


In [13]:
astro.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
3,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
4,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."


"Babonneau, René" é um dado duplicado. Deve-se eliminar esses dados:

In [14]:
astro2 = astro.drop_duplicates()

In [15]:
astro2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 57706 entries, 0 to 58480
Data columns (total 7 columns):
nome       57706 non-null object
data       57706 non-null object
horario    57706 non-null object
lat        57706 non-null object
long       57706 non-null object
bio        57706 non-null object
url        57706 non-null object
dtypes: object(7)
memory usage: 3.5+ MB


In [16]:
astro2.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
3,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
4,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."


In [17]:
len(astro)-len(astro2)

7711

7711 dados duplicados foram eliminados, porém o "Babonneau, René" continuou com duas ocorrências no dataset. Assim como este caso, devem existir outros. Então, eliminou-se pessoas homônimas:

In [18]:
astro3 = astro2.drop_duplicates(subset = "nome")

In [19]:
astro3.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
4,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."
5,"Babilée, Jean",3 February 1923,21:30,48n5102,2e1958,"<p>French dancer, choreographer, and actor, Ba...",https://www.astro.com/astro-databank/Babil%C3%...


In [20]:
astro3.reset_index(inplace=True)
astro3.drop("index", axis=1, inplace=True)

In [21]:
astro3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52548 entries, 0 to 52547
Data columns (total 7 columns):
nome       52548 non-null object
data       52548 non-null object
horario    52548 non-null object
lat        52548 non-null object
long       52548 non-null object
bio        52548 non-null object
url        52548 non-null object
dtypes: object(7)
memory usage: 2.8+ MB


In [22]:
len(astro2)-len(astro3)

5158

Com isso, mais de 5000 dados homônimos foram eliminados. O dataset contém, desta forma, 52548 elementos com uma única ocorrência, sem homônimos, e sem valores nulos.

Antes de filtrar o dataset, foram separados alguns dados interessantes, que podem ser utilizados em etapas futuras do projeto.

In [23]:
#Separando o dataset para dados de acidentes
mask_accident = astro3['nome'].str.contains('Accident')
accident = astro3[mask_accident]
accident.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
303,Accidents: West Fertilizer Company explosion (...,17 April 2013,07:50:38,31n33,97w09,"<p>On April 17, 2013, an ammonium nitrate expl...",https://www.astro.com/astro-databank/Accidents...
304,Accident: wreck Air France flight 447,1 June 2009,05:14,3s5025,32w2439,<p>This plane (Rio-Paris) sank into the Atlant...,https://www.astro.com/astro-databank/Accident:...
306,Accident: great fire of Valparaiso,12 April 2014,16:40,33s02,71w38,"<p>Great wildfire, it began on 12 April 2914 i...",https://www.astro.com/astro-databank/Accident:...
307,Accident: gas explosion Mulhouse,31 August 2014,07:00,47n45,7e20,<p><br>\n</p>,https://www.astro.com/astro-databank/Accident:...
311,Accident: Train Wreck,25 April 1946,13:15,41n47,88w09,"<p>American train accident; the train, called ...",https://www.astro.com/astro-databank/Accident:...


In [24]:
accident.shape

(94, 7)

In [25]:
#Separando dados relativos ao Brasil:
br_mask = astro3['bio'].str.contains('Brazilian')
br = astro3[br_mask]
br.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
123,UFO Sighting 14368,10 February 1969,21:00,21s1218,41w5316,<p>Brazilian UFO sighting of a pear-shaped cra...,https://www.astro.com/astro-databank/UFO_Sight...
124,UFO Sighting 14367,8 February 1969,21:05,16s0456,48w3026,"<p>Brazilian UFO sighting of a ""wheel of fire""...",https://www.astro.com/astro-databank/UFO_Sight...
127,UFO Sighting 14366,6 February 1969,07:30,21s59,47w25,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
128,UFO Sighting 14365,31 January 1969,22:00,16s0456,48w3026,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
132,"Salomão, Waly",3 September 1943,05:00,13s5127,40w0501,"<p>Brazilian poet, songwriter and writer. His ...",https://www.astro.com/astro-databank/Salom%C3%...


In [26]:
br.shape

(627, 7)

In [27]:
#Separando dados relativos a UFO Sightings:
ufo_mask = astro3['nome'].str.contains('UFO Sighting')
ufo = astro3[ufo_mask]
ufo.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
123,UFO Sighting 14368,10 February 1969,21:00,21s1218,41w5316,<p>Brazilian UFO sighting of a pear-shaped cra...,https://www.astro.com/astro-databank/UFO_Sight...
124,UFO Sighting 14367,8 February 1969,21:05,16s0456,48w3026,"<p>Brazilian UFO sighting of a ""wheel of fire""...",https://www.astro.com/astro-databank/UFO_Sight...
127,UFO Sighting 14366,6 February 1969,07:30,21s59,47w25,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
128,UFO Sighting 14365,31 January 1969,22:00,16s0456,48w3026,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
554,UFO Sighting 14363,26 January 1969,21:22,12s0808,38w2509,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...


In [28]:
ufo.shape

(482, 7)

In [29]:
entert = astro3[astro3['nome'].str.contains('Entertainment')]
entert.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
715,Entertainment: Tarzan,1 December 1911,20:00,41n51,87w39,<p>American fictional character created by Edg...,https://www.astro.com/astro-databank/Tarzan
16518,Entertainment: French and Saunders,9 March 1987,21:00,51n30,0w10,<p>British sketch comedy television series wri...,https://www.astro.com/astro-databank/Entertain...
16519,Entertainment: Eurovision Song Contest,24 May 1956,21:00,46n01,8e58,"<p>International TV song competition, the long...",https://www.astro.com/astro-databank/Entertain...
16521,Entertainment: Eight Is Enough,15 March 1977,20:30,40n43,74w00,<p>American television comedy-drama series tha...,https://www.astro.com/astro-databank/Entertain...
16524,Entertainment: Dilwale Dulhania Le Jayenge,20 October 1995,12:00,18n58,72e50,<p>Indian romance film written and directed by...,https://www.astro.com/astro-databank/Entertain...


In [30]:
entert.shape

(144, 7)

In [31]:
nature = astro3[astro3['nome'].str.contains('Nature')]
nature.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
105,Nature:Earthquake California 1812 (a) Wrightwood,8 December 1812,07:00,34n22,117w38,<p>Historical earthquake in California. Estima...,https://www.astro.com/astro-databank/Nature:Ea...
107,Nature:Earthquake California/Mexico: 1979,15 October 1979,23:16:57,32n59,115w32,<p>Californian-mexican earthquake of magnitude...,https://www.astro.com/astro-databank/Nature:Ea...
108,Nature:Earthquake Burma (Myanmar) 1908,12 December 1908,12:55,26n27,96e42,<p>Historical Burma (Myanmar) earthquake. Est...,https://www.astro.com/astro-databank/Nature:Ea...
110,Nature:Earthquake Bulgaria 1928,18 April 1928,19:22,42n09,24e45,<p>Historical Bulgaria earthquake. Estimated ...,https://www.astro.com/astro-databank/Nature:Ea...
111,Nature:Earthquake Bangladesh 1762 (Arakan),22 March 1762,11:00,22n20,91e50,<p>Historical earthquake in Bangladesh (then i...,https://www.astro.com/astro-databank/Nature:Ea...


In [32]:
nature.shape

(623, 7)

In [33]:
plane = astro3[astro3['nome'].str.contains('Plane Crash')]
plane.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
305,Plane Crash: AirAsia Flight 8501 (2014),28 December 2014,05:35,2s08,106e08,<p><br>\n</p>,https://www.astro.com/astro-databank/Accident:...
904,Plane Crash: TWA Flight 800 (1964),23 November 1964,14:15,41n54,12e29,"<p>On November 23, 1964 at 02:09 p.m. the capt...",https://www.astro.com/astro-databank/Accident:...
1611,Plane Crash: Frontier Airlines (1964),12 March 1964,20:50,46n25,105w50,"<p>On March 12, 1964, at 8:50 p.m. Flight 32 o...",https://www.astro.com/astro-databank/Accident:...
1612,Plane Crash: EAL Flight 304 (1964),25 February 1964,02:06,30n17,89w47,<p>Eastern Air Lines Flight 304 was a Douglas ...,https://www.astro.com/astro-databank/Accident:...
1615,Accident: Plane Crash 14268,21 December 1967,16:00,39n44,105w01,"<p>Zip Dobyns, news source unrecorded\n</p>",https://www.astro.com/astro-databank/Accident:...


In [34]:
plane.shape

(176, 7)

In [35]:
animal = astro3[astro3['nome'].str.contains('Animal')]
animal.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
14242,Animal: Ruffian,17 April 1972,21:50,38n13,84w15,"<p>American champion thoroughbred racehorse, w...",https://www.astro.com/astro-databank/Ruffian
46714,Animal: Orkid (killer whale),23 September 1988,16:30,32n43,117w09,"<p>American killer whale, later nicknamed 'The...",https://www.astro.com/astro-databank/Animal:_W...
46717,Animal: Secretariat,30 March 1970,00:10,37n52,77w28,<p>American race horse known as one of the bes...,https://www.astro.com/astro-databank/Animal:_S...
46719,Animal: Rockstardom,17 August 2007,03:16,32s30,150e55,"<p>Australia thoroughbred racehorse, the first...",https://www.astro.com/astro-databank/Animal:_R...
46720,Animal: Sea Pigeon,7 March 1970,04:00,38n03,84w30,"<p>American-bred, British-trained racehorse wh...",https://www.astro.com/astro-databank/Animal:_S...


In [36]:
animal.shape

(23, 7)

In [37]:
hist = astro3[astro3['nome'].str.contains('Historic')]
hist.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
3085,Historic: Internet,29 October 1969,22:30,34n03,118w15,<p><br>\n</p>,https://www.astro.com/astro-databank/Internet
13851,"Historic: Sarajevo, reunification",19 March 1996,06:00,43n52,18e25,<p>Reunification of Sarajevo after the NATO ha...,"https://www.astro.com/astro-databank/Sarajevo,..."
29727,Historic: World Trade Center,4 April 1973,15:00,40n46,73w59,<p>American trade buildings standing out in th...,https://www.astro.com/astro-databank/Historic:...
29729,Historic: World War II,1 September 1939,04:45,54n23,18e40,"<p>Global war that lasted from 1939 to 1945, a...",https://www.astro.com/astro-databank/Historic:...
29730,Historic: World's Fair,30 April 1939,15:12,40n43,74w00,<p>American New York World's Fair opened with ...,https://www.astro.com/astro-databank/Historic:...


In [38]:
hist.shape

(94, 7)

In [39]:
sport = astro3[astro3['nome'].str.contains('Sport')]
sport.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
7429,Sports: Tour de France,1 July 1903,15:16,48n32,2e40,<p>French annual men's multiple stage bicycle ...,https://www.astro.com/astro-databank/Sports:_T...
7430,Sports: Super Bowl 2004,1 February 2004,18:25,29n46,95w22,<p>American sports contest with New England P...,https://www.astro.com/astro-databank/Sports:_S...
7431,Sports: Super Bowl 2002,3 February 2002,17:40,29n57,90w05,<p>American sports contest won by the New Engl...,https://www.astro.com/astro-databank/Sports:_S...
7432,Sports: Super Bowl 2001,28 January 2001,18:25,27n57,82w28,"<p>American major sports event, the annual Sup...",https://www.astro.com/astro-databank/Sports:_S...
7433,Sports: St. Louis Rams,10 September 1937,20:30,41n30,81w42,<p>American sports team that began in Clevelan...,https://www.astro.com/astro-databank/Sports:_S...


In [40]:
sport.shape

(62, 7)

In [41]:
shoot = astro3[astro3['nome'].str.contains('Shooting')]
shoot.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
7370,Shooting Spree: 2015 Murders of Alison and Adam,26 August 2015,06:46,37n11,79w37,<p>Murders of news reporter Alison Parker and ...,https://www.astro.com/astro-databank/Shooting_...
12424,Shooting spree: Port Arthur Tasmania,28 April 1996,11:45,43s09,147e51,<p>The Port Arthur massacre of 28 April 1996 w...,https://www.astro.com/astro-databank/Shooting_...
12425,"Shooting of Michael Brown, Ferguson",9 August 2014,12:01,38n45,90w18,"<p>Black young American, Michael Brown (born 2...",https://www.astro.com/astro-databank/Shooting_...
12426,Shooting spree: Oakland,2 April 2012,10:30,37n5110,122w1209,<p>A gunman shot at people inside Oikos Univer...,https://www.astro.com/astro-databank/Shooting_...
12427,Shooting Spree: Utøya Island,22 July 2011,17:22,60n0125,10e1453,<p>Norwegian massacre and shooting spree. In t...,https://www.astro.com/astro-databank/Shooting_...


In [42]:
shoot.shape

(15, 7)

In [43]:
phenomena = astro3[astro3['nome'].str.contains('Phenomena')]
phenomena.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
16949,Phenomena:Joller and the Haunted House of Stans,8 September 1862,10:36,46n57,8e22,<p><br>\n</p>,"https://www.astro.com/astro-databank/Joller,_M..."
18380,Phenomena: Poltergeist : Seaford case,5 May 1961,18:21,40n40,73w29,<p>American poltergeist series of 67 phenomena...,https://www.astro.com/astro-databank/Poltergei...
19193,Phenomena: Fatima last,13 October 1917,13:30,39n37,8w39,<p>Portuguese phenomena known as the Miracle o...,https://www.astro.com/astro-databank/Phenomena...
19194,Phenomena: Fatima first,13 May 1917,12:15,39n37,8w39,"<p>Portuguese miraculous vision, seen by three...",https://www.astro.com/astro-databank/Phenomena...
19195,Phenomena:The Worksop Disturbances (1883),1 March 1883,23:30,53n18,1w07,<p>English spook in Worksop 1883 in the house ...,https://www.astro.com/astro-databank/Phenomena...


In [44]:
phenomena.shape

(35, 7)

In [45]:
business = astro3[astro3['nome'].str.contains('Business')]
business.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
32350,Business: Google Market Start,19 August 2004,09:30,40n43,74w00,<p>American company Google provides a highly-...,https://www.astro.com/astro-databank/Google_Fi...
50105,Business: eBay Website,3 August 1995,21:00,37n20,121w54,<p>American multinational corporation and e-co...,https://www.astro.com/astro-databank/Business:...
50251,Business: YouTube Website,14 February 2005,21:13:12,37n47,122w25,<p>YouTube is a video-sharing website headquar...,https://www.astro.com/astro-databank/Business:...
50252,Business: Yahoo,15 May 1997,09:30,40n43,74w00,<p>American online service.\n</p>,https://www.astro.com/astro-databank/Business:...
50253,Business: Windows '95,24 August 1995,00:01,47n40,122w07,<p>American Microsoft Windows product long awa...,https://www.astro.com/astro-databank/Business:...


In [46]:
business.shape

(32, 7)

In [47]:
cities = astro3[astro3['nome'].str.contains('Cities')]
cities.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
16236,Cities: Paterson NJ,14 April 1851,12:00,40n55,74w10,"<p>U.S. city. Population 148,645 (1999) -5.9%...",https://www.astro.com/astro-databank/Cities:_P...
40487,Cities: Virginia Beach VA,1 January 1963,00:00,36n51,75w59,<p>U.S. city. Virginia Beach was consolidated...,https://www.astro.com/astro-databank/Cities:_V...
40489,Cities: San Buenaventura CA,2 April 1866,00:00,34n17,119w18,"<p>U.S. city. Population 100,152 (1999) +7.0%...",https://www.astro.com/astro-databank/Cities:_S...
40490,Cities: Rio de Janeiro,1 March 1565 (greg.),06:00,22s5410,43w1227,"<p>Brazilain city, the founding of this metrop...",https://www.astro.com/astro-databank/Cities:_R...
40491,Cities: Newark NJ,11 April 1836,00:00,40n44,74w10,"<p>U.S. city. Population 263,087 (1999) -4.4%...",https://www.astro.com/astro-databank/Cities:_N...


In [48]:
cities.shape

(15, 7)

In [49]:
nation = astro3[astro3['nome'].str.contains('Nation')]
nation.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
2303,"Nation: West Germany (Federal Republic, 1949)",24 May 1949,00:00,50n44,7e05,<p>West Germany (German: Westdeutschland) is t...,https://www.astro.com/astro-databank/Nation:_W...
2305,Nation: Venezuela,5 July 1811,15:00,10n30,66w56,<p>A country on the northern coast of South Am...,https://www.astro.com/astro-databank/Nation:_V...
2306,Nation: USA No.8,1 March 1781,15:19,39n57,75w10,"<p>Date of the Articles of Confederation, cons...",https://www.astro.com/astro-databank/Nation:_U...
2307,Nation: USA No.9,4 July 1776,02:14,39n57,75w10,"<p>Date of the Signing of the Constitution, co...",https://www.astro.com/astro-databank/Nation:_U...
2309,Nation: Vatican City State,11 February 1929,11:05,41n54,12e29,<p><br>\n</p>,https://www.astro.com/astro-databank/Nation:_V...


In [50]:
nation.shape

(58, 7)

Feito essas separações, o dataset foi filtrado, com a intenção de manter personalidades famosas para o público brasileiro.

Primeiramente, eliminou-se os já conhecidos acontecimentos relativos a "Accidents", "UFO Sighting", "Entertainment", "Plane Crash", "Animal", "Historic", "Sport", "Shooting", "Phenomena", "Business", "Cities" e "Nation":

In [51]:
astro4 = astro3[astro3['nome'].str.contains('UFO Sighting') == False][astro3['nome'].str.contains('Accident') == False][astro3['nome'].str.contains('Entertainment') == False][astro3['nome'].str.contains('Nature') == False][astro3['nome'].str.contains('Plane Crash') == False][astro3['nome'].str.contains('Animal') == False][astro3['nome'].str.contains('Historic') == False][astro3['nome'].str.contains('Sport') == False][astro3['nome'].str.contains('Shooting') == False][astro3['nome'].str.contains('Phenomena') == False][astro3['nome'].str.contains('Business') == False][astro3['nome'].str.contains('Cities') == False][astro3['nome'].str.contains('Nation') == False]

In [52]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
3,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."
4,"Babilée, Jean",3 February 1923,21:30,48n5102,2e1958,"<p>French dancer, choreographer, and actor, Ba...",https://www.astro.com/astro-databank/Babil%C3%...


In [53]:
astro4.reset_index(inplace=True)
astro4.drop("index", axis=1, inplace=True)

In [54]:
astro4.shape

(50698, 7)

Em seguida, realizou-se uma limpeza na coluna 'bio'.

In [55]:
astro4['bio'] = astro4['bio'].str.replace('<p>','')
astro4['bio'] = astro4['bio'].str.replace('</p>','')
astro4['bio'] = astro4['bio'].str.replace('\n','')

O próximo passo consistiu em transformar a coluna "data" no formato 'YYYY/MM/DD':

In [56]:
astro4['data'] = pd.to_datetime(astro4['data'], errors='coerce')

In [57]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",1930-03-19,01:30,48n52,2e20,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",1910-05-31,05:30,18n01,66w37,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",1904-06-18,09:30,47n13,1w33,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...
3,"Baboul, Eudoxie",1901-10-01,03:00,5n23,52w57,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E..."
4,"Babilée, Jean",1923-02-03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...


In [58]:
astro4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50698 entries, 0 to 50697
Data columns (total 7 columns):
nome       50698 non-null object
data       49718 non-null datetime64[ns]
horario    50698 non-null object
lat        50698 non-null object
long       50698 non-null object
bio        50698 non-null object
url        50698 non-null object
dtypes: datetime64[ns](1), object(6)
memory usage: 2.7+ MB


In [59]:
astro4 = astro4.dropna()

In [60]:
# Convertendo o formato datetime64 para string:
astro4['data'] = astro4['data'].astype(str) 

In [61]:
astro4.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 49718 entries, 0 to 50697
Data columns (total 7 columns):
nome       49718 non-null object
data       49718 non-null object
horario    49718 non-null object
lat        49718 non-null object
long       49718 non-null object
bio        49718 non-null object
url        49718 non-null object
dtypes: object(7)
memory usage: 3.0+ MB


In [62]:
# Agora que já é uma string, pode-se aplicar métodos para adaptar ao formato desejado:
astro4['data'] = astro4['data'].str.replace('-','/')

In [63]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",1930/03/19,01:30,48n52,2e20,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",1910/05/31,05:30,18n01,66w37,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",1904/06/18,09:30,47n13,1w33,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...
3,"Baboul, Eudoxie",1901/10/01,03:00,5n23,52w57,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E..."
4,"Babilée, Jean",1923/02/03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...


É necessário converter os dados de latitude e longitude para float. <br>
Exemplo: de '23s33' a -23.33 <br>
<br>
Latitude:

In [64]:
# Primeiro, os dados referentes ao sul, que devem ser negativos:
astro4.loc[(astro4['lat'].str.contains("s")),'lat_2'] = astro4['lat'].str.replace('s','.')
astro4['lat_2'] = "-" + astro4['lat_2']

In [65]:
# Depois, os dados referentes ao norte:
astro4.loc[(astro4['lat'].str.contains("n")),'lat_2'] = astro4['lat'].str.replace('n','.')

In [66]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url,lat_2
0,"Bablet, Denis",1930/03/19,01:30,48n52,2e20,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D...",48.52
1,"Babin Cortes, Maria Teresa",1910/05/31,05:30,18n01,66w37,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...,18.01
2,"Babonneau, René",1904/06/18,09:30,47n13,1w33,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...,47.13
3,"Baboul, Eudoxie",1901/10/01,03:00,5n23,52w57,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E...",5.23
4,"Babilée, Jean",1923/02/03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102


In [67]:
astro4['lat_2'] = astro4['lat_2'].astype(float)

In [68]:
astro4.dtypes

nome        object
data        object
horario     object
lat         object
long        object
bio         object
url         object
lat_2      float64
dtype: object

Longitude:

In [69]:
# Primeiro, os dados referentes ao oeste, que devem ser negativos:
astro4.loc[(astro4['long'].str.contains("w")),'long_2'] = astro4['long'].str.replace('w','.')
astro4['long_2'] = "-" + astro4['long_2']

In [70]:
# Depois, os dados referentes ao leste:
astro4.loc[(astro4['long'].str.contains("e")),'long_2'] = astro4['long'].str.replace('e','.')

In [71]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url,lat_2,long_2
0,"Bablet, Denis",1930/03/19,01:30,48n52,2e20,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D...",48.52,2.2
1,"Babin Cortes, Maria Teresa",1910/05/31,05:30,18n01,66w37,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...,18.01,-66.37
2,"Babonneau, René",1904/06/18,09:30,47n13,1w33,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...,47.13,-1.33
3,"Baboul, Eudoxie",1901/10/01,03:00,5n23,52w57,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E...",5.23,-52.57
4,"Babilée, Jean",1923/02/03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958


In [72]:
astro4['long_2'] = astro4['long_2'].astype(float)

In [73]:
astro4.dtypes

nome        object
data        object
horario     object
lat         object
long        object
bio         object
url         object
lat_2      float64
long_2     float64
dtype: object

In [74]:
# Pode-se excluir as colunas originais:
astro4.drop(['lat', 'long'], axis=1, inplace=True)

Em seguida, determinou-se o fuso horário, em deslocamento UTC:

In [75]:
astro4['fuso'] = astro4.apply(offset, axis=1)

In [76]:
astro4.head()

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso
0,"Bablet, Denis",1930/03/19,01:30,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D...",48.52,2.2,1.0
1,"Babin Cortes, Maria Teresa",1910/05/31,05:30,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...,18.01,-66.37,-4.0
2,"Babonneau, René",1904/06/18,09:30,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...,47.13,-1.33,1.0
3,"Baboul, Eudoxie",1901/10/01,03:00,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E...",5.23,-52.57,-3.0
4,"Babilée, Jean",1923/02/03,21:30,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958,1.0


In [77]:
astro4.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 49718 entries, 0 to 50697
Data columns (total 8 columns):
nome       49718 non-null object
data       49718 non-null object
horario    49718 non-null object
bio        49718 non-null object
url        49718 non-null object
lat_2      49718 non-null float64
long_2     49718 non-null float64
fuso       49011 non-null float64
dtypes: float64(3), object(5)
memory usage: 4.7+ MB


In [78]:
astro4 = astro4.dropna()

O dataset já está como o desejado; desta forma, atribuiu-se um nome definitivo para facilitar sua identificação:

In [79]:
df_astro = astro4

In [80]:
df_astro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 49011 entries, 0 to 50697
Data columns (total 8 columns):
nome       49011 non-null object
data       49011 non-null object
horario    49011 non-null object
bio        49011 non-null object
url        49011 non-null object
lat_2      49011 non-null float64
long_2     49011 non-null float64
fuso       49011 non-null float64
dtypes: float64(3), object(5)
memory usage: 3.4+ MB


<div class="alert alert-block alert-info">
<b>Observação:</b> Query de consulta abaixo

In [81]:
df_astro[df_astro['nome'].str.contains('Neymar')]

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso
18941,Neymar,1992/02/05,02:15,Brazilian footballer who plays as a forward fo...,https://www.astro.com/astro-databank/Neymar,-23.31,-46.11,-3.0


***

### 2. Mapa astral:

Calculou-se o mapa astral para todos os elementos do dataset:

In [82]:
df_astro['mapa'] = df_astro.apply(mapa, axis=1)

In [83]:
df_astro.head()

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso,mapa
0,"Bablet, Denis",1930/03/19,01:30,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D...",48.52,2.2,1.0,"{'Sun': 'Pisces', 'Moon': 'Scorpio', 'Mercury'..."
1,"Babin Cortes, Maria Teresa",1910/05/31,05:30,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...,18.01,-66.37,-4.0,"{'Sun': 'Gemini', 'Moon': 'Pisces', 'Mercury':..."
2,"Babonneau, René",1904/06/18,09:30,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...,47.13,-1.33,1.0,"{'Sun': 'Gemini', 'Moon': 'Leo', 'Mercury': 'G..."
3,"Baboul, Eudoxie",1901/10/01,03:00,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E...",5.23,-52.57,-3.0,"{'Sun': 'Libra', 'Moon': 'Taurus', 'Mercury': ..."
4,"Babilée, Jean",1923/02/03,21:30,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958,1.0,"{'Sun': 'Aquarius', 'Moon': 'Virgo', 'Mercury'..."


In [84]:
# Para converter as chaves do dicionário em colunas, com seus respectivos valores:
df_astro = pd.concat([df_astro.drop(['mapa'], axis=1), df_astro['mapa'].apply(pd.Series)], axis=1)

In [85]:
df_astro.head()

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
0,"Bablet, Denis",1930/03/19,01:30,French theorist of the theatre.,"https://www.astro.com/astro-databank/Bablet,_D...",48.52,2.2,1.0,Pisces,Scorpio,Pisces,Aries,Pisces,Gemini,Capricorn,Taurus,Scorpio,Virgo,Aries
1,"Babin Cortes, Maria Teresa",1910/05/31,05:30,"Puerto Rican educator, literary critic, and es...",https://www.astro.com/astro-databank/Babin_Cor...,18.01,-66.37,-4.0,Gemini,Pisces,Gemini,Aries,Cancer,Libra,Taurus,Taurus,Scorpio,Sagittarius,Virgo
2,"Babonneau, René",1904/06/18,09:30,"French legionnaire, awarded the title of 'Comp...",https://www.astro.com/astro-databank/Babonneau...,47.13,-1.33,1.0,Gemini,Leo,Gemini,Gemini,Gemini,Aries,Aquarius,Virgo,Pisces,Gemini,Libra
3,"Baboul, Eudoxie",1901/10/01,03:00,"French supercentenarian, for a while the oldes...","https://www.astro.com/astro-databank/Baboul,_E...",5.23,-52.57,-3.0,Libra,Taurus,Scorpio,Scorpio,Scorpio,Capricorn,Capricorn,Scorpio,Taurus,Aries,Capricorn
4,"Babilée, Jean",1923/02/03,21:30,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958,1.0,Aquarius,Virgo,Aquarius,Sagittarius,Aries,Scorpio,Libra,Virgo,Pisces,Leo,Aquarius


Com o mapa astral calculado para todas as linhas do dataset, eliminou-se as colunas "data", "horario", "bio", "url", "lat_2", "long_2" e "fuso".

In [86]:
df_astro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 49011 entries, 0 to 50697
Data columns (total 19 columns):
nome            49011 non-null object
data            49011 non-null object
horario         49011 non-null object
bio             49011 non-null object
url             49011 non-null object
lat_2           49011 non-null float64
long_2          49011 non-null float64
fuso            49011 non-null float64
Sun             49011 non-null object
Moon            49011 non-null object
Mercury         49011 non-null object
Venus           49011 non-null object
Mars            49011 non-null object
Jupiter         49011 non-null object
Saturn          49011 non-null object
North Node      49011 non-null object
South Node      49011 non-null object
Syzygy          49011 non-null object
Pars Fortuna    49011 non-null object
dtypes: float64(3), object(16)
memory usage: 7.5+ MB


In [87]:
df_astro.drop(["data", "horario", "bio", "url", "lat_2", "long_2", "fuso"], axis=1, inplace=True)

In [88]:
df_astro.head()

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
0,"Bablet, Denis",Pisces,Scorpio,Pisces,Aries,Pisces,Gemini,Capricorn,Taurus,Scorpio,Virgo,Aries
1,"Babin Cortes, Maria Teresa",Gemini,Pisces,Gemini,Aries,Cancer,Libra,Taurus,Taurus,Scorpio,Sagittarius,Virgo
2,"Babonneau, René",Gemini,Leo,Gemini,Gemini,Gemini,Aries,Aquarius,Virgo,Pisces,Gemini,Libra
3,"Baboul, Eudoxie",Libra,Taurus,Scorpio,Scorpio,Scorpio,Capricorn,Capricorn,Scorpio,Taurus,Aries,Capricorn
4,"Babilée, Jean",Aquarius,Virgo,Aquarius,Sagittarius,Aries,Scorpio,Libra,Virgo,Pisces,Leo,Aquarius


In [89]:
df_astro.reset_index(inplace=True)
df_astro.drop("index", axis=1, inplace=True)

In [90]:
df_astro.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49011 entries, 0 to 49010
Data columns (total 12 columns):
nome            49011 non-null object
Sun             49011 non-null object
Moon            49011 non-null object
Mercury         49011 non-null object
Venus           49011 non-null object
Mars            49011 non-null object
Jupiter         49011 non-null object
Saturn          49011 non-null object
North Node      49011 non-null object
South Node      49011 non-null object
Syzygy          49011 non-null object
Pars Fortuna    49011 non-null object
dtypes: object(12)
memory usage: 4.5+ MB


<div class="alert alert-block alert-info">
<b>Observação:</b> Dois indivíduos que possuem mapa astral exatamente igual

In [91]:
df_astro_semnome = df_astro.drop('nome', axis=1)

In [92]:
df_astro_semnome[(df_astro_semnome["Sun"]=="Cancer")&(df_astro_semnome["Moon"]=="Cancer")&(df_astro_semnome["Mercury"]=="Leo")][:2]

Unnamed: 0,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
35,Cancer,Cancer,Leo,Leo,Gemini,Gemini,Leo,Capricorn,Cancer,Capricorn,Taurus
67,Cancer,Cancer,Leo,Leo,Gemini,Gemini,Leo,Capricorn,Cancer,Capricorn,Taurus


***

### 3. Cálculo de distâncias:

#### 3.1 Por similaridade:

Calculou-se as distâncias entre os indivíduos do dataset. <br>
Para isso, primeiramente determinou-se dummies para as colunas categóricas.

In [93]:
df_astro_dummies = pd.get_dummies(df_astro, columns=['Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter',
                                                    'Saturn','North Node', 'South Node', 'Syzygy', 'Pars Fortuna'])

In [94]:
df_astro_dummies.head()

Unnamed: 0,nome,Sun_Aquarius,Sun_Aries,Sun_Cancer,Sun_Capricorn,Sun_Gemini,Sun_Leo,Sun_Libra,Sun_Pisces,Sun_Sagittarius,...,Pars Fortuna_Cancer,Pars Fortuna_Capricorn,Pars Fortuna_Gemini,Pars Fortuna_Leo,Pars Fortuna_Libra,Pars Fortuna_Pisces,Pars Fortuna_Sagittarius,Pars Fortuna_Scorpio,Pars Fortuna_Taurus,Pars Fortuna_Virgo
0,"Bablet, Denis",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,"Babin Cortes, Maria Teresa",0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,"Babonneau, René",0,0,0,0,1,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,"Baboul, Eudoxie",0,0,0,0,0,0,1,0,0,...,0,1,0,0,0,0,0,0,0,0
4,"Babilée, Jean",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Em seguida, utilizou-se o algoritmo BallTree para calcular as distâncias entre os k vizinhos mais próximos em relação a um determinado ponto. 

Como exemplo, verificou-se quais são os 10 indivíduos que possuem o mapa astral que mais se assemelha ao do Neymar.

In [95]:
def top10_semelhantes(nome_do_usuario):
    tree = BallTree(df_astro_dummies.drop('nome', axis=1), leaf_size=2)
    dist, ind = tree.query(df_astro_dummies.loc[df_astro_dummies["nome"]==nome_do_usuario].drop('nome', axis=1), k=11) 
    
    ind_list=[]
    for ind1 in ind:
        for i in ind1:
            ind_list.append(i)
    ind_list.remove(df_astro_dummies.loc[df_astro_dummies["nome"]==nome_do_usuario].index)
    
    dist_list=[]
    for dis in dist:
        for d in dis:
            dist_list.append(d)
    dist_list.remove(0)
    
    df_igual=pd.DataFrame()
    df_semelhante=pd.DataFrame()
    for n,d in enumerate(dist_list):
        if d == 0:
            df_igual = df_igual.append(df_astro.loc[[ind_list[n]]])
        elif d!=0: 
            df_semelhante = df_semelhante.append(df_astro.loc[[ind_list[n]]])
    
    porcentagem = []
    for index, row in df_semelhante.iterrows():
        for index2, row2 in df_astro.loc[df_astro["nome"]==nome_do_usuario].iterrows():
            comparacao = row==row2
            porcentagem.append(round((((comparacao==True).value_counts(sort=False).values[1])/11)*100,2))
            
    print('-------------------------------------------------------------')
    print('Olá, %s!' %nome_do_usuario)
    print('\n')
    print('Você é idêntico às seguintes pessoas:')
    print('>>','\n>> '.join([i for i in df_igual['nome']]))
    print('\n')
    print('Você é similar às seguintes pessoas:')
    print('>>','\n>> '.join([i+" - "+str(p)+"%" for i,p in zip(df_semelhante['nome'],porcentagem)]))
    print('-------------------------------------------------------------')

In [96]:
top10_semelhantes("Neymar")

-------------------------------------------------------------
Olá, Neymar!


Você é idêntico às seguintes pessoas:
>> Benattia, Nabilla


Você é similar às seguintes pessoas:
>> Wagner, Bryan Matthew - 90.91%
>> Efron, Dylan - 90.91%
>> Jain (singer) - 90.91%
>> Santos, Carlos - 81.82%
>> Pino, Andrea - 81.82%
>> Ferrand-Prévot, Pauline - 81.82%
>> Lautner, Taylor - 81.82%
>> Sutton, Chloe - 81.82%
>> Stal, Gary - 81.82%
-------------------------------------------------------------


In [97]:
tree = BallTree(df_astro_dummies.drop('nome', axis=1), leaf_size=2)

In [98]:
dist, ind = tree.query(df_astro_dummies.loc[df_astro_dummies["nome"]=="Neymar"].drop('nome', axis=1), k=11) 

In [99]:
ind_list=[]
for ind1 in ind:
    for i in ind1:
        ind_list.append(i)

ind_list

[41983, 18313, 4219, 2189, 2489, 12646, 17276, 32079, 24444, 44948, 10155]

In [100]:
ind_list.remove(df_astro_dummies.loc[df_astro_dummies["nome"]=="Neymar"].index)

In [101]:
ind_list

[41983, 4219, 2189, 2489, 12646, 17276, 32079, 24444, 44948, 10155]

In [102]:
df_astro_dummies.iloc[ind_list]

Unnamed: 0,nome,Sun_Aquarius,Sun_Aries,Sun_Cancer,Sun_Capricorn,Sun_Gemini,Sun_Leo,Sun_Libra,Sun_Pisces,Sun_Sagittarius,...,Pars Fortuna_Cancer,Pars Fortuna_Capricorn,Pars Fortuna_Gemini,Pars Fortuna_Leo,Pars Fortuna_Libra,Pars Fortuna_Pisces,Pars Fortuna_Sagittarius,Pars Fortuna_Scorpio,Pars Fortuna_Taurus,Pars Fortuna_Virgo
41983,"Benattia, Nabilla",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4219,"Wagner, Bryan Matthew",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2189,"Efron, Dylan",1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2489,Jain (singer),1,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
12646,"Santos, Carlos",1,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
17276,"Pino, Andrea",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
32079,"Ferrand-Prévot, Pauline",1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
24444,"Lautner, Taylor",1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
44948,"Sutton, Chloe",1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
10155,"Stal, Gary",1,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [103]:
dist_list=[]
for dis in dist:
    for d in dis:
        dist_list.append(d)

dist_list

[0.0,
 0.0,
 1.4142135623730951,
 1.4142135623730951,
 1.4142135623730951,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0]

In [104]:
dist_list.remove(0)

In [105]:
dist_list

[0.0,
 1.4142135623730951,
 1.4142135623730951,
 1.4142135623730951,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0]

In [106]:
df_neymar_igual=pd.DataFrame()
df_neymar_semelhante=pd.DataFrame()
for n,d in enumerate(dist_list):
    if d == 0:
        df_neymar_igual = df_neymar_igual.append(df_astro.loc[[ind_list[n]]])
    elif d!=0: 
        df_neymar_semelhante = df_neymar_semelhante.append(df_astro.loc[[ind_list[n]]])

In [107]:
df_neymar_igual

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
41983,"Benattia, Nabilla",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [108]:
df_neymar_semelhante

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
4219,"Wagner, Bryan Matthew",Aquarius,Gemini,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius
2189,"Efron, Dylan",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
2489,Jain (singer),Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer
12646,"Santos, Carlos",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer
17276,"Pino, Andrea",Aquarius,Cancer,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Virgo
32079,"Ferrand-Prévot, Pauline",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
24444,"Lautner, Taylor",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
44948,"Sutton, Chloe",Aquarius,Aquarius,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
10155,"Stal, Gary",Aquarius,Aries,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo


In [109]:
df_astro.loc[df_astro["nome"]=="Neymar"]

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
18313,Neymar,Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [110]:
#calculo de porcentagem de semelhança

for index, row in df_astro.loc[df_astro["nome"].str.contains("Bolsonaro")].iterrows():
        for index2, row2 in df_astro.loc[df_astro["nome"]=="Neymar"].iterrows():
            comparacao = row==row2
            porcentagem_semelhanca=(round((((comparacao==True).value_counts(sort=False).values[1])/11)*100,2))

In [111]:
#calculo de porcentagem de semelhança

for index, row in df_astro.loc[df_astro["nome"].str.contains("Bolsonaro")].iterrows():
        for index2, row2 in df_astro.loc[df_astro["nome"]=="Neymar"].iterrows():
            comparacao = row==row2
            porcentagem_semelhanca=((comparacao==True).values.sum())

In [112]:
porcentagem_semelhanca

2

***

#### 3.2 Por compatibilidade:

A compatibilidade foi determinada por complementaridade. Dessa forma, os signos da base de dados original foram substituídos pelos seus opostos (180 °) no ciclo astrológico.

In [113]:
df_astro.head()

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
0,"Bablet, Denis",Pisces,Scorpio,Pisces,Aries,Pisces,Gemini,Capricorn,Taurus,Scorpio,Virgo,Aries
1,"Babin Cortes, Maria Teresa",Gemini,Pisces,Gemini,Aries,Cancer,Libra,Taurus,Taurus,Scorpio,Sagittarius,Virgo
2,"Babonneau, René",Gemini,Leo,Gemini,Gemini,Gemini,Aries,Aquarius,Virgo,Pisces,Gemini,Libra
3,"Baboul, Eudoxie",Libra,Taurus,Scorpio,Scorpio,Scorpio,Capricorn,Capricorn,Scorpio,Taurus,Aries,Capricorn
4,"Babilée, Jean",Aquarius,Virgo,Aquarius,Sagittarius,Aries,Scorpio,Libra,Virgo,Pisces,Leo,Aquarius


In [114]:
df_astro_compat = df_astro.copy()

In [115]:
#1
df_astro_compat["Sun"] = df_astro_compat["Sun"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#2
df_astro_compat["Moon"] = df_astro_compat["Moon"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#3
df_astro_compat["Mercury"] = df_astro_compat["Mercury"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#4
df_astro_compat["Venus"] = df_astro_compat["Venus"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#5
df_astro_compat["Mars"] = df_astro_compat["Mars"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#6
df_astro_compat["Jupiter"] = df_astro_compat["Jupiter"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#7
df_astro_compat["Saturn"] = df_astro_compat["Saturn"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#8
df_astro_compat["North Node"] = df_astro_compat["North Node"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#9
df_astro_compat["South Node"] = df_astro_compat["South Node"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#10
df_astro_compat["Syzygy"] = df_astro_compat["Syzygy"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

#11
df_astro_compat["Pars Fortuna"] = df_astro_compat["Pars Fortuna"].map({
    'Aries':'Libra','Taurus':'Scorpio','Gemini':'Sagittarius','Cancer':'Capricorn','Leo':'Aquarius','Virgo':'Pisces',
    'Libra':'Aries','Scorpio':'Taurus','Sagittarius':'Gemini','Capricorn':'Cancer','Aquarius':'Leo','Pisces':'Virgo'
    })

In [116]:
df_astro_compat.head()

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
0,"Bablet, Denis",Virgo,Taurus,Virgo,Libra,Virgo,Sagittarius,Cancer,Scorpio,Taurus,Pisces,Libra
1,"Babin Cortes, Maria Teresa",Sagittarius,Virgo,Sagittarius,Libra,Capricorn,Aries,Scorpio,Scorpio,Taurus,Gemini,Pisces
2,"Babonneau, René",Sagittarius,Aquarius,Sagittarius,Sagittarius,Sagittarius,Libra,Leo,Pisces,Virgo,Sagittarius,Aries
3,"Baboul, Eudoxie",Aries,Scorpio,Taurus,Taurus,Taurus,Cancer,Cancer,Taurus,Scorpio,Libra,Cancer
4,"Babilée, Jean",Leo,Pisces,Leo,Gemini,Libra,Taurus,Aries,Pisces,Virgo,Aquarius,Leo


Preparado o dataset, foram geradas dummies.

In [117]:
df_astro_compat_dummies = pd.get_dummies(df_astro_compat, columns=['Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter',
                                                    'Saturn','North Node', 'South Node', 'Syzygy', 'Pars Fortuna'])

In [118]:
#calculo de porcentagem de compatibilidade
porcentagem_compat = []
for index, row in (df_astro.loc[df_astro["nome"].str.contains("Bolsonaro")].drop('nome', axis=1)).iterrows():
        for index2, row2 in (df_astro.loc[df_astro["nome"]=="Neymar"].drop('nome', axis=1)).iterrows():
            comparacao = row==row2
            porcentagem_compat.append((comparacao==True).value_counts(sort=False).values[0]/11*100)

In [119]:
df_astro.loc[df_astro["nome"].str.contains("Bolsonaro")]

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
40604,"Bolsonaro, Jair",Aries,Aquarius,Pisces,Aquarius,Taurus,Cancer,Scorpio,Capricorn,Cancer,Virgo,Gemini


In [120]:
df_astro.loc[df_astro["nome"]=="Neymar"]

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
18313,Neymar,Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [121]:
porcentagem_compat

[81.81818181818183]

Esse dataset de compatibilidade não permite que seja feito o teste para uma personalidade que já está nele (como o Neymar, por exemplo), pois toda a base foi modificada. Sendo assim, a compatibilidade deve ser estabelecida entre os dados do usuário e a base original.

***

### 4. Interação com os dados do usuário:

In [129]:
usuario()

Qual o seu nome?Laís
Em que dia você nasceu? (AAAA/MM/DD)1992/06/20
Em qual horário você nasceu? (HH:MM)01:45
Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)-03:00
Em qual cidade você nasceu? (Cidade, UF)São Paulo, SP
-------------------------------------------------------------
Olá, Laís!


Você não é idêntico a ninguém!


Você é similar às seguintes pessoas:
>> Karem, Brennan - 81.82%
>> Williams, Lisa - 63.64%
>> Tetzner, Hans - 63.64%
>> Richards, David - 63.64%
>> Szeemann, Harald - 63.64%
>> Wright, Nick - 63.64%
>> Henry, Earl - 63.64%
>> Browne, Harry - 63.64%
>> Princess Nokia - 63.64%
>> Perotti Ronzoni, Jose Luis - 63.64%
-------------------------------------------------------------


Você é compatível às seguintes pessoas:
>> Dessange, Jacques - 63.64%
>> King, Jonathan - 63.64%
>> Keller, Philipp - 63.64%
>> Antoon, A.J. - 63.64%
>> Reitano, Mino - 63.64%
>> Bjørnson, Bjørnstjerne - 63.64%
>> McGarrigle, Anna - 63.64%
>> Gentil, Didier - 63.64%
>

In [123]:
sou_semelhante("Bolsonaro")

Qual o seu nome?fer
Em que dia você nasceu? (AAAA/MM/DD)1992/08/15
Em qual horário você nasceu? (HH:MM)13:30
Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)-03:00
Em qual cidade você nasceu? (Cidade, UF)São Paulo, SP
-------------------------------------------------------------
Olá, Fer!


Você é 0.0 % similar a Bolsonaro, Jair.
-------------------------------------------------------------


In [134]:
sou_semelhante("Silva, Maisa")

Qual o seu nome?Laís
Em que dia você nasceu? (AAAA/MM/DD)1992/06/20
Em qual horário você nasceu? (HH:MM)01:45
Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)-03:00
Em qual cidade você nasceu? (Cidade, UF)São Paulo, SP
-------------------------------------------------------------
Olá, Laís!


Você é 18.18 % similar a Silva, Maisa.
-------------------------------------------------------------


In [125]:
sou_compativel("Neymar")

Qual o seu nome?fer
Em que dia você nasceu? (AAAA/MM/DD)1992/08/15
Em qual horário você nasceu? (HH:MM)13:30
Em qual fuso horário (UTC)? (-03:00 para paulistas, fluminenses e baianos)-03:00
Em qual cidade você nasceu? (Cidade, UF)São Paulo, SP
-------------------------------------------------------------
Olá, Fer!


Você é 18.18 % compatível com Neymar.
-------------------------------------------------------------


<div class="alert alert-block alert-info">
<b>Observação:</b> Query de consulta abaixo

In [138]:
df_astro[df_astro['nome'].str.contains('Neymar')]

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
18313,Neymar,Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [135]:
astro4[astro4['bio'].str.contains('Brazil')]

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso,mapa
111,"Salomão, Waly",1943/09/03,05:00,"Brazilian poet, songwriter and writer. His fir...",https://www.astro.com/astro-databank/Salom%C3%...,-13.5127,-40.0501,-3.0,"{'Sun': 'Virgo', 'Moon': 'Libra', 'Mercury': '..."
169,"Taviani, Isabella",1968/10/08,04:00,"Brazilian singer, guitarist and composer.","https://www.astro.com/astro-databank/Taviani,_...",-22.5300,-43.0613,-3.0,"{'Sun': 'Libra', 'Moon': 'Taurus', 'Mercury': ..."
373,"Januária, Princess of Brazil",1822/03/11,15:00,Brazilian princess and Portuguese infanta (pri...,https://www.astro.com/astro-databank/Janu%C3%A...,-22.5300,-43.0613,-3.0,"{'Sun': 'Pisces', 'Moon': 'Scorpio', 'Mercury'..."
437,"Oliveira, Régis",1944/09/19,17:20,"Brazilian appeals court judge pensioner, vice-...","https://www.astro.com/astro-databank/Oliveira,...",-20.4621,-49.4251,-3.0,"{'Sun': 'Virgo', 'Moon': 'Libra', 'Mercury': '..."
443,"Oliveira, Flávia",1969/08/02,23:50,"Brazilian journalist and commentator, currentl...","https://www.astro.com/astro-databank/Oliveira,...",-22.5300,-43.0613,-3.0,"{'Sun': 'Leo', 'Moon': 'Aries', 'Mercury': 'Le..."
475,"Salinas Fuller, Teofilo",1920/05/15,04:00,"Peruvian sports official, one of the main lead...",https://www.astro.com/astro-databank/Salinas_F...,-12.0300,-77.0300,-5.0,"{'Sun': 'Taurus', 'Moon': 'Aries', 'Mercury': ..."
479,"Saliba, Adonis",1952/09/01,11:44,"Brazilian university professor, metallurgical ...","https://www.astro.com/astro-databank/Saliba,_A...",-19.4901,-43.5723,-3.0,"{'Sun': 'Virgo', 'Moon': 'Aquarius', 'Mercury'..."
877,"Kataguiri, Kim",1996/01/28,13:00,Brazilian political activist. He is one of the...,https://www.astro.com/astro-databank/Kataguiri...,-23.1203,-47.1713,-3.0,"{'Sun': 'Aquarius', 'Moon': 'Taurus', 'Mercury..."
1476,"Farias, Rodrigo A. C.",1945/02/23,21:00,Brazilian engineer and astrologer.,"https://www.astro.com/astro-databank/Farias,_R...",-9.3957,-35.4407,-3.0,"{'Sun': 'Pisces', 'Moon': 'Leo', 'Mercury': 'P..."
1477,"Farias, P.C.",1945/09/20,05:00,"Brazilian economist, President Collor's adviso...","https://www.astro.com/astro-databank/Farias,_P.C.",-9.1418,-35.2936,-3.0,"{'Sun': 'Virgo', 'Moon': 'Pisces', 'Mercury': ..."
