# Projeto final

### Ironhack - Data Analytics Bootcamp

### Fernanda Harumi Tanaka

***

## Índice:

[Introdução](#Introdução) <br>
[Metodologia](#Metodologia) <br>
> [1. Análise do dataset](#1.-Análise-do-dataset:) <br>
> [2. Mapa astral](#2.-Mapa-astral:) <br>
> [3. Cálculo de distâncias](#3.-Cálculo-de-distâncias:) <br>
>>[3.1 Por similaridade](#3.1-Por-similaridade:) <br>

> [4. Interação com os dados do usuário](#4.-Interação-com-os-dados-do-usuário:)


***

## Introdução:

O principal objetivo deste projeto é identificar quais famosos mais se assemelham e são compatíveis com a personalidade do usuário, de acordo com os seus mapas astrais. <br>
Para tanto, foi realizado webscraping do site _https://www.astro.com/astro-databank/Main_Page_, que reúne dados astrológicos de acontecimentos marcantes e personalidades da mídia.

***

## Metodologia:

In [1]:
import pandas as pd
import flatlib
import geopy
import pyaztro
import pytz

In [2]:
from flatlib.datetime import Datetime
from flatlib.geopos import GeoPos
from flatlib.chart import Chart
from geopy.geocoders import Nominatim
from timezonefinder import TimezoneFinder
from pytz import timezone
from datetime import datetime
from sklearn.neighbors import BallTree

In [3]:
class pessoa():
    """
    Essa classe calcula o mapa astral de uma pessoa. Forneça como input seu nome, data de nascimento, horário, fuso, local de nascimento (ex: "São Paulo, SP").
    """
       
    def __init__(self, nome, data, horario, fuso, local):
        self.nome = nome
        self.data = data
        self.horario = horario
        self.fuso = fuso
        self.local = local
        self.chart = None
        
    def mapa(self):
        geolocator = Nominatim(user_agent="my-application")
        location = geolocator.geocode(self.local)
        mapa_pessoa = Chart(Datetime(self.data, self.horario, self.fuso), GeoPos(location.latitude, location.longitude)).objects
        dicionario = {"nome":self.nome}
        for item in mapa_pessoa:
            dicionario[item.id] = item.sign
        self.chart = dicionario

In [4]:
def drop_word (df, word):
    """
    Essa função elimina as linhas correspondentes à determinada palavra presente na coluna "bio".
    Insira o nome do dataframe (df) e a palavra que deseja buscar (word). 
    Retorna o dataset sem linhas que contenham essa palavra.
    """
    df.drop(df.loc[df['bio'].str.contains(word)].index, axis=0, inplace=True)
    return df

In [5]:
tf = TimezoneFinder()
utc = pytz.utc

def offset(df):
    """
    Essa função retorna o fuso horário de um local, de acordo com o deslocamento UTC, em horas.
    Insira um dataframe (df), que contenha as colunas lat_2 e long_2 (em float).
    Função retirada da documentação do módulo TimezoneFinder "https://github.com/MrMinimal64/timezonefinder"
    """
    
    today = datetime.now()
    try:
        tz_target = timezone(tf.certain_timezone_at(lat=df['lat_2'], lng=df['long_2']))
    
        today_target = tz_target.localize(today)
        today_utc = utc.localize(today)
        offset = today_utc - today_target
    
        offset -= tz_target.dst(today)
        return offset.total_seconds() / 3600

    except:
        pass

In [6]:
def mapa(df):
    """
    Essa função retorna o mapa astral em um dicionário 
    """
    mapa_pessoa = Chart(Datetime(df['data'], df['horario'], df['fuso']), GeoPos(df['lat_2'], df['long_2'])).objects
    dicionario = {}
    for item in mapa_pessoa:
        dicionario[item.id] = item.sign
    return dicionario

### 1. Análise do dataset:

In [7]:
# Importando o dataset:
astro = pd.read_csv("astro_databank.csv")
astro.columns = ['nome', 'data', 'horario', 'lat', 'long', 'bio', 'url']

In [8]:
astro.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65417 entries, 0 to 65416
Data columns (total 7 columns):
nome       65417 non-null object
data       65417 non-null object
horario    65417 non-null object
lat        65417 non-null object
long       65417 non-null object
bio        65417 non-null object
url        65417 non-null object
dtypes: object(7)
memory usage: 3.5+ MB


In [9]:
astro.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
3,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
4,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."


"Babonneau, René" é um dado duplicado. Deve-se eliminar esses dados:

In [10]:
astro2 = astro.drop_duplicates()

In [11]:
astro2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 57706 entries, 0 to 58480
Data columns (total 7 columns):
nome       57706 non-null object
data       57706 non-null object
horario    57706 non-null object
lat        57706 non-null object
long       57706 non-null object
bio        57706 non-null object
url        57706 non-null object
dtypes: object(7)
memory usage: 3.5+ MB


In [12]:
astro2.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
3,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
4,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."


In [13]:
len(astro)-len(astro2)

7711

7711 dados duplicados foram eliminados, porém o "Babonneau, René" continuou com duas ocorrências no dataset. Assim como este caso, devem existir outros. Então, eliminou-se pessoas homônimas:

In [14]:
astro3 = astro2.drop_duplicates(subset = "nome")

In [15]:
astro3.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
4,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."
5,"Babilée, Jean",3 February 1923,21:30,48n5102,2e1958,"<p>French dancer, choreographer, and actor, Ba...",https://www.astro.com/astro-databank/Babil%C3%...


In [16]:
astro3.reset_index(inplace=True)
astro3.drop("index", axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [17]:
astro3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52548 entries, 0 to 52547
Data columns (total 7 columns):
nome       52548 non-null object
data       52548 non-null object
horario    52548 non-null object
lat        52548 non-null object
long       52548 non-null object
bio        52548 non-null object
url        52548 non-null object
dtypes: object(7)
memory usage: 2.8+ MB


In [18]:
len(astro2)-len(astro3)

5158

Com isso, mais de 5000 dados homônimos foram eliminados. O dataset contém, desta forma, 52548 elementos com uma única ocorrência, sem homônimos, e sem valores nulos.

Antes de filtrar o dataset, foram separados alguns dados interessantes, que podem ser utilizados em etapas futuras do projeto.

In [19]:
#Separando o dataset para dados de acidentes
mask_accident = astro3['nome'].str.contains('Accident')
accident = astro3[mask_accident]
accident.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
303,Accidents: West Fertilizer Company explosion (...,17 April 2013,07:50:38,31n33,97w09,"<p>On April 17, 2013, an ammonium nitrate expl...",https://www.astro.com/astro-databank/Accidents...
304,Accident: wreck Air France flight 447,1 June 2009,05:14,3s5025,32w2439,<p>This plane (Rio-Paris) sank into the Atlant...,https://www.astro.com/astro-databank/Accident:...
306,Accident: great fire of Valparaiso,12 April 2014,16:40,33s02,71w38,"<p>Great wildfire, it began on 12 April 2914 i...",https://www.astro.com/astro-databank/Accident:...
307,Accident: gas explosion Mulhouse,31 August 2014,07:00,47n45,7e20,<p><br>\n</p>,https://www.astro.com/astro-databank/Accident:...
311,Accident: Train Wreck,25 April 1946,13:15,41n47,88w09,"<p>American train accident; the train, called ...",https://www.astro.com/astro-databank/Accident:...


In [20]:
accident.shape

(94, 7)

In [21]:
#Separando dados relativos ao Brasil:
br_mask = astro3['bio'].str.contains('Brazilian')
br = astro3[br_mask]
br.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
123,UFO Sighting 14368,10 February 1969,21:00,21s1218,41w5316,<p>Brazilian UFO sighting of a pear-shaped cra...,https://www.astro.com/astro-databank/UFO_Sight...
124,UFO Sighting 14367,8 February 1969,21:05,16s0456,48w3026,"<p>Brazilian UFO sighting of a ""wheel of fire""...",https://www.astro.com/astro-databank/UFO_Sight...
127,UFO Sighting 14366,6 February 1969,07:30,21s59,47w25,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
128,UFO Sighting 14365,31 January 1969,22:00,16s0456,48w3026,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
132,"Salomão, Waly",3 September 1943,05:00,13s5127,40w0501,"<p>Brazilian poet, songwriter and writer. His ...",https://www.astro.com/astro-databank/Salom%C3%...


In [22]:
br.shape

(627, 7)

In [23]:
#Separando dados relativos a UFO Sightings:
ufo_mask = astro3['nome'].str.contains('UFO Sighting')
ufo = astro3[ufo_mask]
ufo.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
123,UFO Sighting 14368,10 February 1969,21:00,21s1218,41w5316,<p>Brazilian UFO sighting of a pear-shaped cra...,https://www.astro.com/astro-databank/UFO_Sight...
124,UFO Sighting 14367,8 February 1969,21:05,16s0456,48w3026,"<p>Brazilian UFO sighting of a ""wheel of fire""...",https://www.astro.com/astro-databank/UFO_Sight...
127,UFO Sighting 14366,6 February 1969,07:30,21s59,47w25,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
128,UFO Sighting 14365,31 January 1969,22:00,16s0456,48w3026,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...
554,UFO Sighting 14363,26 January 1969,21:22,12s0808,38w2509,"<p>Brazilian UFO sighting, an encounter of the...",https://www.astro.com/astro-databank/UFO_Sight...


In [24]:
ufo.shape

(482, 7)

In [25]:
entert = astro3[astro3['nome'].str.contains('Entertainment')]
entert.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
715,Entertainment: Tarzan,1 December 1911,20:00,41n51,87w39,<p>American fictional character created by Edg...,https://www.astro.com/astro-databank/Tarzan
16518,Entertainment: French and Saunders,9 March 1987,21:00,51n30,0w10,<p>British sketch comedy television series wri...,https://www.astro.com/astro-databank/Entertain...
16519,Entertainment: Eurovision Song Contest,24 May 1956,21:00,46n01,8e58,"<p>International TV song competition, the long...",https://www.astro.com/astro-databank/Entertain...
16521,Entertainment: Eight Is Enough,15 March 1977,20:30,40n43,74w00,<p>American television comedy-drama series tha...,https://www.astro.com/astro-databank/Entertain...
16524,Entertainment: Dilwale Dulhania Le Jayenge,20 October 1995,12:00,18n58,72e50,<p>Indian romance film written and directed by...,https://www.astro.com/astro-databank/Entertain...


In [26]:
entert.shape

(144, 7)

In [27]:
nature = astro3[astro3['nome'].str.contains('Nature')]
nature.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
105,Nature:Earthquake California 1812 (a) Wrightwood,8 December 1812,07:00,34n22,117w38,<p>Historical earthquake in California. Estima...,https://www.astro.com/astro-databank/Nature:Ea...
107,Nature:Earthquake California/Mexico: 1979,15 October 1979,23:16:57,32n59,115w32,<p>Californian-mexican earthquake of magnitude...,https://www.astro.com/astro-databank/Nature:Ea...
108,Nature:Earthquake Burma (Myanmar) 1908,12 December 1908,12:55,26n27,96e42,<p>Historical Burma (Myanmar) earthquake. Est...,https://www.astro.com/astro-databank/Nature:Ea...
110,Nature:Earthquake Bulgaria 1928,18 April 1928,19:22,42n09,24e45,<p>Historical Bulgaria earthquake. Estimated ...,https://www.astro.com/astro-databank/Nature:Ea...
111,Nature:Earthquake Bangladesh 1762 (Arakan),22 March 1762,11:00,22n20,91e50,<p>Historical earthquake in Bangladesh (then i...,https://www.astro.com/astro-databank/Nature:Ea...


In [28]:
nature.shape

(623, 7)

In [29]:
plane = astro3[astro3['nome'].str.contains('Plane Crash')]
plane.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
305,Plane Crash: AirAsia Flight 8501 (2014),28 December 2014,05:35,2s08,106e08,<p><br>\n</p>,https://www.astro.com/astro-databank/Accident:...
904,Plane Crash: TWA Flight 800 (1964),23 November 1964,14:15,41n54,12e29,"<p>On November 23, 1964 at 02:09 p.m. the capt...",https://www.astro.com/astro-databank/Accident:...
1611,Plane Crash: Frontier Airlines (1964),12 March 1964,20:50,46n25,105w50,"<p>On March 12, 1964, at 8:50 p.m. Flight 32 o...",https://www.astro.com/astro-databank/Accident:...
1612,Plane Crash: EAL Flight 304 (1964),25 February 1964,02:06,30n17,89w47,<p>Eastern Air Lines Flight 304 was a Douglas ...,https://www.astro.com/astro-databank/Accident:...
1615,Accident: Plane Crash 14268,21 December 1967,16:00,39n44,105w01,"<p>Zip Dobyns, news source unrecorded\n</p>",https://www.astro.com/astro-databank/Accident:...


In [30]:
plane.shape

(176, 7)

In [31]:
animal = astro3[astro3['nome'].str.contains('Animal')]
animal.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
14242,Animal: Ruffian,17 April 1972,21:50,38n13,84w15,"<p>American champion thoroughbred racehorse, w...",https://www.astro.com/astro-databank/Ruffian
46714,Animal: Orkid (killer whale),23 September 1988,16:30,32n43,117w09,"<p>American killer whale, later nicknamed 'The...",https://www.astro.com/astro-databank/Animal:_W...
46717,Animal: Secretariat,30 March 1970,00:10,37n52,77w28,<p>American race horse known as one of the bes...,https://www.astro.com/astro-databank/Animal:_S...
46719,Animal: Rockstardom,17 August 2007,03:16,32s30,150e55,"<p>Australia thoroughbred racehorse, the first...",https://www.astro.com/astro-databank/Animal:_R...
46720,Animal: Sea Pigeon,7 March 1970,04:00,38n03,84w30,"<p>American-bred, British-trained racehorse wh...",https://www.astro.com/astro-databank/Animal:_S...


In [32]:
animal.shape

(23, 7)

Feito essas separações, o dataset foi filtrado, com a intenção de manter personalidades famosas para o público brasileiro.

Primeiramente, eliminou-se os já conhecidos acontecimentos relativos a "Accidents", "UFO Sighting", "Entertainment" e "Plane Crash":

In [33]:
astro4 = astro3[astro3['nome'].str.contains('UFO Sighting') == False][astro3['nome'].str.contains('Accident') == False][astro3['nome'].str.contains('Entertainment') == False][astro3['nome'].str.contains('Nature') == False][astro3['nome'].str.contains('Plane Crash') == False][astro3['nome'].str.contains('Animal') == False]

  """Entry point for launching an IPython kernel.


In [34]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Bablet, Denis",19 March 1930,01:30,48n52,2e20,<p>French theorist of the theatre. \n</p>,"https://www.astro.com/astro-databank/Bablet,_D..."
1,"Babin Cortes, Maria Teresa",31 May 1910,05:30,18n01,66w37,"<p>Puerto Rican educator, literary critic, and...",https://www.astro.com/astro-databank/Babin_Cor...
2,"Babonneau, René",18 June 1904,09:30,47n13,1w33,"<p>French legionnaire, awarded the title of 'C...",https://www.astro.com/astro-databank/Babonneau...
3,"Baboul, Eudoxie",1 October 1901,03:00,5n23,52w57,"<p>French supercentenarian, for a while the ol...","https://www.astro.com/astro-databank/Baboul,_E..."
4,"Babilée, Jean",3 February 1923,21:30,48n5102,2e1958,"<p>French dancer, choreographer, and actor, Ba...",https://www.astro.com/astro-databank/Babil%C3%...


In [35]:
astro4.reset_index(inplace=True)
astro4.drop("index", axis=1, inplace=True)

In [36]:
astro4.shape

(51007, 7)

Excluiu-se alguns dados, principalmente referentes a profissões e nacionalidades com poucas ocorrências:

In [37]:
drop_word(astro4, "theorist")
drop_word(astro4, "sculptor")
drop_word(astro4, "lawyer")
drop_word(astro4, "cyclist")
drop_word(astro4, "critic")
drop_word(astro4, "banker")
drop_word(astro4, "hockey")
drop_word(astro4, "skater")
drop_word(astro4, "harpist")
drop_word(astro4, "jockey")
drop_word(astro4, "historian")
drop_word(astro4, "army")
drop_word(astro4, "diplomat")
drop_word(astro4, "supercentenarian")
drop_word(astro4, "legionnaire")
drop_word(astro4, "Turkish")
drop_word(astro4, "Finnish")
drop_word(astro4, "Sri Lankan")
drop_word(astro4, "Puerto")
drop_word(astro4, "Polish")
drop_word(astro4, "Belgian")
drop_word(astro4, "Nigerian")
drop_word(astro4, "French arc")

astro4.shape

(44593, 7)

In [38]:
astro4.reset_index(inplace=True)
astro4.drop("index", axis=1, inplace=True)

Após essa pequena limpeza, aproximadamente 6500 dados foram excluídos. Optou-se por seguir com o dataset dessa maneira.

Em seguida, realizou-se uma limpeza na coluna 'bio'.

In [39]:
astro4['bio'] = astro4['bio'].str.replace('<p>','')
astro4['bio'] = astro4['bio'].str.replace('</p>','')
astro4['bio'] = astro4['bio'].str.replace('\n','')

O próximo passo consistiu em transformar a coluna "data" no formato 'YYYY/MM/DD':

In [40]:
astro4['data'] = pd.to_datetime(astro4['data'], errors='coerce')

In [41]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Babilée, Jean",1923-02-03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...
1,"Baber, Alice",1928-08-22,03:40,39n30,88w11,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice"
2,"Babbitt, Manuel",1949-05-03,21:30,41n46,70w43,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_..."
3,"Babbitt, Elwood",1921-11-26,10:20,42n35,72w19,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_..."
4,"A, Dominique",1968-10-06,17:35,48n33,3e18,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique"


In [42]:
astro4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44593 entries, 0 to 44592
Data columns (total 7 columns):
nome       44593 non-null object
data       43700 non-null datetime64[ns]
horario    44593 non-null object
lat        44593 non-null object
long       44593 non-null object
bio        44593 non-null object
url        44593 non-null object
dtypes: datetime64[ns](1), object(6)
memory usage: 2.4+ MB


In [43]:
astro4 = astro4.dropna()

In [44]:
# Convertendo o formato datetime64 para string:
astro4['data'] = astro4['data'].astype(str) 

In [45]:
astro4.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 43700 entries, 0 to 44592
Data columns (total 7 columns):
nome       43700 non-null object
data       43700 non-null object
horario    43700 non-null object
lat        43700 non-null object
long       43700 non-null object
bio        43700 non-null object
url        43700 non-null object
dtypes: object(7)
memory usage: 2.7+ MB


In [46]:
# Agora que já é uma string, pode-se aplicar métodos para adaptar ao formato desejado:
astro4['data'] = astro4['data'].str.replace('-','/')

In [47]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url
0,"Babilée, Jean",1923/02/03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...
1,"Baber, Alice",1928/08/22,03:40,39n30,88w11,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice"
2,"Babbitt, Manuel",1949/05/03,21:30,41n46,70w43,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_..."
3,"Babbitt, Elwood",1921/11/26,10:20,42n35,72w19,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_..."
4,"A, Dominique",1968/10/06,17:35,48n33,3e18,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique"


É necessário converter os dados de latitude e longitude para float. <br>
Exemplo: de '23s33' a -23.33 <br>
<br>
Latitude:

In [48]:
# Primeiro, os dados referentes ao sul, que devem ser negativos:
astro4.loc[(astro4['lat'].str.contains("s")),'lat_2'] = astro4['lat'].str.replace('s','.')
astro4['lat_2'] = "-" + astro4['lat_2']

In [49]:
# Depois, os dados referentes ao norte:
astro4.loc[(astro4['lat'].str.contains("n")),'lat_2'] = astro4['lat'].str.replace('n','.')

In [50]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url,lat_2
0,"Babilée, Jean",1923/02/03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102
1,"Baber, Alice",1928/08/22,03:40,39n30,88w11,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice",39.3
2,"Babbitt, Manuel",1949/05/03,21:30,41n46,70w43,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_...",41.46
3,"Babbitt, Elwood",1921/11/26,10:20,42n35,72w19,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_...",42.35
4,"A, Dominique",1968/10/06,17:35,48n33,3e18,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique",48.33


In [51]:
astro4['lat_2'] = astro4['lat_2'].astype(float)

In [52]:
astro4.dtypes

nome        object
data        object
horario     object
lat         object
long        object
bio         object
url         object
lat_2      float64
dtype: object

Longitude:

In [53]:
# Primeiro, os dados referentes ao oeste, que devem ser negativos:
astro4.loc[(astro4['long'].str.contains("w")),'long_2'] = astro4['long'].str.replace('w','.')
astro4['long_2'] = "-" + astro4['long_2']

In [54]:
# Depois, os dados referentes ao leste:
astro4.loc[(astro4['long'].str.contains("e")),'long_2'] = astro4['long'].str.replace('e','.')

In [55]:
astro4.head()

Unnamed: 0,nome,data,horario,lat,long,bio,url,lat_2,long_2
0,"Babilée, Jean",1923/02/03,21:30,48n5102,2e1958,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958
1,"Baber, Alice",1928/08/22,03:40,39n30,88w11,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice",39.3,-88.11
2,"Babbitt, Manuel",1949/05/03,21:30,41n46,70w43,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_...",41.46,-70.43
3,"Babbitt, Elwood",1921/11/26,10:20,42n35,72w19,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_...",42.35,-72.19
4,"A, Dominique",1968/10/06,17:35,48n33,3e18,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique",48.33,3.18


In [56]:
astro4['long_2'] = astro4['long_2'].astype(float)

In [57]:
astro4.dtypes

nome        object
data        object
horario     object
lat         object
long        object
bio         object
url         object
lat_2      float64
long_2     float64
dtype: object

In [58]:
# Pode-se excluir as colunas originais:
astro4.drop(['lat', 'long'], axis=1, inplace=True)

Em seguida, determinou-se o fuso horário, em deslocamento UTC:

In [59]:
astro4['fuso'] = astro4.apply(offset, axis=1)

In [60]:
astro4.head()

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso
0,"Babilée, Jean",1923/02/03,21:30,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958,1.0
1,"Baber, Alice",1928/08/22,03:40,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice",39.3,-88.11,-6.0
2,"Babbitt, Manuel",1949/05/03,21:30,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_...",41.46,-70.43,-5.0
3,"Babbitt, Elwood",1921/11/26,10:20,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_...",42.35,-72.19,-5.0
4,"A, Dominique",1968/10/06,17:35,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique",48.33,3.18,1.0


In [61]:
astro4.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 43700 entries, 0 to 44592
Data columns (total 8 columns):
nome       43700 non-null object
data       43700 non-null object
horario    43700 non-null object
bio        43700 non-null object
url        43700 non-null object
lat_2      43700 non-null float64
long_2     43700 non-null float64
fuso       43064 non-null float64
dtypes: float64(3), object(5)
memory usage: 4.3+ MB


In [62]:
astro4 = astro4.dropna()

O dataset já está como o desejado; desta forma, atribuiu-se um nome definitivo para facilitar sua identificação:

In [63]:
df_astro = astro4

In [64]:
df_astro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 43064 entries, 0 to 44592
Data columns (total 8 columns):
nome       43064 non-null object
data       43064 non-null object
horario    43064 non-null object
bio        43064 non-null object
url        43064 non-null object
lat_2      43064 non-null float64
long_2     43064 non-null float64
fuso       43064 non-null float64
dtypes: float64(3), object(5)
memory usage: 3.0+ MB


<div class="alert alert-block alert-info">
<b>Observação:</b> Query de consulta abaixo

In [65]:
df_astro[df_astro['nome'].str.contains('Neymar')]

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso
16680,Neymar,1992/02/05,02:15,Brazilian footballer who plays as a forward fo...,https://www.astro.com/astro-databank/Neymar,-23.31,-46.11,-3.0


***

### 2. Mapa astral:

Calculou-se o mapa astral para todos os elementos do dataset:

In [66]:
df_astro['mapa'] = df_astro.apply(mapa, axis=1)

In [67]:
df_astro.head()

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso,mapa
0,"Babilée, Jean",1923/02/03,21:30,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958,1.0,"{'Sun': 'Aquarius', 'Moon': 'Virgo', 'Mercury'..."
1,"Baber, Alice",1928/08/22,03:40,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice",39.3,-88.11,-6.0,"{'Sun': 'Leo', 'Moon': 'Scorpio', 'Mercury': '..."
2,"Babbitt, Manuel",1949/05/03,21:30,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_...",41.46,-70.43,-5.0,"{'Sun': 'Taurus', 'Moon': 'Cancer', 'Mercury':..."
3,"Babbitt, Elwood",1921/11/26,10:20,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_...",42.35,-72.19,-5.0,"{'Sun': 'Sagittarius', 'Moon': 'Libra', 'Mercu..."
4,"A, Dominique",1968/10/06,17:35,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique",48.33,3.18,1.0,"{'Sun': 'Libra', 'Moon': 'Aries', 'Mercury': '..."


In [68]:
# Para converter as chaves do dicionário em colunas, com seus respectivos valores:
df_astro = pd.concat([df_astro.drop(['mapa'], axis=1), df_astro['mapa'].apply(pd.Series)], axis=1)

In [69]:
df_astro.head()

Unnamed: 0,nome,data,horario,bio,url,lat_2,long_2,fuso,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
0,"Babilée, Jean",1923/02/03,21:30,"French dancer, choreographer, and actor, Babil...",https://www.astro.com/astro-databank/Babil%C3%...,48.5102,2.1958,1.0,Aquarius,Virgo,Aquarius,Sagittarius,Aries,Scorpio,Libra,Virgo,Pisces,Leo,Aquarius
1,"Baber, Alice",1928/08/22,03:40,"American artist, her style was abstract expres...","https://www.astro.com/astro-databank/Baber,_Alice",39.3,-88.11,-6.0,Leo,Scorpio,Virgo,Virgo,Gemini,Taurus,Sagittarius,Gemini,Sagittarius,Leo,Taurus
2,"Babbitt, Manuel",1949/05/03,21:30,American homicide: a former Marine and Vietnam...,"https://www.astro.com/astro-databank/Babbitt,_...",41.46,-70.43,-5.0,Taurus,Cancer,Gemini,Taurus,Taurus,Aquarius,Leo,Aries,Libra,Taurus,Libra
3,"Babbitt, Elwood",1921/11/26,10:20,American medium based in New England. His lif...,"https://www.astro.com/astro-databank/Babbitt,_...",42.35,-72.19,-5.0,Sagittarius,Libra,Scorpio,Scorpio,Libra,Libra,Libra,Libra,Aries,Taurus,Sagittarius
4,"A, Dominique",1968/10/06,17:35,"French songwriter, singer and guitarist.","https://www.astro.com/astro-databank/A,_Dominique",48.33,3.18,1.0,Libra,Aries,Scorpio,Scorpio,Virgo,Virgo,Aries,Aries,Libra,Aries,Virgo


Com o mapa astral calculado para todas as linhas do dataset, eliminou-se as colunas "data", "horario", "bio", "url", "lat_2", "long_2" e "fuso".

In [70]:
df_astro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 43064 entries, 0 to 44592
Data columns (total 19 columns):
nome            43064 non-null object
data            43064 non-null object
horario         43064 non-null object
bio             43064 non-null object
url             43064 non-null object
lat_2           43064 non-null float64
long_2          43064 non-null float64
fuso            43064 non-null float64
Sun             43064 non-null object
Moon            43064 non-null object
Mercury         43064 non-null object
Venus           43064 non-null object
Mars            43064 non-null object
Jupiter         43064 non-null object
Saturn          43064 non-null object
North Node      43064 non-null object
South Node      43064 non-null object
Syzygy          43064 non-null object
Pars Fortuna    43064 non-null object
dtypes: float64(3), object(16)
memory usage: 6.6+ MB


In [71]:
df_astro.drop(["data", "horario", "bio", "url", "lat_2", "long_2", "fuso"], axis=1, inplace=True)

In [72]:
df_astro.head()

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
0,"Babilée, Jean",Aquarius,Virgo,Aquarius,Sagittarius,Aries,Scorpio,Libra,Virgo,Pisces,Leo,Aquarius
1,"Baber, Alice",Leo,Scorpio,Virgo,Virgo,Gemini,Taurus,Sagittarius,Gemini,Sagittarius,Leo,Taurus
2,"Babbitt, Manuel",Taurus,Cancer,Gemini,Taurus,Taurus,Aquarius,Leo,Aries,Libra,Taurus,Libra
3,"Babbitt, Elwood",Sagittarius,Libra,Scorpio,Scorpio,Libra,Libra,Libra,Libra,Aries,Taurus,Sagittarius
4,"A, Dominique",Libra,Aries,Scorpio,Scorpio,Virgo,Virgo,Aries,Aries,Libra,Aries,Virgo


In [73]:
df_astro.reset_index(inplace=True)
df_astro.drop("index", axis=1, inplace=True)

In [74]:
df_astro.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43064 entries, 0 to 43063
Data columns (total 12 columns):
nome            43064 non-null object
Sun             43064 non-null object
Moon            43064 non-null object
Mercury         43064 non-null object
Venus           43064 non-null object
Mars            43064 non-null object
Jupiter         43064 non-null object
Saturn          43064 non-null object
North Node      43064 non-null object
South Node      43064 non-null object
Syzygy          43064 non-null object
Pars Fortuna    43064 non-null object
dtypes: object(12)
memory usage: 3.9+ MB


<div class="alert alert-block alert-info">
<b>Observação:</b> Dois indivíduos que possuem mapa astral exatamente igual

In [75]:
df_astro_semnome = df_astro.drop('nome', axis=1)

In [76]:
df_astro_semnome[(df_astro_semnome["Sun"]=="Cancer")&(df_astro_semnome["Moon"]=="Cancer")&(df_astro_semnome["Mercury"]=="Leo")][:2]

Unnamed: 0,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
24,Cancer,Cancer,Leo,Leo,Gemini,Gemini,Leo,Capricorn,Cancer,Capricorn,Taurus
52,Cancer,Cancer,Leo,Leo,Gemini,Gemini,Leo,Capricorn,Cancer,Capricorn,Taurus


***

### 3. Cálculo de distâncias:

#### 3.1 Por similaridade:

Calculou-se as distâncias entre os indivíduos do dataset. <br>
Para isso, primeiramente determinou-se dummies para as colunas categóricas.

In [77]:
df_astro_dummies = pd.get_dummies(df_astro, columns=['Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter',
                                                    'Saturn','North Node', 'South Node', 'Syzygy', 'Pars Fortuna'])

In [78]:
df_astro_dummies.head()

Unnamed: 0,nome,Sun_Aquarius,Sun_Aries,Sun_Cancer,Sun_Capricorn,Sun_Gemini,Sun_Leo,Sun_Libra,Sun_Pisces,Sun_Sagittarius,...,Pars Fortuna_Cancer,Pars Fortuna_Capricorn,Pars Fortuna_Gemini,Pars Fortuna_Leo,Pars Fortuna_Libra,Pars Fortuna_Pisces,Pars Fortuna_Sagittarius,Pars Fortuna_Scorpio,Pars Fortuna_Taurus,Pars Fortuna_Virgo
0,"Babilée, Jean",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Baber, Alice",0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0
2,"Babbitt, Manuel",0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,"Babbitt, Elwood",0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,1,0,0,0
4,"A, Dominique",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1


Em seguida, utilizou-se o algoritmo BallTree para calcular as distâncias entre os k vizinhos mais próximos em relação a um determinado ponto. 

Como exemplo, verificou-se quais são os 10 indivíduos que possuem o mapa astral que mais se assemelha ao do Neymar.

In [219]:
def top10_semelhantes(nome_do_usuario):
    tree = BallTree(df_astro_dummies.drop('nome', axis=1), leaf_size=2)
    dist, ind = tree.query(df_astro_dummies.loc[df_astro_dummies["nome"]==nome_do_usuario].drop('nome', axis=1), k=11) 
    
    ind_list=[]
    for ind1 in ind:
        for i in ind1:
            ind_list.append(i)
    ind_list.remove(df_astro_dummies.loc[df_astro_dummies["nome"]==nome_do_usuario].index)
    
    dist_list=[]
    for dis in dist:
        for d in dis:
            dist_list.append(d)
    dist_list.remove(0)
    
    df_igual=pd.DataFrame()
    df_semelhante=pd.DataFrame()
    for n,d in enumerate(dist_list):
        if d == 0:
            df_igual = df_igual.append(df_astro.loc[[ind_list[n]]])
        elif d!=0: 
            df_semelhante = df_semelhante.append(df_astro.loc[[ind_list[n]]])
    
    porcentagem = []
    for index, row in df_neymar_semelhante.iterrows():
        for index2, row2 in df_astro.loc[df_astro["nome"]=="Neymar"].iterrows():
            comparacao = row==row2
            porcentagem.append(round((((comparacao==True).value_counts(sort=False).values[1])/11)*100,2))
            
    print('-------------------------------------------------------------')
    print('Olá, %s!' %nome_do_usuario)
    print('\n')
    print('Você é idêntico às seguintes pessoas:')
    print('>>','\n>> '.join([i for i in df_igual['nome']]))
    print('\n')
    print('Você é similar às seguintes pessoas:')
    print('>>','\n>> '.join([i for i in df_semelhante['nome']]), [p for p in porcentagem])
    print('-------------------------------------------------------------')

In [220]:
top10_semelhantes("Neymar")

-------------------------------------------------------------
Olá, Neymar!


Você é idêntico às seguintes pessoas:
>> Benattia, Nabilla


Você é similar às seguintes pessoas:


TypeError: sequence item 0: expected str instance, numpy.float64 found

In [133]:
[i for i in semelhante['nome']]

['Wagner, Bryan Matthew',
 'Jain (singer)',
 'Efron, Dylan',
 'Pino, Andrea',
 'Sutton, Chloe',
 'Lautner, Taylor',
 'Ferrand-Prévot, Pauline',
 'Stal, Gary',
 'Santos, Carlos']

In [100]:
tree = BallTree(df_astro_dummies.drop('nome', axis=1), leaf_size=2)

In [114]:
dist, ind = tree.query(df_astro_dummies.loc[df_astro_dummies["nome"]=="Neymar"].drop('nome', axis=1), k=11) 

In [102]:
ind_list=[]
for ind1 in ind:
    for i in ind1:
        ind_list.append(i)

ind_list

[36920, 16112, 3686, 2157, 1882, 15174, 39548, 21487, 28267, 8852, 11145]

In [103]:
ind_list.remove(df_astro_dummies.loc[df_astro_dummies["nome"]=="Neymar"].index)

In [104]:
ind_list

[36920, 3686, 2157, 1882, 15174, 39548, 21487, 28267, 8852, 11145]

In [115]:
dist_list=[]
for dis in dist:
    for d in dis:
        dist_list.append(d)

dist_list

[0.0,
 0.0,
 1.4142135623730951,
 1.4142135623730951,
 1.4142135623730951,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0]

In [116]:
dist_list.remove(0)

In [117]:
dist_list

[0.0,
 1.4142135623730951,
 1.4142135623730951,
 1.4142135623730951,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0,
 2.0]

In [126]:
df_astro_dummies.loc[["Neymar"]]

Unnamed: 0_level_0,Sun_Aquarius,Sun_Aries,Sun_Cancer,Sun_Capricorn,Sun_Gemini,Sun_Leo,Sun_Libra,Sun_Pisces,Sun_Sagittarius,Sun_Scorpio,...,Pars Fortuna_Cancer,Pars Fortuna_Capricorn,Pars Fortuna_Gemini,Pars Fortuna_Leo,Pars Fortuna_Libra,Pars Fortuna_Pisces,Pars Fortuna_Sagittarius,Pars Fortuna_Scorpio,Pars Fortuna_Taurus,Pars Fortuna_Virgo
nome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Neymar,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [155]:
df_neymar_igual=pd.DataFrame()
df_neymar_semelhante=pd.DataFrame()
for n,d in enumerate(dist_list):
    if d == 0:
        df_neymar_igual = df_neymar_igual.append(df_astro.loc[[ind_list[n]]])
    elif d!=0: 
        df_neymar_semelhante = df_neymar_semelhante.append(df_astro.loc[[ind_list[n]]])

In [156]:
df_neymar_igual

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
36920,"Benattia, Nabilla",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [160]:
df_neymar_semelhante

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
3686,"Wagner, Bryan Matthew",Aquarius,Gemini,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius
2157,Jain (singer),Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer
1882,"Efron, Dylan",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
15174,"Pino, Andrea",Aquarius,Cancer,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Virgo
39548,"Sutton, Chloe",Aquarius,Aquarius,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
21487,"Lautner, Taylor",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
28267,"Ferrand-Prévot, Pauline",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
8852,"Stal, Gary",Aquarius,Aries,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
11145,"Santos, Carlos",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer


In [161]:
df_astro.loc[df_astro["nome"]=="Neymar"]

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
16112,Neymar,Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [212]:
porcentagem = []

In [213]:
for index, row in df_neymar_semelhante.iterrows():
        for index2, row2 in df_astro.loc[df_astro["nome"]=="Neymar"].iterrows():
            comparacao = row==row2
            porcentagem.append(round((((comparacao==True).value_counts(sort=False).values[1])/11)*100,2))

In [214]:
porcentagem

[90.91, 90.91, 90.91, 81.82, 81.82, 81.82, 81.82, 81.82, 81.82]

In [170]:
df_neymar_semelhante

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
3686,"Wagner, Bryan Matthew",Aquarius,Gemini,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius
2157,Jain (singer),Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer
1882,"Efron, Dylan",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
15174,"Pino, Andrea",Aquarius,Cancer,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Virgo
39548,"Sutton, Chloe",Aquarius,Aquarius,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
21487,"Lautner, Taylor",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
28267,"Ferrand-Prévot, Pauline",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
8852,"Stal, Gary",Aquarius,Aries,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
11145,"Santos, Carlos",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer


In [93]:
df_astro[df_astro['nome'].str.contains('Neymar')]

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
16112,Neymar,Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius


In [88]:
df_neymar_semelhante

Unnamed: 0,nome,Sun,Moon,Mercury,Venus,Mars,Jupiter,Saturn,North Node,South Node,Syzygy,Pars Fortuna
36920,"Benattia, Nabilla",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius
16112,Neymar,Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius
3686,"Wagner, Bryan Matthew",Aquarius,Gemini,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Sagittarius
1882,"Efron, Dylan",Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
2157,Jain (singer),Aquarius,Pisces,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Cancer
21487,"Lautner, Taylor",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
15174,"Pino, Andrea",Aquarius,Cancer,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Virgo
8852,"Stal, Gary",Aquarius,Aries,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
39548,"Sutton, Chloe",Aquarius,Aquarius,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo
28267,"Ferrand-Prévot, Pauline",Aquarius,Taurus,Aquarius,Capricorn,Capricorn,Virgo,Aquarius,Capricorn,Cancer,Aquarius,Leo


***

### 4. Interação com os dados do usuário:

Utilizou-se a classe "pessoa" para armazenar os dados do usuário e calcular seu mapa astral:

In [None]:
fer = pessoa('Fernanda', '1992/08/15', '13:30', '-03:00', 'São Paulo, SP')

In [None]:
fer.mapa()

In [None]:
fer.chart

Feito isso, estabeleceu-se uma função, para armazenar esses dados do usuário como uma linha temporária do dataset original.

In [None]:
def usuario():
    """
    Essa função armazena os dados que o usuário fornece e cria um dataset
    """
    us_nome = input('Qual o seu nome?')
    #while condição regex nao é verdadeira, continua com o input
    #se verdadeira, pass
    us_data = input('Em que dia você nasceu? (AAAA/MM/DD)')
    us_hora = input('Em qual horário você nasceu? (HH:MM)')
    us_fuso = input('Em qual fuso horário (UTC)? (-03:00 para paulistas)')
    us_cidade = input('Em qual cidade você nasceu? (Cidade, UF)')
    user = pessoa(us_nome, us_data, us_hora, us_fuso, us_cidade)
    user.mapa()
    dicio = user.chart
    user_df = pd.DataFrame([dicio])
    user_df = user_df[['nome', 'Sun', 'Moon', 'Mercury', 'Venus', 'Mars', 'Jupiter', 'Saturn',
       'North Node', 'South Node', 'Syzygy', 'Pars Fortuna']]
    return user_df

In [None]:
usuario()