## Introdução

O objetivo deste notebook é fazer a extração/obtenção de três datasets de maneiras diferentes, o primeiro um arquivo de CSV dado pela Udacity, o segundo será feito o download do arquivo hospedado nos servidores da Udacity também, e por fim o último se trata de uma extração de dados diretamente do Twitter via API com auxílio da lib tweepy.

Inicialmente vamos importar as bibliotecas necessárias para o projeto:

In [466]:
# Importando libs que serão utilizadas no projeto
import pandas as pd
import requests
import json
import tweepy
import sys

pd.set_option('display.precision',50)

## Importando os datasets

Agora é hora de iniciar a importação de cada um dos arquivos com os dados, primeiro vamos importar o arquivo fornecido pela Udacity:

In [467]:
# Lendo os datasets originais, primeiramente o dataset fornecido pela Udacity
weratedogs = pd.read_csv('twitter-archive-enhanced.csv')

# Testando o arquivo
weratedogs.head(20)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,


O segundo dataset é o que foi hospedado pela Udacity, vamos ler ele também:

In [468]:
# Fazendo download do dataset
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'

r = requests.get(url, allow_redirects=True)

open('image-predictions.tsv', 'wb').write(r.content)

# Lendo o arquivo, detalhe para a separação ser por tabs
imagepredictions = pd.read_csv('image-predictions.tsv', sep='\t')

# Testando o segundo dataset
imagepredictions.head()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.46507399999999998740918272233102470636367797...,True,collie,0.15666499999999999870325950723781716078519821...,True,Shetland_sheepdog,0.06142849999999999699440422773477621376514434...,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.50682599999999999873523393034702166914939880...,True,miniature_pinscher,0.07419169999999998543760426628068671561777591...,True,Rhodesian_ridgeback,0.07201000000000000456079618516014306806027889...,True
2,666033412701032449,https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg,1,German_shepherd,0.59646100000000001895017476272187195718288421...,True,malinois,0.13858399999999998497557385235268156975507736...,True,bloodhound,0.11619699999999999473487832801765762269496917...,True
3,666044226329800704,https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg,1,Rhodesian_ridgeback,0.40814299999999997803357132397650275379419326...,True,redbone,0.36068699999999997984900801384355872869491577...,True,miniature_pinscher,0.22275200000000000555289147996518295258283615...,True
4,666049248165822465,https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg,1,miniature_pinscher,0.56031100000000000349587025993969291448593139...,True,Rottweiler,0.24368200000000000970956648416176903992891311...,True,Doberman,0.15462899999999998867927786250220378860831260...,True


Um pequeno detalhe do primeiro para o segundo dataset, é que o primeiro é um .CSV separado por vírgulas, e o segundo é um .TSV que é separado por tabs.

Agora por fim, vamos importar os dados do terceiro dataset via API do twitter:

In [469]:
# Configurando tweepy
'''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)


# ids dos tweets
ids = weratedogs['tweet_id']

# abrindo arquivo
file = open('tweet_json.txt','w') 


# iterando nos ids
for i in (ids):
    try:
        tweet = api.get_status(i)
        if(tweet):
            file.write(str(i) + '\t' + str(tweet.retweet_count) + '\t' + str(tweet.favorite_count) + '\t' + str(tweet.created_at) + '\n')
    except:
        e = sys.exc_info()[0]
        print( "<p>Error: %s</p>" % e )
        
# fechando o arquivo        
file.close()
'''

# criando dataset
tweet_json = pd.read_csv('tweet_json.txt', sep="\t", header=None)
tweet_json.columns = ["tweet_id", "retweet_count", "favorite_count", "created_at"]

# testando dataset criado a partir de dados do tweeter
tweet_json.head(20)

Unnamed: 0,tweet_id,retweet_count,favorite_count,created_at
0,892420643555336193,8476,38486,2017-08-01 16:23:56
1,892177421306343426,6238,32985,2017-08-01 00:17:27
2,891815181378084864,4134,24833,2017-07-31 00:18:03
3,891689557279858688,8601,41863,2017-07-30 15:58:51
4,891327558926688256,9331,40022,2017-07-29 16:00:24
5,891087950875897856,3094,20074,2017-07-29 00:08:17
6,890971913173991426,2056,11750,2017-07-28 16:27:12
7,890729181411237888,18798,64980,2017-07-28 00:22:40
8,890609185150312448,4243,27593,2017-07-27 16:25:51
9,890240255349198849,7368,31675,2017-07-26 15:59:51


Agora que temos nossos 3 arquivos com dados, vamos a análise de problemas.

## Avaliando os dados

Vamos avaliar os nossos conjuntos de dados!




In [470]:
# Método info nos datasets
weratedogs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

In [471]:
imagepredictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB


In [472]:
tweet_json.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2342 entries, 0 to 2341
Data columns (total 4 columns):
tweet_id          2342 non-null int64
retweet_count     2342 non-null int64
favorite_count    2342 non-null int64
created_at        2342 non-null object
dtypes: int64(3), object(1)
memory usage: 73.3+ KB


In [473]:
# Verificando colunas com duplicidade
all_columns = pd.Series(list(weratedogs) + list(imagepredictions) + list(tweet_json))
all_columns[all_columns.duplicated()]

17    tweet_id
29    tweet_id
dtype: object

In [474]:
weratedogs.tweet_id.value_counts()

749075273010798592    1
741099773336379392    1
798644042770751489    1
825120256414846976    1
769212283578875904    1
700462010979500032    1
780858289093574656    1
699775878809702401    1
880095782870896641    1
760521673607086080    1
776477788987613185    1
691820333922455552    1
715696743237730304    1
714606013974974464    1
760539183865880579    1
813157409116065792    1
676430933382295552    1
743510151680958465    1
837012587749474308    1
833722901757046785    1
818259473185828864    1
670704688707301377    1
667160273090932737    1
674394782723014656    1
672082170312290304    1
670093938074779648    1
759923798737051648    1
809920764300447744    1
805487436403003392    1
838085839343206401    1
                     ..
763956972077010945    1
870308999962521604    1
720775346191278080    1
785927819176054784    1
783347506784731136    1
775733305207554048    1
834209720923721728    1
825026590719483904    1
758405701903519748    1
668986018524233728    1
6909388994772213

In [475]:
# Método describe nos datasets
weratedogs.describe()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,rating_numerator,rating_denominator
count,2356.0,78.0,78.0,181.0,181.0,2356.00000000000000000000000000000000000000000...,2356.00000000000000000000000000000000000000000...
mean,7.427715903217198e+17,7.455079178557505e+17,2.014170636087321e+16,7.72039961038007e+17,1.241698365301758e+16,13.1264855687606107892406726023182272911071777...,10.4554329371816638882819461286999285221099853...
std,6.856704744476103e+16,7.582492004419288e+16,1.2527966625523632e+17,6.23692781050556e+16,9.599253533151752e+16,45.8766476233301077058968076016753911972045898...,6.74523722694255578602451350889168679714202880...
min,6.660208880227901e+17,6.658146967007232e+17,11856342.0,6.661041332886651e+17,783214.0,0.00000000000000000000000000000000000000000000...,0.00000000000000000000000000000000000000000000...
25%,6.783989382144758e+17,6.757419119934648e+17,308637448.75,7.18631497683583e+17,4196983835.0,10.0000000000000000000000000000000000000000000...,10.0000000000000000000000000000000000000000000...
50%,7.196279347162358e+17,7.038708402265989e+17,4196983835.0,7.804657092979956e+17,4196983835.0,11.0000000000000000000000000000000000000000000...,10.0000000000000000000000000000000000000000000...
75%,7.993373049542522e+17,8.257803712865669e+17,4196983835.0,8.203146337770618e+17,4196983835.0,12.0000000000000000000000000000000000000000000...,10.0000000000000000000000000000000000000000000...
max,8.924206435553363e+17,8.862663570751283e+17,8.405478643549184e+17,8.874739571039519e+17,7.87461778435289e+17,1776.00000000000000000000000000000000000000000...,170.000000000000000000000000000000000000000000...


In [476]:
imagepredictions.describe()

Unnamed: 0,tweet_id,img_num,p1_conf,p2_conf,p3_conf
count,2075.0,2075.00000000000000000000000000000000000000000...,2075.00000000000000000000000000000000000000000...,2075.00000000000000000000000000000000000000000...,2075.00000000000000000000000000000000000000000...
mean,7.38451357852539e+17,1.20385542168674697371955062408233061432838439...,0.59454826361445822779927539158961735665798187...,0.13458860950039183701498757272929651662707328...,0.06032416861810645236641192923343623988330364...
std,6.7852033330235656e+16,0.56187502798363009315352201156201772391796112...,0.27117351686569007851446144741203170269727706...,0.10066573936432347824432298466490465216338634...,0.05090593131945617827449623860047722700983285...
min,6.660208880227901e+17,1.00000000000000000000000000000000000000000000...,0.04433340000000000197255545231200812850147485...,0.00000001011299999999999930202205611178001287...,0.00000000017401699999999998441190987632376388...
25%,6.76483507139541e+17,1.00000000000000000000000000000000000000000000...,0.36441200000000001368860580441833008080720901...,0.05388624999999999665156735773052787408232688...,0.01622240000000000140212286225960269803181290...
50%,7.119988098580439e+17,1.00000000000000000000000000000000000000000000...,0.58823000000000003062439191126031801104545593...,0.11818099999999999438760056591490865685045719...,0.04944380000000000302540215102453657891601324...
75%,7.932034485251789e+17,1.00000000000000000000000000000000000000000000...,0.84385500000000002174260771425906568765640258...,0.19556550000000000322231130667205434292554855...,0.09180755000000000165538693863709340803325176...
max,8.924206435553363e+17,4.00000000000000000000000000000000000000000000...,1.00000000000000000000000000000000000000000000...,0.48801400000000011436540603426692541688680648...,0.27341900000000002313171876267006155103445053...


In [477]:
tweet_json.describe()

Unnamed: 0,tweet_id,retweet_count,favorite_count
count,2342.0,2342.0,2342.0
mean,7.422646010892494e+17,2980.284372331341,8041.522203245089
std,6.83746565615913e+16,4990.576181813869,12366.505251144687
min,6.660208880227901e+17,0.0,0.0
25%,6.783508657476055e+17,599.25,1393.0
50%,7.186224017334907e+17,1395.0,3508.0
75%,7.987009739014185e+17,3477.75,9878.5
max,8.924206435553363e+17,84230.0,162513.0


## Detectando os problemas

### Problemas de qualidade

#### weratedogs df

- Alguns twitters foram deletados, o que vai causar inconsistência com os dados presentes no dataset da Udacity e dos extraidos via API
- *rating_numerator* com números menores que 10 (todos verificados no Twitter eram 10)
- *rating_denominator* com números menores que 10 (todos verificados no Twitter eram 10+)
- *name* com nomes como None, a, an, the
- *expanded_urls* com links diferentes do twitter (https://www.gofundme.com/mingusneedsus)
- *expanded_urls* estão com os links 'errados', os links estão incompletos e finalizados com ...
- *timestamp* com um +0000 desnecessário no fim
- O dataset apresenta alguns retweets, verificar *retweeted_status_id*
- *source* está com tag html e um r...
- *timestamp* está como string

### Problemas de arrumação

- A coluna de *retweet_count* de **twitter_json** deve estar e **weratedogs**
- A coluna de *favorite_count* de **twitter_json** deve estar em **weratedogs**
- *tweet_id* de **imagepredictions** está ao contrário dos demais datasets
- os status de cachorro de **weratedogs** poderiam estar em uma coluna

## Criando os arquivos limpos

In [478]:
weratedogs_clean = weratedogs.copy()
imagepredictions_clean = imagepredictions.copy()
tweet_json_clean = tweet_json.copy()

## Dados faltantes

### 1- Alguns twitters foram deletados, o que vai causar inconsistência com os dados presentes no dataset da Udacity e dos extraidos via API

Um pequeno problema: alguns twitters foram deletados, assim sendo o dataset da API tem algumas linhas a menos, precisamos deixa-los iguais para não afetar a nossa limpeza e futura análise

#### O que será feito:

Vamos comparar os ids de twitter, os que não existirem no *weratedogs*, serão deletados.

### Código

In [479]:
# Achando os ids que são diferentes
difference = pd.concat([weratedogs_clean['tweet_id'],tweet_json_clean['tweet_id']]).drop_duplicates(keep=False)

# Removendo os ids diferentes
weratedogs_clean = weratedogs_clean[~weratedogs_clean['tweet_id'].isin(difference)]

weratedogs_clean = weratedogs_clean.reset_index(drop=True)

### Teste

In [480]:
len(weratedogs_clean)

2342

In [481]:
len(tweet_json_clean)

2342

## Problemas de organização

Primeiramente vamos solucionar os problemas de organização definidos anteriormente

### 1 - A coluna de retweet_count de twitter_json deve estar em weratedogs:

#### O que será feito:

A coluna retweet_count será transferida do dataset **twitter_json** para o **weratedogs**, vamos nos guiar pela numeração dos ids.

### Código:

In [482]:
weratedogs_clean = pd.concat([weratedogs_clean, tweet_json_clean['retweet_count']], axis=1)

### Teste:

In [483]:
weratedogs_clean.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,retweet_count
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,,8476
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,,6238
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,,4134
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,,8601
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,,9331


### 2 - A coluna de favorite_count de twitter_json deve estar em weratedogs:

#### O que será feito:

A coluna favorite_count será transferida do dataset **twitter_json** para o **weratedogs**, vamos nos guiar pela numeração dos ids.

### Código:

In [484]:
weratedogs_clean = pd.concat([weratedogs_clean, tweet_json_clean['favorite_count']], axis=1)

### Teste

In [485]:
weratedogs_clean.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,retweet_count,favorite_count
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,,8476,38486
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,,6238,32985
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,,4134,24833
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,,8601,41863
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,,9331,40022


### 3 - tweet_id de imagepredictions está ao contrário dos demais datasets:

#### O que será feito:

Vamos inverter a ordem de tweet_id do dataset **imagepredictions**

### Código:

In [486]:
imagepredictions_clean = imagepredictions.sort_values('tweet_id', ascending = False)

### Teste:

In [487]:
imagepredictions_clean.head()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
2074,892420643555336193,https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg,1,orange,0.09704859999999999875530676263224449940025806...,False,bagel,0.08585110000000001351239120594982523471117019...,False,banana,0.07610999999999999710009745967909111641347408...,False
2073,892177421306343426,https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg,1,Chihuahua,0.32358100000000000751043671698425896465778350...,True,Pekinese,0.09064650000000000484856599314298364333808422...,True,papillon,0.06895690000000000152713397483239532448351383...,True
2072,891815181378084864,https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg,1,Chihuahua,0.71601199999999998180300053718383423984050750...,True,malamute,0.07825300000000000311128900420953868888318538...,True,kelpie,0.03137890000000000123581145317075424827635288...,True
2071,891689557279858688,https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg,1,paper_towel,0.17027799999999998492583586084947455674409866...,False,Labrador_retriever,0.16808600000000001317701503467105794697999954...,True,spatula,0.04083590000000000136415323481742234434932470...,False
2070,891327558926688256,https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg,2,basset,0.55571199999999998375699306052410975098609924...,True,English_springer,0.22576999999999999846345133391878334805369377...,True,German_short-haired_pointer,0.17521900000000001362820967187872156500816345...,True


In [488]:
weratedogs_clean.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,retweet_count,favorite_count
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,,8476,38486
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,,6238,32985
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,,4134,24833
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,,8601,41863
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,,9331,40022


In [489]:
weratedogs_clean

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,retweet_count,favorite_count
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,,8476,38486
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,,6238,32985
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,,4134,24833
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,,8601,41863
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,,9331,40022
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,,3094,20074
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,,2056,11750
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,,18798,64980
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,,4243,27593
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,,7368,31675


### 4 - os status de cachorro de weratedogs poderiam estar em uma coluna

#### O que será feito:

Vamos criar uma coluna chamada dog_status no dataset **wearedogs** e adicionar os status corretos a cada uma

### Código:

In [490]:
doggo = weratedogs_clean['tweet_id'].loc[weratedogs_clean['doggo'] != 'None']
floofer = weratedogs_clean['tweet_id'].loc[weratedogs_clean['floofer'] != 'None']
pupper = weratedogs_clean['tweet_id'].loc[weratedogs_clean['pupper'] != 'None']
puppo = weratedogs_clean['tweet_id'].loc[weratedogs_clean['puppo'] != 'None']

weratedogs_clean

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,retweet_count,favorite_count
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,,8476,38486
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,,6238,32985
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,,4134,24833
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,,8601,41863
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,,9331,40022
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,,3094,20074
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,,2056,11750
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,,18798,64980
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,,4243,27593
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,,7368,31675
