# Coletor de dados dos *Running Shoes* do site RunRepeat

---
### Rodrigo Fragoso 
- [**Linkedin**](https://www.linkedin.com/in/rodrigo-a-fragoso/) <br/>
- **Email** : rodrigoandradefragoso@gmail.com <br/>

### Resumo
- #### Após extrair todos os links referentes a cada tênis, nós iremos acessar a sua página e colher todas as informações disponíveis sobre eles ;
- #### O output esperado são diversas informações sobre cada tênis: desde preços até *reviews* feitos pela comunidade.
---

<a id='top'></a>
## Sumário

[1 - Importações das bibliotecas](#t1)

[2 - Coleta dos dados da página de busca](#t2)

[3 - Processamento dos dados brutos](#t3)

[4 - Verificação do resultado](#t4)


##     

<a id='t1'></a>
## 1 - Importações das bibliotecas
- [Sumário](#top)   
    - [Próximo](#t2)

### Para iniciarmos a extração será necessário o uso de algumas bibliotecas específicas, que serão importadas na célula abaixo:
- ##### Pandas: ferramenta rápida e poderosa, responsável pela manipulação/analise de dados através do formato *dataframe* ;
- ##### re: modulo para realizar operações de correspondência (em texto) através de expressões regulares ;
- ##### time: modulo utilizado, principalmente, para cálculo de tempo de processamento e criação de *delays* ;
- ##### requests:  biblioteca HTTP utilizada para fazer o download do código fonte da página ;
- ##### bs4 (Beautiful Soup 4): biblioteca utilizada para extrair dados de arquivos HTML e XML, utilizada como *parser* para navegarmos dentro dos arquivos criados pela requests ;
- ##### tqdm: utilizada para acompanhar a duração de loops ;
- ##### glob: modulo responsável por identificar arquivos em uma pasta ;

In [15]:
import pandas as pd
import re
import time

import requests as rq
import bs4 as bs4
import tqdm
import glob
import json

##     

<a id='t2'></a>
## 2 - Coleta de dados da página do tênis
- [Sumário](#top) 
    - [Anterior](#t1)
    - [Próximo](#t3)

### Nesta etapa, iremos navegar pela página de cada tênis e extrair o maximo de informações possiveis que possam descrevê-lo de alguma maneira analítica.
- #### Com o arquivo criado no primeiro processo, resgataremos o link e o nome de cada tênis ;
- #### O cabeçalho da *dataframe* será observado para garantir que ele foi carregado corretamente.

In [16]:
df = pd.read_json("./dados_json/parsed_running_shoes.json", lines=True)
df.head(7)

Unnamed: 0,link,name
0,/brooks-adrenaline-gts-19?selected_color=422261,Brooks Adrenaline GTS 19
1,/brooks-adrenaline-gts-19?selected_color=422261,Brooks Adrenaline GTS 19
2,/brooks-ghost-12?selected_color=799577,Brooks Ghost 12
3,/brooks-ghost-12?selected_color=799577,Brooks Ghost 12
4,/brooks-glycerin-17?selected_color=642780,Brooks Glycerin 17
5,/brooks-glycerin-17?selected_color=642780,Brooks Glycerin 17
6,/nike-air-zoom-pegasus-36?selected_color=712129,Nike Air Zoom Pegasus 36


### Como podemos perceber, existem links e nomes duplicados. Para resolver este problema utilizaremos o método unique() do pandas, no qual eliminamos os registros repetidos.
- #### Feito isso, encontramos uma lista de 2277 tênis para serem explorados.

In [17]:
lista_de_links = df['link'].unique()
len(lista_de_links)

2277

### Com essa lista, podemos baixar as páginas desejadas.
- #### Cada tênis teve sua página salva em um arquivo ;
- #### Ao salvar o arquivo, foi utilizado o regex para tratar o nome do link e torná-lo compatível com os caracteres permitidos pelo windows.


In [None]:
url = "https://runrepeat.com{link}"

for link in lista_de_links:
    urll = url.format(link=link)
    print(urll)
    response = rq.get(urll)
    
    link_name = re.search("(?<=/)(.*)(\?)", link).group(1)

    with open("./dados_brutos/shoes_{}.html".format(link_name), 'w+',encoding="utf-8") as output:
        output.write(response.text)
    time.sleep(2)

##     

<a id='t3'></a>
## 3 - Processamento dos dados brutos
- [Sumário](#top) 
    - [Anterior](#t2)
    - [Próximo](#t4)

### Ao navegar pelas páginas, podemos perceber que existe uma grande diversidade de informações, como esperado do RunRepeat. Dito isto, foram feitas diversas tentativas e testes para chegar nas *tags* desejadas. 
- #### O código pode parecer complexo mas se trata apenas de uma lógica para extrair os dados necessários através do código html, a depender da página este processo pode ser muito rapido ou demorar horas/dias ;
- #### O regex também foi utilizado para auxiliar na procura de textos específicos ;
- #### Por fim, os dados brutos foram salvos em JSON.


In [9]:
with open("./dados_json/parsed_shoes_info.json", 'w+') as output:
    for shoes_file in tqdm.tqdm_notebook(sorted(glob.glob("./dados_brutos/shoes*"))):
        with open(shoes_file, 'r+',encoding="utf-8") as inp:
            page_html = inp.read()
            parsed = bs4.BeautifulSoup(page_html, 'html.parser')

            class_value = parsed.find_all(attrs={"class":re.compile(r"-value")})
            class_fact = parsed.find_all(attrs={"class":re.compile(r"-fact")})
            class_ranktext= parsed.find_all(attrs={"class":re.compile(r"rank-text")})
            class_expertreview = parsed.find_all(attrs={"class":re.compile(r"rr-reviews-score-average")})
            class_reasons=parsed.find_all(attrs={"class":re.compile(r"gb-w-title")})
            class_good=parsed.find_all(attrs={"id":"the_good"})
            class_bad=parsed.find_all(attrs={"id":"the_bad"})


            data = dict()

            for e in class_value:
                colname = "_".join(e['class'])
                data[colname] = e.text.strip()

            for e in class_fact:
                if e.text.strip() != None:
                    colname =  e.text.strip()
                    if e.find("span",{"class":re.compile(r"rating-fact-bar-value-(\d+)")}) != None:
                        data[colname] = e.find("span",{"class":re.compile("rating-fact-bar-value-*")})['class'][1]
                        
                        
            for e in class_fact:
                if e.find("span",{"class":"label-rating-fact"}) != None:
                    colname =  e.find("span",{"class":"label-rating-fact"}).text.strip()
                    if e.find("span",{"class":"rating-value"}) != None:
                        data[colname] = e.find("span",{"class":"rating-value"}).text.strip()
                        
            for e in class_ranktext:
                colname = e.text.replace('\n','').replace('           ',' ').strip()
                data[colname] = 1

            for e in class_expertreview:
                colname = "_".join(e['class'])
                data[colname] = e.text.strip()
            
            for e in class_reasons:
                if re.compile(r'(\d+)').match(e.text.replace('\n','')) != None:
                    colname = "_".join(re.compile('[a-z]+').findall(e.text.replace('\n','')))
                    data[colname] = re.compile(r'(\d+)').match(e.text.replace('\n','')).group()
               
            write=""
            for e in class_good:
                if e.find_all("li") != None:
                    texts=e.find_all("li")
                    for textss in texts:
                        write= write + " " + textss.text.strip()         
                    colname = "good_reasons_to_buy"
                    data[colname] = write

            write=""
            for e in class_bad:
                if e.find_all("li") != None:
                    texts=e.find_all("li")
                    for textss in texts:
                        write= write + " " + textss.text.strip()         
                    colname = "bad_reasons_to_buy"
                    data[colname] = write                    

            output.write("{}\n".format(json.dumps(data)))

HBox(children=(IntProgress(value=0, max=3), HTML(value='')))




#### Neste passo, o foco foi encontrar *features* que melhor representem o produto, além de trazer também analises do próprio site que podem ser muito ricas.

##     

<a id='t4'></a>
## 4 - Verificação do resultado
- [Sumário](#top)   
    - [Anterior](#t3)

In [10]:
df = pd.read_json("./dados_json/parsed_shoes_info.json", lines=True)
df.shape

(3, 67)

In [13]:
pd.set_option("display.max_columns", 676)
df

' Many people claimed that the in-shoe experience was comfortable right out of the box. The façade didn’t have a bulky look, and a lot of testers liked that aspect. The upper unit gained favor for being breathable and flexible. Some runners felt that the pronation support given by the Brooks Adrenaline GTS 19’s midsole was more substantial than the ones in the previous iterations. The general quality of the components wasn’t overshadowed by the price, some runners noted; they felt that this shoe had an affordable starting price. Purchasers liked the fact that, for a stability shoe, the Adrenaline GTS 19 didn’t have a heavy build. The upper offered smooth and non-irritating coverage, many runners stated.'

In [46]:
teste=df.columns.str.contains(r"A top (\d+)%")
df.columns[teste]

  """Entry point for launching an IPython kernel.


Index(['A top 1% best Road running shoe', 'A top 1% best Trail running shoe',
       'A top 10% best Road running shoe', 'A top 10% best Trail running shoe',
       'A top 2% best Road running shoe', 'A top 2% best Trail running shoe',
       'A top 3% best Road running shoe', 'A top 3% best Trail running shoe',
       'A top 4% best Road running shoe', 'A top 4% best Trail running shoe',
       'A top 5% best Road running shoe', 'A top 5% best Trail running shoe',
       'A top 6% best Road running shoe', 'A top 6% best Trail running shoe',
       'A top 7% best Road running shoe', 'A top 7% best Trail running shoe',
       'A top 8% best Road running shoe', 'A top 8% best Trail running shoe',
       'A top 9% best Road running shoe', 'A top 9% best Trail running shoe'],
      dtype='object')

In [17]:
teste=df.columns.str.contains(r"Better rated than the previous version")
teste2=df.columns[teste]

In [36]:
teste2.shape
h=df[teste2]
h.fillna(0,inplace=True)
h2=h.copy()
h2['mean']=h.mean(axis=1)

In [29]:
h['is_a_upgraded_version']=h['mean'].map(lambda x: 1 if x>0 else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


### 

In [32]:
h.sort_values(by='mean',ascending=False).head()

Unnamed: 0,Better rated than the previous version Adidas Adistar Boost 2,Better rated than the previous version Adidas Adizero Boston Boost 5,Better rated than the previous version Adidas Adizero Tempo 8,Better rated than the previous version Adidas AlphaBounce RC,Better rated than the previous version Adidas Climachill Cosmic Boost,Better rated than the previous version Adidas Duramo 8,Better rated than the previous version Adidas Galaxy 3,Better rated than the previous version Adidas Kanadia 7,Better rated than the previous version Adidas Madoru,Better rated than the previous version Adidas Pureboost 2.0,Better rated than the previous version Adidas Supernova Sequence Boost 7,Better rated than the previous version Altra Lone Peak 2.0,Better rated than the previous version Altra Lone Peak 3.5,Better rated than the previous version Altra Olympus 2.5,Better rated than the previous version Altra Olympus 3.0,Better rated than the previous version Altra Paradigm 3.0,Better rated than the previous version Altra Paradigm 4.0,Better rated than the previous version Altra Superior 3.0,Better rated than the previous version Altra Superior 3.5,Better rated than the previous version Altra Timp,Better rated than the previous version Altra Torin 2.5,Better rated than the previous version Altra Torin 3.0,Better rated than the previous version Asics DynaFlyte 2,Better rated than the previous version Asics FuzeX Lyte,Better rated than the previous version Asics GT 2000 4,Better rated than the previous version Asics GT 2000 5 Lite-Show,Better rated than the previous version Asics GT 2000 6,Better rated than the previous version Asics GT 3000 3,Better rated than the previous version Asics Gel Contend 3,Better rated than the previous version Asics Gel Cumulus 17,Better rated than the previous version Asics Gel DS Trainer 23,Better rated than the previous version Asics Gel Exalt 3,Better rated than the previous version Asics Gel Excite 3,Better rated than the previous version Asics Gel Excite 4,Better rated than the previous version Asics Gel Flux 3,Better rated than the previous version Asics Gel Foundation 12,Better rated than the previous version Asics Gel FujiAttack 5,Better rated than the previous version Asics Gel FujiTrabuco 6,Better rated than the previous version Asics Gel Kahana 7,Better rated than the previous version Asics Gel Kayano 22,Better rated than the previous version Asics Gel Kayano 23,Better rated than the previous version Asics Gel Kayano 24,Better rated than the previous version Asics Gel Kayano 25,Better rated than the previous version Asics Gel Nimbus 18,Better rated than the previous version Asics Gel Nimbus 19,Better rated than the previous version Asics Gel Nimbus 20,Better rated than the previous version Asics Gel Noosa Tri 10,Better rated than the previous version Asics Gel Patriot 7,Better rated than the previous version Asics Gel Pursue 2,Better rated than the previous version Asics Gel Quantum 180 2,Better rated than the previous version Asics Gel Quantum 180 3,Better rated than the previous version Asics Gel Quantum 360,Better rated than the previous version Asics Gel Quantum 360 Knit,Better rated than the previous version Asics Gel Sonoma 3,Better rated than the previous version Asics Gel Surveyor 3,Better rated than the previous version Asics Gel Venture 4,Better rated than the previous version Asics Gel Zaraca 4,Better rated than the previous version Asics Noosa FF,Better rated than the previous version Asics Roadhawk FF,Better rated than the previous version Brooks Adrenaline ASR 12,Better rated than the previous version Brooks Adrenaline GTS 15,Better rated than the previous version Brooks Adrenaline GTS 16,Better rated than the previous version Brooks Adrenaline GTS 17,Better rated than the previous version Brooks Adrenaline GTS 18,Better rated than the previous version Brooks Aduro 3,Better rated than the previous version Brooks Beast 16,Better rated than the previous version Brooks Cascadia 10,Better rated than the previous version Brooks Cascadia 12 GTX,Better rated than the previous version Brooks Ghost 10 GTX,Better rated than the previous version Brooks Ghost 7,Better rated than the previous version Brooks Ghost 9 GTX,Better rated than the previous version Brooks Glycerin 14,Better rated than the previous version Brooks Glycerin 16,Better rated than the previous version Brooks Launch 2,Better rated than the previous version Brooks Launch 3,Better rated than the previous version Brooks Launch 4,Better rated than the previous version Brooks Launch 5,Better rated than the previous version Brooks Neuro 2,Better rated than the previous version Brooks PureCadence 4,Better rated than the previous version Brooks PureCadence 5,Better rated than the previous version Brooks PureFlow 4,Better rated than the previous version Brooks PureFlow 5,Better rated than the previous version Brooks PureGrit 5,Better rated than the previous version Brooks Racer ST 5,Better rated than the previous version Brooks Ravenna 7,Better rated than the previous version Brooks Ravenna 8,Better rated than the previous version Brooks Ravenna 9,Better rated than the previous version Brooks Revel,Better rated than the previous version Brooks Trance 13,Better rated than the previous version Brooks Transcend 2,Better rated than the previous version Brooks Transcend 5,Better rated than the previous version Hoka One One Arahi,Better rated than the previous version Hoka One One Challenger 4 ATR,Better rated than the previous version Hoka One One Clayton,Better rated than the previous version Hoka One One Clifton 2,Better rated than the previous version Hoka One One Clifton 4,Better rated than the previous version Hoka One One Hupana,Better rated than the previous version Hoka One One Speed Instinct,Better rated than the previous version Hoka One One Speedgoat,Better rated than the previous version Hoka One One Speedgoat 2,Better rated than the previous version Hoka One One Tracer,Better rated than the previous version La Sportiva Bushido,Better rated than the previous version Merrell All Out Crush Tough Mudder,Better rated than the previous version Merrell Trail Glove 2,Better rated than the previous version Merrell Vapor Glove 2,Better rated than the previous version Mizuno Wave Catalyst,Better rated than the previous version Mizuno Wave Creation 18,Better rated than the previous version Mizuno Wave Daichi 2,Better rated than the previous version Mizuno Wave Enigma 5,Better rated than the previous version Mizuno Wave Hitogami 3,Better rated than the previous version Mizuno Wave Horizon,Better rated than the previous version Mizuno Wave Inspire 12,Better rated than the previous version Mizuno Wave Inspire 13,Better rated than the previous version Mizuno Wave Inspire 14,Better rated than the previous version Mizuno Wave Mujin 2,Better rated than the previous version Mizuno Wave Paradox 2,Better rated than the previous version Mizuno Wave Prophecy 5,Better rated than the previous version Mizuno Wave Rider 18,Better rated than the previous version Mizuno Wave Rider 19,Better rated than the previous version Mizuno Wave Rider 20,Better rated than the previous version Mizuno Wave Sayonara 3,Better rated than the previous version Mizuno Wave Shadow,Better rated than the previous version Mizuno Wave Sky,Better rated than the previous version Mizuno Wave Ultima 6,Better rated than the previous version Mizuno Wave Ultima 8,Better rated than the previous version New Balance 1080 v5,Better rated than the previous version New Balance 1400 v5,Better rated than the previous version New Balance 1500 v2,Better rated than the previous version New Balance 1500 v3,Better rated than the previous version New Balance 680 v2,Better rated than the previous version New Balance 840 v3,Better rated than the previous version New Balance 860 v6,Better rated than the previous version New Balance 860 v7,Better rated than the previous version New Balance 880 v4,Better rated than the previous version New Balance 880 v5,Better rated than the previous version New Balance 880 v7,Better rated than the previous version New Balance 990 v3,Better rated than the previous version New Balance 990 v4,Better rated than the previous version New Balance Fresh Foam 1080 v7,Better rated than the previous version New Balance Fresh Foam 1080 v8,Better rated than the previous version New Balance Fresh Foam 980,Better rated than the previous version New Balance Fresh Foam Boracay,Better rated than the previous version New Balance Fresh Foam Cruz Sport,Better rated than the previous version New Balance Fresh Foam Gobi Trail,Better rated than the previous version New Balance Fresh Foam Hierro,Better rated than the previous version New Balance Fresh Foam Vongo,Better rated than the previous version New Balance Fresh Foam Zante v3,Better rated than the previous version New Balance FuelCore Sonic,Better rated than the previous version New Balance Vazee Pace,Better rated than the previous version New Balance Vazee Prism,Better rated than the previous version New Balance Vazee Rush,Better rated than the previous version Newton Distance IV,Better rated than the previous version Newton Gravity V,Better rated than the previous version Nike Air Relentless 4,Better rated than the previous version Nike Air Relentless 5,Better rated than the previous version Nike Air Zoom Pegasus 33,Better rated than the previous version Nike Air Zoom Pegasus 34,Better rated than the previous version Nike Air Zoom Structure 19,Better rated than the previous version Nike Air Zoom Structure 20,Better rated than the previous version Nike Air Zoom Structure 20 Shield,Better rated than the previous version Nike Air Zoom Vomero 11,Better rated than the previous version Nike Air Zoom Wildhorse 4,Better rated than the previous version Nike Air Zoom Winflo 2,Better rated than the previous version Nike Air Zoom Winflo 3,Better rated than the previous version Nike Downshifter 5,Better rated than the previous version Nike FS Lite Run 3,Better rated than the previous version Nike Flex Fury,Better rated than the previous version Nike Flex RN 2016,Better rated than the previous version Nike Flex Run 2014,Better rated than the previous version Nike Free 4.0,Better rated than the previous version Nike Free Flyknit 3.0,Better rated than the previous version Nike Free RN,Better rated than the previous version Nike Free RN 2017,Better rated than the previous version Nike Free RN 2017 Shield,Better rated than the previous version Nike Free RN Distance,Better rated than the previous version Nike Free RN Flyknit,Better rated than the previous version Nike Free RN Flyknit 2017,Better rated than the previous version Nike Free RN Motion Flyknit,Better rated than the previous version Nike LunarEpic Low Flyknit,Better rated than the previous version Nike LunarGlide 8,Better rated than the previous version Nike Revolution 3,Better rated than the previous version Nike Zoom Streak 6,Better rated than the previous version Nike Zoom Terra Kiger 2,Better rated than the previous version Puma Faas 300 v4,Better rated than the previous version Reebok All Terrain Super 2.0,Better rated than the previous version Reebok Harmony Road 2,Better rated than the previous version Reebok Print Run 2.0,Better rated than the previous version Reebok Speedlux 2.0,Better rated than the previous version Salomon S-Lab Sense Ultra 5,Better rated than the previous version Salomon S-Lab Sonic,Better rated than the previous version Salomon S-Lab Speed,Better rated than the previous version Salomon S-Lab XA Alpine,Better rated than the previous version Salomon Sense Mantra 3,Better rated than the previous version Salomon Sense Pro,Better rated than the previous version Salomon Sense Ride,Better rated than the previous version Salomon Speedcross 4,Better rated than the previous version Salomon Speedcross Pro,Better rated than the previous version Salomon Speedcross Vario,Better rated than the previous version Salomon Wings Flyte,Better rated than the previous version Saucony Cohesion 11,Better rated than the previous version Saucony Cohesion 9,Better rated than the previous version Saucony Excursion TR 10,Better rated than the previous version Saucony Excursion TR 11,Better rated than the previous version Saucony Freedom ISO,Better rated than the previous version Saucony Guide 9,Better rated than the previous version Saucony Guide ISO,Better rated than the previous version Saucony Hurricane ISO 3,Better rated than the previous version Saucony Kinvara 6,Better rated than the previous version Saucony Kinvara 7,Better rated than the previous version Saucony Kinvara 8,Better rated than the previous version Saucony Kinvara 9,Better rated than the previous version Saucony Omni 14,Better rated than the previous version Saucony Peregrine 7,Better rated than the previous version Saucony Peregrine 7 ICE+,Better rated than the previous version Saucony Ride 8,Better rated than the previous version Saucony Ride 9,Better rated than the previous version Saucony Ride ISO,Better rated than the previous version Saucony Triumph ISO 2,Better rated than the previous version Saucony Triumph ISO 3,Better rated than the previous version Saucony Triumph ISO 4,Better rated than the previous version Saucony Zealot ISO 2,Better rated than the previous version Skechers GOmeb Razor,Better rated than the previous version Skechers GOmeb Speed 3,Better rated than the previous version Skechers GOmeb Speed 4,Better rated than the previous version Skechers GOrun 4,Better rated than the previous version Skechers GOrun 600,Better rated than the previous version Skechers GOrun Ride 6,Better rated than the previous version Skechers GOtrail Ultra 3,Better rated than the previous version The North Face Ultra Cardiac,Better rated than the previous version Topo Athletic Hydroventure,Better rated than the previous version Topo Athletic Terraventure,Better rated than the previous version Under Armour Charged Bandit 2,Better rated than the previous version Under Armour Charged Escape,Better rated than the previous version Under Armour Fat Tire 2,Better rated than the previous version Under Armour HOVR Sonic,Better rated than the previous version Under Armour Micro G Speed Swift,Better rated than the previous version Under Armour SpeedForm Gemini 2,mean,is_a_upgraded_version
663,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004219,1
330,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004219,1
1035,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004219,1
1942,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004219,1
2077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004219,1


In [33]:
h.sort_values(by='mean',ascending=False).tail()

Unnamed: 0,Better rated than the previous version Adidas Adistar Boost 2,Better rated than the previous version Adidas Adizero Boston Boost 5,Better rated than the previous version Adidas Adizero Tempo 8,Better rated than the previous version Adidas AlphaBounce RC,Better rated than the previous version Adidas Climachill Cosmic Boost,Better rated than the previous version Adidas Duramo 8,Better rated than the previous version Adidas Galaxy 3,Better rated than the previous version Adidas Kanadia 7,Better rated than the previous version Adidas Madoru,Better rated than the previous version Adidas Pureboost 2.0,Better rated than the previous version Adidas Supernova Sequence Boost 7,Better rated than the previous version Altra Lone Peak 2.0,Better rated than the previous version Altra Lone Peak 3.5,Better rated than the previous version Altra Olympus 2.5,Better rated than the previous version Altra Olympus 3.0,Better rated than the previous version Altra Paradigm 3.0,Better rated than the previous version Altra Paradigm 4.0,Better rated than the previous version Altra Superior 3.0,Better rated than the previous version Altra Superior 3.5,Better rated than the previous version Altra Timp,Better rated than the previous version Altra Torin 2.5,Better rated than the previous version Altra Torin 3.0,Better rated than the previous version Asics DynaFlyte 2,Better rated than the previous version Asics FuzeX Lyte,Better rated than the previous version Asics GT 2000 4,Better rated than the previous version Asics GT 2000 5 Lite-Show,Better rated than the previous version Asics GT 2000 6,Better rated than the previous version Asics GT 3000 3,Better rated than the previous version Asics Gel Contend 3,Better rated than the previous version Asics Gel Cumulus 17,Better rated than the previous version Asics Gel DS Trainer 23,Better rated than the previous version Asics Gel Exalt 3,Better rated than the previous version Asics Gel Excite 3,Better rated than the previous version Asics Gel Excite 4,Better rated than the previous version Asics Gel Flux 3,Better rated than the previous version Asics Gel Foundation 12,Better rated than the previous version Asics Gel FujiAttack 5,Better rated than the previous version Asics Gel FujiTrabuco 6,Better rated than the previous version Asics Gel Kahana 7,Better rated than the previous version Asics Gel Kayano 22,Better rated than the previous version Asics Gel Kayano 23,Better rated than the previous version Asics Gel Kayano 24,Better rated than the previous version Asics Gel Kayano 25,Better rated than the previous version Asics Gel Nimbus 18,Better rated than the previous version Asics Gel Nimbus 19,Better rated than the previous version Asics Gel Nimbus 20,Better rated than the previous version Asics Gel Noosa Tri 10,Better rated than the previous version Asics Gel Patriot 7,Better rated than the previous version Asics Gel Pursue 2,Better rated than the previous version Asics Gel Quantum 180 2,Better rated than the previous version Asics Gel Quantum 180 3,Better rated than the previous version Asics Gel Quantum 360,Better rated than the previous version Asics Gel Quantum 360 Knit,Better rated than the previous version Asics Gel Sonoma 3,Better rated than the previous version Asics Gel Surveyor 3,Better rated than the previous version Asics Gel Venture 4,Better rated than the previous version Asics Gel Zaraca 4,Better rated than the previous version Asics Noosa FF,Better rated than the previous version Asics Roadhawk FF,Better rated than the previous version Brooks Adrenaline ASR 12,Better rated than the previous version Brooks Adrenaline GTS 15,Better rated than the previous version Brooks Adrenaline GTS 16,Better rated than the previous version Brooks Adrenaline GTS 17,Better rated than the previous version Brooks Adrenaline GTS 18,Better rated than the previous version Brooks Aduro 3,Better rated than the previous version Brooks Beast 16,Better rated than the previous version Brooks Cascadia 10,Better rated than the previous version Brooks Cascadia 12 GTX,Better rated than the previous version Brooks Ghost 10 GTX,Better rated than the previous version Brooks Ghost 7,Better rated than the previous version Brooks Ghost 9 GTX,Better rated than the previous version Brooks Glycerin 14,Better rated than the previous version Brooks Glycerin 16,Better rated than the previous version Brooks Launch 2,Better rated than the previous version Brooks Launch 3,Better rated than the previous version Brooks Launch 4,Better rated than the previous version Brooks Launch 5,Better rated than the previous version Brooks Neuro 2,Better rated than the previous version Brooks PureCadence 4,Better rated than the previous version Brooks PureCadence 5,Better rated than the previous version Brooks PureFlow 4,Better rated than the previous version Brooks PureFlow 5,Better rated than the previous version Brooks PureGrit 5,Better rated than the previous version Brooks Racer ST 5,Better rated than the previous version Brooks Ravenna 7,Better rated than the previous version Brooks Ravenna 8,Better rated than the previous version Brooks Ravenna 9,Better rated than the previous version Brooks Revel,Better rated than the previous version Brooks Trance 13,Better rated than the previous version Brooks Transcend 2,Better rated than the previous version Brooks Transcend 5,Better rated than the previous version Hoka One One Arahi,Better rated than the previous version Hoka One One Challenger 4 ATR,Better rated than the previous version Hoka One One Clayton,Better rated than the previous version Hoka One One Clifton 2,Better rated than the previous version Hoka One One Clifton 4,Better rated than the previous version Hoka One One Hupana,Better rated than the previous version Hoka One One Speed Instinct,Better rated than the previous version Hoka One One Speedgoat,Better rated than the previous version Hoka One One Speedgoat 2,Better rated than the previous version Hoka One One Tracer,Better rated than the previous version La Sportiva Bushido,Better rated than the previous version Merrell All Out Crush Tough Mudder,Better rated than the previous version Merrell Trail Glove 2,Better rated than the previous version Merrell Vapor Glove 2,Better rated than the previous version Mizuno Wave Catalyst,Better rated than the previous version Mizuno Wave Creation 18,Better rated than the previous version Mizuno Wave Daichi 2,Better rated than the previous version Mizuno Wave Enigma 5,Better rated than the previous version Mizuno Wave Hitogami 3,Better rated than the previous version Mizuno Wave Horizon,Better rated than the previous version Mizuno Wave Inspire 12,Better rated than the previous version Mizuno Wave Inspire 13,Better rated than the previous version Mizuno Wave Inspire 14,Better rated than the previous version Mizuno Wave Mujin 2,Better rated than the previous version Mizuno Wave Paradox 2,Better rated than the previous version Mizuno Wave Prophecy 5,Better rated than the previous version Mizuno Wave Rider 18,Better rated than the previous version Mizuno Wave Rider 19,Better rated than the previous version Mizuno Wave Rider 20,Better rated than the previous version Mizuno Wave Sayonara 3,Better rated than the previous version Mizuno Wave Shadow,Better rated than the previous version Mizuno Wave Sky,Better rated than the previous version Mizuno Wave Ultima 6,Better rated than the previous version Mizuno Wave Ultima 8,Better rated than the previous version New Balance 1080 v5,Better rated than the previous version New Balance 1400 v5,Better rated than the previous version New Balance 1500 v2,Better rated than the previous version New Balance 1500 v3,Better rated than the previous version New Balance 680 v2,Better rated than the previous version New Balance 840 v3,Better rated than the previous version New Balance 860 v6,Better rated than the previous version New Balance 860 v7,Better rated than the previous version New Balance 880 v4,Better rated than the previous version New Balance 880 v5,Better rated than the previous version New Balance 880 v7,Better rated than the previous version New Balance 990 v3,Better rated than the previous version New Balance 990 v4,Better rated than the previous version New Balance Fresh Foam 1080 v7,Better rated than the previous version New Balance Fresh Foam 1080 v8,Better rated than the previous version New Balance Fresh Foam 980,Better rated than the previous version New Balance Fresh Foam Boracay,Better rated than the previous version New Balance Fresh Foam Cruz Sport,Better rated than the previous version New Balance Fresh Foam Gobi Trail,Better rated than the previous version New Balance Fresh Foam Hierro,Better rated than the previous version New Balance Fresh Foam Vongo,Better rated than the previous version New Balance Fresh Foam Zante v3,Better rated than the previous version New Balance FuelCore Sonic,Better rated than the previous version New Balance Vazee Pace,Better rated than the previous version New Balance Vazee Prism,Better rated than the previous version New Balance Vazee Rush,Better rated than the previous version Newton Distance IV,Better rated than the previous version Newton Gravity V,Better rated than the previous version Nike Air Relentless 4,Better rated than the previous version Nike Air Relentless 5,Better rated than the previous version Nike Air Zoom Pegasus 33,Better rated than the previous version Nike Air Zoom Pegasus 34,Better rated than the previous version Nike Air Zoom Structure 19,Better rated than the previous version Nike Air Zoom Structure 20,Better rated than the previous version Nike Air Zoom Structure 20 Shield,Better rated than the previous version Nike Air Zoom Vomero 11,Better rated than the previous version Nike Air Zoom Wildhorse 4,Better rated than the previous version Nike Air Zoom Winflo 2,Better rated than the previous version Nike Air Zoom Winflo 3,Better rated than the previous version Nike Downshifter 5,Better rated than the previous version Nike FS Lite Run 3,Better rated than the previous version Nike Flex Fury,Better rated than the previous version Nike Flex RN 2016,Better rated than the previous version Nike Flex Run 2014,Better rated than the previous version Nike Free 4.0,Better rated than the previous version Nike Free Flyknit 3.0,Better rated than the previous version Nike Free RN,Better rated than the previous version Nike Free RN 2017,Better rated than the previous version Nike Free RN 2017 Shield,Better rated than the previous version Nike Free RN Distance,Better rated than the previous version Nike Free RN Flyknit,Better rated than the previous version Nike Free RN Flyknit 2017,Better rated than the previous version Nike Free RN Motion Flyknit,Better rated than the previous version Nike LunarEpic Low Flyknit,Better rated than the previous version Nike LunarGlide 8,Better rated than the previous version Nike Revolution 3,Better rated than the previous version Nike Zoom Streak 6,Better rated than the previous version Nike Zoom Terra Kiger 2,Better rated than the previous version Puma Faas 300 v4,Better rated than the previous version Reebok All Terrain Super 2.0,Better rated than the previous version Reebok Harmony Road 2,Better rated than the previous version Reebok Print Run 2.0,Better rated than the previous version Reebok Speedlux 2.0,Better rated than the previous version Salomon S-Lab Sense Ultra 5,Better rated than the previous version Salomon S-Lab Sonic,Better rated than the previous version Salomon S-Lab Speed,Better rated than the previous version Salomon S-Lab XA Alpine,Better rated than the previous version Salomon Sense Mantra 3,Better rated than the previous version Salomon Sense Pro,Better rated than the previous version Salomon Sense Ride,Better rated than the previous version Salomon Speedcross 4,Better rated than the previous version Salomon Speedcross Pro,Better rated than the previous version Salomon Speedcross Vario,Better rated than the previous version Salomon Wings Flyte,Better rated than the previous version Saucony Cohesion 11,Better rated than the previous version Saucony Cohesion 9,Better rated than the previous version Saucony Excursion TR 10,Better rated than the previous version Saucony Excursion TR 11,Better rated than the previous version Saucony Freedom ISO,Better rated than the previous version Saucony Guide 9,Better rated than the previous version Saucony Guide ISO,Better rated than the previous version Saucony Hurricane ISO 3,Better rated than the previous version Saucony Kinvara 6,Better rated than the previous version Saucony Kinvara 7,Better rated than the previous version Saucony Kinvara 8,Better rated than the previous version Saucony Kinvara 9,Better rated than the previous version Saucony Omni 14,Better rated than the previous version Saucony Peregrine 7,Better rated than the previous version Saucony Peregrine 7 ICE+,Better rated than the previous version Saucony Ride 8,Better rated than the previous version Saucony Ride 9,Better rated than the previous version Saucony Ride ISO,Better rated than the previous version Saucony Triumph ISO 2,Better rated than the previous version Saucony Triumph ISO 3,Better rated than the previous version Saucony Triumph ISO 4,Better rated than the previous version Saucony Zealot ISO 2,Better rated than the previous version Skechers GOmeb Razor,Better rated than the previous version Skechers GOmeb Speed 3,Better rated than the previous version Skechers GOmeb Speed 4,Better rated than the previous version Skechers GOrun 4,Better rated than the previous version Skechers GOrun 600,Better rated than the previous version Skechers GOrun Ride 6,Better rated than the previous version Skechers GOtrail Ultra 3,Better rated than the previous version The North Face Ultra Cardiac,Better rated than the previous version Topo Athletic Hydroventure,Better rated than the previous version Topo Athletic Terraventure,Better rated than the previous version Under Armour Charged Bandit 2,Better rated than the previous version Under Armour Charged Escape,Better rated than the previous version Under Armour Fat Tire 2,Better rated than the previous version Under Armour HOVR Sonic,Better rated than the previous version Under Armour Micro G Speed Swift,Better rated than the previous version Under Armour SpeedForm Gemini 2,mean,is_a_upgraded_version
796,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
795,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
794,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
793,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2247,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


In [276]:
colunas_selecionadas = ['watch-title', 'watch-view-count', 'watch-time-text', 'content_watch-info-tag-list', 'watch7-headline',
                    'watch7-user-header', 'watch8-sentiment-actions', "og:image", 'og:image:width', 'og:image:height',
                    "og:description", "og:video:width", 'og:video:height', "og:video:tag", 'channel_link_0']

In [48]:
df[colunas_selecionadas].head()

Unnamed: 0,watch-title,watch-view-count,watch-time-text,content_watch-info-tag-list,watch7-headline,watch7-user-header,watch8-sentiment-actions,og:image,og:image:width,og:image:height,og:description,og:video:width,og:video:height,og:video:tag,channel_link_0
0,How to Become A Machine Learning Engineer | Ho...,28.028 visualizações,Publicado em 3 de set. de 2018,Educação,#MachineLearningAlgorithms #Datasciencecourse ...,Simplilearn\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarre...,28.028 visualizações\n\n\n\n\n\n\n\n601\n\nGos...,https://i.ytimg.com/vi/-5hEYRt8JE0/maxresdefau...,1280.0,720.0,"This video on ""How to become a Machine Learnin...",1280.0,720.0,simplilearn,/channel/UCsvqVGtbbyHaMoevxPAq9Fg
1,BLOOPERS - Behind The Scenes. | DATA SCIENCE x...,1.131 visualizações,Publicado em 16 de nov. de 2018,Pessoas e blogs,#FAIL #insidezalando\n\n\n\n BLOOPERS - Beh...,Inside Zalando\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarr...,1.131 visualizações\n\n\n\n\n\n\n\n20\n\nGosto...,https://i.ytimg.com/vi/-7GiiT0yEyk/maxresdefau...,1280.0,720.0,#FAIL :) Have fun - and join our teams: https:...,1280.0,720.0,employer branding,/channel/UCTPin8TK-KRSI9zo9FoxG0g
2,Michael I. Jordan: Machine Learning: Dynamical...,1.816 visualizações,Publicado em 2 de mai. de 2019,Licença de atribuição Creative Commons (reutil...,#purdue #michaelijordan #engineering\n\n\n\n ...,Purdue Engineering\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,1.816 visualizações\n\n\n\n\n\n\n\n42\n\nGosto...,https://i.ytimg.com/vi/-8yYFdV5SOc/maxresdefau...,1280.0,720.0,2019 Purdue Engineering Distinguished Lecture ...,1280.0,720.0,electrical engineer,/channel/UC8FZ6dzFVkCACLH9YoMNFog
3,Best Deep Learning Tools - Welcome.AI,1.171 visualizações,Publicado em 13 de ago. de 2019,Ciência e tecnologia,Best Deep Learning Tools - Welcome.AI,Welcome.AI\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregan...,1.171 visualizações\n\n\n\n\n\n\n\n14\n\nGosto...,https://i.ytimg.com/vi/-9LLrwW1Vdo/maxresdefau...,1280.0,720.0,A collection of the 5 best deep learning tools...,1280.0,720.0,Watson,/channel/UC_215Y7rOAsqnFkO_hnpdIg
4,Kaggle Live-Coding: RNNs for Sarcasm Detection...,1.228 visualizações,Transmitido ao vivo em 30 de nov. de 2018,Ciência e tecnologia,Kaggle Live-Coding: RNNs for Sarcasm Detection...,Kaggle\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregando.....,1.228 visualizações\n\n\n\n\n\n\n\n28\n\nGosto...,https://i.ytimg.com/vi/-9U84J178OQ/maxresdefau...,1280.0,720.0,Join Kaggle data scientist Rachael live as she...,1280.0,720.0,CS,/channel/UCSNeZleDn9c74yQc-EKnVTA


In [None]:
df[colunas_selecionadas].to_feather("raw_data.feather")

In [49]:
df[colunas_selecionadas].to_csv("raw_data_sem_labels.csv")