**NETFLIX ANALYSIS ALGORITHM**
1st Step: Data Aquisition
2nd Step: Preparing the Data

In [3]:
import pandas as pd

In [8]:
#1st Step - carregando os dados na memória
df_row = pd.read_csv('netflix_ds.csv')
df_row.head()

Unnamed: 0,Profile Name,Start Time,Duration,Attributes,Title,Supplemental Video Type,Device Type,Bookmark,Latest Bookmark,Country
0,Charlie,2013-03-20 5:17:53,0:00:05,,Star Trek: Deep Space Nine: Season 5: Empok No...,,Mac,0:00:05,Not latest view,US (United States)
1,Charlie,2013-03-20 4:27:45,0:44:31,,Star Trek: Deep Space Nine: Season 5: Blaze of...,,Mac,0:44:31,Not latest view,US (United States)
2,Charlie,2013-03-20 4:05:21,0:22:06,,Star Trek: Deep Space Nine: Season 5: Children...,,Mac,0:44:37,0:44:37,US (United States)
3,Charlie,2013-03-20 0:20:03,0:48:14,,The Invisible War,,Microsoft Xbox 360,0:53:18,0:53:18,US (United States)
4,Charlie,2013-03-20 0:10:31,0:04:51,,The Invisible War,,Mac,0:05:01,Not latest view,US (United States)


In [9]:
df_row.shape #shape - informa a dimensão do dataframe(qtde_linhas, qtde_colunas)

(200, 10)

In [10]:
#2nd Step - Limpeza e preparação dos dados
#drop() - apaga linhas ou colunas do dataframe;
#o parametro 'axis=1' determina que apagará as colunas da lista passada
df = df_row.drop(['Profile Name', 'Attributes', 'Supplemental Video Type', 'Device Type', 'Bookmark', 'Latest Bookmark', 'Country'], axis = 1)
df.head()

Unnamed: 0,Start Time,Duration,Title
0,2013-03-20 5:17:53,0:00:05,Star Trek: Deep Space Nine: Season 5: Empok No...
1,2013-03-20 4:27:45,0:44:31,Star Trek: Deep Space Nine: Season 5: Blaze of...
2,2013-03-20 4:05:21,0:22:06,Star Trek: Deep Space Nine: Season 5: Children...
3,2013-03-20 0:20:03,0:48:14,The Invisible War
4,2013-03-20 0:10:31,0:04:51,The Invisible War


In [11]:
#informa o tipo dos dados em cada coluna do nosso dataframe
df.dtypes

Start Time    object
Duration      object
Title         object
dtype: object

In [13]:
#conversão dos formatos dos campos de tempo e data, para adequado tratamento das informações.
df['Start Time'] = pd.to_datetime(df['Start Time'], utc=True)
df['Duration'] = pd.to_timedelta(df['Duration'])
df.dtypes

Start Time    datetime64[ns, UTC]
Duration          timedelta64[ns]
Title                      object
dtype: object

**Algumas configurações que podem ser interessantes**

- Alteração do índice do DataFrame
- Configuração do horário, pela região (https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)

- set_index(): faz a coluna 'Start Time' se tornar o indice do dataframe
    df = df.set_index('Start Time')

- index.tz_convert - converte a data para o padrão UTC conforme o parâmetro de região informado como atributo.
    df.index = df.index.tz_convert('America/Belem')

- reset_index volta o indice do dataframe para o padrão inicial
    df = df.reset_index()

In [15]:
df = df.set_index('Start Time')
df.index = df.index.tz_convert('America/Belem')
df = df.reset_index()

In [16]:
df.head()

Unnamed: 0,Start Time,Duration,Title
0,2013-03-20 02:17:53-03:00,0 days 00:00:05,Star Trek: Deep Space Nine: Season 5: Empok No...
1,2013-03-20 01:27:45-03:00,0 days 00:44:31,Star Trek: Deep Space Nine: Season 5: Blaze of...
2,2013-03-20 01:05:21-03:00,0 days 00:22:06,Star Trek: Deep Space Nine: Season 5: Children...
3,2013-03-19 21:20:03-03:00,0 days 00:48:14,The Invisible War
4,2013-03-19 21:10:31-03:00,0 days 00:04:51,The Invisible War


In [21]:
# criando um novo dataframe contendo apenas os registros relacionados ao titulo Star Trek
# regex = False - indica para a função que o argumento passado é um texto e não uma "expressão regular"(código).
stTrek = df[df['Title'].str.contains('Star Trek', regex = False)]
stTrek.head()

Unnamed: 0,Start Time,Duration,Title
0,2013-03-20 02:17:53-03:00,0 days 00:00:05,Star Trek: Deep Space Nine: Season 5: Empok No...
1,2013-03-20 01:27:45-03:00,0 days 00:44:31,Star Trek: Deep Space Nine: Season 5: Blaze of...
2,2013-03-20 01:05:21-03:00,0 days 00:22:06,Star Trek: Deep Space Nine: Season 5: Children...
5,2013-03-19 19:34:08-03:00,0 days 00:22:29,Star Trek: Deep Space Nine: Season 5: Children...
6,2013-03-19 02:07:46-03:00,0 days 00:33:10,Star Trek: Deep Space Nine: Season 5: Soldiers...


In [18]:
stTrek.shape

(43, 3)

In [20]:
# removendo os registros de visulização com menos de 1 minuto de duração.
stTrek = stTrek[(stTrek['Duration'] > '0 days 00:01:00')]
stTrek.shape

(38, 3)

# Análise dos Dados
####A partir de agora vem a análise, propriamente dita, dos dados.

In [24]:
tmpStTrek = stTrek['Duration'].sum() #tmpStTrek é o tempo gasto assistindo Star Trek
print('O usuário passou ', tmpStTrek, 'assistindo Star Trek!')

O usuário passou  0 days 20:02:03 assistindo Star Trek!
