<a href="https://colab.research.google.com/github/maxhuguenin/igti_python/blob/master/igti_python_modulo2_desafio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Enunciado</h1>
O uso de bicicletas como meio de transporte ganhou muita força nos últimos anos, seja por questões ambientais, de saúde ou até mesmodeinfraestrutura de trânsito. Para incentivar seu uso, cidades em todo o mundo têm implementado programas de compartilhamento de bicicleta. Nesses sistemas, elas são retiradas e devolvidas em quiosques automatizadosque ficam espalhados pordiversos pontos da cidade.

As plataformas de compartilhamento de bicicletas costumam coletar diversos tipos de dados, como aduração da viagem, as localizações iniciais e finais dos percursos, entre outros. Esses dados, em conjunto com informações sobre o clima, o trânsito e o relevo, por exemplo, possibilitam uma análise mais robusta do compartilhamento de bicicletas.

Segueum descritivo dos dados coletados:

<b>rec_id</b>: índice do registro de locação

<b>datetime</b>: data

<b>season</b>: estação do ano (1: inverno, 2: primevera, 3: verão, 4: outono). Relativo ao hemisfério norte

<b>year</b>: ano (0: 2011, 1:2012)

<b>month</b>: mês (1 a 12)

<b>hour</b>: hora do dia (0 a 23)

<b>is_holiday</b>: booleano indicando feriado

<b>weekday</b>: dia da semana (0: domingo, 1: segunda-feira, ..., 6: sábado)

<b>is_workingday</b>: booleano indicando dia útil

<b>weather_condition</b>: (1: limpo, 2: nublado, 3: chuva leve, 4: chuva forte)

<b>temp</b>: Temperatura escalada entre 0 e 1. Valor original em graus Celsius: -8 a 39

<b>atemp</b>: Sensação térmica escalada entre 0 e 1. Valor original em graus Celsius: -16 a 50

<b>humidity</b>: Humidade relativa (0 a 1)

<b>windspeed</b>: Velocidade do vento escalada entre 0 e 1 (máximo original: 67)

<b>casual</b>: número de locações para usuários casuais

<b>registered</b>: número de locações para usuários registrados

<b>total_count</b>: contador total de aluguéis (casual+registered)



In [1]:
import numpy as np
import pandas as pd

In [4]:
df = pd.read_csv("bike-sharing.csv")

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   datetime           17379 non-null  object 
 1   rec_id             17379 non-null  int64  
 2   season             17379 non-null  int64  
 3   year               17379 non-null  int64  
 4   month              17379 non-null  int64  
 5   hour               17379 non-null  int64  
 6   is_holiday         17379 non-null  int64  
 7   weekday            17379 non-null  int64  
 8   is_workingday      17379 non-null  int64  
 9   weather_condition  17379 non-null  int64  
 10  temp               17379 non-null  float64
 11  atemp              17379 non-null  float64
 12  humidity           17379 non-null  float64
 13  windspeed          17379 non-null  float64
 14  casual             17379 non-null  int64  
 15  registered         17379 non-null  int64  
 16  total_count        173

In [6]:
df.describe()

Unnamed: 0,rec_id,season,year,month,hour,is_holiday,weekday,is_workingday,weather_condition,temp,atemp,humidity,windspeed,casual,registered,total_count
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


In [7]:
# Quantos registros de locações existem para o ano de 2011?
df[df['year'] == 0]

Unnamed: 0,datetime,rec_id,season,year,month,hour,is_holiday,weekday,is_workingday,weather_condition,temp,atemp,humidity,windspeed,casual,registered,total_count
0,2011-01-01,1,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0000,3,13,16
1,2011-01-01,2,1,0,1,1,0,6,0,1,0.22,0.2727,0.80,0.0000,8,32,40
2,2011-01-01,3,1,0,1,2,0,6,0,1,0.22,0.2727,0.80,0.0000,5,27,32
3,2011-01-01,4,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0000,3,10,13
4,2011-01-01,5,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0000,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8640,2011-12-31,8641,1,0,12,19,0,6,0,1,0.42,0.4242,0.54,0.2239,19,73,92
8641,2011-12-31,8642,1,0,12,20,0,6,0,1,0.42,0.4242,0.54,0.2239,8,63,71
8642,2011-12-31,8643,1,0,12,21,0,6,0,1,0.40,0.4091,0.58,0.1940,2,50,52
8643,2011-12-31,8644,1,0,12,22,0,6,0,1,0.38,0.3939,0.62,0.1343,2,36,38


In [8]:
# Quantos registros de locações existem para o ano de 2012?
df[df['year'] == 1]

Unnamed: 0,datetime,rec_id,season,year,month,hour,is_holiday,weekday,is_workingday,weather_condition,temp,atemp,humidity,windspeed,casual,registered,total_count
8645,2012-01-01,8646,1,1,1,0,0,0,0,1,0.36,0.3788,0.66,0.0000,5,43,48
8646,2012-01-01,8647,1,1,1,1,0,0,0,1,0.36,0.3485,0.66,0.1343,15,78,93
8647,2012-01-01,8648,1,1,1,2,0,0,0,1,0.32,0.3485,0.76,0.0000,16,59,75
8648,2012-01-01,8649,1,1,1,3,0,0,0,1,0.30,0.3333,0.81,0.0000,11,41,52
8649,2012-01-01,8650,1,1,1,4,0,0,0,1,0.28,0.3030,0.81,0.0896,0,8,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17374,2012-12-31,17375,1,1,12,19,0,1,1,2,0.26,0.2576,0.60,0.1642,11,108,119
17375,2012-12-31,17376,1,1,12,20,0,1,1,2,0.26,0.2576,0.60,0.1642,8,81,89
17376,2012-12-31,17377,1,1,12,21,0,1,1,1,0.26,0.2576,0.60,0.1642,7,83,90
17377,2012-12-31,17378,1,1,12,22,0,1,1,1,0.26,0.2727,0.56,0.1343,13,48,61


In [9]:
# Quantas locações de bicicletas foram efetuadas em 2011?
df2011 = df[df['year'] == 0]
df2011.sum()

datetime             2011-01-012011-01-012011-01-012011-01-012011-0...
rec_id                                                        37372335
season                                                           21730
year                                                                 0
month                                                            56832
hour                                                            100054
is_holiday                                                         239
weekday                                                          26045
is_workingday                                                     5911
weather_condition                                                12428
temp                                                              4228
atemp                                                          4054.51
humidity                                                       5562.45
windspeed                                                      1652.68
casual

In [10]:
# Quantas locações de bicicletas foram efetuadas em 2012?
df2012 = df[df['year'] == 1]
df2012.sum()

datetime             2012-01-012012-01-012012-01-012012-01-012012-0...
rec_id                                                       113651175
season                                                           21746
year                                                              8734
month                                                            56788
hour                                                            100617
is_holiday                                                         261
weekday                                                          26156
is_workingday                                                     5954
weather_condition                                                12342
temp                                                           4409.14
atemp                                                          4213.99
humidity                                                       5338.16
windspeed                                                      1651.02
casual

In [11]:
# Qual estação do ano contém a maior média de locações de bicicletas?
inverno = df[df['season'] == 1]
print('inverno:', inverno['total_count'].mean())

primavera = df[df['season'] == 2]
print('primavera:', primavera['total_count'].mean())

verao = df[df['season'] == 3]
print('verao:', verao['total_count'].mean())

outono = df[df['season'] == 4]
print('outono:', outono['total_count'].mean())

inverno: 111.11456859971712
primavera: 208.34406894987526
verao: 236.01623665480426
outono: 198.86885633270322


In [12]:
# Qual horário do dia contém a maior média de locações de bicicletas? 
maior_media_locacao = df.groupby(by=['hour']).mean()
maior_media_locacao['total_count'].sort_values(ascending=False)

hour
17    461.452055
18    425.510989
8     359.011004
16    311.983562
19    311.523352
13    253.661180
12    253.315934
15    251.233196
14    240.949246
20    226.030220
9     219.309491
7     212.064649
11    208.143054
10    173.668501
21    172.314560
22    131.335165
23     87.831044
6      76.044138
0      53.898072
1      33.375691
2      22.869930
5      19.889819
3      11.727403
4       6.352941
Name: total_count, dtype: float64

In [13]:
# Qual horário do dia contém a menor média de locações de bicicletas? 
menor_media_locacao = df.groupby(by=['hour']).mean()
menor_media_locacao['total_count'].sort_values(ascending=True)

hour
4       6.352941
3      11.727403
5      19.889819
2      22.869930
1      33.375691
0      53.898072
6      76.044138
23     87.831044
22    131.335165
21    172.314560
10    173.668501
11    208.143054
7     212.064649
9     219.309491
20    226.030220
14    240.949246
15    251.233196
12    253.315934
13    253.661180
19    311.523352
16    311.983562
8     359.011004
18    425.510989
17    461.452055
Name: total_count, dtype: float64

In [14]:
# Que dia da semana contém a maior média de locações de bicicletas?
dia_da_semana_maior_locacao = df.groupby(by=['weekday']).mean()
dia_da_semana_maior_locacao['total_count'].sort_values(ascending=False)

weekday
4    196.436665
5    196.135907
2    191.238891
3    191.130505
6    190.209793
1    183.744655
0    177.468825
Name: total_count, dtype: float64

In [None]:
# Que dia da semana contém a menor média de locações de bicicletas?
dia_da_semana_menor_locacao = df.groupby(by=['weekday']).mean()
dia_da_semana_menor_locacao['total_count'].sort_values(ascending=True)

weekday
0    177.468825
1    183.744655
6    190.209793
3    191.130505
2    191.238891
5    196.135907
4    196.436665
Name: total_count, dtype: float64

In [None]:
# Às quartas-feiras (weekday = 3), qual o horário do dia contém a maior média de locações de bicicletas? 
df_quarta_feira = df[df['weekday'] == 3]
df_horario_quarta_feira = df_quarta_feira.groupby(by=['hour']).mean()
df_horario_quarta_feira['total_count'].sort_values(ascending=False)

hour
17    513.144231
18    494.029126
8     488.326923
19    357.504854
7     303.980769
16    272.961538
20    256.660194
9     238.528846
21    194.669903
12    193.903846
13    185.826923
15    181.288462
14    170.548077
11    152.201923
22    143.689320
10    131.894231
6     107.807692
23     83.737864
0      34.557692
5      25.750000
1      15.336538
2       7.813725
4       4.968750
3       4.888889
Name: total_count, dtype: float64

In [None]:
# Aos sábados (weekday = 6), qual o horário do dia contém a maior média de locações de bicicletas? 
df_sabado = df[df['weekday'] == 6]
df_horario_sabado = df_sabado.groupby(by=['hour']).mean()
df_horario_sabado['total_count'].sort_values(ascending=False)

hour
13    385.371429
15    382.428571
14    381.333333
12    375.380952
16    366.142857
17    334.409524
11    328.609524
18    292.048077
10    263.723810
19    239.932692
9     186.790476
20    180.865385
21    156.000000
22    139.663462
23    115.855769
8     114.476190
0      94.304762
1      67.780952
2      50.495238
7      45.961905
3      22.885714
6      21.000000
5       8.291262
4       7.657143
Name: total_count, dtype: float64