## **DATA WRANGLING BASES**
This file attemps to gather, collect, and transform the bases raw dataset from the source attached below in order to analyse the data avilable and proceed with it. The following processes will be dealt with:

1. Reading the .csv file and transforming variables
2. Data exploration
3. Reshaping data
4. Filtering data

#### **IMPORT LIBRARIES**

In [2]:
import pandas as pd 
import plotly.express as px
import plotly.graph_objects as go

#### **1. READ DATA and VARIABLE TRANSFORMATION**
**Dataset**: bases_bicimad.xls     

**Description**: Dataset of the existing bases of the BiciMAD service. 

**Dataframe size**: 269 base stations including extensions and 13 variables.

In [3]:
bases= pd.read_excel("../Data/bases_bicimad.xls")
bases.shape

(269, 13)

In [4]:
bases.head()

Unnamed: 0,Número,Gis_X,Gis_Y,Fecha de Alta,Distrito,Barrio,Calle,Nº Finca,Tipo de Reserva,Número de Plazas,Longitud,Latitud,Direccion
0,001 a,440443.61,4474290.65,43803,01 CENTRO,01-06 SOL,"ALCALA, CALLE, DE",2,BiciMAD,30,-3.701998,40.417111,"ALCALA, CALLE, DE, 2"
1,001 b,440480.56,4474301.74,43867,01 CENTRO,01-06 SOL,"ALCALA, CALLE, DE",6,BiciMAD,30,-3.701564,40.417213,"ALCALA, CALLE, DE, 6"
2,2,440134.83,4474678.23,41813,01 CENTRO,01-05 UNIVERSIDAD,"MIGUEL MOYA, CALLE, DE",1,BiciMAD,24,-3.705674,40.42058,"MIGUEL MOYA, CALLE, DE, 1"
3,3,440012.98,4475760.68,41813,07 CHAMBERÍ,07-02 ARAPILES,"CONDE DEL VALLE DE SUCHIL, PLAZA, DEL",2,BiciMAD,18,-3.707212,40.430322,"CONDE DEL VALLE DE SUCHIL, PLAZA, DEL, 2"
4,4,440396.4,4475565.36,41813,01 CENTRO,01-05 UNIVERSIDAD,"MANUELA MALASAÑA, CALLE, DE",3,BiciMAD,24,-3.702674,40.42859,"MANUELA MALASAÑA, CALLE, DE, 3"


**Variables type check**: correct

In [5]:
bases.dtypes

Número               object
Gis_X               float64
Gis_Y               float64
Fecha de Alta         int64
Distrito             object
Barrio               object
Calle                object
Nº Finca             object
Tipo de Reserva      object
Número de Plazas      int64
Longitud            float64
Latitud             float64
Direccion            object
dtype: object

**NaN check**: don't exist

In [6]:
bases.isna().sum()

Número              0
Gis_X               0
Gis_Y               0
Fecha de Alta       0
Distrito            0
Barrio              0
Calle               0
Nº Finca            0
Tipo de Reserva     0
Número de Plazas    0
Longitud            0
Latitud             0
Direccion           0
dtype: int64

#### **2. DATA EXPLORATION**

Variable **"Tipo de Reserva"** has only one category: irrelevant

In [7]:
bases["Tipo de Reserva"].value_counts()

BiciMAD    269
Name: Tipo de Reserva, dtype: int64

**Número de Plazas distribution**:an 81% of bases have 24 docks. 

In [8]:
fig = px.histogram(bases, x="Número de Plazas", nbins = 30, histnorm='probability density')
fig.update_traces(marker_color = "darkorange")
fig.show()

**Distrito distribution**: Half the number of stations are concentrated in CENTRO, SALAMANCA and CHAMBERÍ

In [9]:
count_distrito = bases["Distrito"].value_counts() 
labels_distrito = count_distrito.index

fig = px.pie(bases, values=count_distrito, names=labels_distrito, color=labels_distrito,
             color_discrete_sequence=px.colors.sequential.RdBu)

fig.update_layout(title = "Distribución Número de número de bases por distrito")
fig.show()

**Barrio distribution**: irregular distribution between neighborhoods. Number of bases range from 1 to 14 by neigborhood 

In [10]:
count_barrio = bases["Barrio"].value_counts() 
labels_barrio = count_barrio.index

fig = go.Figure()
fig.add_trace(
    go.Bar(
        x = labels_barrio,
        y = count_barrio,
        showlegend = False
    )
)
fig.update_layout(title = "Distribución Número de número de bases por barrio",
                  xaxis_title = "Barrio", yaxis_title = "Número absoluto de estaciones")
fig.show()

**Location of Bases - Latitude and Longitude**: all bases located in Madrid. Correct data.

In [15]:
# https://plotly.com/python/scattermapbox/
#df['text'] = bases['airport'] + '' + df['city'] + ', ' + df['state'] + '' + 'Arrivals: ' + df['cnt'].astype(str)

fig = px.scatter_mapbox(bases, lat="Latitud", lon="Longitud",  color = "Distrito", size=bases["Número de Plazas"]*0.5, width = 1000, height = 900, zoom = 12)
fig.add_trace(go.Scattermapbox(
        lat = bases["Latitud"],
        lon=bases["Longitud"],
        mode='markers',
        showlegend=False,
        marker=go.scattermapbox.Marker(
            size=bases["Número de Plazas"]*0.5,
            opacity=0.7  ,  
            color = "white"    )
    ))
fig.update_layout(
    title='Bases BiciMad por Distrito',
    autosize=True,
    hovermode='closest',
    showlegend=True,
    width = 1000,
    height = 1000,
    mapbox=dict(
        bearing=0,
        center=dict(
            lat=40.435,
            lon=-3.69
        ),
        zoom=12,
        style= 'carto-positron' # 'open-street-map'
        
    )
)
fig.show()

#### **3. RESHAPING DATA**

With the purpose of optimizing space, variables that are irrelevant for the model are deleted - bases_clean

** Delete CALLE and Nº de FINCA as these to variables concatenated form variable DIRECCION. 

In [240]:
bases_clean = bases.drop(columns = ["Gis_X", "Gis_Y", "Fecha de Alta", "Calle", "Nº Finca", "Tipo de Reserva"], axis = 1)
bases_clean.head()

Unnamed: 0,Número,Distrito,Barrio,Número de Plazas,Longitud,Latitud,Direccion
0,001 a,01 CENTRO,01-06 SOL,30,-3.701998,40.417111,"ALCALA, CALLE, DE, 2"
1,001 b,01 CENTRO,01-06 SOL,30,-3.701564,40.417213,"ALCALA, CALLE, DE, 6"
2,2,01 CENTRO,01-05 UNIVERSIDAD,24,-3.705674,40.42058,"MIGUEL MOYA, CALLE, DE, 1"
3,3,07 CHAMBERÍ,07-02 ARAPILES,18,-3.707212,40.430322,"CONDE DEL VALLE DE SUCHIL, PLAZA, DEL, 2"
4,4,01 CENTRO,01-05 UNIVERSIDAD,24,-3.702674,40.42859,"MANUELA MALASAÑA, CALLE, DE, 3"


**##################################################IDEAS#####################################################################**

In [197]:
# https://plotly.com/python/scattermapbox/
fig = go.Figure()
fig.add_trace(go.Scattermapbox(
        lat = bases["Latitud"],
        lon=bases["Longitud"],
        mode='markers',
        marker=go.scattermapbox.Marker(
            size=bases["Número de Plazas"],
            opacity=0.7        ),
        text="Número",
        hoverinfo='text'
    ))

fig.add_trace(go.Scattermapbox(
        lat = bases["Latitud"],
        lon=bases["Longitud"],
        mode='markers',
        marker=go.scattermapbox.Marker(
            size=bases["Número de Plazas"]*0.5,
            color='rgb(242, 177, 172)',
            opacity=0.7        )
    ))

fig.update_layout(
    title='Bases BiciMad',
    autosize=True,
    hovermode='closest',
    showlegend=False,
    width = 1000,
    height = 1000,
    mapbox=dict(
        bearing=0,
        center=dict(
            lat=40.435,
            lon=-3.69
        ),
        zoom=12,
        style='open-street-map'
        
    ),
)

fig.show()

In [195]:
# https://plotly.com/python/scatter-plots-on-maps/
# df['text'] = df['airport'] + '' + df['city'] + ', ' + df['state'] + '' + 'Arrivals: ' + df['cnt'].astype(str)


fig = go.Figure(data=go.Scattergeo(
        #locationmode = 'USA-states',
        lon = bases['Longitud'],
        lat = bases['Latitud'],
        text = bases['Número'],
        mode = 'markers',
        marker = dict(
            size = 8,
            opacity = 0.8,
            reversescale = True,
            autocolorscale = False,
            symbol = 'square',
            line = dict(
                width=1,
                color='rgba(102, 102, 102)'
            ),
            colorscale = 'Blues',
            cmin = 0,
            color = bases['Número de Plazas'],
            cmax = bases['Número de Plazas'].max(),
            colorbar_title="Incoming flights<br>February 2011"
        )))
fig.update_geos(fitbounds="locations", visible=False)

fig.update_layout(
        title = 'Most trafficked US airports<br>(Hover for airport names)',
        geo = dict(
            showland = True,
            landcolor = "rgb(250, 250, 250)",
            subunitcolor = "rgb(217, 217, 217)",
            countrycolor = "rgb(217, 217, 217)",
            countrywidth = 0.5,
            subunitwidth = 0.5
        ),
    )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()