# **DATA WRANGLING SITUACIÓN ESTACIONES**
This file attemps to gather, collect, and transform the station situations raw dataset from the source attached below in order to analyse the data avilable and proceed with it. The following processes will be dealt with:

1. Reading the .json files and transforming variables
2. Data exploration
3. Reshaping data
4. Filtering data

Source: https://datos.madrid.es/sites/v/index.jsp?vgnextoid=374512b9ace9f310VgnVCM100000171f5a0aRCRD&buscar=true&Texto=BiciMAD&Sector=&Formato=&Periodicidad=&orderByCombo=CONTENT_INSTANCE_NAME_DECODE

#### **LIBRARIES**

In [2]:
import pandas as pd 
import json

#### **1. READ DATA and VARIABLE TRANSFORMATION**
**Dataset**: 20190x.json (X stands for the month number)    

**Description**: Dataset of the situation of the different stations of BiciMAD by date and time

**Dataframe size**: Each month 145,286 timestamps are colected with information on 14 variables. Observations froom july to december total 883,777 timestamps

In [12]:
itinerarios_list = ["201907", "201908", "201909", "201910", "201911", "201912"]

situaciones = pd.DataFrame()

for i in itinerarios_list:
    with open('../Data/Situacion Estaciones/'+i+'.json','r') as f:
        data = json.loads(f.read())

    df_i = pd.json_normalize(
        data, 
        meta=['_id'],
        record_path =['stations']
    )
    df_i["_id"] = pd.to_datetime(df_i["_id"])
    situaciones = pd.concat( [situaciones, df_i.drop(columns="id")], ignore_index=True, axis=0)

situaciones.head()

Unnamed: 0,activate,name,reservations_count,light,total_bases,free_bases,number,longitude,no_available,address,latitude,dock_bikes,_id
0,1,Puerta del Sol A,0,2,24,15,1a,-3.7024255,0,Puerta del Sol nº 1,40.4168961,8,2019-07-01 00:29:26.018083
1,1,Puerta del Sol B,0,2,24,14,1b,-3.7024207,0,Puerta del Sol nº 1,40.4170009,9,2019-07-01 00:29:26.018083
2,1,Miguel Moya,0,2,24,12,2,-3.7058415,0,Calle Miguel Moya nº 1,40.4205886,10,2019-07-01 00:29:26.018083
3,1,Plaza Conde Suchil,0,1,18,4,3,-3.7069171,0,Plaza del Conde Suchil nº 2-4,40.4302937,11,2019-07-01 00:29:26.018083
4,1,Malasaña,0,0,24,17,4,-3.7025875,0,Calle Manuela Malasaña nº 5,40.4285524,3,2019-07-01 00:29:26.018083


In [13]:
situaciones.shape

(883777, 13)

**Variables type check**: correct

In [20]:
situaciones.dtypes

activate                       int64
name                          object
reservations_count             int64
light                          int64
total_bases                    int64
free_bases                     int64
number                        object
longitude                     object
no_available                   int64
address                       object
latitude                      object
dock_bikes                     int64
_id                   datetime64[ns]
dtype: object

**NaN check**: don't exist

In [19]:
situaciones.isna().sum()

activate              0
name                  0
reservations_count    0
light                 0
total_bases           0
free_bases            0
number                0
longitude             0
no_available          0
address               0
latitude              0
dock_bikes            0
_id                   0
dtype: int64

**############################################TO TRY SINGLE MONTH###########################**

In [8]:
# load data using Python JSON module
with open('../Data/Situacion Estaciones/201909.json','r') as f:
    data = json.loads(f.read())

sit_201909 = pd.json_normalize(
    data, 
    meta=['_id'],
    record_path =['stations']
)
sit_201909["_id"] = pd.to_datetime(sit_201909["_id"])
sit_201909 = sit_201909.drop(columns="id")
sit_201909.head()

Unnamed: 0,activate,name,reservations_count,light,total_bases,free_bases,number,longitude,no_available,address,latitude,dock_bikes,_id
0,1,Puerta del Sol A,0,0,24,17,1a,-3.7024255,0,Puerta del Sol nº 1,40.4168961,3,2019-09-01 00:10:32.593867
1,1,Puerta del Sol B,0,2,24,13,1b,-3.7024207,0,Puerta del Sol nº 1,40.4170009,10,2019-09-01 00:10:32.593867
2,1,Miguel Moya,0,2,24,8,2,-3.7058415,0,Calle Miguel Moya nº 1,40.4205886,16,2019-09-01 00:10:32.593867
3,1,Plaza Conde Suchil,0,0,18,12,3,-3.7069171,0,Plaza del Conde Suchil nº 2-4,40.4302937,3,2019-09-01 00:10:32.593867
4,1,Malasaña,0,2,24,8,4,-3.7025875,0,Calle Manuela Malasaña nº 5,40.4285524,12,2019-09-01 00:10:32.593867
