# I Workshop on DataScience of POLI - USP
# Basic data visualization with the ARM/BR/Manacapuru (goamazon) dataset
## Surface Meteorological Instrumentation (MET)

* Gabriel A. von Winckler
* Kleberson Junio do Amaral Serique
* Launindo Campos
* Raphael M. Cobe 

# Introduction

This project aims to deliver a better interface to visualize data from the ARM project (http://www.arm.gov)
In order to achieve this goal we first developed a small library that parses the NetCDF files within a given time window for a specific variable.

# Getting and cleaning the data

The data was gathered from the ARM portal in NetCDF format. The data was downloaded from the Manacapuru station and contains data from several sensors, such as:
* Temperature;
* Relative Humidity;
* Precipitation; and so on..

The complete list of the sensors can be seen at: https://github.com/raphaelmcobe/wds2015

The downloaded data was divided in files - one for each day. Each day counted with **1440 sensor measures**. The data ranged from 2013 to 2015 in a total of **701 files**.

We developed our project with the following technologies:

1. Python language;
2. Jupyter Notebook;
3. Numpy;
4. NetCDF4 for Python;
5. D3 JS; and
6. Mozilla Metrics Graphics

## The parser

The parser receives two dates and selects the files within that range. It also receives the number of a variable and collects the data for such variable from the files. The missing data is filled with a specific *None* python object. We assume the the data is available at the `data` directory.

We used numpy to summarize the data for each day, by calculating the mean and standard deviation for the set of 1440 daily measures.

The generated output is in JSON format and is automatically generated using a Python serialization API.

In [16]:
from netCDF4 import Dataset
from datetime import datetime
import glob
import os
import numpy as np
import json

DATA_HOME = "./data/"
def NetCDFtoJSON(variable="temp_mean", 
    start_date="20140101", end_date="20140131"):

    """ This function receives a time frame and produces a json version of
    the data stored at the netcdf repository.  """

    data_home = DATA_HOME

    start_date = datetime.strptime(start_date, "%Y%m%d").date()

    end_date = datetime.strptime(end_date, "%Y%m%d").date()

    data_files = glob.glob("%s/*.cdf" %(data_home))
    filtered_data = []
    for file_name in data_files:
        file_name = os.path.abspath(file_name)
        file_date = datetime.strptime(
            file_name.split('.')[2],"%Y%m%d").date()
        if file_date >= start_date and file_date <= end_date:
            file_obj = Dataset(file_name)
            date_value = {}
            variable_value_daily_mean = np.mean(file_obj.variables[variable][:])
            variable_value_daily_std = np.std(file_obj.variables[variable][:])
            date_value["date"] = file_date.isoformat()
            mean_value = np.asscalar(variable_value_daily_mean)
            lower_std = mean_value - np.asscalar(variable_value_daily_std)
            upper_std = mean_value + np.asscalar(variable_value_daily_std)
            if mean_value == 0 :
                mean_value = None
                upper_std = None
                lower_std = None
            date_value["value"] = mean_value
            date_value["l"] = lower_std
            date_value["u"] = upper_std
            filtered_data.append(date_value)

    return json.dumps(filtered_data)

As output example we have:

In [17]:
NetCDFtoJSON("temp_mean", "20151015", "20151016")

'[{"date": "2015-10-15", "u": 34.96739721298218, "l": 25.873175144195557, "value": 30.420286178588867}, {"date": "2015-10-16", "u": 35.23344135284424, "l": 26.28056240081787, "value": 30.757001876831055}]'

# Visualizing data

Load JS libraries (and CSS) using `IPython.display.Javascript`

* Input a data for start time `start_time`
* Input a data for end time `end_time`

In [18]:
import json
from IPython.display import Javascript
#runs arbitrary javascript, client-side

start_time = "20150601"
end_time = "20150801"

Javascript("""
           window.chart_data_temp={0};
           window.chart_data_rh={1};
           window.chart_data_precip={2};
           """.format(NetCDFtoJSON("temp_mean", start_time, end_time),
                      NetCDFtoJSON("rh_mean", start_time, end_time),
                      NetCDFtoJSON("org_precip_rate_mean", start_time, end_time)))

<IPython.core.display.Javascript object>

# Load Javascript APIs and CSS

We use D3 JS at (http://cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min) and Metrics Graphics at (metricsgraphics.min)

We load CSS for Metrics Graphics (metricsgraphics.css)


In [19]:
%%javascript
require.config({
  paths: {
      d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
      mg: '//www.ncc.unesp.br/~winckler/metrics-graphics-2.7.0/dist/metricsgraphics.min',
      scatter: '//www.ncc.unesp.br/~winckler/metrics-graphics-2.7.0/dist/scatter'
  }
});

$('head').append('<link rel="stylesheet" type="text/css" href="css/metricsgraphics.css">');
$('head').append('<link rel="stylesheet" type="text/css" href="css/scatter-matrix.css">');

<IPython.core.display.Javascript object>

# Javascript Function to plot data

We create tree `<div>` element for each graph
* chart_temp 
* chart_rh
* chart_precip


In [20]:
%%javascript
require(['d3', 'mg'], function(d3){

  $(".graph").remove();
  element.append("<div id='chart_temp' class='graph' style='float: left'></div>");
  element.append("<div id='chart_rh' class='graph' style='float: left'></div>");
  element.append("<div id='chart_precip' class='graph' style='float: left'></div>");
  element.append("<div id='chart_break' style='float: none; display: block; clear: both;'></div>");
  element.append("<div id='chart_scatter' class='graph' style=''></div>");
  $(".graph").width("480px");
  $(".graph").height("300px");
  $('#chart_scatter').width("945px").height("730px");

  window.mg_data_temp = MG.convert.date(chart_data_temp, 'date');
  window.mg_data_rh = MG.convert.date(chart_data_rh, 'date');
  window.mg_data_precip = MG.convert.date(chart_data_precip, 'date');

  function draw_graph(div, data, title) {
      MG.data_graphic({
        animate_on_load: true,
        title: title,
        description: "This is a simple line chart.",
        data: data,
        width: 480,
        height: 300,
        right: 40,
        target: div,
        x_accessor: 'date',
        y_accessor: 'value',
        min_y_from_data: true,
        show_secondary_x_label: false,
        show_confidence_band: ['l', 'u'],
        area: false,
        linked: true,
        missing_is_hidden: true,
    });
  }
   
    draw_graph($('#chart_temp')[0], window.mg_data_temp, "Temperature (C)");
    draw_graph($('#chart_rh')[0], window.mg_data_rh, "Relative Humidity (%)");
    draw_graph($('#chart_precip')[0], window.mg_data_precip, "Preciptation (mm/hour)");
    
    // scatter
    require(['scatter'], function(){
        window.d3=d3;
        var scatter_data=[];
        for (var i in chart_data_temp) {
            scatter_data = scatter_data.concat({temp: chart_data_temp[i].value,
                                                rh: chart_data_rh[i].value,
                                                precip: chart_data_precip[i].value});
        }
        
        var sm = new ScatterMatrix(scatter_data, 'chart_scatter');
        sm.render();
        $('.scatter-matrix-control').css('float', 'right');
        
    });
    
})


<IPython.core.display.Javascript object>