# Unknown Feature

In this notebook, I try to figure out what the feature "geopotential_height_zerodegc_isotherm" is.

I expect it to be something about the height, but not of the ground.

## Initializing

First we initialize the data and load it into spark. I'm loading the smaller data set for a faster turn around time.

In [1]:
from pyspark.sql.types import StructType, StructField, FloatType, LongType, StringType

In [2]:
hdfs_port = "hdfs://orion11:26990"
# data_path = "/nam_s/nam_201501_s*"
data_path = "/nam_s/*"
# data_path = "/sample/nam_tiny*"

In [3]:
feats = []
f = open('../features.txt')
for line_num, line in enumerate(f):
    line = line.strip()
    if line_num == 0:
        # Timestamp
        feats.append(StructField(line, LongType(), True))
    elif line_num == 1:
        # Geohash
        feats.append(StructField(line, StringType(), True))
    else:
        # Other features
        feats.append(StructField(line, FloatType(), True))
        
    
schema = StructType(feats)

In [4]:
%%time

df = spark.read.format('csv').option('sep', '\t').schema(schema).load(f'{hdfs_port}{data_path}')

CPU times: user 2.61 ms, sys: 554 µs, total: 3.17 ms
Wall time: 7.2 s


## Description

After running describe, we notice that the height is on average much taller than sea level, a lot more than I'd expect of the height of the ground. I'm not quite sure what this height means though.

In [5]:
df.describe("geopotential_height_zerodegc_isotherm").show()

+-------+-------------------------------------+
|summary|geopotential_height_zerodegc_isotherm|
+-------+-------------------------------------+
|  count|                            108000000|
|   mean|                   3093.3220493037425|
| stddev|                    1753.024775297478|
|    min|                                  0.0|
|    max|                               6380.0|
+-------+-------------------------------------+



## Research

After doing some minimal research I've found that it is the "lower tropospheric zero degree Celsius isotherm" (from http://www.theweatherprediction.com/habyhints/98/). I'm assuming that this means the height at which the air is 0 degrees celsius.