<h1 style="text-align:center;text-decoration: underline">Getting Started Tutorial</h1>
<h1>Overview</h1>
<p>Welcome to the getting started tutorial for EpiData Lite's Jupyter Notebook inteface. In this tutorial we will query, retrieve and analyze sample weather data acquired from a simulated wireless sensor network.</p>
<p><b>Note:</b> This tutorial assumes the EpiData Lite platform was started with measurement-class="sensor_measurement" (default) setting via conf/application.conf. If the platform was started with measurement-class="automated_test" setting, please follow the tutorial in Automated Test folder.</p>

<h2>EpiDataLiteContext</h2>

<h3>1. Context and Modules Import</h3>
<p>As a first step, We will import the <i>EpiDataLiteContext</i> object <i>ec</i>. EpiDataLiteContext provides methods for query and offline analytics on batch data.</p> 

<p>We will also import packages and modules required for this tutorial.<p> 

In [None]:
from epidata.EpiDataLiteContext import ec

from datetime import datetime, timedelta
import pandas as pd
import matplotlib.pyplot as plt

print(ec)

<h3>2. Context Initialization</h3>
<p>Next, we initialize the EpiDataLiteContext object. This step opens the required network connections for querying the data using EpiDataLiteContext API.</p>

In [None]:
ec.init()

<h2>Data Ingestion</h2>

<h3>1. Download Python Script</h3>

<p>We will use the provided Python script <i>sensor_data_ingest_with_outliers.py</i> to simulate weather data and push it to the EpiData Lite platform. Download the example <i>sensor_data_ingest_with_outliers.py</i> from Jupyter Notebook's tree view as show below.</p>
<img src="./static/jupyter_tree.png">

<h3>2. Ingest Data</h3>
<p>The next step is to run the Python script <i>'sensor_data_ingest_with_outliers.py'</i> using a Python 3 interpreter. The example sends simulated weather data to EpiData Lite platform using REST interface. You should see status of each ingestion steps in your standard output.</p>

<h2>Query and Retrieve</h2>

<h3>1. Query</h3>
<p>Data stored in the EpiData platform can be queried by specifying the primary data attributes, start time and stop time. Below are the primary data attributes for the current dataset:
<ul>
<li><i>company, site, station, sensor</i></li>
</ul>
</p>
<p>
We can use EpiDataLiteContext's <i>list_keys()</i> method to obtain the values of the primary data attributes for our simulated weather dataset.</p>

In [None]:
keys = ec.list_keys()

print(keys)

<p>Now that we know the valid values of the primary data attributes, we can specify them in EpiDataLiteContext's <i>query_measurements_original()</i> method. The method outputs the query result as a <i>Pandas DataFrame</i>.</p>

In [None]:
primary_key={"company": "EpiData", "site": "San_Francisco", "station":"WSN-1", "sensor": ["Temperature_Probe","Anemometer","RH_Probe"]}
start_time = datetime.strptime('01/01/2023 00:00:00', '%m/%d/%Y %H:%M:%S')
stop_time = datetime.strptime('01/01/2024 00:00:00', '%m/%d/%Y %H:%M:%S')

df = ec.query_measurements_original(primary_key, start_time, stop_time)

<h3>2. Retrieve</h3>
<p>Data is retrieved from EpiData Lite platform as a <i>Pandas dataframe</i>. We can peform simple aggegation operations, such as count of the measurements, using Pandas Dataframe's <i>count</i> method.</p>

In [None]:
print("Number of records:", df.count())
df.head(5)

<h2>Data Analysis</h2>
<p>Once data is available in a <i>pandas DataFrame</i>, we can call any of the high-performance and easy-to-use data analysis functions available in <i>pandas</i> library. Let's start by computing basic statistics such as min, max, mean, standard deviation and percentile for temperature measurements.</p>

In [None]:
df = df.loc[df["meas_name"]=="Temperature"]
df["meas_value"].describe()

<p>Next, we'll look at the distribution of the temperature measurements using a histogram.</p>

In [None]:
plt.rcParams["figure.figsize"] = [10,5]
plt.title("Histogram - Temperature Measurements")
plt.xlabel("Temperature (deg F)")
plt.ylabel("Frequency")

df["meas_value"].hist()

<p>As we can see, most of the temperature measurements in our sample data are quite moderate. However, there are some measurements that are unusually high. Let's identify these outlier measurements using a simple method that compares each measurement with the sample mean and standard deviation.</p>

In [None]:
outliers = df.loc[abs(df["meas_value"] - df["meas_value"].mean()) > abs(3*df["meas_value"].std())]
print("Number of Outliers:", outliers["meas_value"].count())

outliers.head()

<h2>Context Close</h2>
<p>Now, we can clear (reset) the EpiDataLiteContext object.</p>

In [None]:
ec.clear()

<h2>Next Steps</h2>
<p>Congratulations, you have successfully queried, retrieved and analyzed sample data aquired by a simulated weather station. The next step is to explore the various capabilities of <i>EpiData Lite</i> by creating your own Jupyter Notebook. Happy Data Exploring!</p>