# Visualizing Alaska Lightning Detection Network Data

In this notebook you will learn:
  1. how to download and visualize geospatial data using geopandas
  2. how to perform quick statistics and exploratory data analysis
  3. how to visualize spatial data using Python
  
The data we will be working with in this notebook is from the Alaska Lightning Detection Network (ALDN).
The ALDN data is updated daily with lightning strikes observed during that current day. For this part of the work
we will be looking at this current year lightning strikes. Follow the cells listed below to analyze some of the data.

## 1. Download the Data

An important step in any research is to download and clean the necessary data so we can work from a dataset with high quality. In this next cell we will use the `curl` tool to download data from the Alaska Interagency Coordination Center (AICC).

In [None]:
!curl 'https://fire.ak.blm.gov/content/maps/aicc/Data/Data%20(zipped%20Shapefiles)/CurrentYearLightning_SHP.zip' --output CurrentYearLightning_SHP.zip

Note that by adding `!` before the command from above, we can run non-Python Linux commands directly into our system. Once the data is downloaded, we can see that the data is in `.zip` format. Which means that the data is compressed.

Search on Google, Why do we compress data into zip format? In order to effectively read the data, we will need to uncompress it. First, let's list what data we have available in our directory.

In [None]:
!ls

Note that there is the `CurrentYearLightning_SHP.zip` file which we just downloaded. Let's start working on some Python code to uncompress and open the file.

##  2. Uncompress the Data

Here we will import several Python packages and will look for the file in our current directory.

In [None]:
import shutil
import leafmap
from glob import glob
import geopandas as gpd
import matplotlib.pyplot as plt

After import the necessary libraries, let's find the zip file from our current directory:

In [None]:
filename = glob('*.zip')
print(filename)

Note that the list above has the filename to uncompress. Let's uncompress our zip file using Python:

In [None]:
shutil.unpack_archive(filename[0], '.')

After uncompressing the file, let's look at the contents of our directory:

In [None]:
!ls

Note that we now have several new files. We are mostly interested in the `.shp` file that is in that directory.
The shape file will contain rows and columns with spatially referenced data.

## 3. Open Shapefile

Let's open the filename and do some statistics.

In [None]:
lightning_filename = glob('*.shp')
print(lightning_filename)

Here we use the `geopandas` package to open our shapefile and look at some of the content.

In [None]:
gdf = gpd.read_file(lightning_filename[0])
gdf.head()

How many columns do we have in the dataset? What is the size of our dataset?

In [None]:
gdf.shape

## 4. Exploratory Data Analysis

Here we will look at our data and understand it. By looking at the output from above we can see that there is a data for when the lightning occurs, and there are also diferent types of strokes. Let's see what are the different types of strokes.

In [None]:
gdf['STROKETYPE'].unique()

Note that we selected the `STROKETYPE` column, and then grabbed the unique values from that column. In this case, we only have two types of strikes: `GROUND_STROKE` and `CLOUD_STROKE`. We are only interested in `GROUND_STROKE`, so we will clean our dataset to only include those.

In [None]:
gdf = gdf[gdf['STROKETYPE'] != 'CLOUD_STROKE']
gdf['STROKETYPE'].unique()

Note that now we have a single stroke type wich is `GROUND_STROKE`. How many samples did we loose because of being cloud strikes? We can simply compare the shape from before with the new one. We removed close to 2000 rows that were `CLOUD_STROKE`.

In [None]:
gdf.shape

## 5. Data Cleaning

One thing to always keep in mind is to not trust the data. Based on what you have seen in the readings from this project, our fire season generally spans between April-October as a max. Thus we will clean the data to only include those observations. Let's visualize first from which months we have data for.

In [None]:
gdf.head()

In [None]:
gdf.columns

Since we only have the specific data of the lightning, we want to add a new column to our dataset with the month. For this we will extract the month from the `LOCALDATET` column, and create a new `MONTH` column.

In [None]:
gdf['MONTH'] = gdf['LOCALDATET'].str[5:7].astype(int)
gdf.head()

Now that we can a new `MONTH` column, let's start visualizing the data. Let's do a line plot of strikes per month.

In [None]:
gdf.MONTH.value_counts().sort_index().plot(kind='bar')

What can we say from this plot? What is the x axis? What is the y axis? To understand this we need to label the axis. Thus, let's go ahead and label both x and y axis.

In [None]:
ax = gdf.MONTH.value_counts().sort_index().plot(kind='bar')
ax.set_xlabel("Month")
ax.set_ylabel("Number of strikes")

Here we can see that the month with the highest number of strikes was June. What is missing in this plot? We are missing a title. Try to add a title to this figure so we can have it ready for our paper. Uncomment the lines from below and Google "how to add a title to a pandas plot?". Hint: you will need to add a `title` argument to the plot function. Want to play even further with it, try to add different colors to it.

In [None]:
# ax = gdf.MONTH.value_counts().sort_index().plot(kind='bar')
# ax.set_xlabel("Month")
# ax.set_ylabel("Number of strikes")

We want only the months of fire season, so let's clean up the dataset and remove data from January-March.

In [None]:
gdf = gdf[gdf['MONTH'] >= 4]
gdf.MONTH.unique()

In [None]:
ax = gdf.MONTH.value_counts().sort_index().plot(kind='bar')
ax.set_xlabel("Month")
ax.set_ylabel("Number of strikes")

Let's visualize the individual days of June and see when did we get the highest lightning days. For that we need to add the day column. Uncomment the cell below to add the `DAY` column.

In [None]:
gdf['DAY'] = gdf['LOCALDATET'].copy().str[8:10].astype(int)
gdf.head()

In [None]:
ax = gdf[gdf['MONTH'] == 6].DAY.value_counts().sort_index().plot(kind='bar')
ax.set_xlabel("Day")
ax.set_ylabel("Number of strikes")

- When did we get the higuest number of strikes in June?
- Can you change the X axis values to replace 4,5,6,7 with the actual month names?
- Can you plot the data as a line?

## 6. Visualize the Data

Let's visualize some of the data using its current spatial information. Let's look at 400 random points from June. All of the blue markers from below are lightning strikes seen from the ALDN database.

In [None]:
m = leafmap.Map(center=[20, 0], zoom=1)
m.add_gdf(gdf[gdf['MONTH'] == 6].head(400), "June Lightning")
m

## 7. Closing Thoughts

We opened spatial data, used Python to uncompress it and clean it. We used geopandas and pandas to read the dataframe data and matplotlib to visualize some of the data. We also used leafmap for some interactive mapping.

For when we meet next week:
- Any ideas on how we can create visualizations to help us understand how good our models are compared to the data you just visualized here?
- Any other ideas on how to visualize this data to make it look cool for a wider audience?
- Anything interesting you might want to do with this data?