Explore the data
Connect to your ArcGIS online organization.

In [None]:
import pandas as pd

from arcgis.map.renderers import UniqueValueRenderer
from arcgis.gis import GIS

In [None]:
agol_gis = GIS()

Search for the Commercial Permits since 2010 layer. You can specify the owner"s name to get more specific results. To search for content from the Living Atlas, or content shared by other users on ArcGIS Online, set outside_org=True.

In [None]:
data = agol_gis.content.search("title: Commercial Permits since 2010", "Feature layer",
                               outside_org=True)
data[0]

Get the first item from the results.

In [None]:
permits = data[0]

Since the item is a Feature Layer Collection, accessing the layers property gives us a list of FeatureLayer objects. The permit layer is the first layer in this item. Visualize this layer on a map of Montgomery County, Maryland.

In [None]:
permit_layer = permits.layers[0]

In [None]:
permit_map = agol_gis.map("Montgomery County, Maryland")
permit_map
permit_map.content.add(permit_layer)

Data Exploration
Convert the layer into a spatially-enabled dataframe to explore these attributes.

In [None]:
permit_layer

In [None]:
sdf = pd.DataFrame.spatial.from_layer(permit_layer)

In [None]:
sdf.tail()

The permit data contains a long list of attributes. Some attributes have self-explanatory names, while others may have names that can be difficult to understand without context. The list of attributes can be obtained using the columns of the dataframe.

In [None]:
sdf.rename(columns=lambda x: x.lower() if x != "SHAPE" else x, inplace=True)
sdf.columns

In [None]:
sdf.describe().T

In [None]:
sdf.dtypes

In [None]:
sdf["work_type"].unique()

In [None]:
sdf["status"].unique()

In [None]:
sdf["use_code"].unique()

Permits by Status
The groupby() method groups the rows per the column and does calculations, such as finding their counts, as shown in the following code.

In [None]:
permits_by_status = sdf.groupby(sdf["status"]).size()
permits_by_status

status
Finaled      5341
Issued       4696
Open          757
Stop Work     430
dtype: int64
There are only four permit statuses: Issued, Finaled, Open, and Stop Work. To visualize the number of permits for each status, you'll create a pie chart.

Since the dataframe attributes just show the count of status, you can consider any attribute to graph the status count.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
plt.axis("equal")
permits_by_status.plot(kind="pie", legend=False, label="Permits by Status")

The pie chart above shows the four permit statuses, with the size of each status determined by the number of permits. The vast majority of permits are either Issued or Finaled. Finaled permits are issued permits that have also had the requisite inspections performed.

It's helpful to visualize the spatial distribution of permit attributes on a map. You'll change the map so that each permit's symbol represents its status.

In [None]:
permits_by_status_map = agol_gis.map("Montgomery County, Maryland")
permits_by_status_map

In [None]:
status_value_infos = [
    {
        "value": "Stop Work",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSCircle",
            "color": [205, 51, 46, 255],
            "size": 6,
        },
    },
    {
        "value": "Issued",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSCircle",
            "color": [246, 132, 34, 255],
            "size": 6,
        },
    },
    {
        "value": "Open",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSCircle",
            "color": [60, 159, 48, 255],
            "size": 6,
        },
    },
    {
        "value": "Finaled",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSCircle",
            "color": [54, 117, 179, 255],
            "size": 6,
        },
    }
]
sdf.spatial.plot(
    map_widget=permits_by_status_map,
    renderer=UniqueValueRenderer(
        field1="status",
        unique_value_infos=status_value_infos
    )
)

Permits by Type

In [None]:
permits_by_type = sdf.groupby(["use_code"]).size()
permits_by_type

The series is not sorted properly. Use the sort() method to sort it from highest count to lowest count. The most common use code, Business Buildings, has almost twice as many permits as the second highest, Multi-family Dwelling. The top four use codes together comprise the majority of all permits, so these use codes may be the most important to focus on in your analysis later.

In [None]:
permits_by_type.sort_values(ascending=False, inplace=True)
permits_by_type.head()

Clean up the data
Before you begin analysis of your data, you'll hide attribute fields you don't intend to use, rename fields with unclear names, and filter your dataset to only show permits with the four most common use codes. These changes won't permanently affect the original dataset, but they will make the data easier to work with and understand.

'Declared_V', 'Building_A', 'Applicatio' attribute fields describe aspects of the data that aren't important for your analysis. You'll drop these fields.

In [None]:
sdf.drop(["declared_v", "building_a", "applicatio"], axis=1, inplace=True)
sdf.columns

Next, you'll rename some of the attribute fields with shortened or unclear names so that their names are more descriptive.

In [None]:
sdf.rename(columns={"descriptio": "Description", "bldgareanu": "Building_Area",
           "declvalnu": "Declared_Value"}, inplace=True)
        

In [None]:
sdf.columns


There are other fields that you may want to either rename or remove, but for the purposes of this lesson, these are enough.

Filter the permits
Next, you'll filter the permits to reduce the number of records in your analysis. As you saw previously, there are four types of permits that comprise over half the total number of permits. Focusing your analysis on just these four types will reduce the amount of data to analyze without ignoring the most important types of development. To remove the other use codes, you'll create a filter.

In [None]:
permits_by_type.head(4)  # top 4 Use_Codes

In [None]:
filtered_permits = list(permits_by_type.head(4).index)
filtered_permits

In [None]:
sdf.shape, filtered_df.shape

In [None]:
The dataset is filtered. Instead of more than 11,000 permits, the filtered dataframe has about 7,500.

Visualize filtered dataset

In [None]:
filtered_map = agol_gis.map("Montgomery County, Maryland")
filtered_map

In [None]:
use_code_value_infos = [
    {
        "value": "MULTI-FAMILY DWELLING",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSDiamond",
            "color": [75, 210, 254, 255],
            "size": 6,
        },
    },
    {
        "value": "MERCANTILE BUILDING",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSDiamond",
            "color": [250, 230, 38, 255],
            "size": 6,
        },
    },
    {
        "value": "BUSINESS BUILDING",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSDiamond",
            "color": [121, 14, 6, 255],
            "size": 6,
        },
    },
    {
        "value": "COMMERCIAL MISCELLANEOUS STRUC",
        "symbol": {
            "type": "esriSMS",
            "style": "esriSMSDiamond",
            "color": [19, 0, 126, 255],
            "size": 6,
        },
    }
]
filtered_df.spatial.plot(
    map_widget=filtered_map,
    renderer=UniqueValueRenderer(
        field1="use_code",
        unique_value_infos=use_code_value_infos
    )
)

Visualize temporal and spatial trends
Your data show permits, but what do these permits say about when and where growth is happening in the county? Your data also contains temporal attribute fields, such as Added_Date, which indicates when a permit was first added to the system. The field has several values that break down the data by year, month, and even hour.

Split the Added_date to get year, month, week_of_day

In [None]:
sdf["datetime"] = pd.to_datetime(sdf["added_date"], unit="ms")
sdf["year"], sdf["month"], sdf["day_of_week"] = sdf.datetime.dt.year, sdf.datetime.dt.month, sdf.datetime.dt.dayofweek

Visualize permits by time of issue
You'll create chart cards for the year, month, and day subfields to visualize patterns in permit activity over time.

In [None]:
import seaborn as sns
sns.set_palette("colorblind")
sns.countplot(data=sdf, x="year", hue="year", palette="deep", legend=False)

The chart shows the number of permits issued each year since 2010. (The year 2017 has significantly fewer permits because the dataset only covers part of 2017.) You can compare the number of permits visually by the size of each bar. Although some fluctuation occurs from year to year, most years had similar permit activity.

Similarly you can visualize it by month as well as day_of_week

In [None]:
sns.countplot(data=sdf, x="day_of_week", hue="day_of_week",
              palette="pastel", legend=False)

Almost all permit activity occurs on weekdays. Government offices are closed on weekends, so few permits are issued then.

In [None]:
ddf = sdf.set_index("datetime")
ddf["num"] = 1
ddf["num"].resample("M").sum().plot()

A huge spike in permit activity occurred in mid-2011. What caused this spike? Is it an increase in overall permit activity, or is it mostly an increase in a certain type of permit? You'll plot the number of permits based on Use_Code to find which one cased the spike.

In [None]:
fig = plt.figure(figsize=(15, 5))
ax = fig.add_subplot(1, 1, 1)

ax.plot(ddf["num"].resample("M").sum(), "k", label="Total permits")
for use_code in filtered_permits:
    x = ddf[ddf.use_code == use_code]["num"].resample("M").sum()
    ax.plot(x, label=use_code)
ax.legend()

Based on the legend, permit activity spiked in 2011 due to a sharp increase in the number of multifamily dwelling permits issued. This likely means that there was large residential growth in 2011.

You've investigated some temporal patterns in your data. Next, you'll look at spatial patterns. Are there certain areas in the county that have experienced a relatively high degree of permit activity? Was the 2011 spike in residential permits in a specific location? To find out, you'll change the symbology of the map card to show hot spots, or areas with concentrations of points.

In [None]:
hotspot_map = agol_gis.map("Germantown, Montgomery County, Maryland")
hotspot_map

In [None]:
sdf_sm = hotspot_map.content.renderer(0).smart_mapping()
sdf_sm.heatmap_renderer()

The hot spots show up where there is a high concentration of permits. The highest concentration areas are in the southeast and northwest corners of the county, which correspond to the major population centers of Germantown and the suburban communities near Washington, D.C.

Next, you'll see if the 2011 permit spike corresponds to a specific area of the map. The code below filters the dataframe to only show permits from 2011 and highlights related data in the map. In this case, the heat map changes to show the hot spot in the northwest part of the county, near Germantown.

In [None]:
hotspot_2011_map = agol_gis.map("Germantown, Montgomery County, Maryland")
hotspot_2011_map

In [None]:
sdf.loc[sdf.year == 2011].copy().spatial.plot(map_widget=hotspot_2011_map)

In [None]:
sedf_2011_sm = hotspot_2011_map.content.renderer(0).smart_mapping()
sedf_2011_sm.heatmap_renderer()