To run a cell, type `Shift` + `Return`. Run the cell below to get started

In [None]:
from workshop_utils import * 
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import datetime
from tqdm import tqdm_notebook as tqdm
tqdm().pandas();
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# All Edits Example



The following query gets all of the edits (ever), joined to changesets for all objects in Haiti... it should take some time to download.

```sql
SELECT
  central_america.id, 
  geom,
  central_america.tags, changeset, updated, valid_until, version, minor_version,
  changesets.id AS c_id, 
  changesets.tags as c_tags,
  changesets.uid,
  changesets.user
  
FROM central_america

JOIN changesets on central_america.changeset = changesets.id

WHERE ST_WITHIN(
  geom , 
  ST_Polygon('POLYGON((-74.4862 20.1269, -71.5923 20.1269, -71.5923 17.9824, -74.4862 17.9824, -74.4862 20.1269))')
)```

In [None]:
edits = load_dataframe_from_s3('https://us-east-2.console.aws.amazon.com/athena/query/results/aec9795f-0e38-478e-b884-c3f531b5e712/csv')


In [None]:
edits.head(2)

In [None]:
#Cast the date as date
edits['updated'] = edits.updated.progress_apply(pd.Timestamp)

In [None]:
edits['date'] = edits.updated.progress_apply(pd.Timestamp.date)

In [None]:
gb_date = edits.groupby('date').aggregate({
    'c_id':'count',
    'id':'nunique',
    'uid':'nunique'
});
gb_date.head(2)

In [None]:
sns.set_style('whitegrid')
ax = gb_date['c_id'].plot(style='.', figsize=(14,4))
ax.set_xlabel("Date"); ax.set_ylabel("");
ax.set_title("Edits in Haiti",fontsize=16);

In [None]:
sns.set_style('whitegrid')
ax = gb_date['uid'].plot(style='.', figsize=(14,4))
ax.set_xlabel("Date"); ax.set_ylabel("");
ax.set_title("Unique editors active per day in Haiti",fontsize=16);

### `Minor Version`

We can use the `minor version` attribute to identify geometry updates to buildings. This can be a form of validation behavior. identifying spikes in the occurence of these can identify map validation.

In [None]:
ax = edits[edits.minor_version>0].groupby('date').aggregate('count')['id'].plot(figsize=(14,4))
ax.set_title("Adjustments to geometries each day in Haiti",fontsize=16);

What is the most edited object? 

In [None]:
# This is a bit hacky, but it definitely works
most_edited_object = pd.DataFrame(edits[edits.id==edits.sort_values(
    by='version',ascending=False).id.values[0]].sort_values(
      by=['version','minor_version'],ascending=False))

print("most edited building: http://openstreetmap.org/way/{}\nThe most recent three edits:".format(most_edited_object.id.values[0]))

most_edited_object.head(3)