# Not stand alone
In contrast to most "solution" notebooks, this notebook isn't stand-alone. That's because running the MongoDB cluster means you shouldn't be running several notebooks at once. To see the results of these solutions, you'll have to copy to code into the [[`16.1 Accidents over time`](16.1 Accidents over time.ipynb) notebook.

# Activity 1

In [None]:
# Solution
pipeline = [
    {'$group': {'_id': '$AADFYear',
                'count': {'$sum': '$FdAll_MV'}}},
    {'$sort': {'_id': 1}}
]
results = list(roads.aggregate(pipeline))
results

In [None]:
traffic_volume_by_year = pd.Series([y['count'] for y in results], 
          index=pd.to_datetime([datetime.datetime(y['_id'], 12, 31) for y in results]))
traffic_volume_by_year.plot()

Let's take the traffic for just 2009–12, and plot it with zero on the *y*-axis.

In [None]:
pipeline = [
    {'$match': {'AADFYear': {'$lte': 2012}}},
    {'$group': {'_id': '$AADFYear',
                'count': {'$sum': '$FdAll_MV'}}},
    {'$sort': {'_id': 1}}
]
results = list(roads.aggregate(pipeline))
traffic_volume_by_year = pd.Series([y['count'] for y in results], 
          index=pd.to_datetime([datetime.datetime(y['_id'], 12, 31) for y in results]))
traffic_volume_by_year.plot(ylim=(0, traffic_volume_by_year.max() * 1.1))

Traffic rates barely changed over the same period. It looks like some accident prevention methods have worked. 

# Activity 2

In [None]:
# Generate the data.
pipeline = [
    {'$project': {'Accident_Severity': '$Accident_Severity',
                  'year': {'$year': '$Datetime'}}},
    {'$group': {'_id': {'Accident_Severity': '$Accident_Severity',
                        'year': '$year'},
                'count': {'$sum': 1}}},
    {'$sort': {'_id': 1}}
]
results = list(accidents.aggregate(pipeline))
results

In [None]:
severity_by_year_long_df = pd.DataFrame([
        {'Accident_Severity': r['_id']['Accident_Severity'],
         'year': r['_id']['year'],
         'count': r['count']}
        for r in results
    ])
severity_by_year_long_df

In [None]:
severity_by_year_df = severity_by_year_long_df.pivot('year', 'Accident_Severity', 'count')
severity_by_year_df.columns = [label_of['Accident_Severity', c] for c in severity_by_year_df.columns]
severity_by_year_df

In [None]:
severity_by_year_df.plot()

The trends are difficult to spot because there are so many more slight accidents compared to the other types. What if we plot the trends using different y axes?

In [None]:
severity_by_year_df['Slight'].plot(legend=True)
severity_by_year_df['Serious'].plot(secondary_y=True, legend=True)
severity_by_year_df['Fatal'].plot(secondary_y=True, legend=True)

This plot shows that the numbers of fatal and serious accidents have declined slightly faster than slight accidents. 

Now we have the data, do pairwise Pearson's *R*<sup>2</sup> tests to see if the correlations are significant. If the mix of accidents changes, we should expect to see non-significant correlations.

In [None]:
scipy.stats.pearsonr(severity_by_year_df['Slight'], severity_by_year_df['Serious'])

In [None]:
scipy.stats.pearsonr(severity_by_year_df['Slight'], severity_by_year_df['Fatal'])

In [None]:
scipy.stats.pearsonr(severity_by_year_df['Serious'], severity_by_year_df['Fatal'])

These results all have _p_ values (the second of the two returned) greater than 0.05, so we cannot reject the null hypothesis that the trends are uncorrelated. In other words, we cannot say that the proportions of accidents have remained the same.