# MBTA passenger data exploration

The [MBTA systemwide passenger survey dashboard](https://www.bostonmpo.org/dv/mbtasurvey2018/) shows an impressive amount of data in a well-organized way, but there is even more data than the dashboard shows.  In this project, you'll dig into the raw data to answer a few more questions, and then generate some plots of your own.


## Loading the data

The data is available for download as an [Excel spreadsheet file](https://www.bostonmpo.org/dv/mbtasurvey2018/MBTA%20systemwide%20survey%20results%20by%20station%20and%20line.xlsx).  To simplify things, we've converted this into a JSON file containing the same data called `mbta_line_data.json`.  It should have been copied to your working directory along with this notebook, and you can load it like any other JSON file:

In [None]:
import json
data = json.load(open("mbta_line_data.json"))

Take some time to poke around through the data (you can open up the JSON file in JupyterLab or a web browser to get a "tree view" of its structure) and figure out how it is organized.

## Answering some data questions
Write code to help you find the answers to each of the following questions:

1. Which bus route carries the most passengers? (Note that the survey numbers been normalized based on actual rider counts from a similar time period to account for different response rates, so the published survey numbers are proportional to the number of riders.  The "count" value for each category contains the normalization factor that has been applied --- don't include this as you're counting passengers!)

In [None]:
# Your code here...

2. Which route (of all modalities) has the highest percentage of students riding to school?

In [None]:
# Your code here...

3. Make a scatterplot showing the percentage of riders receiving [reduced fares](https://www.mbta.com/fares/reduced) versus the percentage of riders who are classified as low-income.
  Note that there are two reduced-fare categories (one for a monthly pass, and one for pay-per-ride), so you'll want to include both of these and divide by the total number of fares.

In [None]:
# Your code here...

## Making your own visualization

Construct your own visualization of the MBTA data.

Your visualization must:
* Display lots of data at once (e.g., show all the bus routes on one figure, or show all of the demographic data for all of the subway lines.)
* Allow the viewer to examine multiple variables, and ideally make connections between variables.
* Be written with matplotlib

[The matplotlib gallery](https://matplotlib.org/stable/gallery/) is always good if you need inspiration.  You might also get ideas from the [Altair gallery](https://altair-viz.github.io/gallery/index.html), but note that you'll need to implement your visualization using matplotlib.

Data visualization an iterative process.  You'll make a graph or two, analyze them to find interesting trends, and then redraw the graph to show those trends more clearly.  Experiment, try new ideas, and make mistakes!


In [None]:
# Your code here...

Fill in your answers to the questions below:
* What were your goals for your visualization?  What were you hoping to achieve?
* Why did you choose the encodings (position, length, color, etc) you did?  Do they save space, invite comparison, etc?
* What interesting things do you see from your visualization?
