<div align="center"> 
    <h2>Scoping Project Goals:</h2>
</div>

## PROJECT GOAL

The SRP would like an update on the estimated number of OY in South King County. According to a recent The Seattle Times article, the number of OY in South King County has remained steadfast at 19,000<sup>4</sup>. However, that estimation comes from a report that is over three years old. As Data Science Consultants, your task is to **inform the SRP on the current status of OY in South King County using updated data.**
## PROJECT REQUIREMENTS

At minimum, the SRP is expecting the following:

1.  A map that visualizes which parts of King County are a part of South King County:

- Research how to generate map of King County in python/pandas 
- Research how to manipulate map to be able to highlight South King County region
- Look into how South King County is defined by the data we're given (and outside sources as well)
- Work out how to input our defined parameters of what SKC is to highlight it on the map

2.  An update of the estimated number of OY in South King County.  In addition to the estimate, be sure to include a breakdown of the count of OY by Public Use Microdata Area (PUMA) within South King County:

- Main goal:  Get an estimate of the current number of OY in SKC.
- Include a breakdown of this count in regards to the PUMA data.  Think about what this means:
    - Look into what data is included in the PUMA 
    - Decide what should be included in the breakdown, e.g. other demographic information like living expenses, income etc?
    - How do we want to break the data down?  By age group, gender, living situation/circumstance?
    - Are there are any other interpretations of what 'including a breakdown of the count by PUMA within SKC' means?


3.  An update of the table “Opportunity Youth Status by Age” located on page 2 of the 2016 report “Opportunity Youth in the Road Map Project Region”:

- Find the 'Opportunity Youth Status by Age" table as mentioned above 
- Update this table with results found from part 2
    - Find the report mentioned here ('opportunity youth in the road map project region')
    - Find the table on page 2 of the report
    - Look into whether this table needs to be downloaded seperately or if it's part of the original data downloaded and we just need to query the table with sql...
    - Work out how to update this table with our results (initial thoughts:  query sql to get table, turn into pd data frame then manipulate the table with pd.

4.  A visualization that highlights a trend between the 2016 report and current data:

- Decide on appropriate plots to use:
    - Initial thoughts:  bar graphs and line graphs would be the most appropriate to represent trend between two data sets
    - Bar graph would be good for categorical data (e.g. age) to show side-by-side 2016 vs. now data.  x-axis has age clusters (16-18, 19-21, 22-24 for e.g.) with 2016 data one colour and 2020 another colour.  y-axis is the population counts.  Can then do this for other categories that we broke down in part 2 of the question.  
    - Line plot may be appropriate to show any upwards/downwards trends between now and 2016... ideal for continuous data rather than categorical.  Would want to fill in the gaps between 2016 and 2020 to indicate trend, otherwise it could be misleading?  
    - Decide as a group our visualisation standard/protocol:  use matplotlib or use seaborn?  Naming protocols etc
- Decide on what dat we want to plot:  
    - Obviously want to plot population differences but how do we want to categorise the data. 
    - What relationships do we want to highlight/investigate further
- Might be also good to have a map visualisation between pop. in 2016 vs. pop. now in SKC (since we will have already worked out how to create map visualisation from pt. 1)

## Strech Goals (optional):
Do we maybe want to pick a stretch goal each to own individually?  
Or perhaps we pick 1 or 2 stretch goals and work as a team on them?  
Or alternatively we could leave it open:  if you do want to do one - do it, if you don't - dont... ??

1. Create a choropleth map of the count of OY by PUMA within South King County:

- Decide who might want to do this.
- Look into how to create a choropleth map - this probably won't be too hard to do once pt 2 & 3 have been done.  It will probably be a natural extention of this problem and it will just be a matter of working out how to make choropleth maps.

2. For South King County, create a choropleth map that shows the percentage of jobs for workers age 29 or younger out of the total number of jobs per census block:

- Decide who might want to do this.
- Same situation as stretch 1
- Extra data digging to find total number of jobs per census block.  
- Interpretations of the question:  
    - Are we finding the jobs available for 29yo- or the jobs currently filled by 29yo-?
    - Is the total number of jobs per census block for total jobs available that are not filled for that age group?  Or total for all ages?  
    - How is a census block defined?
- Calculate percentages based on the above digging.  
- Maybe good to create a 'percentage' column of this data in the relavent table/s created for previous work.  Then we can reference the same df when creating the visualisation. 

3.  Of the census blocks where jobs for workers age 29 or younger are the majority of employed people, what are a few of the **industries** that employ this group of people?

- Decide who might want to do this
- Data digging for info on the idustries that employ 29yo-
    - Data from stretch 2 will be useful
- Might be good to create a visualisation for this (bar graph - industries on x-axis)

Utilise additional data sources to support your recommendations:  e.g. [Census Bureau APIs](https://www.census.gov/data/developers/data-sets.html), [King County Open Data](https://data.kingcounty.gov/browse?limitTo=datasets&provenance=official), or [King County GIS Open Data](https://gis-kingcounty.opendata.arcgis.com/)

## DELIVERABLES

To complete this project, you will need to turn in the following deliverables:

1. A public GitHub repository with a well organized directory structure (this structure has been provided for you in this project, but will not be provided in future projects)
2. An `environment.yml` file that contains all the necessary packages needed to recreate your conda environment.
    - Start with the provided `environment.yml`, then as you install any additional packages be sure to [export](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#exporting-the-environment-yml-file) the new version and commit the changes in git.
    - For Windows users, generate a `windows.yml` based on the provided `windows.yml`
    - For Linux users, generate a `linux.yml` based on the provided `linux.yml`
3. A standalone `src/` directory that stores all relevant source code.
    - Although you may not be able to achieve this goal in Mod 1, we encourage you to package code into .py files and store them in src, then import the functions into the appropriate notebooks. Be ware of premature optimization, however.  Don't try to package your code before it works.
    - All functions have docstrings that act as [professional-quality documentation](http://google.github.io/styleguide/pyguide.html#381-docstrings).
    - [Well documented](https://www.sqlstyle.guide/) SQL queries with appropriate single-line or multiline comments.
4. A user-focused `README.md` file that explains your process, methodology and findings.
    - Provide a directory of your repository so a visitor will know where to look for your report notebook, your source code, etc. 
    - Take the time to make sure that you craft your story well, and clearly explain your process and findings in a way that clearly shows both your technical expertise and your ability to communicate your results!
    - Begin with framing questions, describe your data source, include relevant, well labeled visualizations that support your conclusions, which come at the end.
5. A record of your workflow stored in `notebooks/exploratory`.  Don't be afraid to leave in error messages, so you know what didn't work!
6. One final Jupyter Notebook file stored in `notebooks/report` that focuses on visualization and presentation.
    - The very beginning of the notebook contains a description of the purpose of the notebook.
       - This is helpful for your future self and anyone of your colleagues that needs to view your notebook. Without this context, you’re implicitly asking your peers to invest a lot of energy to help solve your problem. Help them by enabling them to jump into your project by providing them the purpose of this Jupyter Notebook.
    - Explanation of the data sources and where one can retrieve them
        - Whenever possible, link to the corresponding data dictionary
    - We encourage you to import custom functions and classes from Python modules and not create them directly in the notebook.  As soon as you have a working function in one of your exploratory notebooks, copy it over to `src` so it is reusable.
    - Much of the content in the report will be shared with the README.
8. An "Executive Summary" Keynote/PowerPoint/Google Slide presentation (delivered as a PDF export) that explains what you have found for the SRP. The presentation that accompanies that deck should be 4-5 minutes, so use your space wisely.
    - Make sure to also add and commit this file as presentation.pdf of your non-technical presentation to your repository with a file name of `reports/presentation.pdf`.
    - Contain between 5-10 professional quality slides detailing:
       - A high-level overview of your methodology
       - The results you’ve uncovered
       - Any real-world recommendations you would like to make based on your findings (ask yourself--why should the executive team care about what you found? How can your findings help the company/stakeholder?)
       - Avoid technical jargon and explain results in a clear, actionable way for non-technical audiences.
    - All visualizations included in this presentation should also be exported as image files (e.g. with `plt.savefig`, not by taking a screenshot) and saved under `reports/figures/`
9. Be sure to generate at least 3 high quality, well-labeled visualizations that support your conclusions. There should be a clear takeaway from each. These visualizations will reappear in the README, jupyter notebook report, and presentation deck.

## Citations

<sup>1</sup> Yohalem, N., Cooley, S. 2016. “Opportunity Youth in the Road Map Project Region”. Community Center for Education Results. Available at: https://bit.ly/2P2XRF3.

<sup>2</sup> Anderson, T., Braga, B., Derrick-Mills, T., Dodkowitz, A., Peters, E., Runes, C., and Winkler, M. 2019. “New Insights into the Back on Track Model’s Effects on Opportunity Youth Outcomes”. Urban Institute. Available at: https://bit.ly/2BuCLr1.

<sup>3</sup> Seattle Region Partnership. 2016. “King County Opportunity Youth Overview: Demographics of opportunity youth and systemic barriers to employment”. Available at: https://bit.ly/2oRGz37.

<sup>4</sup> Morton, N. 2019. “Nearly 19,000 youth in King County are neither working nor in school. How one Seattle nonprofit is changing that.” The Seattle Times. Available at: https://bit.ly/2W5EufR.