# [CptS 215 Introduction to Algorithmic Problem Solving](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
## Project (100 pts)

### Learner Objectives
At the conclusion of this project, participants should be able to:
* Write a project proposal
* Define data science questions
* Select/define/clean an appropriate dataset
* Implement algorithms and data structures to support answering data science questions
* Create, update, and maintain a Github repository

### Prerequisites
Before starting this project, participants should be able to:
* Write object-oriented code in Python
* Implement data structures and algorithms covered in CptS 215
* Write Markdown and code cells in Jupyter Notebook

### Acknowledgments
Content used in this assignment is based upon information in the following sources:
* None to report

### Overview and Requirements
For the final project in Cpts 215, you are going to implement your own data analytics system in pairs. As long as you conform to the requirements specified in this document, you will have the freedom to define the project topic, dataset, and the analysis performed.

The overview of the project timeline is as follows:
1. Schedule a meeting with your partner
1. Brainstorm a project
1. Write a project proposal
1. Implement the project
1. Present your project

More details about these topics are in the following sections.

#### Brainstorm a Project
With your partner, identify several topics that interest the both of you. For example, analytics applies to sports, politics, news, education, etc. 

Next, brainstorm data related to those topics that could be collected or may already exist. For example, if a topic you are both interested in is fitness, you could collect your own Fitbit data to analyze. If a topic you are interested in is climate change, you may search the internet to find publicly available datasets, web services you could query, or sites you can *ethically* scrape. A great place to start looking for open access datasets is [data.gov](https://www.data.gov/). 

Now, brainstorm problems related to the topic and data that are important to solve. For example, the Google Maps API graph mining case study we looked at solving important problems related to travel, commuting, and routing.

Here are few project ideas to help get your creative juices flowing:
* Mining current tweet trends/content using the [Twitter API](https://dev.twitter.com/overview/api), the [`twython`](https://twython.readthedocs.io/en/latest/) Python library, and natural language processing (i.e. [n gram model](https://en.wikipedia.org/wiki/N-gram)).
* Extending your decision tree code to implement a [Random Forest](https://en.wikipedia.org/wiki/Random_forest) classifier. Test your random forest on a machine learning dataset.
* Implement [Google's PageRank algorithm](https://en.wikipedia.org/wiki/PageRank) and apply it to a network of pages related to a current world affair.
* Read about [recommender systems](https://en.wikipedia.org/wiki/Recommender_system). Implement a recommender system based on information scraped from the web or queried from APIs like [Yelp](https://www.yelp.com/developers/documentation/v2/overview) or [Spotify](https://developer.spotify.com/web-api/).

Here are few dataset sites to check out when looking for publicly available datasets:
* [data.gov](https://www.data.gov/)
* [CORGIS (collection of really great, interesting, situated datasets) from Virginia Tech](https://think.cs.vt.edu/corgis/)
* [UC Irvine machine learning repository](http://archive.ics.uci.edu/ml/)
* [Wikipedia list of datasets for machine learning](https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research)
* [kdnuggets](http://www.kdnuggets.com/datasets/index.html)
* [WSU CASAS smart home datasets](http://casas.wsu.edu/datasets/)

Here are a few sites that list available APIs (both authenticated and unauthenticated):
* [http://www.pythonforbeginners.com/development/list-of-python-apis/](http://www.pythonforbeginners.com/development/list-of-python-apis/)
* [https://shkspr.mobi/blog/2014/04/wanted-simple-apis-without-authentication/](https://shkspr.mobi/blog/2014/04/wanted-simple-apis-without-authentication/)

#### Propose the Project
Formally write up your proposed project. Your write-up should be single spaced, at least one page long, and emailed to the instructor by the deadline specified in the course schedule.

Content to be included in the proposal:
1. Names of team members
1. Project name (you pick!)
1. Project description
    1. Motivation
    1. Stakeholders
    1. Dataset(s) and sources
    1. Impact
1. Implementation
    1. OOP design
    1. Data structures (see section below)
    1. Python libraries
1. Proposed analysis
    1. Data mining techniques
    1. Questions asked/answered
    1. Results
1. Available days and times during dead week your team is available to present (see section below)

#### Implement the Project
Your project will be turned in via the "Project" assignment on the [Cpts 215 Github Classroom](https://classroom.github.com/classrooms/27835955-wsucpts215classroom) site. The invitation link to this project is: [https://classroom.github.com/group-assignment-invitations/404d2af48848ff23a455616bca1880c6](https://classroom.github.com/group-assignment-invitations/404d2af48848ff23a455616bca1880c6).

Your implementation should include at least one data structure (or variation of) that we covered in CptS 215 from the following list:
* Hash table
* Tree (BST/AVL/B+ tree)
* Heap
* Graph

Note: you must implement this data structure yourself.

The directory structure of your project Github repository should be organized in the following manner:
1. `README.md`
    1. Describe your project in the format of a traditional [Github README file](https://help.github.com/articles/about-readmes/)
1. Source code folder (e.g. `src`)
    1. One Jupyter Notebook that tells the story of your project
        * Is the main driver of your program
        * All descriptions, relevant formulas, plots, and summary results are inline
        * All auxiliary code (classes and functions) are in separate .py files and are imported
    1. At least one .py file that implements the auxiliary code (classes and functions) of your program
    1. All other necessary code dependencies
1. Data folder (e.g. `data`)
    1. All datasets you tested your program with
1. Results folder (e.g. `results`)
    1. Write all results to this folder in comma separated value (.csv) files
1. Documentation folder (e.g. `docs`)
    1. Project proposal
    1. Sources/sites/tutorials etc. used 
1. Any other directories/files your team deems necessary (e.g. `etc`)
    
#### Present the Project
During dead week, each team will present their project. The presentation should be about 15 minutes in length and include the major topics of the proposal, some key components of the code, and a demo. After the 15 minute presentation, there will be time for the attendees of the presentation to ask questions. The instructor and at least 3 students will be in attendance of the presentation. Your team will be assigned one half hour time slot to present. The available time slots are listed below. In your project proposal, state all available time slots your team can present during dead week.

|Date|Time|Team presenting|Peer reviewer #1|Peer reviewer #2|Peer reviewer #3|
|-|-|-|-|-|-|
|12/4 (M)| | | | | |
|12/5 (Tu)| | | | | |
|12/6 (W)| | | | | |
|12/7 (Th)| | | | | |
|12/8 (F)| | | | | | |

Individually, you will be assigned 3 time slots to attend the presentation of other teams. Once the team presentation times have been identified, the above time table will be updated and you will need to email your instructor the presentations you are able to peer review. When you are attending another team's presentation, you will peer review your fellow students' projects and presentations. The rubric to use when peer reviewing is [available here](https://github.com/gsprint23/cpts215/blob/master/project/peer_evaluation.docx).

### Bonus (10 pts)
Make a 2-4 minute quality video that showcases your project. The video should include the following information about your project:
* Motivation
* Context
* Algorithm description
* Demo
* Impact

While you may include slides in your video, these should be kept to a minimum. Include footage of each team member speaking, describing key code snippets, interacting with the data, and presenting the results. 

### Submitting Assignments
1.	Submit your code via the [Cpts 215 Github Classroom project assignment](https://classroom.github.com/classrooms/27835955-wsucpts215classroom). You must upload all of your code and supporting documentation by the due date and time.
2.	Your Github repository should contain all of your source files, data files, results files, and documentation files (see the previous repository description).

### Grading Guidelines
This assignment is worth 100 points + 10 points bonus. Your assignment will be evaluated based on a successful compilation and adherence to the program requirements. We will grade according to the following criteria:
* 11 pts for project proposal
* 9 pts (3 pts/each) for peer reviewing other team's project presentations
* 65 pts for implementation
    * 10 pts for Jupyter Notebook story telling of your data science process
    * 5 pts for data wrangling and storage using appropriate data structures
    * 15 pts for data mining using at least one implemented CptS 215 data structure
    * 10 pts for relevant and clearly described/organized results
    * 10 pts for Github repository organization and code collaboration
    * 10 pts for class and function design
    * 5 pts for adherence to proper programming style and comments established for the class
* 15 pts for presentation
    * 5 pts average of peer reviewer's evaluations
    * 10 pts instructor evaluation