# Phase 1 Project Overview - FLEX 

- 08/23/2021

## Objectives

- Review the Phase 1 Project README
- Discuss some General Project Tips and Advice:
    - Getting Started
    - Business Case/Stakeholder
    - The Data
- Walk through Getting Started and coding a loop to preview all files. 

# 📖PHASE 1 PROJECT README

## Phase 1 Project

You've made it all the way through the first phase of this course - take a minute to celebrate your awesomeness!

![awesome](https://raw.githubusercontent.com/learn-co-curriculum/dsc-phase-1-project/master/awesome.gif)

Now you will put your new skills to use with a large end-of-Phase project! This project should take 20 to 30 hours to complete.

### Project Overview

For this project, you will use exploratory data analysis to generate insights for a business stakeholder.

#### Business Problem

Microsoft sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. You are charged with exploring what types of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.

#### The Data

In the folder `zippedData` are movie datasets from:

* [Box Office Mojo](https://www.boxofficemojo.com/)
* [IMDB](https://www.imdb.com/)
* [Rotten Tomatoes](https://www.rottentomatoes.com/)
* [TheMovieDB](https://www.themoviedb.org/)
* [The Numbers](https://www.the-numbers.com/)

It is up to you to decide what data from this to use and how to use it. If you want to make this more challenging, you can scrape websites or make API calls to get additional data. If you are feeling overwhelmed or behind (e.g. struggled with the Phase 1 Code Challenge), we recommend you use only the following data files:

* imdb.title.basics
* imdb.title.ratings
* bom.movie_gross

### Deliverables

There are three deliverables for this project:

* A **GitHub repository**
* A **Jupyter Notebook**
* A **non-technical presentation**

Review the "Project Submission & Review" page in the "Milestones Instructions" topic for instructions on creating and submitting your deliverables. Refer to the rubric associated with this assignment for specifications describing high-quality deliverables.

#### Key Points

* **Your analysis should yield three concrete business recommendations.** The ultimate purpose of exploratory analysis is not just to learn about the data, but to help an organization perform better. Explicitly relate your findings to business needs by recommending actions that you think the business (Microsoft) should take.

* **Communicating about your work well is extremely important.** Your ability to provide value to an organization - or to land a job there - is directly reliant on your ability to communicate with them about what you have done and why it is valuable. Create a storyline your audience (the head of Microsoft's new movie studio) can follow by walking them through the steps of your process, highlighting the most important points and skipping over the rest.

* **Use plenty of visualizations.** Visualizations are invaluable for exploring your data and making your findings accessible to a non-technical audience. Spotlight visuals in your presentation, but only ones that relate directly to your recommendations. Simple visuals are usually best (e.g. bar charts and line graphs), and don't forget to format them well (e.g. labels, titles).

### Getting Started

Please start by reviewing this assignment, the rubric at the bottom of it, and the "Project Submission & Review" page. If you have any questions, please ask your instructor ASAP.

Next, we recommend you check out [the Phase 1 Project Templates and Examples repo](https://github.com/learn-co-curriculum/dsc-project-template) and use the MVP template for your project.

Alternatively, you can fork [the Phase 1 Project Repository](https://github.com/learn-co-curriculum/dsc-phase-1-project), clone it locally, and work in the `student.ipynb` file. Make sure to also add and commit a PDF of your presentation to your repository with a file name of `presentation.pdf`.

### Project Submission and Review

Review the "Project Submission & Review" page in the "Milestones Instructions" topic to learn how to submit your project and how it will be reviewed. Your project must pass review for you to progress to the next Phase.

### Summary

This project will give you a valuable opportunity to develop your data science skills using real-world data. The end-of-phase projects are a critical part of the program because they give you a chance to bring together all the skills you've learned, apply them to realistic projects for a business stakeholder, practice communication skills, and get feedback to help you improve. You've got this!
___

# ⭐️General Project Tips & Advice 


## Getting Started

1. **Primary project repo:**
    - I recommend forking the dsc-phase-1-project repo to use as your project repo:
        - Fork: https://github.com/learn-co-curriculum/dsc-phase-1-project
            - Clone your fork. Do all of your work for the project inside this folder. 
            
2. **Rename key files to prepare for merging with the template project:**
    - Rename `README.md` -> `project_assignment.md` (or `original_README.md`)
    - Either delete or rename "`student.ipynb`" -> "`original_student.ipynb`"
        
    
3. **Downloading and adding the project-template files:**
    - I recommend **downloading** the template repo branch of your choice. 
    - Go to: https://github.com/learn-co-curriculum/dsc-project-template/tree/template-mvp
    - Make sure you are on the template branch, then:
    - Click green "`Code`" button on github.com and select "`Download Zip`"
    - Unzip the file (in your Downloads folder is fine). 



4. Now, **copy all of the non-hidden files from the unzipped folder to your cloned dsc-phase-1-project repo's folder.**
    - do not copy the hidden files:
        - `.git`
        - `.gitignore`

5. **After copying the files to your fork of the dsc-phase-1-project repo:**
    - rename the `README.md` -> `how_to_use_the_template.md` (can eventually delete from final repo) 
    - rename `TEMPLATE_README.md` to `README.md`
    - [ ] rename `dsc-phase1-project-template.ipynb` -> `student.ipynb`
    
6. **Open the newly renamed "`student.ipynb`" and get started!**
___

## Business Case/Stakeholder 

#### Put yourself in the shoes of your **stakeholders**
 * What do they want to know?
 * Is the information geared towards them (talking to non-data scientists)
 * You are providing **insights** which lead to **actionable items**
   


#### Emphasize your points with a narrative
* We like stories: both technical & non-technical
* Use visuals to emphasize your points (**explanatory visualizations**)
* Each slide should contain a single idea (emphasizes what you say)

## The Data

- There are 11 different csv files provided in the zippedData folder. 
- Using these files is optional. 




### Approaches to consider:

1. **Using the provided csv's and merging dataframes**
2. Sourcing your own dataset using an API
3. Sourcing your own dataset using Web Scraping.
4. Supplementing 1 or more of the provided CSVs with data from an API.


### Using the Provided Data - Join with Pandas

>#### Questions to consider:
>1. Where is the financial data?
>2. What columns are primary keys/unique indices?
>3. What tables could I join on what columns?

### If you feel that you are behind on material

>- Then limit yourself to using just the following 3 provided tables:
    - imdb.title.basics
    - imdb.title.ratings
    - bom.movie_gross

# 🚸 Walkthrough

- [ ] Walkthrough the steps above
- [ ] Write a loop to visualize a preview of all provided data files

## Questions?

- any questions or comments?